\

How oxide cuts data center power consumption in half

195 points - 11/21/2024

Source
  • KenoFischer

    11/22/2024

    I really love Oxide to an unhealthy amount (it's become a bit of a meme among my colleagues), but sometimes I do wonder whether they went about their go-to-market the right way. They really tried to do everything at once - custom servers, custom router, custom rack, everything. Their accomplishments are technologically impressive, but, as somebody who is in a position to make purchasing decisions, not economically attractive. They're 3x more expensive than our existing hardware, two generations behind (I'm aware they're on track for a refresh) and don't have any GPUs. E.g. what I would have loved to see is just an after-market BMC/NIC/firmware solution using their stack. Plug it into a cheap Gigabyte system (their BMC is pluggable and NIC is OCP) and just have the control plane manage it as a whole box. I'd have easily paid serveral thousand $ per server just for that. All the rack scale integration, virtualization, migration, network storage, etc stuff is cool, but not everyone needs it. Get your foot in the door at customers, build up some volume for better deals with AMD, and then start building the custom rack stuff ... Of course it's easy to be a critic from the side lines. As I said, I do really love what the Oxide folks are doing, I just really hope it'll become possible for me to buy their gear at some point.

      • bcantrill

        11/22/2024

        First, thanks for the love -- it's deeply appreciated! Our go-to-market is not an accident: we spent a ton of time (too much time?) looking at how every company had endeavored (and failed) in this space, and then considering a bunch of other options besides. Plugging into a "cheap Gigabyte" system wouldn't actually allow us to build what we've built, and we know this viscerally: before we had our system built, we had to have hardware to build our software on -- which was... a bunch of cheap Gigabyte systems. We had the special pain of relearning all of the reasons why we took the approach we've taken: these systems are a non-starter with respect to foundation.

        You may very well not need the system that we have built, but lots of people do -- and the price point versus the alternatives (public cloud or on-prem commodity HW + pretty price SW) has proven to be pretty compelling. I don't know if we'll ever have a product that hits your price point (which sounds like... the cost of Gigabyte plus a few thousand bucks?), but at least the software is all open source!

          • KenoFischer

            11/22/2024

            Please forgive my tergiversation. I fully trust that you know your path and I know how annoying it is to be why-dont-they-just'd. As I said, I'm rooting for you.

              • m463

                11/22/2024

                > The meaning of TERGIVERSATION is evasion of straightforward action or clear-cut statement : equivocation

                  • KenoFischer

                    11/23/2024

                    There's two dictionary definitions of tergiversate. One is the one you quoted, the other is one of desertion. Both meanings of the word are pejorative in the sense that the word comes with a connotation of betrayal of a cause. What I wanted to express was an acknowledgement that I understood the feeling that you get when someone who's clearly a fan of your work nevertheless does not provide a clear endorsement. It's easy emotionally to dismiss people who "just don't get it". But when someone does get it but chooses to equivocate, that can feel like an emotional betrayal. So I was looking for a word that covered that with the right connotation. I originally used apostasy, but it didn't feel quite right, because I wasn't really renouncing, more failing to fully endorse, so tergiversation it was. Of course having to write an entire paragraph to explain your word choice kind of defeats the purpose of choosing a single well fitting word over just writing a sentence of simple words that explains what you mean. But hey, I write enough technical writing, documentation, reports, grants, etc. all day where clarity is paramount that I feel like I get to have a little vocabulary treat in my personal writing ;).

                • PeterCorless

                  11/22/2024

                  +1 for use of tergiversation

              • 11/22/2024

                • PeterCorless

                  11/22/2024

                  So my question: any Arm-based system or GPU-based system on the horizon?

              • alberth

                11/22/2024

                You just described why commodity servers won over engineered systems that came before Oxide (like Nutanix, Sun / Oracle Exa*, VCE etc).

                So I totally agree with your go-to-market comment, because it’s also a bet against cloud.

                I wish them luck though.

                  • panick21_

                    11/23/2024

                    And yet, non of the hyperscalers use commodity server. They are buying parts from the OCP but those are hardly 'commodity' servers. So did they win?

                • chambers

                  11/22/2024

                  I kinda feel that their focus is more on building a great technology (& culture?) than a great business.

                  Not necessarily a bad choice; after all, for what shall it profit a man, if he shall gain the whole world, and lose his own soul?

                    • bcantrill

                      11/22/2024

                      We are definitely very much building a business! We have the iconoclastic belief that you can build a business by having a terrific team building a great product that customers love. And we're getting there![0]

                      [0] https://www.theregister.com/2024/11/18/llnl_oxide_compute/

                  • intelVISA

                    11/22/2024

                    Oxide are doing great work. Hoping they can probe the market a bit more for us out on the sidelines preparing to drop in and compete with some similar tech.

                    • preisschild

                      11/22/2024

                      Id also wish I could get to play around with a cheaper version of their tech, but they probably havw enough customers that really want a large-scale solution that is completely customizable

                      • cdchn

                        11/22/2024

                        I'm curious what their burn rate is.

                    • unsnap_biceps

                      11/21/2024

                      > When we started Oxide, the DC bus bar stood as one of the most glaring differences between the rack-scale machines at the hyperscalers and the rack-and-stack servers that the rest of the market was stuck with. That a relatively simple piece of copper was unavailable to commercial buyers

                      It seems that 0xide was founded in 2019 and Open Compute Project had been specifying dc bus bars for 6 years at that point. People could purchase racks if they wanted, but it seems like, by large, people didn't care enough to go whole hog in on it.

                      Wonder if the economics have changed or if it's still just neat but won't move the needle.

                        • walrus01

                          11/21/2024

                          Things like -48VDC bus bars in the 'telco' world significantly predate the OCP, all the way back to like 1952 in the Bell system.

                          In general, the telco world concept hasn't changed much. You have AC grid power coming from your local utility into some BIG ASS RECTIFIERS which create -48VDC (and are responsible for charging your BIG ASS BATTERY BANK to float voltage), then various DC fuses/breakers going to distribution of -48VDC bus bars powering the equipment in a CO.

                          Re: Open Compute, the general concept of what they did was go to a bunch of 1U/2U server power supply manufacturers and get them to make a series of 48VDC-to-12VDC power supplies (which can be 92%+ efficient), and cut out the need for legacy 5VDC feed from power supply into ATX-derived-design x86-64 motherboards.

                            • m463

                              11/23/2024

                              I remember seeing an old telephone switching system from the 20's and I think it was 48vdc. Uncertain though.

                                • ttyprintk

                                  11/23/2024

                                  Yeah, would have been 48 vdc for line operations, 60 and up AC for the ring.

                          • indrora

                            11/21/2024

                            You simply can't buy OCP hardware is part of the issue, not new anyway. What you're going to find is "OCP Inspired" hardware that has some overlap with the full OCP specification but is almost always meant to run on 240VAC on 19in racks because nobody wants to invest the money in something that can't be bought from CDW.

                              • p_l

                                11/22/2024

                                I remember the one time I had OCP hardware in data center, and how it was essentially rumoured it's better to not ask too much how it got there - not the level of "fell of a truck", but some possibility it was ex-(big tech) equipment acquired through favours, or some really insistent negotiating with Quanta till "to be sold to (big tech)" racks ended up with us

                            • zamalek

                              11/21/2024

                              It's normally incredibly difficult for employees to disrupt at massive companies that would be the type which runs a data center. Disruption usually enters the corp in a sales deck, much like the one Oxide would have.

                              It's stupid, but that's why we all have jobs.

                                • hnthrowaway0328

                                  11/21/2024

                                  I think engineers should be more forceful to lead their own visions instead being led by accountants and lawyers.

                                  After engineers have the power of implementation and de-implementstion. They need to step into dirty politics and bend other people's views.

                                  It's either theirs or ours. Win-win is a fallacy.

                                    • andrewjf

                                      11/21/2024

                                      Being able to navigate this is what differentiates a very senior IC (principal, distinguished, etc) and random employees.

                                        • orochimaaru

                                          11/21/2024

                                          Yes. I think as an engineer at this level you need to also have the patience to deal with the bean counters.

                                          But as I’ve grown in my career I’ve actually found that line of thinking refreshing. Can you quantify benefit? If it requires too many assumptions it’s probably not worth it.

                                          But then again there’s always the Vp or the svp who wants to ā€œshowcase his towers’ innovative spiritā€ and then there goes money that could be used for better things. The innovative spirit of the day is random Llm apps.

                                      • philipov

                                        11/21/2024

                                        Let me know how that works out for you!

                                        • hinkley

                                          11/22/2024

                                          Once the accountants are convinced the entire company is about them, there’s not much the engineers can do. They just starve you out by refusing to buy anything. It’s a big reason why open source is as successful as it is. It’s free so they can’t stop you with the checkbook.

                                  • bigfatkitten

                                    11/21/2024

                                    OCP hardware is only really accessible to hyperscalers. You can't go out and just buy a rack or two, the Taiwanese OEMs don't do direct deals that small. Even if they did, no integration is done for you. You would have to integrate the compute hardware from one company, the network fabric from another company, and then the OS and everything else from yet another. That's a lot of risk, a lot of engineering resources, a lot of procurement overhead, and a lot of different vendors pointing fingers at each other when something doesn't work.

                                    If you're Amazon or Google, you can do this stuff yourself. If you're a normal company, you probably won't have the inhouse expertise.

                                    On the other hand, Oxide sells a turnkey IaaS platform that you can just roll off the pallet, plug in and start using immediately. You only need to pay one company, and you have one company to yell at if something goes wrong.

                                    You can buy a rack of 1-2U machines from Dell, HPE or Cisco with VMware or some other HCI platform, but you don't get that power efficiency or the really nice control plane Oxide have on their platform.

                                      • leoc

                                        11/22/2024

                                        But isn’t it a little surprising (I’m not an expert) that Dell or Supermicro or somefirm like that hadn’t already started offering an approachable access to either OCP gear or a proprietary knockoff of it? Presumably that may still happen if Oxide is seen to have proven the market.

                                          • kjellsbells

                                            11/22/2024

                                            Azure tried this, not with their hyperscaler stuff, but with Azure Operator Nexus.

                                            Basically an "opinionated" combination of Dell, Arista, and Pure storage with a special Azure AKS running on top and a metric ton of management and orchestration smarts. The target customer base was telcos who needed local capabilities in their data centers and who might otherwise have gone to OCP.

                                            As far as I can surmise, it's dead, but not EOLed. Microsoft nuked the operator business unit earlier in the year, and judging by recent job postings from contract shops, AT&T might be the only customer.

                                            • panick21_

                                              11/23/2024

                                              These companies are looked into their way of doing things. Also, they would be competing with themselves. It would also require more work on their side then they do now.

                                              I think the whole 'existing company is not doing something, therefore its a bad idea' is a really dangerous take.

                                              Oxide is also not just exactly, OCP, they share some aspects, but Oxide racks are optimized for typical DC of large organizations. Maybe there is a balance there that matters.

                                              • unsnap_biceps

                                                11/22/2024

                                                Supermicro does sell OCP racks.

                                                https://www.supermicro.com/solutions/Solution-Brief-Supermic...

                                                I recall them offering older versions of the specs but can't easily find a reference, so I might be wrong about how accessible they were.

                                        • Sylamore

                                          11/22/2024

                                          HP BladeSystem p-series chassis were all DC bus bar powered back in the mid 2000s. You had a power enclosure which provided DC output to one or more chassis in a rack over the bus bar. We were glad to be rid of those blades but it wasn't because of their power configuration.

                                          • TZubiri

                                            11/21/2024

                                            One is the specs and the other is an actual implementation, what am I missing?

                                        • walrus01

                                          11/21/2024

                                          They do have a good point here. If you do the total power budget on a typical 1U (discrete chassis, not blade) server which is packed full of a wall of 40mm fans pushing air, the highest speed screaming 40mm 12VDC fans can be 20W electrical load each. It's easy to "spend" at least 120W at maximum heat from the CPUs, in a dual socket system, just on the fans to pull air from the front/cold side of the server through to the rear heat exhaust.

                                          Just going up to 60mm or 80mm standard size DC fans can be a huge efficiency increase in watt-hours spent per cubic meters of air moved per hour.

                                          I am extremely skeptical of the "12x" but using larger fans is more efficient.

                                          from the URL linked:

                                          > Bigger fans = bigger efficiency gains Oxide server sleds are designed to a custom form factor to accommodate larger fans than legacy servers typically use. These fans can move more air more efficiently, cooling the systems using 12x less energy than legacy servers, which each contain as many as 7 fans, which must work much harder to move air over system components.

                                            • eaasen

                                              11/21/2024

                                              FWIW, we had to have the idle speed of our fans lowered because the usual idle of around 5k RPM was WAY too much cooling. We generally run our fans at around 2.5kRPM (barely above idle). This is due to not only the larger fans, but also the fact that we optimized and prioritized as little restriction on airflow as possible. If you’ve taken apart a current gen 1U/2U server and then compare that to how little our airflow is restricted and how little our fans have to work, the 12X reduction becomes a bit clearer.

                                                • znpy

                                                  11/23/2024

                                                  > the usual idle of around 5k RPM was WAY too much cooling.

                                                  What does this mean? Can one actually get too much cooling? Do you get like condensation and stuff, that kind of "too much cooling" ?

                                                  I'm not being snarky, i actually don't know.

                                                    • kardos

                                                      11/24/2024

                                                      It must mean cooling significantly below the target temperature, and thus wasting power to do it

                                                        • znpy

                                                          11/24/2024

                                                          I see, thank you!

                                          • LeoPanthera

                                            11/21/2024

                                            I really wish Oxide had homelab/prosumer grade stuff. I'd be sending them so much money.

                                              • hinkley

                                                11/22/2024

                                                I kinda feel we need minicomputers back in this age of computing. Instead of making one giant rack that doesn’t fit through doorways, they should make a 4 ft tall unit that stacks. At least once they’re established enough that they can manage doing small installs instead of full data centers. I’ve looked around and there are tiny forklifts they could use to install 2 at once.

                                                Just the power demands for their full rack exceed capacity for most office spaces.

                                                That and someone needs to make a rack that has a port to plug a glycol line directly into. Doesn’t have to be Oxide, but someone should.

                                                • VTimofeenko

                                                  11/22/2024

                                                  A ~20U rack working off residential 15/20A would have been so cool.

                                                  Though given how it's designed for the datacenters, I'd expect the thing to be pretty darn loud.

                                                    • steveklabnik

                                                      11/22/2024

                                                      > Though given how it's designed for the datacenters, I'd expect the thing to be pretty darn loud.

                                                      It's actually very much the opposite: the rack is very, very quiet. You can hear for yourself: https://www.youtube.com/watch?v=bYcgPRIWf6I

                                                        • VTimofeenko

                                                          11/22/2024

                                                          That is quiet, indeed! Have you done any decibel measurements by any chance? I wonder how loud it would be when compared to just ambient residential noise level.

                                                            • steveklabnik

                                                              11/22/2024

                                                              I don't remember off the top of my head.

                                                              It's quiet enough that one customer is putting one just straight-up on their office floor, rather than in a colo somewhere. I've stood next to one in our office (which is a big garage, no soundproofing, so sound otherwhere bounces around a lot) and had conversations easily.

                                                                • VTimofeenko

                                                                  11/22/2024

                                                                  Thanks for the info, it would definitely pass the WAF gate from that perspective :)

                                                  • TabTwo

                                                    11/22/2024

                                                    Isn't most of their stuff open source?

                                                      • steveklabnik

                                                        11/22/2024

                                                        It is, but if you're running on different hardware than us, you'd have to do a bunch of porting. Buying a solution would be a lot simpler, as we'd have already done the porting.

                                                          • Gormo

                                                            11/22/2024

                                                            Have you thought of building an affordable small-scale product for home labs and maybe SMBs? Even if that line didn't turn a profit, it could function as a loss leader in getting engineers and consultants familiar with Oxide, and an opportunity to experiment with (and ultimately evangelize) your tech stack without needing to already have an enterprise-scale use case.

                                                              • steveklabnik

                                                                11/23/2024

                                                                In general, we love the love we get from homelab folks, but the issue is that the current thesis of our designs is "take advantage of the scale of building at the full-rack level."

                                                                We really can't afford to do loss leaders before we have more of a business. It's already difficult enough to build a company like this, and that's with making money off of sales. I fully agree that in general, this idea completely makes sense, but you can only really employ it once you have a business to be able to absorb those losses. Right now, building and selling the current product takes up 110% of our time.

                                                                  • Gormo

                                                                    11/23/2024

                                                                    I respect that, and I hope you get to that point! As a tech leader in a organization that currently falls short of the scale we'd need to justify Oxide products, I'm hoping that day comes soon.

                                                                    We're getting to the point where people are building large clusters of Raspberry Pis and the like for hobbyist projects, so I hope that within a few years, the concept of "full-rack level" can encompass hardware with hundreds of nodes small and cheap enough to be packed into a "rack" that still fits under a desk and sells for a couple grand.

                                                                    In the meantime, I'll guess I'll have to settle for exploring your code and listening to your podcast!

                                                • renewiltord

                                                  11/21/2024

                                                  What I don't get is why tie to such an ancient platform. AMD Milan is my home lab. The new 9004 Epycs are so much better on power efficiency. I'm sure they've done their market research and the gains must be so significant. We used to have a few petabytes and tens of thousands of cores almost ten years ago and it's crazy how much higher data and compute density you can get with modern 30 TiB disks and Epyc 9654s. 100 such nodes and you have 10k cores and really fast data. I can't see myself running a 7003-series datacenter anymore unless the Oxide gains are that big.

                                                    • farawayea

                                                      11/21/2024

                                                      They've built this a while ago. A hardware refresh takes time. The good news is that they may be able to upgrade the existing equipment with newer sleds.

                                                        • jclulow

                                                          11/21/2024

                                                          Yes we're definitely building the next generation of equipment to fit into the existing racks!

                                                      • znpy

                                                        11/23/2024

                                                        my undestanding is that they had to build not only the entire hardware platform from scratch, but also the software.

                                                        in one of his talks Bryan Cantrill talks about how AMD cpus were meant to be booted off a uefi microcode, and AMD themselves told them such... Until they kinda reverse engineered the AGESA thingy and made the cpu boot without bios/uefi.

                                                        I guess that's the kind of things that take a lot of time... the first time. In the future they'll likely to be iterating faster.

                                                        EDIT: i wrote the comment above to the best of my knowledge, somebody from Oxide might chime in and maybe add some more details :)

                                                    • zcw100

                                                      11/21/2024

                                                      I believe the telco’s did dc power for years so I don’t think this anything new. Any old hands out there want to school us on how it was done in the old days?

                                                        • iamthepieman

                                                          11/21/2024

                                                          Every old telco technician had a story about dropping a wrench on a busbar or other bare piece of high powered transmission equipment and having to shut that center down, get out the heavy equipment, and cut it off because the wrench had been welded to the bus bars.

                                                            • jclulow

                                                              11/21/2024

                                                              Note that the rack doesn't accept DC input, like lots of (e.g., NEBS certified) telco equipment. There's a bus bar, but it's enclosed within the rack itself. The rack takes single- or three-phase AC inputs to power the rectifiers, which are then attached to the internal bus bar.

                                                          • walrus01

                                                            11/21/2024

                                                            big ass rectifiers

                                                            big ass solid copper busbars

                                                            huge gauge copper cables going around a central office (google "telcoflex IV")

                                                            big DC breaker/fuse panels

                                                            specialized dc fuse panels for power distribution at the top of racks, using little tiny fuses

                                                            100% overhead steel ladder rack type cable trays, since your typical telco CO was never a raised floor type environment (UNLIKE legacy 1960s/1970s mainframe computer rooms), so all the power was kept accessible by a team of people working on stepladders.

                                                            The same general thing continues today in serious telco/ISP operations, with tech features to bring it into the modern era. The rectifiers are modular now, and there's also rectiverters. Monitoring is much better. People are moving rapidly away from wet cell 2V lead acid battery banks and AGM sealed lead acid stuff to LiFePo4 battery systems.

                                                            DC fuse panels can come with network-based monitoring, ability to turn on/off devices remotely.

                                                            equipment is a whole lot less power hungry now, a telco CO that has decommed a 5ESS will find itself with a ton of empty thermal and power budget.

                                                            when I say serious telco stuff is a lot less power hungry, it's by huge margins. randomly chosen example of radio transport equipment. For instance back in the day a powerful, very expensive point to point microwave radio system might be a full 42U rack, 800W in load, with waveguide going out to antennas on a roof. It would carry one, two or three DS3 equivalent of capacity (45 Mbps each).

                                                            now, that same telco might have a radio on its CO roof in the same microwave bands that is 1.3 Gbps FDD capacity, pure ethernet with a SFP+ fiber interface built into it, and the whole radio is a 40W electrical load. The radio is mounted directly on the antenna with some UV/IR resistant weatherproof 16 gauge DC power cable running down into the CO and plugged into a fuse panel.

                                                              • applied_heat

                                                                11/22/2024

                                                                Can you give me a link to this 1.3 gbps radio product? I have some Alcatel radios with waveguides on a licensed band that only do 50 megabit that I would upgrade if there was something that could get more bits out of the same bandwidth and towers.

                                                                  • walrus01

                                                                    11/22/2024

                                                                    Ceragon is one brand name. If you need to keep an entirely indoor unit radio in a rack with the existing waveguide it'll cost a little more, since that's a more rare configuration for new 4096QAM modulation radios.

                                                                    The 1.3 Gbps full duplex capacity assumes dual linear H&V polarization simultaneously, and assumes an 80 MHz wide FDD channel split such as in the 11 GHz high/low band plan. If you're in FCC part 101 regulatory band territory, and what frequency your existing radios use and existing path, you might not have that capacity. You could have an existing 40 MHz wide channel which will be half the capacity.

                                                                    If you have a 50 Mbps radio product it's also very likely you're in a single polarity so you would need to recoordinate the path (around $1500) entirely to get the same MHz in the opposite polarity.

                                                                    • EvanAnderson

                                                                      11/22/2024

                                                                      I don't have a link handy (on my phone), but I was involved in installs of licensed Cambium 18Ghz radios last year that were pushing >1Gbps. PTP-800 was the model number, if memory serves.

                                                              • hinkley

                                                                11/22/2024

                                                                The first large scale app I did we got offices in a building that used to have telco equipment in it. There wasn’t enough power or cooling to run about a rack worth of equipment split across several. It basically had a mini-split for AC. We had to bring in new wiring and run a glycol line to a condenser on the roof, and the smallest unit we were willing to pay for was too big so we had to knock out a wall to tack a reasonable sized office onto the end to get the volume large enough. So much wasted space for the amount of equipment in there.

                                                            • farawayea

                                                              11/21/2024

                                                              Their tech may be more than adequate today. Bigger businesses may not buy from a small startup company. They expect a lot more. Illumos is a less popular OS. It wouldn't be the first choice for the OS I'd rely on. Who writes the security mitigations for speculative execution bugs? Who patches CVEs in the shipped software which doesn't use Rust?

                                                                • AlotOfReading

                                                                  11/21/2024

                                                                  The answer to "who does X" is Oxide. That's the point. You're not going to Dell who's integrating multiple vendors in the same box in a way that "should" work. You're getting a rack where everything is designed to work together from top to bottom.

                                                                  The goal is that you can email Oxide and they'll be able fix it regardless of where it is in the stack, even down to the processor ROM.

                                                                    • toomuchtodo

                                                                      11/21/2024

                                                                      This. If you want on prem cloud infra without having to roll it yourself, Oxide is the solution.

                                                                      (no affiliation, just a fan)

                                                                        • carlhjerpe

                                                                          11/21/2024

                                                                          If you want on prem infra in exactly the shape and form Oxide delivers*

                                                                          I've read and understood from Joyent and SmartOS that they believe fault tolerant block devices / filesystems is the wrong abstraction, your software should handle losing storage.

                                                                            • eaasen

                                                                              11/21/2024

                                                                              We do not put the onus on customers to tolerate data loss. Our storage is redundant and spread through the rack so that if you lose drives or even an entire computer, your data is still safe. https://oxide.computer/product/storage

                                                                              • 11/21/2024

                                                                                • panick21_

                                                                                  11/23/2024

                                                                                  They have partly changed their position on that. You can listen to their podcast on their distributed block storage solution.

                                                                          • yencabulator

                                                                            11/22/2024

                                                                            And a big enough customer will evaluate Oxide's resources and consider for themselves whether they think Oxide can provide a quick enough turnaround for everything. That's what GP is talking about.

                                                                        • throw0101d

                                                                          11/21/2024

                                                                          > Bigger businesses may not buy from a small startup company.

                                                                          What would you classify Shopify as?

                                                                          > One existing Oxide user is e-commerce giant Shopify, which indicates the growth potential for the systems available.

                                                                          * https://blocksandfiles.com/2024/07/04/oxide-ships-first-clou...

                                                                          Their CEO has tweeted about it:

                                                                          * https://twitter.com/tobi/status/1793798092212367669

                                                                          > Who writes the security mitigations for speculative execution bugs? Who patches CVEs in the shipped software which doesn't use Rust?

                                                                          Oxide.

                                                                          This is all a pre-canned solution: just use the API like you would an off-prem cloud. Do you worry about AWS patching stuff? And how many people purchasing 'traditional' servers from Dell/HPe/Lenovo worry about patching links like the LOM?

                                                                          Further, all of Oxide's stuff is on Github, so you're in better shape for old stuff, whereas if the traditional server vendors EO(S)L something firmware-wise you have no recourse.

                                                                            • cdchn

                                                                              11/22/2024

                                                                              How much did Shopify buy? Sounds like from what the CEO is saying they bought 1 unit.

                                                                              >We learned that Oxide has so far shipped ā€œunder 20 racks,ā€ which illustrates the selective markets its powerful systems are aimed at.

                                                                              >B&F understands most of those systems were deployed as single units at customer sites. Therefore, Oxide hopes these and new customers will scale up their operations in response to positive outcomes.

                                                                              Yikes. If they sold 20 racks in July, how many are they up to now?

                                                                          • packetlost

                                                                            11/21/2024

                                                                            Illumos is the OS for the hypervisor and core services, they don't expect their customers to run their code directly on that OS, but inside VMs.

                                                                            • steveklabnik

                                                                              11/21/2024

                                                                              > Bigger businesses may not buy from a small startup company.

                                                                              Our early customers include government, finance, and places like Shopify.

                                                                              You’re not wrong that some places may prefer older companies but that doesn’t mean they all do.

                                                                              Illumos is not really directly relevant to the customer, it’s a non user facing implementation detail.

                                                                              We provide security updates.

                                                                              • mycoliza

                                                                                11/21/2024

                                                                                We write the security mitigations. We patch the CVEs. Oxide employs many, perhaps most, of the currently active illumos maintainers --- although I don't work on the illumos kernel personally, I talk to those folks every day.

                                                                                A big part of what we're offering our customers is the promise that there's one vendor who's responsible for everything in the rack. We want to be the responsible party for all the software we ship, whether it's firmware, the host operating system, the hypervisor, and everything else. Arguably, the promise that there's one vendor you can yell at for everything is a more important differentiator for us than any particular technical aspect of our hardware or software.

                                                                                • sunshowers

                                                                                  11/21/2024

                                                                                  The illumos bare-metal OS is not directly visible to customers.

                                                                              • throw0101d

                                                                                11/21/2024

                                                                                See perhaps "Oxide Cloud Computer Tour - Rear":

                                                                                * https://www.youtube.com/watch?v=lJmw9OICH-4

                                                                                • arpinum

                                                                                  11/21/2024

                                                                                  How long before a VPS pops up running Oxide racks? Or, why wouldn't a VPS build on top of Oxide if they offer better efficiency and server management?

                                                                                    • steveklabnik

                                                                                      11/22/2024

                                                                                      Someone could if they wanted to! We’ll see if anyone does.

                                                                                      • INTPenis

                                                                                        11/21/2024

                                                                                        Because they use such esoteric software that you'll forever be reliant on Oxide.

                                                                                        I'd rather they use more standardized open source software like Linux, Talos, k8s, Ceph, KubeVirt. Instead of rolling it all themselves on an OS that has a very small niche ecosystem.

                                                                                          • AceJohnny2

                                                                                            11/22/2024

                                                                                            Oxide is providing an x86 platform to run VMs/containers on. That's a commoditized market.

                                                                                            The value they're offering is that the rack-level consumption and management is improved over the competition, but you should be able to run whatever you want on the actual compute, k8s or whatnot.

                                                                                            This also means you'd not be forever reliant on Oxide.

                                                                                              • 11/22/2024

                                                                                    • louwrentius

                                                                                      11/21/2024

                                                                                      I’m rooting for solutions like this as an alternative to the public cloud. I do see that an org would rely on one company that theoretically can do a ā€˜Broadcom VMware’ on them but I don’t get this vibe from 0x1d3 at all.

                                                                                      But they target large orgs, I wish a solution like this would be accessible for smaller companies.

                                                                                      I wish I could throw their stack on my second hand cots hardware, rent a few U’s in two colos for geo redundancy and cry of happiness each month realizing how much money we save on public cloud cost, yet having cloud capabilities/benefits

                                                                                      • huijzer

                                                                                        11/22/2024

                                                                                        > Here’s a sobering thought: today, data centers already consume 1-2% of the world’s power, and that percentage will likely rise to 3-4% by the end of the decade.

                                                                                        I don't get this marketing angle. I've made arguments here before that the cost of compute from a energy perspective is often negligible. If Google Maps, for example, can save you 1 mile due to better routing, then that is several orders of magnitude more efficient [1].

                                                                                        If it uses less resources, it uses less resources. Everybody (businesses and individuals) loves that.

                                                                                        [1]: https://news.ycombinator.com/threads?id=huijzer&next=4206549...

                                                                                          • adgjlsfhk1

                                                                                            11/22/2024

                                                                                            both are true. using computers to reduce emissions is good, and reducing computer emissions is good.

                                                                                        • grecy

                                                                                          11/21/2024

                                                                                          I'm amazed Apple don't have a rack mount version of their M series chips yet.

                                                                                          Even for their own internal use in their data centers they'd have to save an absolute boat load on power and cooling given their performance per watt compared to legacy stuff.

                                                                                            • bayindirh

                                                                                              11/21/2024

                                                                                              Oxide is not touching DLC systems in their post even with a 100ft barge pole.

                                                                                              Lenovo's DLC systems use 45 degrees C water to directly cool the power supplies and the servers themselves (water goes through them) for > 97% heat transfer to water. In cooler climates, you can just pump this to your drycoolers, and in winter you can freecool them with just air convection.

                                                                                              Yes, the TDP doesn't go down, but cooling costs and efficiency shots up considerably, reducing POE to 1.03 levels. You can put tremendous amount of compute or GPU power in one rack, and cool them efficiently.

                                                                                              Every chassis handles its own power, but IIRC, all the chassis electricity is DC. and the PSUs are extremely efficient.

                                                                                                • hinkley

                                                                                                  11/22/2024

                                                                                                  The in case PSUs I’ve seen them gesturing to in videos don’t even seem to have cooling fins on them.

                                                                                              • walrus01

                                                                                                11/21/2024

                                                                                                Companies buying massive cloud scale server hardware want to be able to choose from a dozen different Taiwanese motherboard manufacturers. Apple is in no way motivated to release or sell the M3/M4 CPUs as a product that major east asia motherboard manufacturers can design their own platform for. Apple is highly invested in tightly integrated ecosystems where everything is soldered down together in one package as a consumer product (take a look at a macbook air or pro motherboard for instance).

                                                                                                  • vineyardmike

                                                                                                    11/24/2024

                                                                                                    …Apple has made rack-mounted computers in recent history. They don’t sell chips, they sell complete boxes with rack mount hardware, motherboard and all.

                                                                                                    https://www.apple.com/shop/product/G1720LL/A/Refurbished-Mac...

                                                                                                      • walrus01

                                                                                                        11/26/2024

                                                                                                        An extremely niche product for things like video editing studios, not something you can deploy at scale in colocation/datacenter environments. Literally never seen rackmounted apple hardware in a serious datacenter since the apple xserve 20 to 22 years ago.

                                                                                                • rincebrain

                                                                                                  11/21/2024

                                                                                                  I don't think they'd admit much about it even if they had one internally, both because Apple isn't known for their openness about many things, and because they already exited the dedicated server hardware business years ago, so I think they're likely averse to re-entering it without very strong evidence that it would be beneficial for more than a brief period.

                                                                                                  In particular, while I'd enjoy such a device, Apple's whole thing is their whole-system integration and charging a premium because of it, and I'm not sure the markets that want to sell people access to Apple CPUs will pay a premium for a 1U over shoving multiple Mac Minis in the same 1U footprint, especially if they've already been doing that for years at this point...

                                                                                                  ...I might also speculate that if they did this, they'd have a serious problem, because if they're buying exclusive access to all TSMC's newest fab for extended intervals to meet demand on their existing products, they'd have issues finding sources to meet a potentially substantial demand in people wanting their machines for dense compute. (They could always opt to lag the server platforms behind on a previous fab that's not as competed with, of course, but that feels like self-sabotage if they're already competing with people shoving Mac Minis in a rack, and now the Mac Minis get to be a generation ahead, too?)

                                                                                                    • AceJohnny2

                                                                                                      11/21/2024

                                                                                                      I will add that consumer macOS is a piss-poor server OS.

                                                                                                      At one point, for many years, it would just sometimes fail to `exec()` a process. This would manifest as a random failure on our build farm about once/twice a month. (This would manifest as "/bin/sh: fail to exec binary file" because the error type from the kernel would have the libc fall back to trying to run the binary as a script, as normal for a Unix, but it isn't a script)

                                                                                                      This is likely stemming from their exiting the server business years ago, and focusing on consumer appeal more than robustness (see various terrible releases, security- and stability-wise).

                                                                                                      (I'll grant that macOS has many features that would make it a great server OS, but it's just not polished enough in that direction)

                                                                                                        • AceJohnny2

                                                                                                          11/21/2024

                                                                                                          > as normal for a Unix

                                                                                                          veering offtopic, did you know macOS is a certified Unix?

                                                                                                          https://www.opengroup.org/openbrand/register/brand3581.htm

                                                                                                          As I recall, Apple advertised macOS as a Unix without such certification, got sued, and then scrambled to implement the required features to get certification as a result. Here's the story as told by the lead engineer of the project:

                                                                                                          https://www.quora.com/What-goes-into-making-an-OS-to-be-Unix...

                                                                                                            • jorams

                                                                                                              11/21/2024

                                                                                                              This comes up rather often, and on the last significant post about it I saw on HN someone pointed out that the certification is kind of meaningless[1]. macOS poll(2) is not Unix-compliant, hasn't been since forever, yet every new version of macOS gets certified regardless.

                                                                                                              [1]: https://news.ycombinator.com/item?id=41823078

                                                                                                                • znpy

                                                                                                                  11/23/2024

                                                                                                                  lovely, i favorited that comment!

                                                                                                              • autoexecbat

                                                                                                                11/21/2024

                                                                                                                and Windows used to be certified for posix, but none of that matters theses days if it's not bug-compatible with Linux

                                                                                                            • rincebrain

                                                                                                              11/21/2024

                                                                                                              Did that ever get fixed? That...seems like a pretty critical problem.

                                                                                                                • AceJohnny2

                                                                                                                  11/21/2024

                                                                                                                  Yes, it quietly stopped happening a few years ago, sometime since 2020.

                                                                                                              • outworlder

                                                                                                                11/21/2024

                                                                                                                > I will add that consumer macOS is a piss-poor server OS.

                                                                                                                Windows is also abysmal but it hasn't stopped people from using it.

                                                                                                                But yes, it is too much of a desktop OS.

                                                                                                                  • toast0

                                                                                                                    11/22/2024

                                                                                                                    I wouldn't run a Windows server, but at least it can manage a SYN flood, whereas MacOS doesn't have syncookies or similar (their version of pf has the syncookie keyword, but it seems like it only works for traffic that transits the host, not for traffic that is terminated by the host). Windows also has some pretty nice stuff for servers like receive side scaling (afaik, Microsoft brought that to market, or at least was very early).

                                                                                                        • thatfrenchguy

                                                                                                          11/21/2024

                                                                                                          There is a rack mount version of the Mac Pro you can buy

                                                                                                            • bigfatkitten

                                                                                                              11/21/2024

                                                                                                              That's designed for the broadcast market, where they rack mount everything in the studio environment. It's not really a server, it has no out of band management, redundant power etc.

                                                                                                              There are third party rack mounts available for the Mac Mini and Mac Studio also.

                                                                                                                • wpm

                                                                                                                  11/22/2024

                                                                                                                  Rack mount models have LOM over MDM.

                                                                                                          • jauntywundrkind

                                                                                                            11/21/2024

                                                                                                            For who? How would this help their core mission?

                                                                                                            Maybe it becomes a big enough profit center to matter. Maybe. At the risk of taking focus away, splitting attention from the mission they're on today: building end user systems.

                                                                                                            Maybe they build them for themselves. For what upside? Maybe somewhat better compute efficiency maybe, but I think if you have big workloads the huge massive AMD Turin super-chips are going to be incredibly hard to beat.

                                                                                                            It's hard to emphasize just how efficient AMD is, with 192 very high performance cores on a 350-500W chip.

                                                                                                              • favorited

                                                                                                                11/21/2024

                                                                                                                > Maybe they build them for themselves. For what upside?

                                                                                                                They do build it for themselves. From their security blog:

                                                                                                                "The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot. We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing."

                                                                                                                <https://security.apple.com/blog/private-cloud-compute/>

                                                                                                                  • jauntywundrkind

                                                                                                                    11/22/2024

                                                                                                                    This is such a narrow narrow tiny corner of computing needs. That has such serious need for ownership, no matter the cost. And has extremely fantastically chill as shit overall computing needs, is un-perfomamce-sensitive as it gets.

                                                                                                                    I could not be less convinced by this information that this is a useful indicator for the other 99.999999999% of computing needs.

                                                                                                                      • favorited

                                                                                                                        11/22/2024

                                                                                                                        Good, because you can’t have one.

                                                                                                            • throawayonthe

                                                                                                              11/21/2024

                                                                                                              (some of?) their servers do run apple silicon: https://security.apple.com/blog/private-cloud-compute/

                                                                                                          • shivak

                                                                                                            11/21/2024

                                                                                                            > > The power shelf distributes DC power up and down the rack via a bus bar. This eliminates the 70 total AC power supplies found in an equivalent legacy server rack within 32 servers, two top-of-rack switches, and one out-of-band switch, each with two AC power supplies

                                                                                                            This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff. In general, the cost savings advertised by cloud infrastructure should be more holistic.

                                                                                                              • dralley

                                                                                                                11/21/2024

                                                                                                                >This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff.

                                                                                                                I'll happily take a single high qualify power supply (which may have internal redundancy FWIW) over 70 much more cheaply made power supplies that stress other parts of my datacenter via sheer inefficiency, and also costs more in aggregate. Nobody drives down the highway with 10 spare tires for their SUV.

                                                                                                                  • shivak

                                                                                                                    11/21/2024

                                                                                                                    A DC busbar can propagate a short circuit across the rack, and DC circuit protection is harder than AC. So of course each server now needs its own current limiter, or a cheap fuse.

                                                                                                                    But I’m not debating the merits of this engineering tradeoff - which seems fine, and pretty widely adopted - just its advertisement. The healthcare industry understands the importance of assessing clinical endpoints (like mortality) rather than surrogate measures (like lab results). Whenever we replace ā€œlegacyā€ with ā€œcloudā€, it’d be nice to estimate the change in TCO.

                                                                                                                      • malfist

                                                                                                                        11/21/2024

                                                                                                                        DC circuit protection is absolutely not harder than AC. DC has the advantage in current flowing in only one direction, not two

                                                                                                                          • paddy_m

                                                                                                                            11/22/2024

                                                                                                                            Which makes it much harder to break the circuit vs AC

                                                                                                                              • wbl

                                                                                                                                11/22/2024

                                                                                                                                At 48 volts arcing shorts aren't the concern.

                                                                                                                    • fracus

                                                                                                                      11/21/2024

                                                                                                                      No one drives down the highway with one tire either.

                                                                                                                        • AcerbicZero

                                                                                                                          11/21/2024

                                                                                                                          Careful, unicyclists are an unforgiving bunch.

                                                                                                                      • hn-throw

                                                                                                                        11/21/2024

                                                                                                                        Let's say your high quality supply's yearly failure rate is 100 times less than the cheap ones

                                                                                                                        The probability of at least a single failure is 1-(1-r)^70.

                                                                                                                        This is quite high even w/out considering the higher quality of the one supply.

                                                                                                                        The probability of all 70 going down is

                                                                                                                        r^70 which is absurdly low.

                                                                                                                        Let's say r = 0.05 or one failed supply every 20 in a year.

                                                                                                                        1-(1-r)^70 = 97% r^70 < 1E-91

                                                                                                                        The high quality supply has r = 0.0005, in between no failure and all failing. If you code can handle node failure, very many, cheaper supplies appears to be more robust.

                                                                                                                        (Assuming uncorrelated events. YMMV)

                                                                                                                          • carlhjerpe

                                                                                                                            11/21/2024

                                                                                                                            Yeah but the failure rate of an analog piece of copper is pretty low, it'll keep being copper unless you do stupid things. You'll have multiple power supplies provide power on the same piece of copper

                                                                                                                              • hn-throw

                                                                                                                                11/21/2024

                                                                                                                                TL/DR, isnt there a single, shared, DC supply that supplies said piece of copper? Presumably connected to mains?

                                                                                                                                Or are the running on SOFCs?

                                                                                                                                  • mycoliza

                                                                                                                                    11/21/2024

                                                                                                                                    The big piece of copper is fed by redundant rectifiers. Each power shelf has six independent rectifiers which are 5+1 redundant if the rack is fully loaded with compute sleds, or 3+3 redundant if the rack is half-populated. Customers who want more redundancy can also have a second power shelf with six more rectifiers.

                                                                                                                                      • hn-throw

                                                                                                                                        11/22/2024

                                                                                                                                        I'm going to assume this is on 3 phase power, but how is the ripple filtered?

                                                                                                                                          • applied_heat

                                                                                                                                            11/22/2024

                                                                                                                                            Inductors and capacitors usually

                                                                                                                    • sunshowers

                                                                                                                      11/21/2024

                                                                                                                      Look very carefully at the picture of the rack at https://oxide.computer/ :) there are two power shelves in the middle, not one.

                                                                                                                      We're absolutely aware of the tradeoffs here and have made quite considered decisions!

                                                                                                                      • jsolson

                                                                                                                        11/21/2024

                                                                                                                        The bus bar itself is an SPoF, but it's also just dumb copper. That doesn't mean that nothing can go wrong, but it's pretty far into the tail of the failure distribution.

                                                                                                                        The power shelf that keeps the busbar fed will have multiple rectifiers, often with at least N+1 redundancy so that you can have a rectifier fail and swap it without the rack itself failing. Similar things apply to the battery shelves.

                                                                                                                          • immibis

                                                                                                                            11/21/2024

                                                                                                                            It's also plausible to have multiple power supplies feeding the same bus bar in parallel (if they're designed to support this) e.g. one at each end of a row.

                                                                                                                              • eaasen

                                                                                                                                11/21/2024

                                                                                                                                This is how our rack works (Oxide employee). In each power shelf, there are 6 power supplies and only 5 need to be functional to run at full load. If you want even more redundancy, you can use both power shelves with independent power feeds to each so even if you lose a feed, the rack still has 5+1 redundant power supplies.

                                                                                                                        • walrus01

                                                                                                                          11/21/2024

                                                                                                                          The whole thing with eliminating 70 discrete 1U server size AC-to-DC power supplies is nothing new. It's the same general concept as the power distribution unit in the center of an open compute platform rack design from 10+ years ago.

                                                                                                                          Everyone who's doing serious datacenter stuff at scale knows that one of the absolute least efficient, labor intensive and cabling intensive/annoying ways of powering stuff is to have something like a 42U cabinet with 36 servers in it, each of them with dual power supplies, with power leads going to a pair of 208V 30A vertical PDUs in the rear of the cabinet. It gets ugly fast in terms of efficiency.

                                                                                                                          The single point of failure isn't really a problem as long as the software is architected to be tolerant of the disappearance of an entire node (mapping to a single motherboard that is a single or dual cpu socket config with a ton of DDR4 on it).

                                                                                                                            • formerly_proven

                                                                                                                              11/21/2024

                                                                                                                              That’s one reason why 2U4N systems are kinda popular. 1/4 the cabling in legacy infrastructure.

                                                                                                                              • jeffbee

                                                                                                                                11/21/2024

                                                                                                                                PDUs are also very failure-prone and not worth the trouble.

                                                                                                                            • sidewndr46

                                                                                                                              11/21/2024

                                                                                                                              This isn't even remotely close. Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

                                                                                                                              In the event that all 32 servers had redundant AC power feeds, you could just install a pair of redundant DC power feeds.

                                                                                                                                • gruez

                                                                                                                                  11/21/2024

                                                                                                                                  >Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

                                                                                                                                  Is this not standard? I vaguely remember that rack severs typically have two PSUs for this reason.

                                                                                                                                    • glitchcrab

                                                                                                                                      11/21/2024

                                                                                                                                      It's highly dependent on the individual server model and quite often how you spec it too. Most 1U Dell machines I worked with in the past only had a single slot for a PSU, whereas the beefier 2U (and above) machines generally came with 2 PSUs.

                                                                                                                                        • thfuran

                                                                                                                                          11/21/2024

                                                                                                                                          But 2 PSUs plugged into the same AC supply still have a single point of failure.

                                                                                                                                            • glitchcrab

                                                                                                                                              11/22/2024

                                                                                                                                              Which is why you have two separate PDUs in the rack which are fed by different power feeds and you connect the server's 2 PSUs to opposing PDUs.

                                                                                                                                                • growse

                                                                                                                                                  11/22/2024

                                                                                                                                                  This works brilliantly, right up to the point where your A side fails, and every single server suddenly doubles their demand on B.

                                                                                                                                                  Better have good capacity management so you don't go over 100% on B when that happens! (I've seen it happen and take a DC out).

                                                                                                                                      • jeffbee

                                                                                                                                        11/21/2024

                                                                                                                                        Rack servers have two PSUs because enterprise buyers are gullible and will buy anything. Generally what happens in case of a single PSU failure is the other PSU also fails or it asserts PROCHOT which means instead of a clean hard down server you have a slow server derping along at 400MHz which is worse in every possible way.

                                                                                                                                        • sidewndr46

                                                                                                                                          11/21/2024

                                                                                                                                          you could have 15 PSUs in a server. It doesn't mean they have redundant power feeds

                                                                                                                                  • MisterTea

                                                                                                                                    11/21/2024

                                                                                                                                    > This creates a single point of failure,

                                                                                                                                    Who told you there is only one PSU in the power shelf?

                                                                                                                                • 11/22/2024

                                                                                                                                  • 11/23/2024

                                                                                                                                    • ZeroCool2u

                                                                                                                                      11/21/2024

                                                                                                                                      If any Oxide staff are here, I'm just curious, is BlueSky a customer? Seems like it would fit well with their on-prem setup.

                                                                                                                                        • mkeeter

                                                                                                                                          11/21/2024

                                                                                                                                          Nope, but many of us (Oxide staff) are big fans of what Bluesky is doing!

                                                                                                                                          One of the Bluesky team members posted about their requirements earlier this month, and why Oxide isn't a great fit for them at the moment:

                                                                                                                                          https://bsky.app/profile/jaz.bsky.social/post/3laha2upw3k2z

                                                                                                                                            • ZeroCool2u

                                                                                                                                              11/22/2024

                                                                                                                                              Appreciate the reply! Been following Oxide for a few years now and really enjoy the technical blogs :)

                                                                                                                                              • AceJohnny2

                                                                                                                                                11/22/2024

                                                                                                                                                > Also prices don't make sense for us.

                                                                                                                                                Oof.

                                                                                                                                                  • tptacek

                                                                                                                                                    11/22/2024

                                                                                                                                                    Why is that "oof"? They're using commodity servers today. Oxide does not offer commodity servers.

                                                                                                                                                      • AceJohnny2

                                                                                                                                                        11/23/2024

                                                                                                                                                        Just that it highlights the challenge that Oxide faces, that they're effectively offering a "luxury" product in a deeply commoditized space.

                                                                                                                                                          • tptacek

                                                                                                                                                            11/23/2024

                                                                                                                                                            That's true if you think the market is SaaS upstarts like Bluesky and maybe less true if you think of the market in terms of who buys hardware. I remember early on at Matasano working for a house account, a major US corp that isn't a household name, and being shocked 2 years in when I finally had to do something in their data center (a FCIP appliance assessment) and seeing how much they'd spent on it. Look at everyone who runs (and wishes they weren't) z/OS today, or Oracle. There's more of them than I think a lot of HN people think.

                                                                                                                                                        • cplwankery

                                                                                                                                                          11/23/2024

                                                                                                                                                          Good on 0x1d5 to bring back the era of expensive, proprietary hardware that everybody loved so much.

                                                                                                                                              • danpalmer

                                                                                                                                                11/21/2024

                                                                                                                                                Not Oxide or Bluesky, but firstly I'd suggest that asking the company about their customers is unlikely to get a response, most companies don't disclose their customers. Secondly, Bluesky have been growing quickly, I can only assume their hardware is too, and that means long lead time products like an Oxide rack aren't going to work, especially when you can have an off the shelf machine from Dell delivered in a few days.

                                                                                                                                                  • steveklabnik

                                                                                                                                                    11/21/2024

                                                                                                                                                    Oxide is very open, we are happy to talk about customers that allow us to talk about them. Some don’t want to, others are very happy to be mentioned, just like any other company.

                                                                                                                                                      • danpalmer

                                                                                                                                                        11/22/2024

                                                                                                                                                        > we are happy to talk about customers that allow us to talk about them

                                                                                                                                                        This is what I meant by "don't disclose", I didn't mean that Oxide was in any way secretive, but that usually this stuff doesn't get agreed, and that it would make more sense to ask the customer rather than the company selling as Oxide won't want to disclose unless there's already an agreement in place (formal or otherwise).

                                                                                                                                                          • steveklabnik

                                                                                                                                                            11/22/2024

                                                                                                                                                            Gotcha. That totally makes sense, I would t have thought about it that way.

                                                                                                                                                    • ramon156

                                                                                                                                                      11/21/2024

                                                                                                                                                      > most companies dont disclose their customers

                                                                                                                                                      In my head I'm imagining an average landing page. They slap their customers on there like stickers. I doubt bluesky would stay secretive about using oxide if they did

                                                                                                                                                        • slyall

                                                                                                                                                          11/21/2024

                                                                                                                                                          Those customers listed on the front page of companies are there as part of an agreement. Usually something like a discount. Certainly they are not listed without permission. 10x that if it is a case study.

                                                                                                                                                            • danpalmer

                                                                                                                                                              11/22/2024

                                                                                                                                                              I think they often are listed without permission unfortunately, and often literally based on on the the email addresses of people signing up for a trial. I see my company's logo on the landing page of many products that we don't use or may even have a policy preventing our use of.

                                                                                                                                                  • tptacek

                                                                                                                                                    11/21/2024

                                                                                                                                                    events.bsky appears to be hosted on OVH. Single-product SAAS companies less than a few years old are unlikely to be a major customer cohort for Oxide.

                                                                                                                                                • ccorcos

                                                                                                                                                  11/22/2024

                                                                                                                                                  From the title, I was expecting to read about how oxidation (aka rust) reduces power throughput capacity

                                                                                                                                                  • rajnathani

                                                                                                                                                    11/23/2024

                                                                                                                                                    Is this just the main reason?

                                                                                                                                                    > Replacing low-efficiency AC power supplies with a high-efficiency DC Bus Bar

                                                                                                                                                    The part after it about better cooling fans, meh, there are more efficient liquid-cooling methods including immersion-cooling which are already there in implementation albeit niche.

                                                                                                                                                    • kev009

                                                                                                                                                      11/21/2024

                                                                                                                                                      Where is the GPU?

                                                                                                                                                        • steveklabnik

                                                                                                                                                          11/21/2024

                                                                                                                                                          We don’t currently have GPUs in the product. The closed-ness of the GPU space is a bit of a cultural difference, but we’ll surely have something eventually. As a small company, we have to focus on our strengths, and there’s plenty of folks who don’t need GPUs right now.

                                                                                                                                                            • kev009

                                                                                                                                                              11/21/2024

                                                                                                                                                              That's fine, just awkward because the GS report shows the TAM or problem depending on your perspective being accelerated computing.

                                                                                                                                                                • steveklabnik

                                                                                                                                                                  11/22/2024

                                                                                                                                                                  For sure. It’s not just GPUs; given that we have one product with three SKUs, there’s a variety of workloads we won’t be appropriate for just yet. Just takes time to diversify the offering.

                                                                                                                                                          • kev507

                                                                                                                                                            11/21/2024

                                                                                                                                                            maybe the real GPU was the friends we made along the way

                                                                                                                                                        • PreInternet01

                                                                                                                                                          11/21/2024

                                                                                                                                                          "If only they used DC from the wall socket, all those H100s would be green" is, not, I think, the hill you want to die on.

                                                                                                                                                          But, yeah, my three 18MW/y racks agree that more power efficiency would be nice, it's just that Rewrite It In (Safe) Rust is unlikely to help with that...

                                                                                                                                                            • yjftsjthsd-h

                                                                                                                                                              11/21/2024

                                                                                                                                                              > it's just that Rewrite It In (Safe) Rust is unlikely to help with that...

                                                                                                                                                              I didn't see any mention of Rust in the article?

                                                                                                                                                                • PreInternet01

                                                                                                                                                                  11/21/2024

                                                                                                                                                                  [flagged]

                                                                                                                                                                    • bigfatkitten

                                                                                                                                                                      11/21/2024

                                                                                                                                                                      They wrote their own BMC and various other bits and pieces in Rust. That's an extremely tiny part of the whole picture.

                                                                                                                                                                        • steveklabnik

                                                                                                                                                                          11/22/2024

                                                                                                                                                                          It’s significantly more than that, but it’s also true that we include stuff in other languages where appropriate. CockroachDB is in Go, and illumos is in C, as two examples. But almost all new code we write is in Rust. That is the stuff you’re talking about, but also like, our control plane.

                                                                                                                                                                          Oh and we write a lot of Typescript too.

                                                                                                                                                                      • rcxdude

                                                                                                                                                                        11/22/2024

                                                                                                                                                                        I think it's hard to call it a reason. It is a tool which fits in with the philosophy of the company in terms of how to achieve it's goals, but I think it would still exist if rust didn't. I would describe the goal as making a hyperscaling system that can be sold as a product, the philosophy of how to make this is an aggressive focus on integration, openness, and quality, and that rust is a language that works well with the last two of those goals.

                                                                                                                                                                          • sam_bristow

                                                                                                                                                                            11/23/2024

                                                                                                                                                                            It's also not really a case of "rewriting in Rust" anyway, it's more just "writing it in Rust" since most of the stuff the Oxide team has built is greenfield work.

                                                                                                                                                                        • mycoliza

                                                                                                                                                                          11/22/2024

                                                                                                                                                                          We also sell computers... :)

                                                                                                                                                                          • transpute

                                                                                                                                                                            11/21/2024

                                                                                                                                                                            OSS Rust in Rack trenchcoat.

                                                                                                                                                                            • sophacles

                                                                                                                                                                              11/21/2024

                                                                                                                                                                              That's an interesting take. What's your reasoning? Whats your evidence?

                                                                                                                                                                                • 0x457

                                                                                                                                                                                  11/22/2024

                                                                                                                                                                                  Pretty much everything Oxide publishes on github is either in rust or it's an sdk to service in rust. Well and web panel isn'tin rust, so negative points for that, true evangelists would have used WASM.

                                                                                                                                                                                  But Oxide reason to exist is to keep memory of cool racks from Sun running Solaris alive forever.

                                                                                                                                                                              • murderfs

                                                                                                                                                                                11/21/2024

                                                                                                                                                                                The raison d'ĆŖtre of Oxide isn't Rust, it's continuing to pretend that the bloated corpse of Solaris still has some signs of life.

                                                                                                                                                                                  • yjftsjthsd-h

                                                                                                                                                                                    11/22/2024

                                                                                                                                                                                    https://github.com/illumos/illumos-gate/commits/master/ looks alive to me.

                                                                                                                                                                                    (And for that matter, Oracle's proprietary Solaris seems better maintained than I ever expected, though in this context I think the open source fork is the relevant thing to look at.)

                                                                                                                                                                        • shrubble

                                                                                                                                                                          11/22/2024

                                                                                                                                                                          18MW/year is not a real unit of measurement; did you mean MWh?

                                                                                                                                                                      • einpoklum

                                                                                                                                                                        11/21/2024

                                                                                                                                                                        > How can organizations reduce power consumption and corresponding carbon emissions?

                                                                                                                                                                        Stop running so much useless stuff.

                                                                                                                                                                        Also maybe ARM over x86_64 and similar power-efficiency-oriented hardware.

                                                                                                                                                                        Rack-level system design, or at least power & cooling design, is certainly also a reasonable thing to do. But standardization is probably important here, rather than some bespoke solution which only one provider/supplier offers.

                                                                                                                                                                        > How can organizations keep pace with AI innovation as existing data centers run out of available power?

                                                                                                                                                                        Waste less energy on LLM chatbots?

                                                                                                                                                                          • zamadatix

                                                                                                                                                                            11/21/2024

                                                                                                                                                                            Current ARM servers actually generally offer "on par" (varies by workload) perf/Watt for generally worse absolute performance (varies by workload) i.e. require more other overhead to achieve the same total perf despite "on par" perf/Watt.

                                                                                                                                                                            Need either Apple to get into the general market server business or someone to start designing CPUs as well as Apple (based on the comparison between different ARM cores I'm not sure it really matters if they do so using a specific architecture or not).

                                                                                                                                                                              • p_l

                                                                                                                                                                                11/22/2024

                                                                                                                                                                                It's more a case of selection of optimization parameters and corresponding economy. It's not so much that apple towers over others in design (though they are absolutely no slouches and have wins there) but their design team is in position to coordinate with product directly and as such isn't as limited by "but will it sell in high enough numbers for the excel sheet at investor's desk?"

                                                                                                                                                                                The real show stopper for years is that ARM servers are just not prepared to be a proper platform. uBoot with grudgingly included FDT (after getting kicked out of Linux kernel) does not make a proper platform, and often there's also no BMC, unique approaches to various parts making the server that one annoying weirdo in the data center, etc.

                                                                                                                                                                                Cloud providers can spend the effort to backfill necessary features with custom parts, but doing so on your own on-prem is hard

                                                                                                                                                                                  • zamadatix

                                                                                                                                                                                    11/22/2024

                                                                                                                                                                                    Not sure what you mean wrt to Apple's uniqueness. AMD/Mediatek/Intel/Qualcomm/Samsung only make margin on how well they invest on their designs vs their competitors and they'd all love to be outshipping each other and Apple in any market. All, including Apple, also rely on the same manufacturer for their top products and the ones (Intel/Samsung) with alternatives have not been able to use that as an advantage for top performing products. Sure, Apple can work directly with their own product... but at the end of the day the goal and available customer pool to fight over is the same and they still ship fewer units than the others.

                                                                                                                                                                                    I'm not hands-on familiar with other serious ARM server market players but for several years now Ampere ARM server CPUs at least are nothing like you describe. Phoronix says it best in https://www.phoronix.com/review/linux-os-ampereone

                                                                                                                                                                                    > All the Linux distributions I attempted worked out effortlessly on this Supermicro AmpereOne server. Like with Ampere Altra and Ampere eMAG before that, it's a seamless AArch64 Linux experience. Thanks to supporting open standards like UEFI, Arm SBSA/SBBR and ACPI and not having to rely on DeviceTrees or other nuisances, installing an AArch64 Linux distribution on Ampere hardware is as easy as in the x86_64 space.

                                                                                                                                                                                      • p_l

                                                                                                                                                                                        11/22/2024

                                                                                                                                                                                        Ampere is a bright spot in all of this, indeed. Just considerably late. I remember being bombarded by "ARM servers are going to eat the world" in 2013, but ARM couldn't deliver SBSA in shape that would make it possible and to this day I am left with serious doubts if any ARM board will work out right (there are bright spots though).

                                                                                                                                                                                        As for Apple "uniqueness", I met a lot of people who think that Apple "just" has so much better design team, when it's similar to what you say and the unique part is them being able to properly narrow their design space instead of chasing cost-conscious manufacturers.