\

OpenBSD: PF queues break the 4 Gbps barrier

160 points - today at 1:43 PM

Source
  • ralferoo

    today at 2:29 PM

    In the days when even cheap consumer hardware ships with 2.5G ports, this number seems weirdly low. Does this mean that basically nobody is currently using OpenBSD in the datacentre or anywhere that might be expecting to handle 10G or higher per port, or is it just filtering that's an issue?

    I'm not surprised that the issue exists as even 10 years ago these speeds were uncommon outside of the datacentre, I'm just surprised that nobody has felt a pressing enough need to fix this earlier in the previous few years.

      • Someone

        today at 3:03 PM

        The article is about allowing bandwidth restrictions in bytes/second that are larger than 2³²-1, not about how fast pf can filter packets.

        I guess few people with faster ports felt the need to limit bandwidth for a service to something that’s that large.

        FTA:

        “OpenBSD's PF packet filter has long supported HFSC traffic shaping with the queue rules in pf.conf(5). However, an internal 32-bit limitation in the HFSC service curve structure (struct hfsc_sc) meant that bandwidth values were silently capped at approximately 4.29 Gbps, ” the maximum value of a u_int ".

        With 10G, 25G, and 100G network interfaces now commonplace, OpenBSD devs making huge progress unlocking the kernel for SMP, and adding drivers for cards supporting some of these speeds, this limitation started to get in the way. Configuring bandwidth 10G on a queue would silently wrap around, producing incorrect and unpredictable scheduling behaviour.

        A new patch widens the bandwidth fields in the kernel's HFSC scheduler from 32-bit to 64-bit integers, removing this bottleneck entirely.”

          • nine_k

            today at 4:03 PM

            > silently wrap around, producing incorrect and unpredictable

            Now I'm more scared to use OpenBSD than I was a minute before.

            I strongly prefer software that fails loudly and explicitly.

              • kaashif

                today at 4:57 PM

                Yeah, that's pretty appalling.

                Regardless of how good the philosophy of something is, if it's as niche and manpower constrained as OpenBSD is then it's going to accumulate problems like this.

                  • muvlon

                    today at 5:35 PM

                    I actually think this isn't even surprising from OpenBSD philosophically. They still subscribe to the Unix philosophy of old, moreso than FreeBSD and much much more than Linux.

                    That is, "worse is better" and it's okay to accept a somewhat leaky abstraction or less helpful diagnostics if it simplifies the implementation.

                    This is why `ed` doesn't bother to say anything but "?" to erroneous commands. If the user messes up, why should it be the job of the OS to handhold them? Garbage in, garbage out. That attitude may seem out of place today but consider that it came from a time when a program might have one author and 1-20 users, so their time was valued almost equally.

                      • fluoridation

                        today at 8:46 PM

                        Even in that scenario that attitude seems out of place, considering a feature is implemented once and used many times.

        • traceroute66

          today at 3:02 PM

          > Does this mean that basically nobody is currently using OpenBSD in the datacentre or anywhere

          Half the problem is lack of proper drivers. I love OpenBSD but all the fibre stuff is just a bit half-baked.

          For a long time OpenBSD didn't even have DOM (light-level monitoring etc.) exposed in its 1g fibre drivers. Stuff like that automatically kills off OpenBSD as a choice for datacentres where DOM stats are a non-negotiable hard requirement as they are so critical to troubleshooting.

          OpenBSD finally introduced DOM stats for SFP somewhere around 2020–2021, but it doesn't always work, it depends if you have the right magic combination of SFP and card manufacturer. Whilst on FreeBSD it Just Works (TM).

          And then overall, for higher speed optics, FreeBSD simply remains lightyears ahead (forgive the pun !). For example, Decisio make nice little router boxes with 10g SFP+ on them, FreeBSD has the drivers out-of-the-box, OpenBSD doesn't. And that's only an SFP+ example, its basically rolling-tumbleweed in a desert territory if you start venturing up to QSFP etc. ...

            • CursedSilicon

              today at 4:23 PM

              How much work is it to port drivers between Free and Open BSD?

                • SoftTalker

                  today at 5:57 PM

                  OpenBSD doesn't allow binary blobs. So if there isn't a fully open-source driver (or adequate docs for a developer to write one), it won't happen (hence, no Nvidia support). Not sure about FreeBSD in this regard, but AFAIK most of these drivers start as ports from Linux.

                  • traceroute66

                    today at 5:08 PM

                    > How much work is it to port drivers between Free and Open BSD?

                    IIRC there are two problems at play:

                    First, I'm not a C coder so this is a bit above my pay-grade, but from what little I do remember about the subject, the problem relates to the OpenBSD requirement to adopt their security mechanisms such as pledge, unveil and strlcpy. IIRC the OpenBSD compiler is also (unsurprisingly !) more anal about stack protector W^X etc. So the porting process is perhaps more time-consuming and low-level than it might otherwise be on other porting projects.

                    Second, the licensing thing might come into it. OpenBSD has a high preference to most-permissive, and so things like GPL-licensed origins might not be acceptable. IIRC FreeBSD is a little more relaxed within reason ? And when you're working with network cards I would think that is perhaps hard to avoid to some extent if you're relying on certain bits being ultimately derived from Intel chipsets or whatever.

                    I'm open to correction by those more knowledgable than me on porting intricacies. ;)

                      • toast0

                        today at 5:31 PM

                        The difficulty of porting NIC drivers is probably not in differences in the userland API; kernel drivers don't likely pledge anything. But OpenBSD and FreeBSD diverged a long time ago, and I'd be surprised if their kernel APIs are very close anymore. How to detect and interface with devices is probably a bit different, and rx/tx packets will be different too.

                        I think most of the vendor supplied NIC drivers in FreeBSD are BSD licensed, so that shouldn't be an issue. I checked Intel, Melanox (now NVidia), Cavium/QLogic/Broadcom, Solarflare. The realtek driver in the tree is BSD licensed but not vendor provided; the vendor driver in ports is also BSD licensed. I'm not sure if there's a datacenter ethernet provider with in kernel drivers I missed; but I don't think license is a problem here either --- anyway you could ship a driver module out of tree if it was.

            • citrin_ru

              today at 2:36 PM

              AFAIK performance is not a priority for OpenBSD project - security is (and other related qualities like code which is easy to understand and maintain). FreeBSD (at least when I followed it several years ago) had better performance both for ipfw and its own PF fork (not fully compatible with OpenBSD one).

                • traceroute66

                  today at 3:07 PM

                  > AFAIK performance is not a priority for OpenBSD project - security is

                  TBF that was the case historically, but they have absolutely been putting in an effort into performance in their more recent releases.

                  Lots of stuff that used to be simply horrific on OpenBSD, such as multi-peer BGP full-table refreshes is SIGNIFICANTLY better in the last couple of years.

                  Clearly still not as good as FreeBSD, but compared to what it was...

              • ffk

                today at 4:21 PM

                A lot of the time once you get into multi-gig+ territory the answer isn't "make the kernel faster," it's "stop doing it in the kernel."

                You end up pushing the hot path out to userland where you can actually scale across cores (DPDK/netmap/XDP style approaches), batch packets, and then DMA straight to and from the NIC. The kernel becomes more of a control plane than the data plane.

                PF/ALTQ is very much in the traditional in-kernel, per-packet model, so it hits those limits sooner.

                  • toast0

                    today at 5:05 PM

                    The big things to avoid are crossing the user/kernel divide and communication across cores.

                    Staying in the kernel is approximately the same as bypassing the kernel (caveats apply); for a packet filtering / smoothing use case, I don't think kernel bypass is needed. You probably want to tune NIC hashing so that inbound traffic for a given shaping queue arrives in the same NIC rx queue; but you probably want that in a kernel bypass case as well. Userspace is certainly nicer during development, as it's easier to push changes, but in 2026, it feels like traffic shaping has pretty static requirements and letting the kernel do all the work feels reasonable to me.

                    Otoh, OpenBSD is pretty far behind the curve on SMP and all that (I think their PF now has support for SMP, but maybe it's still in development?; I'd bet there's lots of room to reduce cross core communication as well, but I haven't examined it). You can't pin userspace cores to cpus, I doubt their kernel datastructures are built to reduce communications, etc. Kernel bypass won't help as much as you would hope, if it's available, which it might not be, because you can't control the userspace to limit cross core communications.

                      • rpcope1

                        today at 6:43 PM

                        Just a single data point, but the BSDs in general, as much as people like to jerk them off, having tested both recent FreeBSD (which should be much faster than OpenBSD) and Debian on I guess the now kind of elderly APU2s I have, netfilter is noticably faster (and I find nftables to be frankly less challenging than pf) and gets those devices right at gigabit line speed even with complex firewall rules, where as pf leaves performance on the table. It probably has to do with the fact it's an older 4 core design that wasn't super high power to begin with (does still does its job extremely well), but still.

                          • toast0

                            today at 7:17 PM

                            One issue I've seen from a fair number of people on the APU2s running FreeBSD is if they've got PPPoE; inbound traffic (at least) all hashes to the same RX queue, and as a result there's no parallelism... if you're on gigE fiber with PPPoE, the APU2 can't really keep up single threaded. The earlier APU (1) boards use realtek nics that I think only have a single queue, so you won't get effective parallelism there either. If I'm finding the right information, APU2s with i210 have 4 rx queues which is well matched with a quad core, but those with i211 only have 2 rx queues, which means half of the processors will have nothing to do unless your kernel redistributes packets after rxing, but that comes at a cost too.

                            Linux may have a different packet flow, or netfilter could be faster than pf.

                            > I find nftables to be frankly less challenging than pf

                            I also don't really care for how pf specifies rules. I would rather run ipfw, but pf has pfsync whereas ipfw doesn't have a way to do failover with state synchronization for stateful firewalls/NAT. So I figured out how to express my rules in pf.conf; because it was worth it, even if I don't like it :P

                    • cperciva

                      today at 5:01 PM

                      pushing the hot path out to userland where you can actually scale across cores

                      What sort of kernel do you have which can't scale across cores?

                  • atmosx

                    today at 5:47 PM

                    PF itself is not tailored towards ISPs and/or big orgs. IPFW (FreeBSD) is more powerful and flexible.

                    OpenBSD shines as a secure all-in-one router SOHO solution. And it’s great because you get all the software you need in the base system. PF is intuitive and easy to work with, even for non network gurus.

                    • asmnzxklopqw

                      today at 6:18 PM

                      OpenBSD was a great OS back in the late 90s and even early 2000s. In some cases it was competing neck to neck with Linux. Since then, well, Linux grew a lot and OpenBSD not so much. There are multiple causes for this, I will go only through a few: Linux has more support from the big companies; the huge difference in userbase numbers; Linux is more welcoming to new users. And the difference is only growing.

                        • dim13

                          today at 7:06 PM

                          "OpenBSD does not want to attract GNU newbies." misc@

                          And that's IMHO is a good thing.

                      • toast0

                        today at 2:44 PM

                        > Does this mean that basically nobody is currently using OpenBSD in the datacentre or anywhere that might be expecting to handle 10G or higher per port, or is it just filtering that's an issue?

                        This looks like it only affects bandwidth limiting. I suspect it's pretty niche to use OpenBSD as a traffic shaper at 10G+, and if you did, I'd imagine most of the queue limits would tend toward significantly less than 4G.

                        • IcePic

                          today at 2:46 PM

                          One thing could also be that by the time you have 10GE uplinks, shaping is not as important.

                          When we had 512kbit links, prioritizing VOIP would be a thing, and for asymmetric links like 128/512kbit it was prudent to prioritize small packets (ssh) and tcp ACKs on the outgoing link or the downloads would suffer, but when you have 5-10-25GE, not being able to stick an ACK packet in the queue is perhaps not the main issue.

                            • hrmtst93837

                              today at 5:35 PM

                              At 10G and up, shaping still matters. Once you mix backups, CCTV, voice, and customer circuits on the same uplink, a brief saturation event can dump enough queueing delay into the path that the link looks fine on paper while the stuff people actually notice starts glitching, and latency budgets is tight. Fat pipes don't remove the need for control, they just make the billing mistake more expensive.

                                • Melatonic

                                  today at 5:50 PM

                                  At this level wouldnt a proper implementation be segregating the link into multiple VMs (or jails?) ? Or is that the same thing on BSD?

                          • Melatonic

                            today at 5:47 PM

                            Isnt OpenBSD mainly used for security testing or do I have it wrong? Would be surprised if it was used in production datacenter networking hardware at all. Seems like most people would use one of the proprietary implementations (which likely would include specific written drivers for that hardware) or something like FreeBSD

                              • SoftTalker

                                today at 6:00 PM

                                It's widely used as a router, that's one of its primary uses. But not sure to what scale, likely at small orgs not at major ISPs.

                                But, OpenBSD is a project by and for its developers. They use it and develop it to do what they want; they don't really care what anyone else does or doesn't do with it.

                                • lstodd

                                  today at 6:06 PM

                                  You don't need 4gbps pf queues or even fiber on every single machine in a datacenter. So be surprised, it is used widely for its simplicity and reliability not to mention security compared to those proprietary implementations you speak of, may they rot in hell.

                          • haunter

                            today at 4:37 PM

                            My local fiber finally offers 4 Gbps connection but I’m not even sure what to use it for lol. I have 2 Gbps and that's more than enough already.

                              • shpingbing

                                today at 5:53 PM

                                I finally talked myself into going to 3Gbps (and working on internal network to 10). Internal transfer to NAS will be much faster, and downloading AI models should go from ~8 minutes to less than 3 minutes. Is it necessary? Not exactly. But super nice

                                  • darknavi

                                    today at 7:55 PM

                                    I do nightly offsite mirroring (just to a cloud provider) and making that go faster and not cannibalize all of my throughput is nice.

                            • rayiner

                              today at 2:17 PM

                              Can pf actually shape at speeds above 4 gbps?

                              • razighter777

                                today at 7:36 PM

                                I would love to use openbsd. I really wanna give it a try but the filesystem choices seem kinda meh. Are there any modern filesystems with good nvme and FDE support for openbsd.

                                • gigatexal

                                  today at 4:09 PM

                                  It’s still single threaded. PF in FreeBSD is multithreaded. For home wan’s I’d be using openBSD. For anything else FreeBSD.

                                  • riteshyadav02

                                    today at 1:58 PM

                                    [dead]

                                    • mdavidyu

                                      today at 4:57 PM

                                      [dead]

                                      • holdtman47

                                        today at 4:47 PM

                                        [dead]

                                        • Heer_J

                                          today at 3:39 PM

                                          [dead]

                                          • Heer_J

                                            today at 2:19 PM

                                            [dead]

                                            • jamesvzb

                                              today at 3:27 PM

                                              [dead]

                                              • bell-cot

                                                today at 1:53 PM

                                                "Values up to 999G are supported, more than enough for interfaces today and the future." - Article

                                                "When we set the upper limit of PC-DOS at 640K, we thought nobody would ever need that much memory." - Bill Gates

                                                  • throw0101d

                                                    today at 2:01 PM

                                                    > "Values up to 999G are supported, more than enough for interfaces today and the future." - Article

                                                    Especially given that IEEE 802.3dj is working on 1.6T / 1600G, and is expected to publish the final spec in Summer/Autumn 2026:

                                                    * https://en.wikipedia.org/wiki/Terabit_Ethernet

                                                    Currently these interfaces are only on switches, but there are already NICs at 800G (P1800GO, Thor Ultra, ConnectX-8/9), so if you LACP/LAGG two together your bond is at 1600G.

                                                  • bitfilped

                                                    today at 2:43 PM

                                                    Yes, we're already running 800G networks, so this phrasing seems really silly to me.

                                                    • WhyNotHugo

                                                      today at 1:56 PM

                                                      Honestly, I'm really curious about this number. 10bits is 1024, so why 999G specifically?

                                                        • abound

                                                          today at 2:02 PM

                                                          Looking at the patch itself (linked in the article), the description has this:

                                                          > We now support configuring bandwidth up to ~1 Tbps (overflow in m2sm at m > 2^40).

                                                          So I think that's it, 2^40 is ~1.099 trillion

                                                          • elevation

                                                            today at 2:01 PM

                                                            Looks like an arbitrary validation cap. By the time we're maxing out the 64-bit underlying representation we probably won't be using Ethernet any more.

                                                              • palmotea

                                                                today at 2:05 PM

                                                                > By the time we're maxing out the 64-bit underlying representation we probably won't be using Ethernet any more.

                                                                We will be using Ethernet until the heat death of the universe, if we survive that long.

                                                                • bell-cot

                                                                  today at 2:37 PM

                                                                  https://en.wikipedia.org/wiki/Ethernet#History (& following sections)

                                                                  Calling something "Ethernet" amounts to a promise that:

                                                                  - From far enough up the OSI sandwich*, you can pretend that it's a magically-faster version of old-fashioned Ethernet

                                                                  - It sticks to broadly accepted standards, so you won't get bitten by cutting-edge or proprietary surprises

                                                                  *https://en.wikipedia.org/wiki/OSI_model

                                                      • chokan

                                                        today at 3:49 PM

                                                        dsa