\

Cloudflare outage on February 20, 2026

121 points - today at 7:05 PM

Source
  • kgeist

    today at 9:27 PM

    It's something we debated in our team: if there's an API that returns data based on filters, what's the better behavior if no filters are provided - return everything or return nothing?

    The consensus was that returning everything is rarely what's desired, for two reasons: first, if the system grows, allowing API users to return everything at once can be a problem both for our server (lots of data in RAM when fetching from the DB => OOM, and additional stress on the DB) and for the user (the same problem on their side). Second, it's easy to forget to specify filters, especially in cases like "let's delete something based on some filters."

    So the standard practice now is to return nothing if no filters are provided, and we pay attention to it during code reviews. If the user does really want all the data, you can add pagination to your API. With pagination, it's very unlikely for the user to accidentally fetch everything because they must explicitly work with pagination tokens, etc.

    Another option, if you don't want pagination, is to have a separate method named accordingly, like ListAllObjects, without any filters.

      • alemanek

        today at 10:10 PM

        Returning an empty result in that case may cause a more subtle failure. I would think returning an error would be a bit better as it would clearly communicate that the caller called the API endpoint incorrectly. If it’s HTTP a 400 Bad Request status code would seem appropriate.

        • MobileVet

          today at 9:36 PM

          I like your thought process around the ā€˜empty’ case. While the opposite of a filter is no filter, to your point, that is probably not really the desire when it comes to data retrieval. We might have to revisit that ourselves.

      • CommonGuy

        today at 7:28 PM

        Insufficient mock data in the staging environment? Like no BYOIP prefixes at all? Since even one prefix should have shown that it would be deleted by that subtask...

        From all the recent outages, it sounds like Cloudflare is barely tested at all. Maybe they have lots of unit tests etc, but they do not seem to test their whole system... I get that their whole setup is vast, but even testing that subtask manually would have surfaced the bug

          • zmj

            today at 10:23 PM

            Testing the "whole system" for a mature enterprise product is quite difficult. The combinatorial explosion of account configurations and feature usage becomes intractable on two levels: engineers can't anticipate every scenario they need their tests to cover (because the product is too big understand the whole of), and even if comprehensive testing was possible - it would be impractical on some combination of time, flakiness, and cost.

            • dabinat

              today at 7:43 PM

              I think Cloudflare does not sufficiently test lesser-used options. I lurk in the R2 Discord and a lot of users seem to have problems with custom domains.

              • asciii

                today at 7:37 PM

                It was also merged 15 days prior to production release...however, you're spot on with the empty test. That's a basic scenario that if it returned all...is like oh no.

                • suhputt

                  today at 10:15 PM

                  my guess is the company is rotting from the inside and drowning in tech debt

                  • martinald

                    today at 8:36 PM

                    Just crazy. Why does a staging environment matter? They should be running some integration tests against eg an in memory database for these kinds of tasks surely?

                • otar

                  today at 8:26 PM

                  Reliability was/is CF's label.

                  It's alarming already. Too many outages in the past months. CF should fix it, or it becomes unacceptable and people will leave the platform.

                  I really hope they will figure things out.

                    • argestes

                      today at 8:31 PM

                      I have many things dependent on Cloudflare. That makes me root for Cloudflare and I think I'm not the only one. Instead of finding better options we're getting stuck on an already failing HA solution. I wonder what caused this.

                        • slothsarecool

                          today at 9:11 PM

                          There are no alternatives, and those alternatives that did exist back in the day, had to shut down due to either going out of business or not being able to keep a paygo model.

                          Not everybody needs cloudflare, but those that need it and aren't major enterprises, have no other option.

                            • pocksuppet

                              today at 9:18 PM

                              Lots of people who think they need Cloudflare don't. What are you using it for?

                                • slothsarecool

                                  today at 9:36 PM

                                  L7 DDoS protection and global routing + CDN, there is not a single paygo provider that can handle the capacity CF can, especially not at this price range (mitigated attacks distributed from approximately 50-90k ips, adding up to about 300-700k rps).

                                  We tried Stackpath, Imperva (Incapsula back in the day), etc but they were either too expensive or went out of business.

                              • Sanzig

                                today at 9:19 PM

                                Bunny.net? Doesn't have near the same feature set as Cloudflare, but the essentials are there and you can easily pay as you go with a credit card.

                                  • slothsarecool

                                    today at 9:38 PM

                                    Their WAF isn't there yet, the moment it can build the expressions you can build with CF (and allows you to have as much visibility into the traffic as CF does), then it might be a solid option, assuming they have the compute/network capacity.

                            • arcatech

                              today at 8:58 PM

                              Do you not feel concern about you and everybody else deciding to put ALL of their eggs into one basket like this?

                                • ranger_danger

                                  today at 10:37 PM

                                  I would bet money that most people who use CF now are already hosting their endpoints at a single provider. I don't think most people care until it actually becomes enough of a problem.

                      • alansaber

                        today at 8:29 PM

                        Not sure why everyone is complaining, new MCP features are more important than uptime

                        • blibble

                          today at 7:45 PM

                          is this blog post LLM generated?

                          the explanation makes no sense:

                          > Because the client is passing pending_delete with no value, the result of Query().Get(ā€œpending_deleteā€) here will be an empty string (ā€œā€), so the API server interprets this as a request for all BYOIP prefixes instead of just those prefixes that were supposed to be removed. The system interpreted this as all returned prefixes being queued for deletion.

                          client:

                               resp, err := d.doRequest(ctx, http.MethodGet, `/v1/prefixes?pending_delete`, nil)
                          
                          server:

                              if v := req.URL.Query().Get("pending_delete"); v != "" {
                                  // ignore other behavior and fetch pending objects from the ip_prefixes_deleted table
                                  prefixes, err := c.RO().IPPrefixes().FetchPrefixesPendingDeletion(ctx)
                                  if err != nil {
                                      api.RenderError(ctx, w, ErrInternalError)
                                      return
                                  }
                          
                                  api.Render(ctx, w, http.StatusOK, renderIPPrefixAPIResponse(prefixes, nil))
                                  return
                              }
                          
                          even if the client had passed a value it would have still done exactly the same thing, as the value of "v" (or anything from the request) is not used in that block

                            • subscribed

                              today at 9:57 PM

                              That's weird. They only removed some 6 of our prefixes out of perhaps 40 we have with them, so something seems off in this explanation.

                              • bretthoerner

                                today at 7:51 PM

                                > even if the client had passed a value it would have still done exactly the same thing, as the value of "v" (or anything from the request) is not used in that block

                                If they passed in any value, they would have entered the block and returned early with the results of FetchPrefixesPendingDeletion.

                                From the post:

                                > this was implemented as part of a regularly running sub-task that checks for BYOIP prefixes that should be removed, and then removes them.

                                They expected to drop into the block of code above, but since they didn't, they returned all routes.

                                  • blibble

                                    today at 8:05 PM

                                    okay so the code which returned everything isn't there

                                    actual explanation: the API server by default returns everything. the client attempted to make a request to return "pending_deletes", but as the request was malformed, the API instead went down the default path, which returned everything. then the client deleted everything.

                                    makes sense now

                                    but is that explanation is even worse

                                    because that means the code path was never tested?

                                      • jbxntuehineoh

                                        today at 8:36 PM

                                        or they tested it, but not with a dataset that contained prefixes not pending deletion

                                • bstsb

                                  today at 7:49 PM

                                  doesn't look AI-generated. even if they have made a mistake, it's probably just from the rush of getting a postmortem out prior to root cause analysis

                                  • himata4113

                                    today at 7:55 PM

                                    yep, no mention that re-advertised prefixes would be withdrawn again as well during the entire impact even after they shut it down.

                                • atty

                                  today at 7:29 PM

                                  I do not work in the space at all, but it seems like Cloudflare has been having more network disruptions lately than they used to. To anyone who deals with this sort of thing, is that just recency bias?

                                    • Icathian

                                      today at 7:33 PM

                                      It is not. They went about 5 years without one of these, and had a handful over the last 6 months. They're really going to need to figure out what's going wrong and clean up shop.

                                        • NinjaTrance

                                          today at 7:41 PM

                                          Engineers have been vibe coding a lot recently...

                                            • jsheard

                                              today at 7:49 PM

                                              The featured blog post where one of their senior engineering PMs presented an allegedly "production grade" Matrix implementation, in which authentication was stubbed out as a TODO, says it all really. I'm glad a quarter of the internet is in such responsible hands.

                                                • gtowey

                                                  today at 8:55 PM

                                                  It's spreading and only going to get worse.

                                                  Management thinks AI tools should make everyone 10x as productive, so they're all trying to run lean teams and load up the remaining engineers with all the work. This will end about as well as the great offshoring of the early 2000s.

                                                  • blibble

                                                    today at 8:18 PM

                                                    there was also a post here where an engineer was parading around a vibe-coded oauth library he'd made as a demonstration of how great LLMs were

                                                    at which point the CVEs started to fly in

                                                    • ranger_danger

                                                      today at 10:39 PM

                                                      Matrix doesn't actually define how one should do authentication though... every homeserver software is free to implement it however they want.

                                                      • dana321

                                                        today at 7:53 PM

                                                        Thats a classic claude move, even the new sonnet 4.6 still does this.

                                                          • bonesss

                                                            today at 8:01 PM

                                                            It’s almost as classic as just short circuiting tests in lightly obfuscated ways.

                                                            I could be quite the kernel developer if making the test green was the only criteria.

                                                            • brutalc

                                                              today at 8:00 PM

                                                              [dead]

                                                      • dakiol

                                                        today at 8:06 PM

                                                        No joke. In my company we "sabotaged" the AI initiative led by the CTO. We used LLMs to deliver features as requested by the CTO, but we introduced a couple of bugs here and there (intentionally). As a result, the quarter ended up with more time allocated to fix bugs and tons of customer claims. The CTO is now undoing his initiative. We all have now some time more to keep our jobs.

                                                          • samrus

                                                            today at 8:42 PM

                                                            Thats actively malicious. I understand not going out of your way to catch the LLMs' bugs so as to show the folly of the initiative, but actively sabotaging it is legitimately dangerous behavior. Its acting in bad faith. And i say this as someone who would mostly oppose such an initiative myself

                                                            I would go so far as to say that you shouldnt be employed in the industry. Malicious actors like you will contribute to an erosion of trust thatll make everything worse

                                                              • sp00chy

                                                                today at 8:53 PM

                                                                Might be but sometimes you don’t have another choice when employers are enforcing AIs which have no ā€žfeelingā€œ for context of all business processes involved created by human workers in the years before. Those who spent a lot of love and energy for them mostly. And who are now forced to work against an inferior but overpowered workforce.

                                                                Don’t stop sabotaging AI efforts.

                                                                • tovej

                                                                  today at 10:14 PM

                                                                  Forcing developers to use unsafe LLM tools is also malicious. This is completely ethical to me. Not commenting on legality. But ethically, this is correct.

                                                              • hypeatei

                                                                today at 8:48 PM

                                                                That's extremely unethical. You're being paid to do something and you deliberately broke it which not only cost your employer additional time and money, but it also cost your customers time and money. If I were you, I'd probably just quit and find another profession.

                                                                • renegade-otter

                                                                  today at 8:57 PM

                                                                  I see someone is not familiar with the joys of the current job market.

                                                                  • logicchains

                                                                    today at 8:22 PM

                                                                    That's not "sabotaged", that's sabotaged, if you intentionally introduced the bugs. Be very careful admitting something like that publicly unless you're absolutely completely sure nobody could map your HN username to your real identity.

                                                            • Ylpertnodi

                                                              today at 8:19 PM

                                                              Typo: "shop", should have been with an 'el'.

                                                              (: phonetically, because 'l's are hard to read.

                                                          • dazc

                                                            today at 7:55 PM

                                                            Launching a new service every 5 minutes is obviously stretching their resources.

                                                            • lysace

                                                              today at 7:51 PM

                                                              It has been roughly speaking five and a half years since the IPO. The original CTO (John Graham-Cumming) left about a year ago.

                                                                • jacquesm

                                                                  today at 7:55 PM

                                                                  They coasted on momentum for half a year. I don't even think it says anything negative about the current CTO, but more of what an exception JGC is relative to what is normal. A CTO leaving would never show up the next day in the stats, the position is strategic after all. But you'd expect to see the effect after a while, 6 months is longer than I would have expected, but short enough that cause and effect are undeniable.

                                                                  Even so, it is a strong reminder not to rely on any one vendor for critical stuff, in case that wasn't clear enough yet.

                                                                  • dazc

                                                                    today at 7:57 PM

                                                                    I wondered what happened to him?

                                                                      • jgrahamc

                                                                        today at 9:09 PM

                                                                        I am reading HN.

                                                                          • SoKamil

                                                                            today at 10:01 PM

                                                                            What is your opinion on the recent Cloudflare outages?

                                                                        • brcmthrowaway

                                                                          today at 8:03 PM

                                                                          He's on a yacht somewhere

                                                                            • tedd4u

                                                                              today at 8:11 PM

                                                                              For real

                                                                      • today at 8:18 PM

                                                                    • slophater

                                                                      today at 8:20 PM

                                                                      been at cf for 7 yrs but thinking of gtfo soon. the ceo is a manchild, new cto is an idiot, rest of leadership was replaced by yes-men, and the push for AI-first is being a disaster. c levels pretend they care about reliability but pressure teams to constantly ship, cto vibe codes terraform changes without warning anyone, and it's overall a bigger and bigger mess

                                                                      even the blog, that used to be a respected source of technical content, has morphed into a garbage fire of slop and vaporware announcements since jgc left.

                                                                        • sebmellen

                                                                          today at 10:43 PM

                                                                          Do you feel that Matthew Prince is still technically active/informed? I've interacted with him in the past and he seemed relatively technically grounded, but that doesn't seem as true these days.

                                                                          • goalieca

                                                                            today at 8:54 PM

                                                                            I’ve had a lot of problems lately. Basic things are failing and it’s like product isn’t involved at all in the dash. What’s worse? The support.. the chat is the buggiest thing I’ve ever seen.

                                                                              • slophater

                                                                                today at 10:24 PM

                                                                                don't worry, if it gets much worse the ceo will just throw all of support under the bus again. it will surely get better.

                                                                            • lysace

                                                                              today at 10:13 PM

                                                                              > the ceo is a manchild

                                                                              Checks out with what we have seen from the outside.

                                                                              • __turbobrew__

                                                                                today at 9:00 PM

                                                                                You know what they say, shit rolls downhill. I don't personally know the CEO, but the feeling I have got from their public fits on social media doesn't instill confidence.

                                                                                If I was a CF customer I would be migrating off now.

                                                                                • a24446ff87

                                                                                  today at 8:47 PM

                                                                                  GSD! GSD!! ship! ship! ship!

                                                                                  **everything breaks**

                                                                                  ...

                                                                                  **everything breaks again**

                                                                                  oh fuck! Code Orange! I repeat, Code Orange! we need to rebuild trust(R)(TM)! we've let our customers down!

                                                                                  ...

                                                                                  **everything breaks again**

                                                                                  Code Orangier! I repeat, Code Orangier!

                                                                                    • slophater

                                                                                      today at 10:20 PM

                                                                                      exactly. recently "if the cto is shipping more than you, you're doing something wrong"

                                                                                      cto can't even articulate a sentence without passing it through an LLM, and instead of doing his job he's posting the stupidest shit to his personal bootlicking chat channel. I cringe every time at the brown-nosers that inhabit that hovel.

                                                                                      no words for what the product org is becoming too. they should take their own advice a bit further and just replace all the leadership with an LLM, it would be cheaper and it's the same shit in practice

                                                                                  • slophater

                                                                                    today at 8:28 PM

                                                                                    amazing how my comment was flagged in 30 seconds... keep bootlicking

                                                                                • Betelbuddy

                                                                                  today at 7:45 PM

                                                                                  Cloudflare Outages are as predictable, as the Sun coming up tomorrow. Its their engineering culture.

                                                                                  https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

                                                                                  • candiddevmike

                                                                                    today at 8:00 PM

                                                                                    Wait till you see the drama around their horrible terraform provider update/rewrite:

                                                                                    https://github.com/cloudflare/terraform-provider-cloudflare/...

                                                                                • NinjaTrance

                                                                                  today at 7:45 PM

                                                                                  The irony is that the outage was caused by a change from the "Code Orange: Fail Small initiative".

                                                                                  They definitely failed big this time.

                                                                                  • anurag

                                                                                    today at 8:12 PM

                                                                                    The one redeeming feature of this failure is staged rollouts. As someone advertising routes through CF, we were quite happy to be spared from the initial 25%.

                                                                                    • himata4113

                                                                                      today at 7:54 PM

                                                                                      This blog post is inaccurate, the prefixes were being revoked over and over - to keep your prefixes advertised you had to have a script that would readd them or else it would be withdrawn again. The way they seemed to word it is really dishonest.

                                                                                      • boarush

                                                                                        today at 7:28 PM

                                                                                        While neither am I nor the company I work for directly impacted by this outage, I wonder how long can Cloudflare take these hits and keep apologizing for it. Truly appreciate them being transparent about it, but businesses care more about SLAs and uptime than the incident report.

                                                                                          • llama052

                                                                                            today at 7:42 PM

                                                                                            I’ll take clarity and actual RCAs than Microsoft’s approach of not notifying customers and keeping their status page green until enough people notice.

                                                                                            One thing I do appreciate about cloudflare is their actual use of their status page. That’s not to say these outages are okay. They aren’t. However I’m pretty confident in saying that a lot of providers would have a big paper trail of outages if they were more honest to the same degree or more so than cloudflare. At least from what I’ve noticed, especially this year.

                                                                                              • boarush

                                                                                                today at 7:47 PM

                                                                                                Azure straight up refuses to show me if there's even an incident even if I can literally not access shit.

                                                                                                But last few months has been quite rough for Cloudflare, and a few outages on their Workers platform that didn't quite make the headlines too. Can't wait for Code Orange to get to production.

                                                                                            • jacquesm

                                                                                              today at 7:58 PM

                                                                                              Bluntly: they expended that credit a while ago. Those that can will move on. Those that can't have a real problem.

                                                                                              As for your last sentence:

                                                                                              Businesses really do care about the incident reports because they give good insight into whether they can trust the company going forward. Full transparency and a clear path to non-repetition due to process or software changes are called for. You be the judge of whether or not you think that standard has been met.

                                                                                                • boarush

                                                                                                  today at 8:04 PM

                                                                                                  I might be looking at it differently, but aren't decisions over a certain provider of service being made by the management. Incident reports don't ever reach there in my experience.

                                                                                                    • samrus

                                                                                                      today at 8:50 PM

                                                                                                      In my experience, the gist of it does reach management when its an existing vendor. Especially if management is tech literate

                                                                                                      Becuase management wants to know why the graphs all went to zero, and the engineers have nothing else to do but relay the incident report.

                                                                                                      This builds a perception for management of the vendor, and if the perception is that the vendor doesnt tell them shit or doesnt even seem to know theres an outage, then management can decide to shift vendors

                                                                                          • dilyevsky

                                                                                            today at 8:37 PM

                                                                                            > Because the client is passing pending_delete with no value, the result of Query().Get(ā€œpending_deleteā€) here will be an empty string (ā€œā€), so the API server interprets this as a request for all BYOIP prefixes instead of just those prefixes that were supposed to be removed.

                                                                                            Lmao, iirc long time ago Google's internal system had the same exact bug (treating empty as "all" in the delete call) that took down all their edges. Surprisingly there was little impact as traffic just routed through the next set of proxies.

                                                                                            • jaboostin

                                                                                              today at 8:12 PM

                                                                                              Hindsight is 20/20 but why not dry run this change in production and monitor the logs/metrics before enabling it? Seems prudent for any new ā€œdelete something in prodā€ change.

                                                                                              • ssiddharth

                                                                                                today at 7:49 PM

                                                                                                The eternal tech outage aphorism: It's always DNS, except for when it's BGP.

                                                                                                  • subscribed

                                                                                                    today at 10:00 PM

                                                                                                    You could argue BGP is like DNS for IPs :)

                                                                                                • vimda

                                                                                                  today at 9:08 PM

                                                                                                  One has to wonder when the board realises Dane was a bad replacement for JGC. These outages are getting ridiculous

                                                                                                  • user205738

                                                                                                    today at 9:09 PM

                                                                                                    They should have rewritten this code in Rust using these brilliant language models. /jk

                                                                                                    • tokyobreakfast

                                                                                                      today at 8:15 PM

                                                                                                      Is this trend of oversharing code snippets and TMI postmortems done purposely to distract their customers from raging over the outage and the next impending fuckup?

                                                                                                        • samrus

                                                                                                          today at 8:53 PM

                                                                                                          Just seems like transparency. I agree that we should also judge them based on the frequency of these incidents and amwhether they provide a path to non-repeatability, but i wouldnt criticize them for the transparency per se

                                                                                                          • alansaber

                                                                                                            today at 8:19 PM

                                                                                                            Well I still appreciate a good postmortem even if I have no doubt it'll happen again imminently

                                                                                                            • bdangubic

                                                                                                              today at 8:21 PM

                                                                                                              and if they didn’t we’d posting about lack of transparency. damned if you do, damned if you don’t

                                                                                                          • wa008

                                                                                                            today at 8:42 PM

                                                                                                            This transparent report can earn my trust

                                                                                                            • today at 7:47 PM

                                                                                                              • djfobbz

                                                                                                                today at 9:09 PM

                                                                                                                I'm honestly amazed that a company CF's size doesn't have a neat little cluster of Mac Minis running OpenClaw and quietly taking care of this for them.

                                                                                                                • VirusNewbie

                                                                                                                  today at 8:05 PM

                                                                                                                  If you track large SaaS and Cloud uptime, it seem to correlate pretty highly with compensation for big companies. Is cloudflare getting top talent?

                                                                                                                    • bombcar

                                                                                                                      today at 8:10 PM

                                                                                                                      Based on IPO date and lockups, I suspect top talent is moving on.

                                                                                                                  • henning

                                                                                                                    today at 8:04 PM

                                                                                                                    Sure vibe-coded slop that has not been properly peer reviewed or tested prior to deployment is leading to major outages, but the point is they are producing lots of code. More code is good, that means you are a good programmer. Reading code would just slow things down.

                                                                                                                      • sp00chy

                                                                                                                        today at 8:48 PM

                                                                                                                        that’s my feeling also. We will get this more and more in future.

                                                                                                                    • NooneAtAll3

                                                                                                                      today at 8:25 PM

                                                                                                                      again?

                                                                                                                      • dryarzeg

                                                                                                                        today at 7:35 PM

                                                                                                                        DaaS - Downtime as a Service©

                                                                                                                        Just joking, no offence :)

                                                                                                                          • logicchains

                                                                                                                            today at 9:03 PM

                                                                                                                            DaaS is good ja