\

Chess engines do weird stuff

124 points - today at 5:07 PM

Source
  • mpolson64

    today at 6:19 PM

    I'm no expert on chess engine development, but it's surprising to me that both lc0 and stockfish use SPSA for "tuning" the miscellaneous magic numbers which appear in the system rather than different black box optimization algorithms like Bayesian optimization or evolutionary algorithms. As far as I am aware both of these approaches are used more often for similar tasks in non-chess applications (ex. hyperparameter optimization in ML training) and have much more active research communities compared to SPSA.

    Is there something special about these chess engines that makes SPSA more desirable for these use cases specifically? My intuition is that something like Bayesian optimization could yield stronger optimization results, and that the computational overhead of doing BO would be minimal compared to the time it takes to train and evaluate the models.

      • LPisGood

        today at 7:58 PM

        One thing I wonder is why design of experiments (DOE) methodology is so seldom used for these things.

        Statisticians and operations researchers have spent a hundred years deciding how to do as few experiments as possible to tweak parameters in the ways that give the highest impact with statistical basis that the selections are good.

        In the language of information and decision trees, these experiments are trying to in some sense “branch” on the entropy minimizing variables.

          • agalunar

            today at 8:09 PM

            SPRT is used religiously in engine development today. There is enormous incentive to test efficiently.

            https://github.com/official-stockfish/fishtest/wiki/Fishtest...

            • mpolson64

              today at 8:25 PM

              DOE is still very useful in many contexts, but when it's possible do use a sequential design these adaptive techniques really start to pull away in terms of optimization quality.

              There's simply a lot of sample efficiency to gain by adapting the experiment to incoming data in a regime where one can repeatedly design n candidates, observe their effects, and repeat m times compared to a setting where one must design a fixed experiment with n*m samples.

          • sscg13

            today at 7:06 PM

            Engines like Stockfish might have over 100 "search parameters" that need to be tuned, to my best knowledge SPSA is preferred because the computational cost typically does not depend on the number of parameters.

            Or, if attempting to use SPSA to say, perform a final post-training tune to the last layers of a neural network, this could be thousands of parameters or more.

              • mpolson64

                today at 8:28 PM

                The concern about the dimensionality of the search space is real, especially once things cross over into the 100s -- BO would certainly not be useful post-training the way the blog post talks about using SPSA.

                That being said, it still seems possible to be that using a different black box optimization technique for a fairly constrained set of related magic numbers (say, fewer than 50) might lead to some real performance improvements in these systems, could be worth reaching out to the lc0 or stockfish development communities.

        • incognito124

          today at 5:56 PM

          Please be careful when visiting the homepage

            • andix

              today at 10:04 PM

              If you paid for MS FrontPage, you better get some value out of it!

              • NooneAtAll3

                today at 6:54 PM

                as always, genius and insanity are only 1 step apart

                  • hmmmmmmmmmmmmmm

                    today at 7:10 PM

                    [flagged]

                      • tqpcharlie

                        today at 7:23 PM

                        or a generation of people who feel comfortable talking candidly about their mental ilness(es) and feel safe seeking help among others like them??

                          • hmmmmmmmmmmmmmm

                            today at 7:27 PM

                            You mean the people marketing their mental health issues online?

                              • squeaky-clean

                                today at 8:10 PM

                                Where can I buy some of what they're marketing?

                        • Retric

                          today at 7:31 PM

                          Mental illness has always been common and often been cool in one form or another.

                          Serial killers get fan mail, that’s true now and it was true 100 years ago.

                            • hmmmmmmmmmmmmmm

                              today at 7:41 PM

                              I think a lot of people still grow out of that phase. Like wanting to be like the Joker or taking a 'am I a sociopath' test online and finding your new edgelord persona only to find it deeply cringeworthy later.

                          • simlevesque

                            today at 7:30 PM

                            I mean, some of the things on that homepage are truly the works of a genius IMO

                              • NooneAtAll3

                                today at 9:07 PM

                                no, homepage is insanity

                                blog post is good

                    • today at 6:06 PM

                      • pavel_lishin

                        today at 6:17 PM

                        This really reminds me of the web as I remember it from the mid-to-late 90's; I feel like I'm just a click away from the old deoxy.org, if anyone remembers that. (Don't go there now; the domain appears to have been long-ago hijacked.)

                          • fsiefken

                            today at 10:05 PM

                            I loved the deoxy site, it was one of my favorites :-) Next to the site and writings of the esoteric Brother Blue, who was he? It eventually caused me to go in a reality tunnel for a few years. It was a fascinating and puzzling experience similar as to what was described in Cosmic Trigger III by R.A. Wilson.

                            • incognito124

                              today at 6:25 PM

                              or kittens on encyclopediabrittanica

                          • WesolyKubeczek

                            today at 6:06 PM

                            It gave me serious vibes of the old internet homepages of highly eccentric people that became a part of the internet folklore, whether in a good way or a bad way.

                            The video is probably the least bizarre thing there, if that's what you are warning about.

                              • uncivilized

                                today at 6:15 PM

                                What were you browsing where someone cutting off their own testicles is not as bizarre as other things? I didn't watch the video but atleast there was a warning.

                                Feds this guy right here ^^

                                  • ASalazarMX

                                    today at 7:18 PM

                                    Looks at the chain of comments, then at the URL domain

                                    Thanks for the warnings, kind strangers.

                                    • mghackerlady

                                      today at 8:27 PM

                                      As a transgender woman, that isn't something I'd expect to see but am not surprised to see on a site called girl.surgery. dead doves and all that

                                      • pavel_lishin

                                        today at 6:21 PM

                                        > What were you browsing where someone cutting off their own testicles is not as bizarre as other things?

                                        One of my formative early internet experiences was loading up a video of a man being beheaded with a knife.

                                        Luckily, I realized what was about to happen, and didn't subject myself to the whole thing.

                                        • WesolyKubeczek

                                          today at 7:15 PM

                                          There's some distance between setting pubes on fire and cutting testicles off, dare I say.

                                          Although, setting any kind of hair on fire in public should be punishable, primarily because of stench of the burnt hairs.

                              • t1234s

                                today at 7:04 PM

                                The homepage for this site is defiantly NSFW.

                                  • voxl

                                    today at 9:07 PM

                                    You probably meant definitely, but defiantly amusingly works too

                                    • thinkingtoilet

                                      today at 9:06 PM

                                      "definitely" or "defiantly"?

                                      The idea of something being "defiantly" NSFW gave me a chuckle.

                                  • GaggiX

                                    today at 5:41 PM

                                    https://cosmo.tardis.ac/files/2026-02-12-az-rl-and-spsa.html

                                    Response from the author of Viridithas, there is a link to this engine in her webpage.

                                      • dang

                                        today at 5:47 PM

                                        Thanks! I've put that link in the toptext as well.

                                        • twiclo

                                          today at 8:25 PM

                                          Her?

                                            • today at 8:38 PM

                                              • GaggiX

                                                today at 8:51 PM

                                                I read "girl.surgery" and guessed.

                                        • RivieraKid

                                          today at 5:58 PM

                                          AFAIK chess is has been "solved" for a few years in the sense that Stockfish running on modern laptop with 1 minute per move is unbeatable from the starting position.

                                            • helloplanets

                                              today at 6:19 PM

                                              This is not true. Stockfish is not unbeatable by another engine, or another copy of Stockfish.

                                              Chess engines have been impossible for humans to beat for well over a decade.

                                              But a position in chess being solved is a specific thing, which is still very far from having happened for the starting position. Chess has been solved up to 7 pieces. Solving basically amounts to some absolutely massive tables that have every variation accounted for, so that you know whether a given position will end in a draw, black win or white win. (https://syzygy-tables.info)

                                                • LeifCarrotson

                                                  today at 6:56 PM

                                                  The parent is using a different definition, so they put "solved" in quotes. What word would you suggest to describe the situation where the starting position with 32 pieces always ends in either a draw or win for white, regardless of the compute and creativity available to black?

                                                  I haven't verified OP's claim attributed to 'someone on the Stockfish discord', but if true, that's fascinating. There would be nothing left for the engine developers to do but improve efficiency and perhaps increase the win-to-draw ratio.

                                                    • helloplanets

                                                      today at 7:05 PM

                                                      Yea that's true, it's a pretty overloaded word. From what I remember though, even the top players thought that there wasn't anywhere left to go with chess engines, before Alpha Zero basically ripped the roof off with a completely different play style back in 2017, beating Stockfish.

                                                      And the play style of Alpha Zero wasn't different in a way that needs a super trained chess intuition to see, it's outrageously different if you take a look at the games.

                                                      I guess my point is, that even if the current situation is basically a 'deadlock', it's been proven that it's not some sort of eternal knowledge of the game as of yet. There's still the possiblity that a new type of approach could blow the current top engines out of the water, with a completely different take on the game.

                                                        • sscg13

                                                          today at 7:16 PM

                                                          However, it is true that Elo gain on "balanced books" has stalled somewhat since Stockfish 16 in 2023, which is also reflected on the CCRL rating lists.

                                                          IMO AlphaZero was partially a result of the fact that using more compute also works. Stockfish 10 running on 4x as many CPUs would beat Stockfish 8 by a larger margin than AlphaZero did. To this day, nobody has determined what a "fair" GPU to CPU comparison is.

                                                      • gowld

                                                        today at 7:36 PM

                                                        It's a strange definition of "solved".

                                                        War was "solved" when someone made a weapon capable of killing all the enemy soldiers, until someone made a weapon capable of disabling the first weapon.

                                                    • RivieraKid

                                                      today at 6:29 PM

                                                      Do you have a source? I remember asking on the Stockfish Discord and being told that Stockfish on a modern laptop with 1 min per move will never lose against Stockfish with 1000 min per move from the starting position.

                                                      But I'm not sure whether that guy was guessing or confident about that claim.

                                                        • helloplanets

                                                          today at 6:53 PM

                                                          There's the TCEC [0] which is a big thing in some circles. Stockfish does lose every now and then against top engines. [1] Usually it's two different engines playing against one another, though. Like Leela Chess Zero [2] vs. Stockfish.

                                                          In that hypothetical of running 2 instances of Stockfish against one another on a modern laptop, with the key difference being minutes of compute time, it'd probably be very close to 100% of draws. Depending on how many games you run. So, if you run a million games, there's probably some outliers. If you run a hundred, maybe not.

                                                          When it comes to actually solved positions, the 7-piece tables take around 1TB of RAM to even run. These tablebases are used by Stockfish when you actually want to run it at peak strength. [3]

                                                          [0]: https://tcec-chess.com [1]: https://lichess.org/broadcast/tcec-s28-leagues--superfinal/m... [2]: https://lczero.org [3]: https://github.com/syzygy1/tb

                                                            • NooneAtAll3

                                                              today at 6:55 PM

                                                              doesn't TCEC use opening book?

                                                              I remember hearing that starting position is so draw-ish that it's not practical anymore

                                                                • LogicalRisk

                                                                  today at 7:12 PM

                                                                  TCEC does force different openings yes. Engines play both sides.

                                                          • LogicalRisk

                                                            today at 7:11 PM

                                                            Here's a game from a month ago where Stockfish loses to Lc0, played during the TCEC Cup. https://lichess.org/S9AwOvWn

                                                            Chess is a 2 player game of perfect, finite information, so by Zermelo's theorem either one side always wins with optimal play or it's a draw with optimal play. The argument from the Discord person simply says that Stockfish computationally can't come up with a way to beat itself. Whether this is true (and it really sounds like a question about depth in search) is separate from whether the game itself is solved, and it very much is not.

                                                            Solving chess would be a table that simply lists out the optimal strategy at every node in the game tree. Since this is computationally infeasible, we will certainly never solve chess absent some as yet unknown advance in computation.

                                                              • RivieraKid

                                                                today at 7:25 PM

                                                                What I meant by "solved" is "never loses from the starting position against Stockfish that has infinite time per move".

                                                                In the TCEC game, I see "2. f4?!", so I'm guessing Stockfish was forced to played some specific opening, i.e. it was forced to make a mistake.

                                                                  • gowld

                                                                    today at 7:39 PM

                                                                    That means that Stockfish's parameters are already optimized as far as practically possible for Rapid chess and Slow chess, not that chess itself is solved, or even that Stockfish is fully optimized for Blitz and Bullet.

                                                                • sscg13

                                                                  today at 7:18 PM

                                                                  Surely it is apparent to you that the first few moves are not independently chosen by the engine, but rather intentionally chosen by the TCEC bookmakers to create a position on the edge between a draw and a decisive result.

                                                                  For what it's worth, Stockfish wins the rematch also. https://tcec-chess.com/#game=13&round=fl&season=cup16

                                                                    • LogicalRisk

                                                                      today at 8:35 PM

                                                                      Yes, engines would almost certainly never play 2. f4. That's a different question than whether chess is solved, for which the question of interest would be "given optimal play after 1. e4 e5 2. f4 is the result a win for one side or a draw?"

                                                                      It's also almost certainly the case, in that I don't know why you would do it, that Stockfish given the black pieces and extensive pondering would be meaningfully better than Stockfish with a time capped move order. Most games are going to be draws so practically it would take awhile to determine this.

                                                                      I'm of the view that the actual answer for chess is "It's a draw with optimal play."

                                                              • MengerSponge

                                                                today at 6:41 PM

                                                                That just means that Stockfish doesn't get stronger with more than 1 minute per move on a modern computer. It doesn't say anything about other engines.

                                                                  • RivieraKid

                                                                    today at 6:54 PM

                                                                    Stockfish with 1000 minutes per move is an approximation of a perfect chess player. So if Stockfish with 1 minute per move will never lose against a perfect player, it is unbeatable by any chess engine.

                                                            • sscg13

                                                              today at 7:32 PM

                                                              Hypothetically, what reward would be worth the cost for you to attempt to beat Stockfish 18, 100 million nodes/move, from the starting position?

                                                          • sscg13

                                                            today at 7:30 PM

                                                            You can run Stockfish single threaded in a deterministic manner by specifying nodes searched instead of time, so in principle it is possible to set some kind of bounty for beating Stockfish X at Y nodes per move from the start position, but I haven't seen anyone willing to actually do so.

                                                            • altruios

                                                              today at 6:02 PM

                                                              Even by a stockfish running on a modern laptop with 2 minutes per move (provided they are going second)?!

                                                                • RivieraKid

                                                                  today at 6:32 PM

                                                                  Yes, that's what "unbeatable from the starting position" means.

                                                              • bee_rider

                                                                today at 6:54 PM

                                                                “Solved” is a term of art. Defining it in some other way is not really wrong (since it is a definition) but it seems… unnecessary.

                                                            • TZubiri

                                                              today at 7:44 PM

                                                              I know a fair deal about the subject of chess AI, but when I was reading this and I didn't understand. I was polarized, was I reading a mastermind that was way above my level? Or someone way too confident that learned enough buzzwords through an LLM to briefly delude someone else other than themselves?

                                                              A quick visit at the homepage suggests that it's probably the latter. I don't want to be rude, not posting out of malice, but if someone else was reading this and was trying to parse it, I think it might be helpful to compare notes and evaluate whether it's better to discard the article altogether.

                                                                • Paracompact

                                                                  today at 8:03 PM

                                                                  Curious, what has you believe that? As someone who doesn't know much about chess AI, I was mostly able to follow along, and figured there were simply some prereqs the author wasn't explaining (e.g. distillation, test-time search, RLVR). If the article is deeply confused in some way I would indeed like to calibrate my BS detectors.

                                                                    • TZubiri

                                                                      today at 10:18 PM

                                                                      Just to confirm, did you read Cosmo's article (cosmo.tardis.uk black background), or the girl.surgery (white background) article?

                                                                      ML isn't my strong so I wouldn't be able to explain how, but Cosmo's article is almost entirely a refutation of the points made by the root article. No doubt he is very friendly, as someone would be to anyone interested in their field.

                                                                      What I can speak about is the general construction of sentences, they read (in the most charitable of interpretations) like text messages:

                                                                      "Good model vs bad model is ~200 elo, but search is ~1200 elo, so even a bad model + search is essentially an oracle to a good model without, and you can distill from bad model + search → good model."

                                                                      I take it that by "is ~X elo" they mean that implementing that strategy results in a gain of 200 ELO? (Cosmo incorporates the expression, but adds a + sign for clarity, which would still be quite undefined, as 1000 to 1200 is not the same as 2800 to 3000). I get that this reads more like internal notes, but it was published, so there was some expectation that it would be understood by someone else.

                                                                      The writing reminds me of when I used to write notes and I did drugs for a lot more reasons, and I also saw it in the writings of other loved ones that were using drugs/bordering schizophrenia. My take is that the article was written by a mind that used to be brilliant but is now echoing. I hope it is reversible and if per is reading this and my estimation is correct, that they perturb the weights in favour of quitting drugs and see if they win more or not.

                                                                  • potsandpans

                                                                    today at 8:21 PM

                                                                    This comment is another example of an "llm psychosis" that is currently occuring in common discourse.

                                                                    The mass delusion of, "I don't understand what I'm reading, therefore it must be produced by an llm."

                                                                    I think it's a pretty serious problem. Not that llm text exists on the internet, but that reasonable people are reflexively closed off to creativity because the mere existence of the possibility that something is created by an llm is in their minds grounds for disqualification.

                                                                      • TZubiri

                                                                        today at 9:46 PM

                                                                        Nono, the claim is not that it is produced by an llm, rather that author researches the subject with llms and generally is a high frequency user.

                                                                        A common property of llm psychosis is the development of an internal vocabulary that the llm learns, often reusing words but adopting specific meanings, for some reason quantum and quantic are very popular for this.

                                                                • oldpersonintx

                                                                  today at 5:38 PM

                                                                  [dead]