gus_massa
today at 4:43 PM
> We therefore conclude that theoretically motivated experiment choice is potentially damaging for science, but in a way that will not be apparent to the scientists themselves.
They are analyzing a toy model of science. The details and in figure 1. They have a search space that has a few Gaussians like
f(x,y,z) = A0 * expt(-(x-x0)^2-(y-y0)^2-(z-z0)^2) + A1 * expt(-(x-x1)^2-(y-y1)^2-(z-z1)^2)
but maybe in more than 3 dimensions and maybe with more than 2 Gaussians.
They want the agents to find all of Gaussians.
It's somewhat similar to a maximization problem that is easier. There are many strategies for this, from gradient ascent to random sampling to a million more of variants. I like simulated annealing.
They claim that the best method is random sampling, that only work when the search space is small. But it breaks quite fast for high dimensional problems, unless the Gaussians are so big that cover most of the space, and perhaps I'm beeing too optimistic. Add noise, overlapping Gaussians and the problem gets super hard.
Let's get to a realistic example, all the molecules with 6 Carbons and 12 Hydrogens. Let's try to find all of them and their stables 3D configuration. This is chemistry from the first year in the university, perhaps earlier, no cutting edge science.
You have 18 atoms, so 18 * 3 = 54 dimensions, and the surface of -energy has a lot of mountains ranges and nasty stuff. Most of them very sharp. Let's try to find the local points of maximal -energy, that is much easier than the full map. These are the stable molecules, that (usually) have names.
* There is a cycle one with 6 Carbons, where each Carbon has 2 Hydrogens, https://en.wikipedia.org/wiki/Cyclohexane Note that it actually has two different 3D variants.
* There is one with a cycle of 5 Carbons and 1 carbon attached to the cycle https://en.wikipedia.org/wiki/Methylcyclopentane
* There are variants with shorter cycles, but I'm not sure how stable they are and Wikipedia has no page for them.
* There is also 3 linear versions, where the 6 Carbons are a s wavy line, and there is a double bound in one of the steps https://en.wikipedia.org/wiki/1-Hexene I'm not sure why the other two version have no page in Wikipedia, I think they should be stable, but sometimes it's not a local maximum or the local maximum is to shallow and the double bound jump and the Hydrogen reorganize.
* And there may be other nasty stuff, take a look at the complete list https://en.wikipedia.org/wiki/C6H12.
And don't try to make the complete list when of molecules that includes a few Nitrogen, because the number of molecules explodes exponentially.
So this random sampling method they propose, does not even work for an elementary Chemistry problem.
Legend2440
today at 6:44 PM
That said, random or exhaustive search is a more scientifically useful method than you might think.
The first commercial antibiotics (Sulfa drugs) were found by systemically testing thousands of random chemicals on infected mice. This was a major drug discovery method up until the 1970s or so, when they had covered most of the search space of biologically-active small molecules.
StableAlkyne
today at 9:12 PM
Related, I was talking to a computational chemist at a conference a few years ago. Their work was mostly at the intersection of ML and material science.
An interesting concept they mentioned was this idea of "injected serendipity" when they were screening for novel materials with a certain target performance. They proceed as normal, but 10% or so of the screened materials are randomly sampled from the chemical space.
They claimed this had led them to several interesting candidates across several problems.
gus_massa
today at 8:06 PM
A few month ago I went to a similar talk. They got a carboxylic acid from a plant (I forgot the name) that has some activity to kill caterpillar that eat corn, and made like 10 or 15 compounds with organic alcohols to get an ester. They tried different doses on the caterpillars and then make a computer model to predict the activity of similar compounds (QSAR). The idea is to use it in a long list of other organic alcohols and try to find a better compound.
But they choose chemical reactions that are usual in the lab, so they guess they will be able to make it work in the lab, and they keep most of the structure without changes. So it's closer to what they classify here as look nearby the known good points instead of a true random search.
Eisenstein
today at 5:07 PM
They address this specifically and hand-wave it away:
Moreover, both random and all other experimentation strategies we examined require constructing a bounded experimental space, a challenge that lies beyond the scope of the current work (see Almaatouq et al., 2024, for further discussion).
I think their conclusion is still important to consider, though. It makes a point beyond the practicalities and more towards the philosophy of approach.
gus_massa
today at 7:05 PM
That is an unrelated problem, that usually is not even a problem.
For molecules, 10 Armstrong away is probably as good as infinite.
For how many bananas should you eat per week to become the chess world champion, you can ask Wolfram Alpha to convert 2400kcal * 7 to bananas and get an upper bound.
I think everyone agree that with infinite time a resources a brute force search is better in case there is a weird combination. But for finite time and resources you need to select a better strategy unless the search space is ridiculous small and smooth.
Eisenstein
today at 8:39 PM
I guess I am not following very well -- what exactly is an unrelated problem? Setting a bounded space?