One thing something like AutoCodeBenchmark cannot demonstrate is what happens when you have human-written type definitions defining the domain before the LLM writes a line of code.
That is something I have found very effective in F#, that I model the domain with types, I know what the type signatures of the functions I need are, and the LLM does the work of actually implementing those functions.
Here is a concrete example:
I have been playing around with a program to assist me with projects I make at home on my hobby-grade CNC router, which does not have an automatic toolchanger. I use a mix of Vectric VCarve and some older handwritten programs to generate GCode files. I end up with a USB drive with maybe 6 to 12 GCode files on it and a model in my head of "to make this product, I start with a board here, gotta install this square nose end mill and zero on this corner of the board, run files A and B. Then install a ball nose end mill and run file C. Then flip the board over lengthwise, switch to a smaller square nose end mill, zero here, run file D. etc. etc."
Although I try to name the GCode files in a self documenting way like 01_TopSide_25square.ngc, if I come back in 1 year and want to make the same thing again, I pretty much always have to open VCarve and eyeball what the hell all the files did and confirm where to zero, what size board to use, etc. So I'm making a tool where I can define those human-operator steps that go with the G-Code files, save it as a "project file", preview in 3d what each step will look like, and export to a printable PDF with screenshots and step-by-step instructions. Hopefully this will reduce the amount of rot that these projects suffer and the cognitive overhead of picking up an old one.
Modeling the steps as F# types was the very first step, like (small excerpt):
type WorkpiecePlacement =
{ Id : WorkpieceId
/// Corner of the workpiece we'll attach to the machine.
WorkpieceCorner : WorkpieceSpace.Corner3D
/// Point in machine-space we'll anchor this corner to.
MachinePoint : MachineSpace.Point
/// Which face of the workpiece is on top.
FaceUp : WorkpieceSpace.Face
/// Rotation around the up-axis.
Yaw : WorkpieceSpace.Yaw
}
type OperationType =
| PlaceWorkpiece of placement : Operation.WorkpiecePlacement
| InstallTool of id : ToolId * slot : int option
| ZeroAt of point : MachineSpace.Point
| RunGCode of source : GCode.Source
| RemoveWorkpiece of id : WorkpieceId
For the GCode simulator I needed a parser for GCode files, which produces a type with 1:1 equivalence to the GCode instruction set:
type GCodeInstruction =
// --- Motion ---
| G0_RapidMove of axisMoves : (Axis * float<gcodeunit>) array
| G1_Move of feedRate : float<gcodeunit/minute> option * axisMoves : (Axis * float<gcodeunit>) array
| G2_ClockwiseArc of ArcParams
| G3_CounterClockwiseArc of ArcParams
| G4_Dwell of seconds : double
// --- Plane selection ---
| G17_SelectXYPlane
| G18_SelectXZPlane
| G19_SelectYZPlane
// --- Unit selection ---
| G20_Inches
| G21_Millimeters
// --- Distance mode ---
| G90_AbsoluteDistance
| G91_RelativeDistance
// ... etc truncated, more instructions in real code
But my tool supports doing transforms on toolpaths, like rotating 90 degrees or offsetting so I can easily define that I want to make tiling copies of the same project.
To implement those transforms straight up as GCodeInstruction[] -> GCodeInstruction[] is a bad call. GCode is very stateful and lets you switch units, relative vs. absolute coordinate spaces, etc. in instructions. That makes the transform awkward and tricky to write.
So I have a ToolPath type that makes the transforms clean. It normalizes the many ways of expressing the same toolpath in GCode to a single representation with all absolute coordinates in metric units.
type ToolPathInstruction =
| Rapid of From : Point * To : Point
| Linear of From : Point * To : Point * Feed : FeedRate
| Arc of
From : Point *
To : Point *
Center : Point *
Plane : Plane *
Direction : ArcDirection *
Feed : FeedRate
| ... etc truncated
That is the appropriate level for the transforms like offset, rotate, scale, etc. to operate on.
Yet there is still ANOTHER level of toolpath-related operations that deserves its own type. When I'm doing simulation of material removal to check for crashes, or rendering the toolpath in 3d, I don't want to deal with arcs! The rendering/simulation is inherently an approximation. It will break down each arc into line segments. So sim code and rendering code shouldn't take a toolpath, it should take basically a line segment list, or in other words...
type ApproxMove =
{ From : Vector3
To : Vector3
FeedRate : double<m/minute>
IsRapid : bool
}
type ToolPathApproximation =
{ StartPosition : Vector3
Moves : ApproxMove[]
}
Having defined all these types it's clear that I need operations like:
parse: string -> GCode
serialize : GCode -> string
normalizeToToolPath : GCode -> ToolPath
denormalizeToGCode : ToolPath -> GCode
offset : Vector3 -> ToolPath -> ToolPath
rotate90 : ToolPath -> ToolPath
scale : Vector3 -> ToolPath -> ToolPath
approximate : ToolPath -> ToolPathApproximation
simulate : ToolPathApproximation -> MachineState -> MachineState
renderToolPathWireframe : ToolPathApproximation -> VBO
renderMachineState : MachineState -> VBO
And so on. An LLM is absolutely awesome at one-shotting the implementations.
I would find it quite frustrating trying to model the same domain without any types, either having all methods working on a single toolpathy data structure that's not really the right fit for any of the places it's used, or having them work on multiple data structures without any clear delineation of which layer is expecting which toolpathy-thing that are all subtly but importantly different.