Tiled Hacker news on React Router

Microgpt

1642 points - today at 1:39 AM

teleforce
today at 6:21 AM
Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].
Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.
[1] English GPT lab:
https://ko-microgpt.vercel.app/
hkbuilds
today at 9:34 PM
The "micro" trend in AI is fascinating. We're seeing diminishing returns from just making models bigger, and increasing returns from making them smaller and more focused.
For practical applications, a well-tuned small model that does one thing reliably is worth more than a giant model that does everything approximately. I've been using Gemini Flash for domain-specific analysis tasks and the speed/cost ratio is incredible compared to the frontier models. The latency difference alone changes what kind of products you can build.
verma7
today at 6:00 AM
I wrote a C++ translation of it: https://github.com/verma7/microgpt/blob/main/microgpt.cc
2x the number of lines of code (~400L), 10x the speed
The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).
geokon
today at 9:31 AM
> What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data.
Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow
subset
today at 4:25 AM
I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program
red_hare
today at 4:25 AM
This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html
smj-edison
today at 10:23 PM
Somewhat unrelated, but the generated names are surprisingly good! They're certainly more sane then appending -eigh to make a unique name.
la_fayette
today at 10:02 AM
This guy is so amazing! With his video and the code base I really have the feeling I understand gradient descent, back propagation, chain rule etc. Reading math only just confuses me, together with the code it makes it so clear! It feels like a lifetime achievement for me :-)
growingswe
today at 6:46 AM
Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: https://growingswe.com/blog/microgpt
kuberwastaken
today at 7:18 AM
I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in python
And it's small enough to run from a QR code :) https://kuber.studio/picogpt/
You can quite literally train a micro LLM from your phone's browser
astroanax
today at 6:28 PM
I feel its wrong to call it microgpt, since its smaller than nanogpt, so maybe picogpt would have been a better name? nice project tho
etothet
today at 1:38 PM
Even if you have some basic understanding of how LLMs work, I highly recommend Karpathy’s intro to LLMs videos on YouTube.
- https://m.youtube.com/watch?v=7xTGNNLPyMI - https://m.youtube.com/watch?v=EWvNQjAaOHw
znnajdla
today at 6:22 AM
Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.
jonjacky
today at 8:24 PM
I wonder if such a small GPT exhibits plagiarism. Are some of the generated names the same as names in the input data?
freakynit
today at 5:45 AM
Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.
chenster
today at 9:46 PM
The best ML learning for dummies.
vadimf
today at 5:07 PM
I’m 100% sure the future consists of many models running on device. LLMs will be the mobile apps of the future (or a different architecture, but still intelligence).

0xbadcafebee

today at 4:43 AM

Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (https://www.ioccc.org/2024/cable1/index.html), minus the stdlib headers:

  #define a(_)typedef _##t
  #define _(_)_##printf
  #define x f(i,
  #define N f(k,
  #define u _Pragma("omp parallel for")f(h,
  #define f(u,n)for(I u=0;u<(n);u++)
  #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
  
  a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
  _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
  m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
  W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
  _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
  puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
  (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
  "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
  $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
  2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
  s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
  ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
  V)w=k[i]>w?k[$=i]:w;}}

dwroberts
today at 3:51 PM
I enjoyed the footnote on their entry, where they link to ChatGPT confidently asserting that it was impossible for such an LLM to exist
> You're about as close to writing this in 1800 characters of C as you are to launching a rocket to Mars with a paperclip and a match.
thatxliner
today at 5:10 AM
wiat what does this do?

ruszki
today at 8:12 AM
> [p for mat in state_dict.values() for row in mat for p in row]
I'm so happy without seeing Python list comprehensions nowadays.
I don't know why they couldn't go with something like this:
[state_dict.values() for mat for row for p]
or in more difficult cases
[state_dict.values() for mat to mat*2 for row for p to p/2]
I know, I know, different times, but still.
fulafel
today at 3:00 AM
This could make an interesting language shootout benchmark.
jimbokun
today at 4:16 AM
It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.
Yes with some extra tricks and tweaks. But the core ideas are all here.
MattyRad
today at 6:10 AM
Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.
Beautiful, perhaps like ice-nine is beautiful.
colonCapitalDee
today at 2:25 AM
Beautiful work
sieste
today at 10:56 AM
The typos are interesting ("vocavulary", "inmput") - One of the godfathers of LLMs clearly does not use an LLM to improve his writing, and he doesn't even bother to use a simple spell checker.
huqedato
today at 7:49 PM
Looking for alternative in Julia.
WithinReason
today at 9:22 AM
Previously:
https://news.ycombinator.com/item?id=47000263
retube
today at 9:12 AM
Can you train this on say Wikipedia and have it generate semi-sensible responses?
rramadass
today at 4:06 AM
C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...
Rust version - https://github.com/mplekh/rust-microgpt
ThrowawayTestr
today at 3:06 AM
This is like those websites that implement an entire retro console in the browser.
geon
today at 12:28 PM
Is there a similarly simple implementation with tensorflow?
I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.
borplk
today at 11:58 AM
Can anyone mention how you can "save the state" so it doesn't have to train from scratch on every run?
bytesandbits
today at 10:09 AM
sensei karpathy has done it again
stuckkeys
today at 10:25 AM
That web interface that someone commented in your github was flawless.
mold_aid
today at 11:05 AM
"art" project?
dhruv3006
today at 3:42 AM
Karapthy with another gem !
today at 5:42 AM
coolThingsFirst
today at 4:44 AM
Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.
shevy-java
today at 8:20 AM
Microslop is alive!
ViktorRay
today at 2:36 AM
Which license is being used for this?
hackersk
today at 5:26 AM
[flagged]
kelvinjps10
today at 4:26 AM
Why there is multiple comments talking about 1000 c lines, bots?
raphaelmolly8
today at 5:01 PM
[dead]
Jaxon_Varr
today at 9:55 AM
[dead]
genie3io
today at 8:30 AM
[dead]
OussamaAfnakkar
today at 9:59 AM
[dead]
abhitriloki
today at 7:17 AM
[flagged]
lynxbot2026
today at 3:33 AM
[flagged]
Paddyz
today at 3:08 AM
[flagged]
agenthustler
today at 11:21 AM
[flagged]
tithos
today at 2:13 AM
What is the prime use case
with
today at 7:09 AM
"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.
profsummergig
today at 3:04 AM
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.

Microgpt

teleforce

love2read

camkego

hkbuilds

grey-area

verma7

WithinReason

geokon

andy12_

geokon

throwthrowuknow

chongli

radarsat1

chongli

danlitt

DavidSJ

mr_toad

podnami

DavidSJ

jorvi

podnami

Lionga

sharperguy

alexwebb2

amelius

jaen

red75prime

amelius

jaen

subset

amelius

hei-lima

thomasmg

pmarreck

O5vYtytb

justinhj

moderation

red_hare

tomjakubowski

ashish01

subset

OJFord

subset

altcognito

smj-edison

la_fayette

mentos

la_fayette

growingswe

O4epegb

evntdrvn

dang

joenot443

spinningslate

hei-lima

evntdrvn

kuberwastaken

dang

cootsnuck

iberator

dang

lelandfe

kuberwastaken

astroanax

etothet

grey-area

arvid-lind

znnajdla

allovertheworld

rapnie

cowlby

tomjakubowski

altmanaltman

nananana9

Jolter

matheusmoreira

Jolter

matheusmoreira

Jolter