Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown
131 points - last Friday at 1:30 PM
I got tired of Claude hallucinating Swift APIs. It does a good job at Python and TypeScript, but ask it about SwiftUI and it's basically guessing.
The problem? Apple's docs are JavaScript-rendered, so when you paste URLs into AI tools, they just see a blank page. Copy-pasting works but... c'mon.
So I built something that converts Apple Developer docs to clean markdown. Just swap developer.apple.com with sosumi.ai in any Apple docs URL and you get AI-readable content.
For example:
- Before: https://developer.apple.com/documentation/swift/double
- After: https://sosumi.ai/documentation/swift/double
The site itself is a small Hono app running on Cloudflare Workers. Apple's docs are actually available as structured data, but Apple doesn't make it obvious how to get it. So what this does is map the URLs, fetch the original JSON, and render as Markdown.
It also provides an MCP interface that includes a tool to search the Apple developer website, which is helpful.
Anyway, please give this a try and let me know what you think!
Sourcedewey
last Friday at 1:44 PM
For those wondering about the name, it has a fun story behind it: https://en.wikipedia.org/wiki/Sosumi
jbgt
last Friday at 9:48 PM
When I was a kid I used Sosumi as my chime on the apple IIGS!
oneeyedpigeon
last Friday at 2:54 PM
"AI-readable Markdown" — can't we just say "Markdown"? I'm serious about this, why are we focussing on making things accessible to AI when they should just be machine-readable and accessible to human beings in the first place? No need to taint this by bringing AI into it.
forrestthewoods
last Friday at 3:00 PM
> why are we focussing on making things accessible to AI
Because that’s the authors actual goal? To take a web page that looks fine to human eyes but is unintuitively not accessible to AI. That’s genuinely useful and valuable.
Sure it’s no different than converting it to markdown for human eyes. But it’s important to be clear about not just WHAT but also WHY.
C’mon now. This isn’t controversial or even bad.
socalgal2
last Saturday at 8:58 AM
Makes one wonder what apple’s actual goal is
DeepYogurt
last Friday at 10:46 PM
I mean.... it could have broader appeal without artificially restricting its audience
zapzupnz
last Friday at 11:30 PM
Not everything is for everyone. Y'all are criticising this project's bean soup because you don't like beans.
forrestthewoods
last Friday at 11:47 PM
If you think it has less appeal because it was built for and advertised for AI that’s a you problem.
qazxcvbnmlp
last Friday at 5:03 PM
Great promise; sometimes need to reference docs to build context.
I looked at the examples you posted and did a quick glance. For example
'''init?(exactly: Float80)'''
the tool converted it to
'''- [initexactly-63925](/documentation/Swift/Double/init(exactly:)-63925)'''
To achieve its goal I would be worried that it dropped the verbatim function signature. Claude still figured it out, but for more obscure stuff that could be an issue.
_mattt
last Friday at 5:34 PM
Thanks for pointing that out. That’s most likely a mistake in how I’m translating into Markdown. I’ll look into this.
_mattt
last Sunday at 10:46 AM
Following up — I just pushed a fix for this. This latest version significantly improves how references like protocol conformances and default implementations are rendered.
danielfalbo
last Friday at 3:36 PM
How to reliably HTML to MD for any page on the internet?
I remember struggling with this in the past
How hard would it be to build an MCP that's basically a proxy for web search except it always tries to build the markdown version of the web pages instead of passing HTML?
Basically Sosumi.ai but instead of working on only for Apple docs it works for any web page (including every doc on the internet)
_mattt
last Sunday at 10:31 AM
I think HTML -> Markdown is a bit of a red herring.
In many cases, a Markdown distillation of HTML can improve the signal-to-noise ratio — especially for sites mired in <div> tag soup (intentionally or not). But that's an optimization for token efficiency; LLMs can usually figure things out.
The motivation behind Sosumi is better understood as a matter of accessibility. The way AI assistants typically fetch content from the web precluded them from getting any useful information from developer.apple.com.
You could start to solve the generalized problem with an MCP that 1) used a headless browser to access content for sites that require JS, and 2) used sampling (i.e. having a tool use the host LLM) to summarize / distill HTML.
crazylogger
last Friday at 6:00 PM
https://pure.md is exactly what you're looking for.
But stripping complex formats like html & pdf down to simple markdown is a hard problem. It's nearly impossible to infer what the rendered page looks like by looking at the raw html / pdf code. https://github.com/mozilla/readability helps but it often breaks down over unconventional div structures. I heard the state of the art solution is using multimodal LLM OCR to really look at the rendered page and rewrite the thing in markdown.
Which makes me wonder: how did OpenAI make their model read pdf, docx and images at all?
michaelmior
last Friday at 4:50 PM
There are APIs such as Jina AI's reader API that do this pretty well. It doesn't produce output as clean as Sosumi for Apple docs, but it's free and does a decent job.
https://jina.ai/reader
Someone
last Friday at 8:04 PM
Hm, I would have extracted the markdown from the Swift source code. That’s what Apple uses to generate their pages, using https://www.swift.org/documentation/docc/.
For example, AFAIK, https://github.com/swiftlang/swift/blob/main/stdlib/public/c... is used to generate https://developer.apple.com/documentation/swift/array.
_mattt
last Friday at 10:11 PM
This is only possible for some of Apple‘s open source Swift code, including the Swift standard library. This is not the case for hundreds of other SDK frameworks, such as SwiftUI.
Even for those open source projects, there is still some value added in the generated documentation that isn’t directly available from documentation comments, such as type members and protocol conformances (though a LLM could certainly suss that out with the right context).
pomber
last Saturday at 10:17 AM
Nice! Do you think it could be adapted to other docs sites?
I made a small clone of the tutorials section (https://clone-swiftui-tutorial.vercel.app/) where the content is already Markdown (and use codehike to turn the markdown into a rich UI). This made me realize that codehike is AI-friendly, in the sense that even for non-linear UIs the original content is still AI-readable Markdown.
jcoletti
last Friday at 2:46 PM
This is awesome and timely for me...going to give it a whirl. Thanks for building. Also, there should totally be an easter egg where clicking something somewhere plays the sound!
_mattt
last Friday at 3:24 PM
Great idea! I just added that. Try clicking the icon in the header.
jcoletti
last Saturday at 7:48 PM
Ha, amazing and so nostalgic
smerrill25
last Friday at 1:50 PM
As someone who is currently building my first iOS app, I am extremely happy to have this. This will be much nicer doing my animation documentation.
novok
last Friday at 5:17 PM
Hope this make apple's Xcode team realize they should do this, especially with all the recent AI integration.
AzzyHN
last Saturday at 3:21 AM
This is really cool, but also totally something you'd think existing AI agents should have zero issues with. _Especially_ if they're supposed to be for coding, I'd expect loads of documentation to be baked-in, so to speak
croes
last Friday at 3:35 PM
Wasn’t one of the benefits of AI that we don’t need special documents for AI to understand the data?
zach_moore
last Friday at 11:01 PM
Understand yes but the issue that this project is built for is to access data
h1fra
last Friday at 2:43 PM
I think it's safe to assume most big players have browser rendering enabled (I hope so). imo AI is struggling with a lot of languages that are not as popular as javascript, mostly because it's more niche and you don't get a lot of good examples on the web.
_mattt
last Friday at 3:11 PM
In my experience, coding agents seem to do a normal fetch when provided links. Which makes sense — headless browser automation is expensive, and only really necessary for interacting with a webpage.
But with these RLHF'd AIs, being confident and helpful as they are, it took me a while to realize that they couldn't actually read the Apple developer links I was giving them. Like a kid who can't read the chalkboard, but doesn't realize they need glasses.
h1fra
last Friday at 4:13 PM
It's expensive for sure, but probably a drop in the water compared to the cost of renting H100. Plus, it would be a massive boost in terms of data quality/quantity for them. But maybe you are right, I'm just surprised it's not the case
tempodox
last Friday at 3:11 PM
An “AI” that makes text “AI”-readable. How does that make any sense?
ChrisMarshallNY
last Friday at 3:02 PM
I don't even bother trying to render docc catalogs into JS. It's a royal pain that breaks easily.
If GitHub could support .docc files, that would be great. Otherwise, I still use Jazzy Docs.
_mattt
last Friday at 3:15 PM
Once upon a time, I built a project called `swift-doc`, which eventually got Sherlocked. I think what I was most upset about was their decision to call their thing "DocC". Like, adding redundant consonants to avoid name collisions is my shtick.
Long live Jazzy.
ChrisMarshallNY
last Friday at 3:16 PM
I love your stuff!
Thanks!
fabiensanglard
last Friday at 3:32 PM
It is for the same reasons LLM are struggling to produce something that compile in Rust? I was under the impression that most of Rust documentation was plain HTML.
saagarjha
last Saturday at 10:19 AM
Is it possible to download an archive of the data so I can run searches against it locally (without AI)?
_mattt
last Sunday at 10:37 AM
Hi Saagar! One could indeed use this method to export this information en-masse, however that would require the kind of automated access that the site ToS explicitly forbids.
For the Swift standard library and other open-source frameworks, you could probably extract this from documentation comments using DocC, Jazzy, or the like.
thomask1995
last Friday at 1:35 PM
Very interesting. You have any before and after examples?
Curious how it handles some of the concurrency stuff. Actors, async/await etc..
diimdeep
last Friday at 2:21 PM
Could you share what is your motivation behind doing whole domain and web hosting ?
Personally I feel that this whole AI induced problem should even exist in the first place, but even then it is ridiculous, that you have to query some web api to solve this problem, why not just publish parsed and converted to .md set of local files and be done with it.
_mattt
last Friday at 2:56 PM
I agree, it'd be great if Apple provided accessible documentation in the first place. Time was, Apple published self-contained docsets that you could download and read offline.
Apple's ToS pretty explicitly forbid the kind of automation required to download everything. But even if someone did that, it'd only be a snapshot in time. And a lot can can change between OS releases.
As for the hosted web app, I wanted to provide this as a public service. I plan to open source it, so anyone can self-host instead, if they're inclined.
diimdeep
last Friday at 3:45 PM
Yeah, pre Swift documentation was very sufficient, dense and locally explorable, since then documentation started to resemble .h copy pasta without comments.
the_arun
last Friday at 4:15 PM
Just wondering - can't AI read HTML? If so how are we training our models?
justmedep
last Friday at 4:46 PM
The AI only sees a bit of HTML plus a bunch of JS that, when executed, generates more HTML. If the AI does not run the JS it won’t see everything. During training they probably use a crawler that runs a headless browser behind the scenes to get everything a human would get.
Lockal
last Saturday at 5:29 AM
So... The answer is to use during the real-time access the same headless browser as they used during the training? Which they already do, unless you ask specifically to write and run a python script that uses simple requests?
It is like generating static webpages just for SEO: obsolete since 2012[1], and few years later for other major websites.
[1] https://www.i-programmer.info/news/81-web-general/4248-googl...
y1n0
last Friday at 4:43 PM
Ai can’t read something dynamically rendered with JavaScript. At the moment.
novok
last Friday at 5:18 PM
They can, but the token to content ratio is far less, so they work less effectively when it's put into the inference context window.
zach_moore
last Friday at 9:48 PM
I’m building a swift app now and will most definitely give this a try
jordanmorgan10
last Friday at 4:14 PM
Another awesome project that does this for Apple's docs: https://llm.codes/
Also, Apple has started shipping docs like this, too. They are a bit hidden but you can find them here:
/Applications/Xcode-beta.app/Contents/PlugIns/IDEIntelligenceChat.framework/Versions/A/Resources/AdditionalDocumentation
edomyrots
last Friday at 2:19 PM
Do you have a public repo? Would love to see how it's working.
_mattt
last Friday at 2:29 PM
Not yet, but I plan to open source it soon. Just gotta tidy up a little bit, you know?
sneak
last Friday at 2:30 PM
Why do people always say this?
It’s ok to just start coding with a public repo. Code isn’t a secret.
oneeyedpigeon
last Friday at 2:58 PM
I think it's because, in the world of programming, people are incredibly critical. Just try putting absolutely anything out there, and you'll get certain types of people picking it apart.
I remember an early experience in my working career, when someone was sharing their sample code with a group, to demonstrate a particular concept. And one of those present picked them up on their use of magic numbers, as if that was at all relevant in the context.
I don't blame anyone for being wary of showing their work in progress. Painters often don't like their subjects trying to take a sneek peak at their work in progress, as another example.
joshstrange
last Friday at 2:55 PM
- Keys/tokens in the code. Yes, it’s bad practice but for a hobby/personal project it’s not the end of the world
- Not wanting to get roasted
- Open source = dealing with a lot of entitlement
And the list goes on. Putting code out into the world (publicly) often sets you up for future obligation of some kind (even if it’s just saying “no”).
None of this is a stance against open source, but I understand where people are coming from.
hirvi74
last Friday at 3:04 PM
> "Ever notice Claude struggling to write Swift code?"
Yes, that is why I quit using Claude and swapped to ChatGPT about a year ago. I've had substantially less issues with GPT.
asimovDev
last Friday at 4:12 PM
I remember copilot with GPT helping me export Swift to C for interop. Something I could hardly find good examples and info on in my google searches.
guestbest
last Friday at 3:17 PM
ChatGPT does swift and objc remarkably well including UIKit and core data. Its gotten much better for project files with gpt5-mini
awaseem
last Friday at 8:12 PM
Can't wait to use this. Its awesome!
awaseem
last Friday at 8:12 PM
Is it open source?
hamza_q_
last Friday at 8:39 PM
This is awesome; great work.
sosumi.md seems to be a better fit domain-wise, no?
miki123211
last Friday at 3:06 PM
Just saying, sites like these are also pretty great for accessibility, screen reader users in particular.
I think this one would be slightly better if it rendered that Markdown as simple HTML if accessed through a real browser, but I can imagine even this version being pretty useful.
I think it could also make the "Small web" crowd pretty happy too.
_mattt
last Sunday at 10:39 AM
Amen to that. It's funny how virtues like accessibility and good API design matter just as much as ever in this new era. Good ideas don't go out of style.
rtk0
last Saturday at 2:55 AM
Love the name.
amelius
last Friday at 11:18 PM
Aha, another developer doing Apple's job.
skyzouwdev
last Friday at 9:44 PM
[dead]