Tiled Hacker news on React Router

DeepSeek Introduces Vision

135 points - today at 6:17 AM

Source

jiehong
today at 7:39 AM
For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
rcMgD2BwE72F
today at 6:58 AM
Points to https://chat.deepseek.com/sign_in for me, that's just a login screen. Anything page with some info?
bjoli
today at 7:00 AM
What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.
Is it a new silent update?
insumanth
today at 9:21 AM
Multi-Modal is the way to go. Deepmind nailed this a long back.
throwaw12
today at 8:06 AM
I wish they published a post where we read about capabilities, quality, accuracy and other parameters
tornikeo
today at 7:27 AM
I really need this as an API.
Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.
Can't wait to have it available on deepseek.
arjie
today at 7:20 AM
If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.
earth2mars
today at 7:02 AM
And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where
crvdgc
today at 6:57 AM
Vision has been in A/B testing for a while now (at least in China). Is there an official announcement that this will be available for everyone?
alexwwang
today at 8:47 AM
Does the api support vision yet?
innis226
today at 6:57 AM
Nice, is this available in the API now as well?
today at 6:17 AM
tw1984
today at 8:35 AM
what is more interesting to me is why it takes so long for them to support vision.
does it implies that Liang believes vision/voice is less important on its way to AGI?
thiago_fm
today at 9:17 AM
Just wait until they release their coding model. Once they do an Opus-level coding model, the sandcastle of the AI economy in the US will fall
hklohani
today at 7:26 AM
[flagged]
ValveFan6666
today at 6:56 AM
[dead]
today at 6:56 AM
andrewstuart
today at 7:12 AM
OpenAI and Anthropic need to get this free foreign competition banned.

DeepSeek Introduces Vision

jiehong

paulluuk

cicko

itake

QuantumNomad_

arcanemachiner

perching_aix

throawayonthe

stranded22

rcMgD2BwE72F

RIshabh235

dude250711

bjoli

k__

Shank

bogdan

Razengan

cocoflunchy

dryarzeg

kgeist

seydor

bogdan

grogg

cicko

serf

surgical_fire

abyssin

RIshabh235

alfiedotwtf

epolanski

insumanth

Zababa

throwaw12

tornikeo

5701652400

petesergeant

arjie

RIshabh235

earth2mars

RIshabh235

crvdgc

RIshabh235

alexwwang

RIshabh235

alexwwang

innis226

naseemali925

dakolli

naseemali925

RIshabh235

tw1984

RIshabh235

thiago_fm

el_io

hklohani

ValveFan6666

andrewstuart

0xpgm

epolanski

dudisubekti

pjc50

cromka

Weryj

ReptileMan

andrewstuart

cromka