Tiled Hacker news on React Router

Microsoft VibeVoice: Open-Source Frontier Voice AI

196 points - today at 11:56 AM

Source

nickandbro
today at 4:43 PM
This is a very good model, but can it be run on the web?
steinvakt2
today at 1:00 PM
This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.
Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.
isodev
today at 4:19 PM
I think in this category, Voxtral by Mistral is a lot better. It also happens to be small enough to run on webGPU https://huggingface.co/spaces/mistralai/Voxtral-Realtime-Web...
maxloh
today at 1:11 PM
I think we should stop calling this type of models open source. They are indeed "open weight." The training code is proprietary and never revealed.
https://github.com/microsoft/VibeVoice/issues/102
dragonfax
today at 4:35 PM
Shouldn't it be called something like "Copilot Voice"?
aqme28
today at 1:27 PM
Interesting to see "vibe" enshrined by the likes of Microsoft as an AI product word.
embedding-shape
today at 12:52 PM
Isn't this project the one Microsoft published but then soon after pulled it for security/safety reasons? What has changed since then?
mberg
today at 3:34 PM
I've been using VibeVoice's ASR (speech to text) model quite intensively for the past month and have found it to be a lot more reliable and out-of-the box functional then Whisper, parakeet and other models. The fact that is has diarization built into to the model is a huge win in my book. Without that you have to run a different model just for that which adds significantly to the overall processing time vs VibeVoice which gives you reliably great results. Big fan.
xnx
today at 2:49 PM
Still waiting for the open weights model that conclusively beats the multi-year old Whisper in accuracy, features, and performance.
pluc
today at 1:16 PM
Interesting story about this repo/product/author by cybersecurity researcher Kevin Beaumont: https://cyberplace.social/@GossiTheDog/116454846703138243
CubsFan1060
today at 12:44 PM
Great post last night from Simon: https://simonwillison.net/2026/Apr/27/vibevoice/
chaosprint
today at 2:12 PM
Microsoft Store App Vibing.exe Accused of Harvesting Screens, Audio, and Clipboard Data:
https://cyberpress.org/microsoft-store-app-vibing-exe-accuse...
podgietaru
today at 12:45 PM
So we've really just settled on Vibe as the verb for AI then?
ryukoposting
today at 1:46 PM
Holy moly, a Microsoft AI product that isn't named Copilot!
Anonyneko
today at 1:05 PM
You have selected Microsoft Sam as the computer's default voice.
today at 2:33 PM
solomatov
today at 3:35 PM
It would have been better if they provided not just weights, but also some frontend where it is usable as is.
frangonf
today at 1:55 PM
I took a look into local options for ASR and diarization some months ago, I missed that VibeVoice now has this feature.
My conclusions back then (which only came from a shallow research on the topic and 0 real experience mind you) was that Whisper + Pyannote was the "stable" approach.
Have the VibeVoice, Voxtral, Qwen or the Nemo solutions caught up in segmentation and speaker recognition?
Mobius01
today at 2:21 PM
Microsoft has historically made poor choices in product naming, but this has to be a new low.
Void_
today at 1:10 PM
I the past month or so, I added 2 models to my app Whisper Memos (https://whispermemos.com):
- Cohere Transcribe (self hosted)
- Grok Speech To Text (they provide an API, only $0.10/hr!)
They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?
JumpCrisscross
today at 1:26 PM
What’s the current state of the art, for each of training locally and in the cloud, for learning my voice?
BlastBash192
today at 1:28 PM
Maybe Microsoft’s real strength was never making the best model, it was knowing you don’t need to, as long as you own the platform everyone builds on.
khimaros
today at 2:08 PM
looks like this offers ASR support in GGUF https://github.com/CrispStrobe/CrispASR -- haven't tested
mistic92
today at 1:17 PM
For me its giving me very poor results
Zopieux
today at 2:54 PM
English only?
ChrisArchitect
today at 2:16 PM
Previously:
Sept 2025 https://news.ycombinator.com/item?id=45114245
walthamstow
today at 12:54 PM
Seems quite heavy for a STT model, Parakeet and Whisper are much smaller and perform great for quick dictation and transcription of longer files. I guess that's due to additional accuracy and speaker diarisation?
The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck
starkeeper
today at 2:20 PM
Microsoft is famous for choosing terrible names but how could they be this terrible.
villgax
today at 4:00 PM
lol they rug-pulled the 7B for our own safety some months ago

Microsoft VibeVoice: Open-Source Frontier Voice AI

nickandbro

steinvakt2

zuzululu

scotty79

lblock

xnx

realty_geek

GuinansEyebrows

ramon156

Vinnl

gagan2020

SecretDreams

NobleLie

isodev

maxloh

simonw

yjftsjthsd-h

simonw

riedel

rogerrogerr

simonw

data-ottawa

jcmfernandes

MarsIronPI

jcmfernandes

psychoslave

jrm4

hedora

MarsIronPI

hedora

jrm4

hedora

WhyNotHugo

freedomben

Otek

freedomben

clickety_clack

JumpCrisscross

engeljohnb

JumpCrisscross

andy_ppp

giancarlostoro

DoctorOW

kevin_thibedeau

Geezus_42

ziml77

s20n

ziml77

bronson

dijksterhuis

notabotiswear

pardon_me

giancarlostoro

WorldMaker

giancarlostoro

ziml77

parineum

briffle

WarmWash

MagicMoonlight

WarmWash

btown

cute_boi

bitvvip

giancarlostoro

scotty79

jrm4

ilqr_jb

notabotiswear

dist-epoch

Geezus_42

dist-epoch

dragonfax

aqme28

amlib

accrual

ryandrake

Barbing

lvncelot