VibeVoice: 微软开源的文本转语音项目

2025年09月16日 06:30:43
11825
语音对话 VibeVoice 语音 模型 音频 AI
VibeVoice microsoft/VibeVoice

让 AI 生成像真实聊天或播客那样自然、流畅的长篇对话语音。

项目大小 53.78 KB
涉及语言
许可协议 LICENSE
仓库同步说明
  • • 同步需要仓库写入权限以创建目标仓库
  • • 使用平台账号授权登录后将同步到您平台下的个人仓库

🎙️ VibeVoice: A Frontier Long Conversational Text-to-Speech Model

Project Page
Hugging Face
Technical Report

VibeVoice Logo

2025-09-05: VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled this repo until we are confident that out-of-scope use is no longer possible.


VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.

A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.

The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.

MOS Preference Results VibeVoice Overview

🎵 Demo Examples

Video Demo

We produced this video with Wan2.2. We sincerely appreciate the Wan-Video team for their great work.

English

Chinese

Cross-Lingual

Spontaneous Singing

Long Conversation with 4 people

For more examples, see the Project Page.

Risks and limitations

While efforts have been made to optimize it through various techniques, it may still produce outputs that are unexpected, biased, or inaccurate. VibeVoice inherits any biases, errors, or omissions produced by its base model (specifically, Qwen2.5 1.5b in this release).
Potential for Deepfakes and Disinformation: High-quality synthetic speech can be misused to create convincing fake audio content for impersonation, fraud, or spreading disinformation. Users must ensure transcripts are reliable, check content accuracy, and avoid using generated content in misleading ways. Users are expected to use the generated content and to deploy the models in a lawful manner, in full compliance with all applicable laws and regulations in the relevant jurisdictions. It is best practice to disclose the use of AI when sharing AI-generated content.

English and Chinese only: Transcripts in languages other than English or Chinese may result in unexpected audio outputs.

Non-Speech Audio: The model focuses solely on speech synthesis and does not handle background noise, music, or other sound effects.

Overlapping Speech: The current model does not explicitly model or generate overlapping speech segments in conversations.

We do not recommend using VibeVoice in commercial or real-world applications without further testing and development. This model is intended for research and development purposes only. Please use responsibly.


                

                

免责声明 © 2026 - 虚宝阁

本站部分源码来源于网络,版权归属原开发者,用户仅获得使用权。依据《计算机软件保护条例》第十六条,禁止:

  • 逆向工程破解技术保护措施
  • 未经许可的分发行为
  • 去除源码中的原始版权标识

※ 本站源码仅用于学习和研究,禁止用于商业用途。如有侵权, 请及时联系我们进行处理。

侵权举报请提供: 侵权页面URL | 权属证明模板

响应时效:收到完整材料后48小时内处理

相关推荐

Stable Diffusion AI绘画界的 扛把子

Stable Diffusion AI绘画界的 扛把子

提到AI画图,没人能绕开Stable Diffusion!由Stability AI开源,支持文本生成图像、图像修复、风格迁移,关键是完全免费商用(非商用更没问题),普通电脑装个WebUI就能玩到飞起。

41831 2025-09-13
scira: AI 驱动搜索引擎

scira: AI 驱动搜索引擎

极简的 AI 驱动搜索引擎,支持引用来源

10827 2025-10-20
YOLOv8-目标检测界的闪电侠

YOLOv8-目标检测界的闪电侠

YOLO系列的最新版,主打“又快又准”的目标检测。能瞬间识别图片/视频里的人、车、动物、物体,在普通显卡上就能实时处理视频流,工业级场景都在用它。

47034 2025-09-13
NBlog: 开源博客系统

NBlog: 开源博客系统

一个前后端分离的开源博客系统,基于 Spring Boot + Vue 技术栈开发,界面清新简洁,拥有多个丰富的博客组件,自带管理后台。

2636 2025-09-16
Retrieval-based-Voice-Conversion-WebUI

Retrieval-based-Voice-Conversion-WebUI

Retrieval-based-Voice-Conversion-WebUI是基于VITS的易用变声框架。底模用开源VCTK数据集训练,无版权问题。有训练推理和实时变声界面,具备很多优点。

32357 2025-09-06
Claudable:基于 Next.js 框架的网站生成器

Claudable:基于 Next.js 框架的网站生成器

把你用自然语言描述的应用想法,直接变成可以运行的网站代码。Claudable 背后依赖强大的 AI 编程助手,主要是 Claude Code,也支持 Cursor CLI 来理解你的需求并生成代码。你不需要懂复杂的 API 设置、数据库配置或者部署流程:用简单的语言告诉 Claudable 你想要什么应用。

2525 2025-09-15
Seelen-UI: 高度可定制的 Windows 桌面美化工具

Seelen-UI: 高度可定制的 Windows 桌面美化工具

一款免费开源的 Windows 桌面增强工具,专注于高度自定义和效率提升。它采用 Rust 语言开发,结合 Tauri 框架与 Web 技术,支持窗口平铺管理、应用启动器、Dock、任务栏、动态壁纸、插件扩展等功能。

13691 2025-10-04
presenton AI PPT 生成器

presenton AI PPT 生成器

一个免费的、能完全在你自己电脑上运行的 AI PPT 生成工具。和那些必须联网、依赖服务商云服务的工具不同,Presenton 的核心优势在于本地优先和开放可控。 你的数据你做主, 所有生成演示文稿的过程都在你的电脑上完成。这意味着你的内容创意、上传的文件等敏感信息,无需上传到第三方云端服务器,隐私更有保障。自由选择AI模型,它不绑定任何一家 AI 服务商。你可以灵活选择。

2412 2025-09-14
CubeCity

CubeCity

A city waiting to be built by you 🏙️🔨✨. Threejs Version (一个等着被你建造的城市)

510 2025-10-02

仓库下载

gitee

GitHub 下载代理

文件信息

文件名
文件大小
文件类型
代理耗时