ICCV2023论文汇总：视觉和音频 Vision and Audio

标签：计算机视觉人工智能

Sound Source Localization is All About Cross-Modal Alignment

声源定位就是跨模态对齐

Class-Incremental Grouping Network for Continual Audio-Visual Learning

用于持续视听学习的班级增量分组网络

Audio-Visual Class-Incremental Learning

视听课堂-增量学习

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding

DiffV2S：具有视觉引导扬声器嵌入的基于扩散的视频语音合成

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

声音的力量 (TPoS)：具有稳定扩散的音频反应视频生成

On the Audio-Visual Synchronization for Lip-to-Speech Synthesis

唇语合成的视听同步研究

Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation

通过对齐的跨模态蒸馏进行密集 2D-3D 室内声音预测

Hyperbolic Audio-Visual Zero-Shot Learning

双曲视听零样本学习

AdVerb: Visually Guided Audio Dereverberation

AdVerb：视觉引导音频去混响

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

运动声音定位：联合学习声音方向和相机旋转

展开预览

猜您喜欢

上传者

TI 文字链专区

举报人：
被举报人：	念慈菴
举报的资源分：	2
* 类型：
	请您提供公司营业执照和软件相关版权到service@eeworld.com.cn
* 详细原因：