首页资源分类 > Video Demystified 5th Edition

Video Demystified 5th Edition

已有 453251个资源

下载专区

热门资源

本周本月全部

文档信息举报收藏

标    签: Video

分    享:

文档简介

What doesn't have a video component nowadays? IPod, cell phone, computer, they all have video. And, of course, television which is a major source of our entertainment and information. Any engineer involved in designing, manufacturing, or testing video electronics needs this book!

文档预览

Video Demystified Contents i Video Demystified ii Contents Contents iii Video Demystified A Handbook for the Digital Engineer Fifth Edition by Keith Jack AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Newnes is an imprint of Elsevier iv Contents Newnes is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA Linacre House, Jordan Hill, Oxford OX2 8DP, UK Copyright © 2007, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Permissions may be sought directly from Elsevier's Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail: permissions@elsevier.com. You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible. Librar y of Congress Cataloging-in-Publication Data (Application submitted.) British Librar y Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-7506-8395-1 For information on all Newnes publications visit our Web site at www.books.elsevier.com 07 08 09 10 11 10 9 8 7 6 5 4 3 2 1 Printed in the United States of America Contents Contents Contents v About the Author xix Chapter 1 • Introduction 1 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Standards Organizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 • Introduction to Video 6 Analog vs. Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Video Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Digital Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Video Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Video Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Standard-Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Enhanced-Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 High-Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Audio and Video Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Application Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 DVD Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Digital Media Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Digital Television Set-Top Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 v vi Contents Chapter 3 • Color Spaces 15 RGB Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 sRGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 scRGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 YUV Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 YIQ Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 RGB-YCbCr Equations: SDTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 RGB-YCbCr Equations: HDTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4:4:4 YCbCr Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4:2:2 YCbCr Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4:1:1 YCbCr Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4:2:0 YCbCr Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 xvYCC Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 PhotoYCC Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 HSI, HLS, and HSV Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chromaticity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Non-RGB Color Space Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Gamma Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Constant Luminance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 4 • Video Signals Overview 37 Digital Component Video Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Coding Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 480i and 480p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 576i and 576p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 720p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 1080i and 1080p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Other Video Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Contents vii Chapter 5 • Analog Video Interfaces 68 S-Video Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 SCART Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 SDTV RGB Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 HDTV RGB Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Constrained Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 SDTV YPbPr Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 VBI Data for 480p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 VBI Data for 576p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 HDTV YPbPr Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 VBI Data for 720p Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 VBI Data for 1080i Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Constrained Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 D-Connector Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Other Pro-Video Analog Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 VGA Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Chapter 6 • Digital Video Interfaces 106 Pro-Video Component Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Parallel Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Serial Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Pro-Video Composite Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Pro-Video Transport Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Serial Data Transport Interface (SDTI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 High Data-Rate Serial Data Transport Interface (HD-SDTI) . . . . . . . . . . . . . . . . . . . . . . 144 IC Component Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 BT.601 Video Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Video Module Interface (VMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 BT.656 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Zoomed Video Port (ZV Port) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Video Interface Port (VIP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Consumer Component Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Digital Visual Interface (DVI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 High-Definition Multimedia Interface (HDMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Digital Flat Panel (DFP) Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 viii Contents Open LVDS Display Interface (OpenLDI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Gigabit Video Interface (GVIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Consumer Transport Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 USB 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 IEEE 1394 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Chapter 7 • Digital Video Processing 192 Rounding Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Conventional Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Error Feedback Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Dynamic Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 SDTV-HDTV YCbCr Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 SDTV to HDTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 HDTV to SDTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 4:4:4 to 4:2:2 YCbCr Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Display Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Brightness, Contrast, Saturation (Color), and Hue (Tint) . . . . . . . . . . . . . . . . . . . . . . . . . 198 Color Transient Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Luma Transient Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Sharpness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Blue Stretch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Green Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Dynamic Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Color Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Color Temperature Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Video Mixing and Graphics Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Luma and Chroma Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Luminance Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Chroma Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Superblack and Luma Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Contents ix Video Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Pixel Dropping and Duplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Anti-Aliased Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Display Scaling Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Scan Rate Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Frame or Field Dropping and Duplicating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Temporal Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 2:2 Pulldown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 3:2 Pulldown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 3:3 Pulldown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 24:1 Pulldown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Noninterlaced-to-Interlaced Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Scan Line Decimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Vertical Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Interlaced-to-Noninterlaced Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Video Mode: Intra-Field Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Video Mode: Inter-Field Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Film Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Frequency Response Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 DCT-Based Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Fixed Pixel Display Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Expanded Color Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Detail Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Non-uniform Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Scaling and Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Chapter 8 • NTSC, PAL, and SECAM Overview 257 NTSC Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Luminance Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Color Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Color Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Composite Video Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Color Subcarrier Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 NTSC Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RF Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 x Contents Analog Channel Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Luminance Equation Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 PAL Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Luminance Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Color Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Color Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Composite Video Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 PAL Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 RF Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Analog Channel Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Luminance Equation Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 PALplus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 SECAM Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Luminance Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Color Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Color Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Composite Video Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 SECAM Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Luminance Equation Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Video Test Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 VBI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Timecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 CEA-608 Closed Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Widescreen Signaling and CGMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Teletext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 AMOL (Automated Measurement of Lineups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Raw VBI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Sliced VBI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Enhanced Television Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Contents xi Chapter 9 • NTSC and PAL Digital Encoding and Decoding 388 NTSC and PAL Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 2× Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Luminance (Y) Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Color Difference Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 Analog Composite Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Color Subcarrier Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Horizontal and Vertical Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Clean Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Bandwidth-Limited Edge Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 Level Limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Encoder Video Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Genlocking Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Alpha Channel Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 NTSC and PAL Digital Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 Digitizing the Analog Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 Y/C Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Color Difference Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Luminance (Y) Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 User Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 Genlocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Video Timing Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Auto-Detection of Video Signal Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Y/C Separation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Alpha Channel Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Decoder Video Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 xii Contents Chapter 10 • H.261 and H.263 466 H.261 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Video Coding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Video Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Still Image Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 H.263 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Video Coding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Video Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Optional H.263 Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Chapter 11 • Consumer DV 515 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Digital Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 IEEE 1394 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 SDTI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 100 Mbps DV Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 HDV Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 AVCHD Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Chapter 12 • MPEG-1 539 MPEG vs. JPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Quality Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 Audio Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Video Coding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 Interlaced Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Encode Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Coded Frame Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 I Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Contents xiii P Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 B Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 D Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550 Video Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Video Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Sequence Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Group of Pictures (GOP) Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Picture Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Slice Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Macroblock (MB) Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Block Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 System Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 ISO/IEC 11172 Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Pack Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 System Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Packet Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Video Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Real-World Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Chapter 13 • MPEG-2 577 Audio Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Video Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 Transport and Program Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 Video Coding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Coded Picture Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Macroblocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 I Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 P Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 B Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 xiv Contents Video Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Video Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Sequence Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 User Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Sequence Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Sequence Display Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Sequence Scalable Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Group of Pictures (GOP) Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Picture Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Content Description Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Picture Coding Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Quant Matrix Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 Picture Display Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 Picture Temporal Scalable Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Picture Spatial Scalable Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 Copyright Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 Camera Parameters Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 ITU-T ext. D Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 Slice Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 Macroblock Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Block Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642 PES Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Program Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 Pack Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 System Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Program Stream Map (PSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Program Stream Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Packet Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Adaptation Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Program Specific Information (PSI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 Program Association Table (PAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668 Program Map Table (PMT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670 Transport Stream Description Table (TSDT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Conditional Access Table (CAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 Network Information Table (NIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 IPMP Control Information Table (ICIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 Contents xv Intellectual Property Management and Protection (IPMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 MPEG-4.2 Video over MPEG-2 Transport Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 MPEG-4.10 (H.264) Video over MPEG-2 Transport Streams . . . . . . . . . . . . . . . . . . . . . . . . . 674 SMPTE 421M (VC-1) Video over MPEG-2 Transport Streams . . . . . . . . . . . . . . . . . . . . . . . . 675 MPEG-2 PMT/PSM Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 MPEG-4 PMT/PSM Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 ARIB PMT Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692 ATSC PMT Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 DVB PMT Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 OpenCable PMT Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Closed Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 VBI Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 Teletext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 Active Format Description (AFD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 Subtitles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 Enhanced Television Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 Data Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Decoder Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Chapter 14 • MPEG-4 and H.264 738 Audio Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 General Audio Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 Speech Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 Synthesized Speech Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 Synthesized Audio Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 Visual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Visual Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 MPEG-4 Part 2 Natural Visual Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 MPEG-4 Part 2 Natural Visual Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Graphics Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Visual Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Visual Object Sequence (VS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Video Object (VO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Video Object Layer (VOL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Group of Video Object Plane (GOV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Video Object Plane (VOP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 xvi Contents Object Description Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Object Descriptor (OD) Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Object Content Information (OCI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751 Intellectual Property Management and Protection (IPMP) . . . . . . . . . . . . . . . . . . . . . . . . 751 Scene Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751 BIFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751 Synchronization of Elementary Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Sync Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 DMIF Application Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 Multiplexing of Elementary Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 FlexMux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 MPEG-4 Over MPEG-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 MP4 File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Intellectual Property Management and Protection (IPMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 MPEG-4 Part 10 (H.264) Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Profiles and Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Supplemental Enhancement Information (SEI) Messages . . . . . . . . . . . . . . . . . . . . . . . . 758 Video Coding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Network Abstraction Layer (NAL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Chapter 15 • ATSC Digital Television 764 Video Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 Audio Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 Program and System Information Protocol (PSIP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 Required Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 Optional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 E-VSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 Data Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Application Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777 Contents xvii Chapter 16 • OpenCable™ Digital Television 778 Video Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 Audio Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 In-Band System Information (SI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 Required Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 Optional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 Out-of-Band System Information (SI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788 In-Band Data Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790 Data Service Announcements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790 Service Description Framework (SDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Conditional Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Related Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 Application Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 Chapter 17 • DVB Digital Television 796 Video Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 Audio Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 System Information (SI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 Required Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 Optional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 Data Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 Conditional Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 Application Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810 xviii Contents Chapter 18 • ISDB Digital Television 812 ISDB-S (Satellite) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 ISDB-C (Cable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 ISDB-T (Terrestrial) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Video Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Audio Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Still Picture Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Graphics Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 System Information (SI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Data Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Application Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826 Chapter 19 • IPTV 827 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Multicasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 RTSP-Based Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 RTSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 RTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 RTCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833 RSVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 ISMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 Broadcast over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 Conditional Access (DRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 Chapter 20 • Glossary 837 Index 889 About the Author Contents xix About the Author Keith Jack is Director of Product Marketing at Sigma Designs. Sigma Designs develops and markets high-performance, highly-integrated System-on-a-Chip (SoC) semiconductors for IPTV Set-top Boxes, Blu-ray and HD DVD Players/Recorders, HDTVs, Digital Media Adapters, and Portable Media Players. Prior to joining Sigma Designs, Mr. Jack held various marketing and chip design positions at Harris Semiconductor, Brooktree, and Rockwell International. He has been involved in over 40 multimedia chips for the consumer market. I dedicate this book to my wife Gabriela, and my two sons Ethan and Andy, all who have brought tremendous joy into my life. xix Chapter 1: Introduction Chapter 1 Introduction Introduction 1 A few short years ago, the applications for video were somewhat confined—analog was used for broadcast and cable television, VCRs, set-top boxes, televisions, and camcorders. Since then, there has been a tremendous and rapid conversion to digital video, mostly based on the MPEG-2 video compression standard. Today, in addition to the legacy DV, MPEG-1, and MPEG-2 audio and video compression standards, there are three new highperformance video compression standards. These new video codecs offer much higher video compression for a given level of video quality. • MPEG-4.2. This video codec typically offers a 1.5–2× improvement in compression ratio over MPEG-2. Able to address a wide variety of markets, MPEG-4.2 never really achieved widespread acceptance due to its complexity. Also, many simply decided to wait for the new MPEG-4.10 (H.264) video codec to become available. • MPEG-4.10 (H.264). This video codec typically offers a 2–3× improvement in compression ratio over MPEG-2. Additional improvements in compression ratios and quality are expected as the encoders become better and use more of the available tools that MPEG-4.10 (H.264) offers. Learning a lesson from MPEG-4, MPEG-4.10 (H.264) is optimized for implementing on low-cost single-chip solutions and has already been adopted by the DVB and ARIB. • SMPTE 421M (VC-1). A competitor to MPEG-4.10 (H.264), this video codec also typically offers a 2–3× improvement in compression ratios over MPEG-2. Again, additional improvements in compression ratios and quality are expected as the encoders become better. 1 2 Chapter 1: Introduction Many more audio codecs are also available as a result of the interest in 6.1- and 7.1-channel audio, multi-channel lossless compression, lower bit-rates for the same level of audio quality, and finally, higher bit-rates for applications needing the highest audio quality at a reasonable bit-rate. In addition to decoding audio, real-time high-quality audio encoding is needed for DVD, HD DVD and Blu-ray recorders and digital video recorders (DVRs). Combining all these audio requirements mandates that any single-chip solution for the consumer market incorporate a DSP for audio processing. Equipment for the consumer has also become more sophisticated, supporting a much wider variety of content and interconnectivity. Today we have: • HD DVD and Blu-ray Players and Recorders. In addition to playing CDs and DVDs, these advanced HD players also support the playback of MPEG-4.10 (H.264), and SMPTE 421M (VC-1) content. Some include an Ethernet connection to enable content from a PC or media server to be easily enjoyed on the television. • Digital Media Adapters. These small, low-cost boxes use an Ethernet or 802.11 connection to enable content from a PC or media server to be easily enjoyed on any television. Playback of MPEG-2, MPEG-4.10 (H.264), SMPTE 421M (VC-1), and JPEG content is typically supported. • Digital Set-Top Boxes. Cable and satellite set-top boxes are now including digital video recorder (DVR) capabilities, allowing viewers to enjoy content at their convenience. Use of MPEG-4.10 (H.264) and SMPTE 421M (VC-1) now enables more channels of content and reduces the chance of early product obsolescence. • Digital Televisions (DTV). In addition to the tuners and decoders being incorporated inside the television, some also include the digital media adapter capability. Support for viewing on-line video content is also growing. • IPTV Set-Top Boxes. These low-cost settop boxes are gaining popularity in regions that have high-speed DSL and FTTH (fiber to the home) available. Use of MPEG-4.10 (H.264) and SMPTE 421M (VC-1) reduces the chance of early product obsolescence. • Portable Media Players. Using an internal hard disc drive (HDD), these players connect to the PC via USB or 802.11 network for downloading a wide variety of content. Playback of MPEG-2, MPEG4.10 (H.264), SMPTE 421M (VC-1). and JPEG content is typically supported. • Mobile Video Receivers. Being incorporated into cell phones, MPEG-4.10 (H.264) and SMPTE 421M (VC-1) is used to transmit a high-quality video signal. Example applications are the DMB, DVB-H and DVB-SH standards. Introduction 3 Of course, to make these advanced consumer products requires more than just supporting an audio and video codec. There is also the need to support: • Closed Captioning, Subtitles, Teletext, and V-Chip. These standards were updated to support digital broadcasts. • Advanced Video Processing. Due to the wide range of resolutions for both content and displays, sophisticated highquality scaling and motion adaptive deinterlacing are usually required. Since the standard-definition (SD) and high-definition (HD) standards use different colorimetry standards, this also needs to be corrected when viewing SD content on an HDTV or HD content on an SDTV. • Sophisticated Image Composition. The ability to render a sophisticated image composed of a variety of video, OSD (on-screen display), subtitle/captioning/subpicture, text, and graphics elements. • ARIB and DVB over IP. The complexity of supporting IP video is increasing, with deployments now incorporating ARIB and DVB over IP. • Digital Rights Management (DRM). The protection of content from unauthorized copying or viewing. This fifth edition of Video Demystified has been updated to reflect these changing times. Implementing real-world solutions is not easy, and many engineers have little knowledge or experience in this area. This book is a guide for those engineers charged with the task of understanding and implementing video features into next-generation designs. This book can be used by engineers who need or desire to learn about video, VLSI design engineers working on new video products, or anyone who wants to evaluate or simply know more about video systems. Contents The book is organized as follows: Chapter 2, an Introduction to Video, discusses the various video formats and signals, where they are used, and the differences between interlaced and progressive video. Block diagrams of DVD players and digital settop boxes are provided. Chapter 3 reviews the common Color Spaces, how they are mathematically related, and when a specific color space is used. Color spaces reviewed include RGB, YUV, YIQ, YCbCr, xvYCC, HSI, HSV, and HLS. Considerations for converting from a non-RGB to an RGB color space and gamma correction are also discussed. Chapter 4 is a Video Signals Overview that reviews the video timing and the analog and digital representations of various video formats, including 480i, 480p, 576i, 576p, 720p, 1080i, and 1080p. Chapter 5 discusses the Analog Video Interfaces, including the analog RGB, YPbPr, SVideo, and SCART interfaces for consumer and pro-video applications. Chapter 6 discusses the various Digital Video Interfaces for semiconductors, pro-video equipment, and consumer equipment. It reviews the BT.601 and BT.656 semiconductor interfaces; the SDI, SDTI, and HD-SDTI provideo interfaces; and the DVI, HDMI, and IEEE 1394 consumer interfaces. Chapter 7 covers several Digital Video Processing requirements such as 4:4:4 to 4:2:2 YCbCr, YCbCr digital filter templates, scaling, 4 Chapter 1: Introduction interlaced/noninterlaced conversion, frame rate conversion, alpha mixing, flicker filtering, and chroma keying. Brightness, contrast, saturation, hue, and sharpness controls are also discussed. Chapter 8 provides an NTSC, PAL, and SECAM Overview. The various composite analog video signal formats are reviewed, along with video test signals. VBI data discussed includes timecode, closed captioning and extended data services (XDS), widescreen signaling and teletext. In addition, PALplus, RF modulation, BTSC, and Zweiton analog stereo audio and NICAM 728 digital stereo audio are reviewed. Chapter 9 covers digital techniques used for the Encoding and Decoding of NTSC and PAL color video signals. Also reviewed are various luma/chroma (Y/C) separation techniques and their trade-offs. Chapter 10 discusses the H.261 and H.263 video compression standards used for video teleconferencing. Chapter 11 discusses the Consumer DV video compression standards used by digital camcorders. Chapter 12 reviews the MPEG-1 video compression standard. Chapter 13 discusses the MPEG-2 video compression standard. Chapter 14 discusses the MPEG-4 video compression standard, including MPEG-4.10 (H.264). Chapter 15 discusses the ATSC Digital Television standard used in the United States. Chapter 16 discusses the OpenCable™ Digital Television standard used in the United States. Chapter 17 discusses the DVB Digital Television standard used in Europe and Asia. Chapter 18 discusses the ISDB Digital Television standard used in Japan. Chapter 19 discusses IPTV. This technology sends compressed video over broadband networks such as Internet, DSL, FTTH (Fiber To The Home), etc. Finally, Chapter 20 is a glossary of over 400 video terms. If you encounter an unfamiliar term, it likely will be defined in the glossary. Introduction 5 Standards Organizations Many standards organizations, some of which are listed below, are involved in specifying video standards. Advanced Television Systems Committee (ATSC) www.atsc.org Association of Radio Industries and Businesses (ARIB) www.arib.or.jp Cable Television Laboratories www.cablelabs.com Consumer Electronics Associations (CEA) www.ce.org Digital Video Broadcasting (DVB) www.dvb.org International Electrotechnical Commission (IEC) www.iec.ch Institute of Electrical and Electronics Engineers (IEEE) www.ieee.org International Organization for Standardization (ISO) www.iso.org International Telecommunication Union (ITU) www.itu.int Society of Cable Telecommunications Engineers (SCTE) www.scte.org Society of Motion Picture and Television Engineers (SMPTE) www.smpte.org Electronic Industries Alliance (EIA) www.eia.org European Broadcasting Union (EBU) www.ebu.ch Video Electronics Standards Association (VESA) www.vesa.org European Telecommunications Standards Institute (ETSI) www.etsi.org 6 Chapter 2: Introduction to Video Chapter 2: Introduction to Video Chapter 2 Introduction to Video Although there are many variations and implementation techniques, video signals are just a way of transferring visual information from one point to another. The information may be from a VCR, DVD player, a channel on the local broadcast, cable television, or satellite system, the Internet, or one of many other sources. Invariably, the video information must be transferred from one device to another. It could be from a satellite set-top box or DVD player to a television. Or it could be from one chip to another inside the satellite set-top box or television. Although it seems simple, there are many different requirements, and therefore many different ways of doing it. Analog vs. Digital Until a few years ago, most video equipment was designed primarily for analog video. Digital video was confined to professional applications, such as video editing. The average consumer now uses digital video every day thanks to continuing falling costs. This trend has led to the development of DVD players and recorders, digital set-top boxes, digital television (DTV), portable video players, and the ability to use the Internet for transferring video data. Video Data Initially, video contained only gray-scale (also called black-and-white) information. While color broadcasts were being developed, attempts were made to transmit color video using analog RGB (red, green, blue) data. However, this technique occupied 3× more bandwidth than the current gray-scale solution, so alternate methods were developed that led to using Y, R–Y, and G–Y data to represent color information. A technique was then developed to transmit this Y, R–Y, and G–Y information using one signal, instead of three separate signals, and in the same bandwidth as the original gray-scale video signal. This com- 6 Video Timing 7 posite video signal is what the NTSC, PAL, and SECAM video standards are still based on today. This technique is discussed in more detail in Chapters 8 and 9. Today, even though there are many ways of representing video, they are still all related mathematically to RGB. These variations are discussed in more detail in Chapter 3. S-Video was developed for connecting consumer equipment together (it is not used for broadcast purposes). It is a set of two analog signals, one gray-scale (Y) and one that carries the analog R–Y and B–Y color information in a specific format (also called C or chroma). Once available only for S-VHS, it is now supported on most consumer video products. This is discussed in more detail in Chapter 9. Although always used by the professional video market, analog RGB video data has made a temporary comeback for connecting highend consumer equipment together. Like SVideo, it is not used for broadcast purposes. A variation of the Y, R–Y, and G–Y video signals, called YPbPr, is now commonly used for connecting consumer video products together. Its primary advantage is the ability to transfer high-definition video between consumer products. Some manufacturers incorrectly label the YPbPr connectors YUV, YCbCr, or Y(B-Y)(R-Y). Chapter 5 discusses the various analog interconnect schemes in detail. Best Connection Method There is always the question of “what is the best connection method for equipment?” For DVD players and digital cable/satellite/ terrestrial set-top boxes, the typical order of decreasing video quality is: 1. HDMI (digital YCbCr) 2. HDMI (digital RGB) 3. Analog YPbPr 4. Analog RGB 5. Analog S-Video 6. Analog Composite Some will disagree about the order. However, most consumer products do digital video processing in the YCbCr color space. Therefore, using YCbCr as the interconnect for equipment reduces the number of color space conversions required. Color space conversion of digital signals is still preferable to D/A (digital-to-analog) conversion followed by A/D (analog-to-digital) conversion, hence the positioning of HDMI RGB above analog YPbPr. The computer industry has standardized on analog and digital RGB for connecting to the computer monitor. Video Timing Digital Video The most common digital signals used are RGB and YCbCr. RGB is simply the digitized version of the analog RGB video signals. YCbCr is basically the digitized version of the analog YPbPr video signals, and is the format used by DVD and digital television. Chapter 6 further discusses the various digital interconnect schemes. Although it looks like video is continuous motion, it is actually a series of still images, changing fast enough that it looks like continuous motion, as shown in Figure 2.1. This typically occurs 50 or 60 times per second for consumer video, and 70–90 times per second for computer displays. Special timing information, called vertical sync, is used to indicate when a new image is starting. 8 Chapter 2: Introduction to Video IMAGE 4 IMAGE 3 IMAGE 2 IMAGE 1 TIME Figure 2.1. Video Is Composed of a Series of Still Images. Each image is composed of individual lines of data. Each still image is also composed of scan lines, lines of data that occur sequentially one after another down the display, as shown in Figure 2.1. Additional timing information, called horizontal sync, is used to indicate when a new scan line is starting. The vertical and horizontal sync information is usually transferred in one of three ways: 1. Separate horizontal and vertical sync signals 2. Separate composite sync signal 3. Composite sync signal embedded within the video signal The composite sync signal is a combination of both vertical and horizontal sync. Computer and consumer equipment that uses analog RGB video usually uses technique 1 or 2. Consumer equipment that supports composite video or analog YPbPr video usually uses technique 3. For digital video, either technique 1 is commonly used or timing code words are embedded within the digital video stream. This is discussed in Chapter 6. Interlaced vs. Progressive Since video is a series of still images, it makes sense to simply display each full image consecutively, one after the another. This is the basic technique of progressive, or non-interlaced, displays. For progressive displays that “paint” an image on the screen, such as a CRT, each image is displayed starting at the top left corner of the display, moving to the right edge of the display. Then scanning then moves down one line, and repeats scanning left-to-right. This process is repeated until the entire screen is refreshed, as seen in Figure 2.2. Video Resolution 9 In the early days of television, a technique called “interlacing” was used to reduce the amount of information sent for each image. By transferring the odd-numbered lines, followed by the even-numbered lines (as shown in Figure 2.3), the amount of information sent for each image was halved. Given this advantage of interlacing, why bother to use progressive? With interlace, each scan line is refreshed half as often as it would be if it were a progressive display. Therefore, to avoid line flicker on sharp edges due to a too-low frame rate, the line-to-line changes are limited, essentially by vertically lowpass filtering the image. A progressive display has no limit on the line-to-line changes, so is capable of providing a higherresolution image (vertically) without flicker. Today, most broadcasts (including HDTV) are still transmitted as interlaced. Most CRTbased displays are still interlaced while LCD, plasma, and computer displays are progressive. Video Resolution Video resolution is one of those “fuzzy” things in life. It is common to see video resolutions of 720 × 480 or 1920 × 1080. However, those are just the number of horizontal samples and vertical scan lines, and do not necessarily convey the amount of useful information. For example, an analog video signal can be sampled at 13.5 MHz to generate 720 samples per line. Sampling the same signal at 27 MHz would generate 1440 samples per line. However, only the number of samples per line has changed, not the resolution of the content. Therefore, video is usually measured using lines of resolution. In essence, how many distinct black and white vertical lines can be seen across the display? This number is then normalized to a 1:1 display aspect ratio (dividing the number by 3/4 for a 4:3 display, or by 9/16 for a 16:9 display). Of course, this results in a lower value for widescreen (16:9) displays, which goes against intuition. Standard-Definition Standard-definition video is usually defined as having 480 or 576 interlaced active scan lines, and is commonly called “480i” and “576i,” respectively. For a fixed-pixel (non-CRT) consumer display with a 4:3 aspect ratio, this translates into an active resolution of 720 × 480i or 720 × 576i. For a 16:9 aspect ratio, this translates into an active resolution of 960 × 480i or 960 × 576i. Enhanced-Definition Enhanced-definition video is usually defined as having 480 or 576 progressive active scan lines, and is commonly called “480p” and “576p,” respectively. For a fixed-pixel (non-CRT) consumer display with a 4:3 aspect ratio, this translates into an active resolution of 720 × 480p or 720 × 576p. For a 16:9 aspect ratio, this translates into an active resolution of 960 × 480p or 960 × 576p. The difference between standard and enhanced definition is that standard-definition is interlaced, while enhanced-definition is progressive. 10 Chapter 2: Introduction to Video VERTICAL SCANNING HORIZONTAL SCANNING ... Figure 2.2. Progressive Displays “Paint” the Lines of an Image Consecutively, One After Another. VERTICAL SCANNING HORIZONTAL SCANNING FIELD 1 HORIZONTAL SCANNING FIELD 2 ... ... Figure 2.3. Interlaced Displays “Paint” First One-Half of the Image (Odd Lines), Then the Other Half (Even Lines). Audio and Video Compression 11 High-Definition High-definition video is usually defined as having 720 progressive (720p) or 1080 interlaced (1080i) active scan lines. For a fixed-pixel (non-CRT) consumer display with a 16:9 aspect ratio, this translates into an active resolution of 1280 × 720p or 1920 × 1080i, respectively. However, HDTV displays are technically defined as being capable of displaying a minimum of 720p or 1080i active scan lines. They also must be capable of displaying 16:9 content using a minimum of 540 progressive (540p) or 810 interlaced (810i) active scan lines. This enables the manufacturing of CRT-based HDTVs with a 4:3 aspect ratio and LCD/ plasma 16:9 aspect ratio displays with resolutions of 1024 × 1024p, 1280 × 768p, 1024 × 768p, and so on, lowering costs. Audio and Video Compression The recent advances in consumer electronics, such as digital television, DVD players and recorders, digital video recorders, and so on, were made possible due to audio and video compression based largely on MPEG-2 video with Dolby® Digital, DTS®, MPEG-1, or MPEG-2 audio. New audio and video codecs, such as MPEG-4 HE-AAC, MPEG-4.10 (H.264), and SMPTE 421M (VC-1), offer better compression than previous codecs for the same quality. These advances are enabling new ways of distributing content (both to consumers and within the home), new consumer products (such as portable video players and mobile video/cell phones), and more cable/satellite channels. Application Block Diagrams Looking at a few simplified block diagrams helps envision how video flows through its various operations. DVD Players Figure 2.4 is a simplified block diagram for a basic DVD player, showing the common blocks. Today, all of this is on a single low-cost chip. In addition to playing DVDs (which are based on MPEG-2 video compression), DVD players are now expected to handle MP3 and WMA audio, MPEG-4 video (for DivX Video), JPEG images, and so on. Special playback modes such as slow/fast forward/reverse at various speeds are also expected. Support for DVD Audio and SACD is also popular. A recent enhancement to DVD players is the ability to connect to a home network for playing content (music, video, pictures, etc.) residing on the PC. These “networked DVD players” may also include the ability to play movies from the Internet and download content onto an internal hard disc drive (HDD) for later viewing. Support for playing audio, video, and pictures from a variety of flash-memory cards is also growing. In an attempt to look different to quickly grab buyers’ attention, some DVD player manufacturers tweak the video frequency response. Since this feature is usually irritating over the long term, it should be defeated or properly adjusted. For the film look many video enthusiasts strive for, the frequency response should be as flat as possible. Another issue is the output levels of the analog video signals. Although it is easy to generate very accurate video levels, they vary considerably. Reviews now point out this issue 12 Chapter 2: Introduction to Video FROM READ ELECTRONICS IR INPUT CSS DESCRAMBLE -------------PROGRAM STREAM DEMUX CLOSED CAPTIONING, TELETEXT, WIDESCREEN VBI DATA VIDEO DECOMPRESS (MPEG 2) SCALING BRIGHTNESS CONTRAST HUE SATURATION SHARPNESS GRAPHICS OVERLAY AUDIO DECOMPRESS (DOLBY DIGITAL AND DTS) NTSC / PAL VIDEO ENCODE STEREO AUDIO DAC DIGITAL AUDIO INTERFACE CPU Figure 2.4. Simplified Block Diagram of a Basic DVD Player. NTSC / PAL, S-VIDEO RGB / YPBPR HDMI AUDIO L AUDIO R 5.1 DIGITAL AUDIO since switching between sources may mean changing brightness or black levels, defeating any television calibration or personal adjustments that may have been done by the user. Digital Media Adapters Digital media adapters connect to a home network for playing content (music, video, pictures, and so on) residing on a PC or media server. These small, low-cost boxes enable content to be easily enjoyed on any or all televisions in the home. Many support optional wireless networking, simplifying installation. Figure 2.5 is a simplified block diagram for a basic digital media adapter, showing the common blocks. Today, all of this is on a single lowcost chip. Digital Television Set-Top Boxes The digital television standards fall into seven major categories: ATSC (Advanced Television Systems Committee) DVB (Digital Video Broadcast) ARIB (Association of Radio Industries and Businesses) IPTV (including DVB and ARIB over IP) Open digital cable standards, such as OpenCable Proprietary digital cable standards Proprietary digital satellite standards Application Block Diagrams 13 ETHERNET NETWORK IR INPUT DEMUX CLOSED CAPTIONING, TELETEXT, WIDESCREEN VBI DATA VIDEO DECOMPRESS SCALING BRIGHTNESS CONTRAST HUE SATURATION SHARPNESS GRAPHICS OVERLAY AUDIO DECOMPRESS NTSC / PAL VIDEO ENCODE STEREO AUDIO DAC DIGITAL AUDIO INTERFACE CPU NTSC / PAL, S-VIDEO RGB / YPBPR HDMI AUDIO L AUDIO R 5.1 DIGITAL AUDIO Figure 2.5. Simplified Block Diagram of a Digital Media Adapter. Originally based on MPEG-2 video and Dolby® Digital or MPEG audio, they now support new advanced audio and video standards, such as MPEG-4 HE-AAC audio, Dolby® Digital Plus audio, MPEG-4.10 (H.264) video, and SMPTE 421M (VC-1) video. Figure 2.6 is a simplified block diagram for a digital television set-top box, showing the common audio and video processing blocks. It is used to receive digital television broadcasts, from either terrestrial (over-the-air), cable, or satellite. A digital television may include this circuitry inside the television. Many set-top boxes now include two tuners and digital video recorder (DVR) capability. This enables recording one program onto an internal HDD while watching another. Two tuners are also common in digital television receivers to support a picture-in-picture (PIP) feature. 14 Chapter 2: Introduction to Video Figure 2.6. Simplified Block Diagram of a Digital Television Set-Top Box. RF INPUT TUNER QAM / VSB / COFDM DEMOD AND FEC CHANNEL DESCRAMBLE -------------TRANSPORT STREAM DEMUX CLOSED CAPTIONING, TELETEXT, WIDESCREEN VBI DATA VIDEO DECOMPRESS SCALING BRIGHTNESS CONTRAST HUE SATURATION SHARPNESS GRAPHICS OVERLAY AUDIO DECOMPRESS NTSC / PAL VIDEO ENCODE STEREO AUDIO DAC DIGITAL AUDIO INTERFACE NTSC / PAL, S-VIDEO RGB / YPBPR VIDEO HDMI AUDIO L AUDIO R 5.1 DIGITAL AUDIO IR INPUT NTSC / PAL VIDEO DECODE NTSC / PAL AUDIO DECODE CPU Chapter 3: Color Spaces Chapter 3 RGB Color Space 15 Color Spaces A color space is a mathematical representation of a set of colors. The three most popular color models are RGB (used in computer graphics); YIQ, YUV, or YCbCr (used in video systems); and CMYK (used in color printing). However, none of these color spaces is directly related to the intuitive notions of hue, saturation, and brightness. This resulted in the temporary pursuit of other models, such as HSI and HSV, to simplify programming, processing, and end-user manipulation. All of the color spaces can be derived from the RGB information supplied by devices such as cameras and scanners. RGB Color Space The red, green, and blue (RGB) color space is widely used for computer graphics and displays. Red, green, and blue are three primary additive colors (individual components are added together to form a desired color) and are represented by a three-dimensional, Cartesian coordinate system (Figure 3.1). The indicated diagonal of the cube, with equal amounts of each primary component, represents various gray levels. Table 3.1 contains the RGB values for 100% amplitude, 100% saturated color bars, a common video test signal. MAGENTA BLUE CYAN WHITE BLACK RED GREEN YELLOW Figure 3.1. The RGB Color Cube. 15 16 Chapter 3: Color Spaces Nominal Range White Yellow Cyan Green Magenta Red Blue Black R 0 to 255 255 255 0 0 255 255 0 0 G 0 to 255 255 255 255 255 0 0 0 0 B 0 to 255 255 0 255 0 255 0 255 0 Table 3.1. 100% RGB Color Bars. The RGB color space is the most prevalent choice for computer graphics because color displays use red, green, and blue to create the desired color. Therefore, the choice of the RGB color space simplifies the architecture and design of the system. Also, a system that is designed using the RGB color space can take advantage of a large number of existing software routines, since this color space has been around for a number of years. However, RGB is not very efficient when dealing with real-world images. All three RGB components need to be of equal bandwidth to generate any color within the RGB color cube. The result of this is a frame buffer that has the same pixel depth and display resolution for each RGB component. Also, processing an image in the RGB color space is usually not the most efficient method. For example, to modify the intensity or color of a given pixel, the three RGB values must be read from the frame buffer, the intensity or color calculated, the desired modifications performed, and the new RGB values calculated and written back to the frame buffer. If the system had access to an image stored directly in the intensity and color format, some processing steps would be faster. For these and other reasons, many video standards use luma and two color difference signals. The most common are the YUV, YIQ, and YCbCr color spaces. Although all are related, there are some differences. sRGB Due to the many implementations of the RGB color space, the sRGB color space was formalized. The specification for sRGB (IEC 61966–2–1) uses BT.709 chromaticity, D65 reference white, a display gamma of 2.2, and linear RGB (8 bits per color). sRGB values have a normalized range of 0– 1, with 8-bit digital sRGB values having a range of 0–255 for black–white. A version called “Studio RGB” uses an 8-bit range of 16–235 for black–white, enabling compatibility with video applications. One limitation of sRGB is that since the normalized values are restricted to the 0–1 range, colors outside the gamut (the triangle produced by them) cannot be used. For this reason, the extended RGB color space, “scRGB,” was developed. YUV Color Space 17 scRGB The scRGB color space (formerly called sRGB64) extends the dynamic range, color gamut, and bit precision over sRGB. The scRGB gamut is not only much larger than the sRGB gamut, but it is larger than what the human visual system can see. The specification for scRGB (IEC 61966–2–2) uses BT.709 chromaticity, D65 reference white, and linear RGB data (16 bits per color). Instead of using a normalized range of 0–1, a range of –0.5 to +7.4999 is supported. Values below 0 and above 1 are what enable scRGB to have a larger gamut, compared to sRGB, even though it has the same primary colors. The correlation between the linear 16-bit scRGB values and normalized range are: 00000 = –0.5 04096 = 0.0 (black) 12288 = 1.0 (white) 16384 = 1.5 65535 = 7.4999 After gamma correction, the correlation between the nonlinear 16-bit scR´G´B´ values and normalized range are: 00000 = –0.7354 04096 = 0.0 (black) 12288 = 1.0 (white) 65535 = 2.3876 scRGB to sRGB Conversion To convert linear 16-bit scRGB to gamma- corrected 8-bit sRGB (notated as sR´G´B´8): scR = (scR16 / 8192) − 0.5 scG = (scG16 / 8192) − 0.5 scB = (scB16 / 8192) − 0.5 if (scR16, scG16, scB16) ≤ 4095 sR´8 = 0 sG´8 = 0 sB´8 = 0 if 4096 ≤ (scR16, scG16, scB16) ≤ 4243 sR´8 = round[4.500 × scR × 255] sG´8 = round[4.500 × scG × 255] sB´8 = round[4.500 × scB × 255] if 4244 ≤ (scR16, scG16, scB16) ≤ 12288 sR´8 = round[(1.099 × scR0.45 – 0.099) × 255] sG´8 = round[(1.099 × scG0.45 – 0.099) × 255] sB´8 = round[(1.099 × scB0.45 – 0.099) × 255] if (scR16, scG16, scB16) ≥ 12289 sR´8 = 255 sG´8 = 255 sB´8 = 255 YUV Color Space The YUV color space is used by the PAL (Phase Alternation Line), NTSC (National Television System Committee), and SECAM (Sequentiel Couleur Avec Mémoire or Sequential Color with Memory) composite color video standards. The black-and-white system used only luma (Y) information; color information (U and V) was added in such a way that a black-and-white receiver would still display a normal black-and-white picture. Color receivers decoded the additional color information to display a color picture. 18 Chapter 3: Color Spaces The basic equations to convert between gamma-corrected RGB (notated as R´G´B´) and YUV are: Y = 0.299R´ + 0.587G´ + 0.114B´ U = – 0.147R´ – 0.289G´ + 0.436B´ = 0.492 (B´ – Y) V = 0.615R´ – 0.515G´ – 0.100B´ = 0.877(R´ – Y) R´ = Y + 1.140V G´ = Y – 0.395U – 0.581V B´ = Y + 2.032U For digital R´G´B´ values with a range of 0– 255, Y has a range of 0–255, U a range of 0 to ±112, and V a range of 0 to ±157. These equations are usually scaled to simplify the implementation in an actual NTSC or PAL digital encoder or decoder. Note that for digital data, 8-bit YUV and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. If the full range of (B´ – Y) and (R´ – Y) had been used, the composite NTSC and PAL levels would have exceeded what the (then current) black-and-white television transmitters and receivers were capable of supporting. Experimentation determined that modulated subcarrier excursions of 20% of the luma (Y) signal excursion could be permitted above white and below black. The scaling factors were then selected so that the maximum level of 75% amplitude, 100% saturation yellow and cyan color bars would be at the white level (100 IRE). YIQ Color Space The YIQ color space, further discussed in Chapter 8, is derived from the YUV color space and is optionally used by the NTSC composite color video standard. (The “I” stands for “inphase” and the “Q” for “quadrature,” which is the modulation method used to transmit the color information.) The basic equations to convert between R´G´B´ and YIQ are: Y = 0.299R´ + 0.587G´ + 0.114B´ I = 0.596R´ – 0.275G´ – 0.321B´ = Vcos 33° – Usin 33° = 0.736(R´ – Y) – 0.268(B´ – Y) Q = 0.212R´ – 0.523G´ + 0.311B´ = Vsin 33° + Ucos 33° = 0.478(R´ – Y) + 0.413(B´ – Y) or, using matrix notation: I = 0 1 cos(33) sin(33) U Q 1 0 –sin (33) cos(33) V R´ = Y + 0.956I + 0.621Q G´ = Y – 0.272I – 0.647Q B´ = Y – 1.107I + 1.704Q For digital R´G´B´ values with a range of 0– 255, Y has a range of 0–255, I has a range of 0 to ±152, and Q has a range of 0 to ±134. I and Q are obtained by rotating the U and V axes 33°. These equations are usually scaled to simplify the implementation in an actual NTSC digital encoder or decoder. Note that for digital data, 8-bit YIQ and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. YCbCr Color Space 19 YCbCr Color Space The YCbCr color space was developed as part of ITU-R BT.601 during the development of a world-wide digital component video standard (discussed in Chapter 4). YCbCr is a scaled and offset version of the YUV color space. Y is defined to have a nominal 8-bit range of 16–235; Cb and Cr are defined to have a nominal range of 16–240. There are several YCbCr sampling formats, such as 4:4:4, 4:2:2, 4:1:1, and 4:2:0 that are also described. RGB-YCbCr Equations: SDTV RGB to YCbCr: Analog Equations Many specifications assume the source is analog R´G´B´ with a normalized range of 0–1. This is first converted to analog YPbPr: Y = 0.299R´ + 0.587G´ + 0.114B´ Pb = –0.169R´ – 0.331G´ + 0.500B´ Pr = 0.500R´ – 0.419G´ – 0.081B´ To generate 8-bit YCbCr with the proper values, YPbPr is then quantized to 8 bits: Y = round[219Y + 16] Cb = round[224Pb + 128] Cr = round[224Pr + 128] RGB to YCbCr: Digital Equations To convert 8-bit digital R´G´B´ data with a 16–235 nominal range (Studio R´G´B´) to YCbCr, the analog equations may be simplified to: Y = 0.299R´ + 0.587G´ + 0.114B´ Cb = –0.172R´ – 0.339G´ + 0.511B´ + 128 Cr = 0.511R´ – 0.428G´ – 0.083B´ + 128 Nominal Range White Yellow Cyan Green Magenta Red Blue Black SDTV Y 16 to 235 180 162 131 112 84 65 35 16 Cb 16 to 240 128 44 156 72 184 100 212 128 Cr 16 to 240 128 142 44 58 198 212 114 128 HDTV Y 16 to 235 180 168 145 133 63 51 28 16 Cb 16 to 240 128 44 147 63 193 109 212 128 Cr 16 to 240 128 136 44 52 204 212 120 128 Table 3.2. 75% YCbCr Color Bars. 20 Chapter 3: Color Spaces YCbCr to RGB: Analog Equations Many specifications assume the source is analog YPbPr. This is first converted to analog R´G´B´: R´ = Y + 1.402Pr G´ = Y – 0.714Pr – 0.344Pb B´ = Y + 1.772Pb To generate 8-bit R´G´B´ with a 16–235 nominal range (Studio R´G´B´), R´G´B´ is then quantized to 8 bits: out´ = round[219in´ + 16] YCbCr to RGB: Digital Equations To convert 8-bit YCbCr to R´G´B´ data with a 16–235 nominal range (Studio R´G´B´), the analog equations may be simplified to: R´ = Y + 1.371(Cr – 128) G´ = Y – 0.698(Cr – 128) – 0.336(Cb – 128) B´ = Y + 1.732(Cb – 128) YCbCr to RGB: General Considerations When performing YCbCr to R´G´B´ con- version, the resulting R´G´B´ values have a nominal range of 16–235, with possible occasional excursions into the 0–15 and 236–255 values. This is due to Y and CbCr occasionally going outside the 16–235 and 16–240 ranges, respectively, due to video processing and noise. Note that 8-bit YCbCr and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. Table 3.2 lists the YCbCr values for 75% amplitude, 100% saturated color bars, a common video test signal. Computer Systems Considerations If the R´G´B´ data has a range of 0–255, as is commonly found in computer systems, the following equations may be more convenient to use: Y = 0.257R´ + 0.504G´ + 0.098B´ + 16 Cb = –0.148R´ – 0.291G´ + 0.439B´ + 128 Cr = 0.439R´ – 0.368G´ – 0.071B´ + 128 R´ = 1.164(Y – 16) + 1.596(Cr – 128) G´ = 1.164(Y – 16) – 0.813(Cr – 128) – 0.391(Cb – 128) B´ = 1.164(Y – 16) + 2.018(Cb – 128) Note that 8-bit YCbCr and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. RGB-YCbCr Equations: HDTV RGB to YCbCr: Analog Equations Many specifications assume the source is analog R´G´B´ with a normalized range of 0–1. This is first converted to analog YPbPr: Y = 0.213R´ + 0.715G´ + 0.072B´ Pb = –0.115R´ – 0.385G´ + 0.500B´ Pr = 0.500R´ – 0.454G´ – 0.046B´ To generate 8-bit YCbCr with the proper values, YPbPr is then quantized to 8 bits: Y = round[219Y + 16] Cb = round[224Pb + 128] Cr = round[224Pr + 128] YCbCr Color Space 21 RGB to YCbCr: Digital Equations To convert 8-bit digital R´G´B´ data with a 16–235 nominal range (Studio R´G´B´) to YCbCr, the analog equations may be simplified to: Y = 0.213R´ + 0.715G´ + 0.072B´ Cb = –0.117R´ – 0.394G´ + 0.511B´ + 128 Cr = 0.511R´ – 0.464G´ – 0.047B´ + 128 YCbCr to RGB: Analog Equations Many specifications assume the source is analog YPbPr. This is first converted to analog R´G´B´: R´ = Y + 1.575Pr G´ = Y – 0.468Pr – 0.187Pb B´ = Y + 1.856Pb To generate 8-bit R´G´B´ with a 16–235 nominal range (Studio R´G´B´), R´G´B´ is then quantized to 8 bits: out´ = round[219in´ + 16] YCbCr to RGB: Digital Equations To convert 8-bit YCbCr to R´G´B´ data with a 16–235 nominal range (Studio R´G´B´), the analog equations may be simplified to: R´ = Y + 1.540(Cr – 128) G´ = Y – 0.459(Cr – 128) – 0.183(Cb – 128) B´ = Y + 1.816(Cb – 128) YCbCr to RGB: General Considerations When performing YCbCr to R´G´B´ con- version, the resulting R´G´B´ values have a nominal range of 16–235, with possible occasional excursions into the 0–15 and 236–255 values. This is due to Y and CbCr occasionally going outside the 16–235 and 16–240 ranges, respectively, due to video processing and noise. Note that 8-bit YCbCr and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. Table 3.2 lists the YCbCr values for 75% amplitude, 100% saturated color bars, a common video test signal. Computer Systems Considerations If the R´G´B´ data has a range of 0–255, as is commonly found in computer systems, the following equations may be more convenient to use: Y = 0.183R´ + 0.614G´ + 0.062B´ + 16 Cb = –0.101R´ – 0.338G´ + 0.439B´ + 128 Cr = 0.439R´ – 0.399G´ – 0.040B´ + 128 R´ = 1.164(Y – 16) + 1.793(Cr – 128) G´ = 1.164(Y – 16) – 0.534(Cr – 128) – 0.213(Cb – 128) B´ = 1.164(Y – 16) + 2.115(Cb – 128) Note that 8-bit YCbCr and R´G´B´ data should be saturated at the 0 and 255 levels to avoid underflow and overflow wrap-around problems. 4:4:4 YCbCr Format Figure 3.2 illustrates the positioning of YCbCr samples for the 4:4:4 format. Each sample has a Y, a Cb, and a Cr value. Each sample is typically 8 bits (consumer applications) or 10 bits (pro-video applications) per component. Each sample therefore requires 24 bits (or 30 bits for pro-video applications). 22 Chapter 3: Color Spaces 4:2:2 YCbCr Format Figure 3.3 illustrates the positioning of YCbCr samples for the 4:2:2 format. For every two horizontal Y samples, there is one Cb and Cr sample. Each sample is typically 8 bits (consumer applications) or 10 bits (pro-video applications) per component. Each sample therefore requires 16 bits (or 20 bits for provideo applications), usually formatted as shown in Figure 3.4. To display 4:2:2 YCbCr data, it is first converted to 4:4:4 YCbCr data, using interpolation to generate the missing Cb and Cr samples. 4:1:1 YCbCr Format Figure 3.5 illustrates the positioning of YCbCr samples for the 4:1:1 format (also known as YUV12), used in some consumer video and DV video compression applications. For every four horizontal Y samples, there is one Cb and Cr value. Each component is typically 8 bits. Each sample therefore requires 12 bits, usually formatted as shown in Figure 3.6. To display 4:1:1 YCbCr data, it is first converted to 4:4:4 YCbCr data, using interpolation to generate the missing Cb and Cr samples. 4:2:0 YCbCr Format Rather than the horizontal-only 2:1 reduction of Cb and Cr used by 4:2:2, 4:2:0 YCbCr implements a 2:1 reduction of Cb and Cr in both the vertical and horizontal directions. It is commonly used for video compression. As shown in Figures 3.7 through 3.11, there are several 4:2:0 sampling formats. Table 3.3 lists the YCbCr formats for various DV applications. To display 4:2:0 YCbCr data, it is first converted to 4:4:4 YCbCr data, using interpolation to generate the new Cb and Cr samples. Note that some solutions do not properly convert the 4:2:0 YCbCr data to the 4:4:4 format, resulting in a “chroma bug.” ACTIVE LINE NUMBER 1 [1] 2 [2] 3 X = FIELD 1 (576i FIELD 2) [ X ] = FIELD 2 (576i FIELD 1) CB, CR SAMPLE Y SAMPLE Figure 3.2. 4:4:4 Co-Sited Sampling. The sampling positions on the active scan lines of an interlaced picture. ACTIVE LINE NUMBER 1 [1] 2 [2] 3 X = FIELD 1 (576i FIELD 2) [ X ] = FIELD 2 (576i FIELD 1) CB, CR SAMPLE Y SAMPLE Figure 3.3. 4:2:2 Co-Sited Sampling. The sampling positions on the active scan lines of an interlaced picture. YCbCr Color Space 23 SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE 0 1 2 3 4 5 Y7 - 0 Y6 - 0 Y5 - 0 Y4 - 0 Y3 - 0 Y2 - 0 Y1 - 0 Y0 - 0 CB7 - 0 CB6 - 0 CB5 - 0 CB4 - 0 CB3 - 0 CB2 - 0 CB1 - 0 CB0 - 0 Y7 - 1 Y6 - 1 Y5 - 1 Y4 - 1 Y3 - 1 Y2 - 1 Y1 - 1 Y0 - 1 CR7 - 0 CR6 - 0 CR5 - 0 CR4 - 0 CR3 - 0 CR2 - 0 CR1 - 0 CR0 - 0 Y7 - 2 Y6 - 2 Y5 - 2 Y4 - 2 Y3 - 2 Y2 - 2 Y1 - 2 Y0 - 2 CB7 - 2 CB6 - 2 CB5 - 2 CB4 - 2 CB3 - 2 CB2 - 2 CB1 - 2 CB0 - 2 Y7 - 3 Y6 - 3 Y5 - 3 Y4 - 3 Y3 - 3 Y2 - 3 Y1 - 3 Y0 - 3 CR7 - 2 CR6 - 2 CR5 - 2 CR4 - 2 CR3 - 2 CR2 - 2 CR1 - 2 CR0 - 2 Y7 - 4 Y6 - 4 Y5 - 4 Y4 - 4 Y3 - 4 Y2 - 4 Y1 - 4 Y0 - 4 CB7 - 4 CB6 - 4 CB5 - 4 CB4 - 4 CB3 - 4 CB2 - 4 CB1 - 4 CB0 - 4 Y7 - 5 Y6 - 5 Y5 - 5 Y4 - 5 Y3 - 5 Y2 - 5 Y1 - 5 Y0 - 5 CR7 - 4 CR6 - 4 CR5 - 4 CR4 - 4 CR3 - 4 CR2 - 4 CR1 - 4 CR0 - 4 16 BITS PER SAMPLE - 0 = SAMPLE 0 DATA - 1 = SAMPLE 1 DATA - 2 = SAMPLE 2 DATA - 3 = SAMPLE 3 DATA - 4 = SAMPLE 4 DATA Figure 3.4. 4:2:2 Frame Buffer Formatting. ACTIVE LINE NUMBER 1 [1] 2 [2] 3 X = FIELD 1 (576i FIELD 2) [ X ] = FIELD 2 (576i FIELD 1) CB, CR SAMPLE Y SAMPLE Figure 3.5. 4:1:1 Co-Sited Sampling. The sampling positions on the active scan lines of an interlaced picture. SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE 0 1 2 3 4 5 Y7 - 0 Y6 - 0 Y5 - 0 Y4 - 0 Y3 - 0 Y2 - 0 Y1 - 0 Y0 - 0 Y7 - 1 Y6 - 1 Y5 - 1 Y4 - 1 Y3 - 1 Y2 - 1 Y1 - 1 Y0 - 1 Y7 - 2 Y6 - 2 Y5 - 2 Y4 - 2 Y3 - 2 Y2 - 2 Y1 - 2 Y0 - 2 Y7 - 3 Y6 - 3 Y5 - 3 Y4 - 3 Y3 - 3 Y2 - 3 Y1 - 3 Y0 - 3 Y7 - 4 Y6 - 4 Y5 - 4 Y4 - 4 Y3 - 4 Y2 - 4 Y1 - 4 Y0 - 4 Y7 - 5 Y6 - 5 Y5 - 5 Y4 - 5 Y3 - 5 Y2 - 5 Y1 - 5 Y0 - 5 12 BITS PER SAMPLE CB7 - 0 CB6 - 0 CR7 - 0 CR6 - 0 CB5 - 0 CB4 - 0 CR5 - 0 CR4 - 0 CB3 - 0 CB2 - 0 CR3 - 0 CR2 - 0 CB1 - 0 CB0 - 0 CR1 - 0 CR0 - 0 CB7 - 4 CB6 - 4 CR7 - 4 CR6 - 4 CB5 - 4 CB4 - 4 CR5 - 4 CR4 - 4 - 0 = SAMPLE 0 DATA - 1 = SAMPLE 1 DATA - 2 = SAMPLE 2 DATA - 3 = SAMPLE 3 DATA - 4 = SAMPLE 4 DATA Figure 3.6. 4:1:1 Frame Buffer Formatting. 24 Chapter 3: Color Spaces ACTIVE LINE NUMBER 1 2 3 4 5 CALCULATED CB, CR SAMPLE Y SAMPLE Figure 3.7. 4:2:0 Sampling for H.261, H.263, and MPEG-1. The sampling positions on the active scan lines of a progressive or noninterlaced picture. ACTIVE LINE NUMBER 1 2 3 4 5 CALCULATED CB, CR SAMPLE Y SAMPLE Figure 3.8. 4:2:0 Sampling for MPEG-2, MPEG-4.2, MPEG-4.10 (H.264), and SMPTE 421M (VC-1). The sampling positions on the active scan lines of a progressive or noninterlaced picture. YCbCr Format 25 Mbps DV 50 Mbps DV 100 Mbps DV 480-Line DV 576-Line DV 480-Line DVCAM 576-Line DVCAM D-7 (DVCPRO) DVCPRO 50 Digital Betacam D-9 (Digital S) DVCPRO HD D-9 HD MPEG-1 MPEG-2, -4.2, -4.10 (H.264) H.261, H.263 4:4:4 Co-Sited × 4:2:2 Co-Sited ××××× × 4:1:1 Co-Sited × × × 4:2:0 ××× 4:2:0 Co-Sited × × Table 3.3. YCbCr Formats for Various DV Applications. ACTIVE LINE NUMBER 1 [1] 2 [2] 3 [3] 4 [4] FIELD N YCbCr Color Space 25 FIELD N + 1 CALCULATED CB, CR SAMPLE Y SAMPLE Figure 3.9. 4:2:0 Sampling for MPEG-2, MPEG-4.2, MPEG-4.10 (H.264), and SMPTE 421M (VC-1). The sampling positions on the active scan lines of an interlaced picture (top_field_first = 1). ACTIVE LINE NUMBER 1 [1] 2 [2] 3 [3] 4 [4] FIELD N FIELD N + 1 CALCULATED CB, CR SAMPLE Y SAMPLE Figure 3.10. 4:2:0 Sampling for MPEG-2, MPEG-4.2, MPEG-4.10 (H.264), and SMPTE 421M (VC-1). The sampling positions on the active scan lines of an interlaced picture (top_field_first = 0). 26 Chapter 3: Color Spaces ACTIVE LINE NUMBER 1 [1] 2 [2] 3 [3] 4 [4] FIELD N + 1 FIELD N CR SAMPLE CB SAMPLE Y SAMPLE Figure 3.11. 4:2:0 Co-Sited Sampling for 576i DV and DVCAM. The sampling positions on the active scan lines of an interlaced picture. xvYCC Color Space PhotoYCC Color Space The xvYCC (extended gamut YCbCr for video) color space extends the color gamut of normal YCbCr, enabling 1.8× more colors to be reproduced. The specification for xvYCC (IEC 61966–2–4) uses BT.709 chromaticity and D65 reference white. The equations for converting between scR´G´B´ and xvYCbCr are the same as those used for converting between R´G´B´ and YCbCr. xvYCC-based YCbCr data has an 8-bit range of 1–254, enabling backwards compatibility with existing designs. Y has an 8-bit range of –15/219 to +238/219 (–0.068493 to +1.086758); CbCr has an 8-bit range of –15/224 to +238/224 (–0.066964 to +1.062500). HDMI uses Gamut Boundary Description metadata to convey xvYCC video data is being used. PhotoYCC (a trademark of Eastman Kodak Company) was developed to encode Photo CD image data. The goal was to develop a display-device-independent color space. For maximum video display efficiency, the color space is based upon ITU-R BT.601 and BT.709. The encoding process (RGB to PhotoYCC) assumes CIE Standard Illuminant D65 and that the spectral sensitivities of the image capture system are proportional to the color-matching functions of the BT.709 reference primaries. The RGB values, unlike those for a computer graphics system, may be negative. PhotoYCC includes colors outside the BT.709 color gamut; these are encoded using negative values. HSI, HLS, and HSV Color Spaces 27 RGB to PhotoYCC Linear RGB data (normalized to have values of 0 to 1) is nonlinearly transformed to PhotoYCC as follows: for R, G, B ≥ 0.018 R´ = 1.099 R0.45 – 0.099 G´ = 1.099 G0.45 – 0.099 B´ = 1.099 B0.45 – 0.099 for –0.018 < R, G, B < 0.018 R´ = 4.5 R G´ = 4.5 G B´ = 4.5 B for R, G, B ≤ –0.018 R´ = – 1.099 |R|0.45 – 0.099 G´ = – 1.099 |G|0.45 – 0.099 B´ = – 1.099 |B|0.45 – 0.099 From R´G´B´ with a 0–255 range, a luma and two chrominance signals (C1 and C2) are generated: Y = 0.213R´ + 0.419G´ + 0.081B´ C1 = – 0.131R´ – 0.256G´ + 0.387B´ + 156 C2 = 0.373R´ – 0.312G´ – 0.061B´ + 137 As an example, a 20% gray value (R, G, and B = 0.2) would be recorded on the PhotoCD disc using the following values: Y = 79 C1 = 156 C2 = 137 PhotoYCC to RGB Since PhotoYCC attempts to preserve the dynamic range of film, decoding PhotoYCC images requires the selection of a color space and range appropriate for the output device. Thus, the decoding equations are not always the exact inverse of the encoding equations. The following equations are suitable for generating RGB values for driving a CRT display, and assume a unity relationship between the luma in the encoded image and the displayed image. R´ = 0.981Y + 1.315(C2 – 137) G´ = 0.981Y – 0.311(C1 – 156) – 0.669(C2 – 137) B´ = 0.981Y + 1.601 (C1 – 156) The R´G´B´ values should be saturated to a range of 0 to 255. The equations above assume the display uses phosphor chromaticities that are the same as the BT.709 reference primaries, and that the video signal luma (V) and the display luminance (L) have the relationship: for V ≥ 0.0812 L = ((V + 0.099) / 1.099)1/0.45 for V < 0.0812 L = V / 4.5 HSI, HLS, and HSV Color Spaces The HSI (hue, saturation, intensity) and HSV (hue, saturation, value) color spaces were developed to be more “intuitive” in manipulating color and were designed to approximate the way humans perceive and interpret color. They were developed when colors had to be specified manually, and are rarely used now 28 Chapter 3: Color Spaces that users can select colors visually or specify Pantone colors. These color spaces are discussed for historic interest. HLS (hue, lightness, saturation) is similar to HSI; the term lightness is used rather than intensity. The difference between HSI and HSV is the computation of the brightness component (I or V), which determines the distribution and dynamic range of both the brightness (I or V) and saturation (S). The HSI color space is best for traditional image processing functions such as convolution, equalization, histograms, and so on, which operate by manipulation of the brightness values since I is equally dependent on R, G, and B. The HSV color space is preferred for manipulation of hue and saturation (to shift colors or adjust the amount of color) since it yields a greater dynamic range of saturation. Figure 3.12 illustrates the single hexcone HSV color model. The top of the hexcone corresponds to V = 1, or the maximum intensity colors. The point at the base of the hexcone is black and here V = 0. Complementary colors are 180° opposite one another as measured by H, the angle around the vertical axis (V), with red at 0°. The value of S is a ratio, ranging from 0 on the center line vertical axis (V) to 1 on the sides of the hexcone. Any value of S between 0 and 1 may be associated with the point V = 0. The point S = 0, V = 1 is white. Intermediate values of V for S = 0 are the grays. Note that when S = 0, the value of H is irrelevant. From an artist’s viewpoint, any color with V = 1, S = 1 is a pure pigment (whose color is defined by H). Adding white corresponds to decreasing S (without changing V); adding black corresponds to decreasing V (without changing S). Tones are created by decreasing both S and V. Table 3.4 lists the 75% amplitude, 100% saturated HSV color bars. Figure 3.13 illustrates the double hexcone HSI color model. The top of the hexcone corre- sponds to I = 1, or white. The point at the base of the hexcone is black and here I = 0. Complementary colors are 180° opposite one another as measured by H, the angle around the vertical axis (I), with red at 0° (for consistency with the HSV model, we have changed from the Tektronix convention of blue at 0°). The value of S ranges from 0 on the vertical axis (I) to 1 on the surfaces of the hexcone. The grays all have S = 0, but maximum saturation of hues is at S = 1, I = 0.5. Table 3.5 lists the 75% amplitude, 100% saturated HSI color bars. Chromaticity Diagram The color gamut perceived by a person with normal vision (the 1931 CIE Standard Observer) is shown in Figure 3.14. The diagram and underlying mathematics were updated in 1960 and 1976; however, the NTSC television system is based on the 1931 specifications. Color perception was measured by viewing combinations of the three standard CIE (International Commission on Illumination or Commission Internationale de I’Eclairage) primary colors: red with a 700-nm wavelength, green at 546.1 nm, and blue at 435.8 nm. These primary colors, and the other spectrally pure colors resulting from mixing of the primary colors, are located along the curved outer boundary line (called the spectrum locus), shown in Figure 3.14. The ends of the spectrum locus (at red and blue) are connected by a straight line that represents the purples, which are combinations of red and blue. The area within this closed boundary contains all the colors that can be generated by mixing light of different colors. The closer a color is to the boundary, the more saturated it is. Colors within the boundary are perceived as becoming more pastel as the cen- Chromaticity Diagram 29 CYAN 180˚ GREEN 120˚ V YELLOW 60˚ WHITE 1.0 RED 0˚ BLUE 240˚ MAGENTA 300˚ H 0.0 S BLACK Figure 3.12. Single Hexcone HSV Color Model. Nominal Range White Yellow Cyan Green Magenta Red Blue Black H 0° to 360° – 60° 180° 120° 300° 0° 240° – S 0 to 1 0 1 1 1 1 1 1 0 V 0 to 1 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0 Table 3.4. 75% HSV Color Bars. 30 Chapter 3: Color Spaces I WHITE 1.0 CYAN 180˚ GREEN 120˚ BLUE 240˚ YELLOW 60˚ RED 0˚ MAGENTA 300˚ H S BLACK 0.0 Figure 3.13. Double Hexcone HSI Color Model. For consistency with the HSV model, we have changed from the Tektronix convention of blue at 0° and depict the model as a double hexcone rather than as a double cone. Nominal Range White Yellow Cyan Green Magenta Red Blue Black H 0° to 360° – 60° 180° 120° 300° 0° 240° – S 0 to 1 0 1 1 1 1 1 1 0 I 0 to 1 0.75 0.375 0.375 0.375 0.375 0.375 0.375 0 Table 3.5. 75% HSI Color Bars. For consistency with the HSV model, we have changed from the Tektronix convention of blue at 0°. Chromaticity Diagram 31 ter of the diagram (white) is approached. Each point on the diagram, representing a unique color, may be identified by its x and y coordinates. In the CIE system, the intensities of red, green, and blue are transformed into what are called the tristimulus values, which are represented by the capital letters X, Y, and Z. These values represent the relative quantities of the primary colors. The coordinate axes of Figure 3.14 are derived from the tristimulus values: x = X/(X + Y + Z) = red/(red + green + blue) y = Y/(X + Y + Z) = green/(red + green + blue) z = Z/(X + Y + Z) = blue/(red + green + blue) The coordinates x, y, and z are called chromaticity coordinates, and they always add up to 1. As a result, z can always be expressed in terms of x and y, which means that only x and y are required to specify any color, and the diagram can be two-dimensional. Typically, a source or display specifies three (x, y) coordinates to define the three primary colors it uses. The triangle formed by the three (x, y) coordinates encloses the gamut of y 1.0 0.9 520 0.8 GREEN 540 0.7 0.6 500 0.5 560 YELLOW 580 ORANGE 0.4 600 0.3 CYAN WHITE PINK RED 780 NM 0.2 480 0.1 BLUE PURPLE 380 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x Figure 3.14. CIE 1931 Chromaticity Diagram Showing Various Color Regions. y 1.0 0.9 520 0.8 GREEN 0.7 0.6 500 0.5 540 1953 NTSC COLOR GAMUT NTSC / PAL / SECAM / HDTV COLOR GAMUT 560 580 0.4 600 0.3 RED 780 NM 0.2 480 0.1 BLUE 380 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x Figure 3.15. CIE 1931 Chromaticity Diagram Showing Various Color Gamuts. 32 Chapter 3: Color Spaces colors that the source or display can reproduce. This is shown in Figure 3.15, which compares the color gamuts of NTSC, PAL and HDTV. Note that no set of three colors can generate all possible colors, which is why television pictures are never completely accurate. In addition, a source or display usually specifies the (x, y) coordinate of the white color used, since pure white is not usually captured or reproduced. White is defined as the color captured or produced when all three primary signals are equal, and it has a subtle shade of color to it. Note that luminance, or brightness, information is not included in the standard CIE 1931 chromaticity diagram, but is an axis that is orthogonal to the (x, y) plane. The lighter a color is, the more restricted the chromaticity range is. The RGB chromaticities and reference white (CIE illuminate C) for the 1953 NTSC standard are: R: xr = 0.67 yr = 0.33 G: xg = 0.21 yg = 0.71 B: xb = 0.14 yb = 0.08 white: xw = 0.3101 yw = 0.3162 Modern NTSC, 480i and 480p video systems use a different set of RGB chromaticities (SMPTE “C”) and reference white (CIE illuminate D65): R: xr = 0.630 yr = 0.340 G: xg = 0.310 yg = 0.595 B: xb = 0.155 yb = 0.070 white: xw = 0.3127 yw = 0.3290 The RGB chromaticities and reference white (CIE illuminate D65) for PAL, SECAM, 576i and 576p video systems are: R: xr = 0.64 yr = 0.33 G: xg = 0.29 yg = 0.60 B: xb = 0.15 yb = 0.06 white: xw = 0.3127 yw = 0.3290 The RGB chromaticities and reference white (CIE illuminate D65) for sRGB, scRGB, xvYCC, and HDTV are based on BT.709 and SMPTE 274M: R: xr = 0.64 yr = 0.33 G: xg = 0.30 yg = 0.60 B: xb = 0.15 yb = 0.06 white: xw = 0.3127 yw = 0.3290 Since different chromaticity and reference white values are used for various video standards, minor color errors may occur when the source and display values do not match; for example, displaying a 480i or 480p program on an HDTV, or displaying an HDTV program on a NTSC television. These minor color errors can easily be corrected at the display by using a 3 × 3 matrix multiplier, as discussed in Chapter 7. The RGB chromaticities for consumer displays are usually slightly different from the standards. As a result, one or more of the RGB colors are slightly off, such as having too much orange in the red, or too much blue in the green. This can usually be compensated by having the display professionally calibrated. Non-RGB Color Space Considerations When processing information in a nonRGB color space (such as YIQ, YUV, or YCbCr), care must be taken that combinations of values are not created that result in the gen- Non-RGB Color Space Considerations 33 Y = 255, CB = CR = 128 ALL POSSIBLE YCBCR VALUES Y W 255 G Y C R M 255 CR BK 0 CB B 255 YCBCR VALID COLOR BLOCK R = RED G = GREEN B = BLUE Y = YELLOW C = CYAN M = MAGENTA W = WHITE BK = BLACK Figure 3.16. RGB Limits Transformed into 3-D YCbCr Space. eration of invalid RGB colors. The term invalid refers to RGB components outside the normalized RGB limits of (1, 1, 1). For example, given that RGB has a normalized value of (1, 1, 1), the resulting YCbCr value is (235, 128, 128). If Cb and Cr are manipulated to generate a YCbCr value of (235, 64, 73), the corresponding RGB normalized value becomes (0.6, 1.29, 0.56)—note that the green value exceeds the normalized value of 1. From this illustration it is obvious that there are many combinations of Y, Cb, and Cr that result in invalid RGB values; these YCbCr values must be processed so as to generate valid RGB values. Figure 3.16 shows the RGB normalized limits transformed into the YCbCr color space. Best results are obtained using a constant luma and constant hue approach—Y is not altered while Cb and Cr are limited to the maximum valid values having the same hue as the invalid color prior to limiting. The constant hue principle corresponds to moving invalid CbCr combinations directly towards the CbCr origin (128, 128), until they lie on the surface of the valid YCbCr color block. When converting to the RGB color space from a non-RGB color space, care must be taken to include saturation logic to ensure overflow and underflow wrap-around conditions do not occur due to the finite precision of digital circuitry. 8-bit RGB values less than 0 must be set to 0, and values greater than 255 must be set to 255. 34 Chapter 3: Color Spaces Gamma Correction The transfer function of most CRT displays produces an intensity that is proportional to some power (referred to as gamma) of the signal amplitude. As a result, high-intensity ranges are expanded and low-intensity ranges are compressed (see Figure 3.17). This is an advantage in combatting noise, as the eye is approximately equally sensitive to equally relative intensity changes. By “gamma correcting” the video signals before transmission, the intensity output of the display is roughly linear (the gray line in Figure 3.17), and transmission-induced noise is reduced. To minimize noise in the darker areas of the image, modern video systems limit the gain of the curve in the black region. This technique limits the gain close to black and stretches the remainder of the curve to maintain function and tangent continuity. Although video standards assume a display gamma of about 2.2, a gamma of about 2.5 is more realistic for CRT displays. However, this difference improves the viewing in a dimly lit environment. More accurate viewing in a brightly lit environment may be accomplished by applying another gamma factor of about 1.14 (2.5/2.2). It is also common to tweak the gamma curve in the display to get closer to the “film look.” Early NTSC Systems Early NTSC systems assumed a simple transform at the display, with a gamma of 2.2. RGB values are normalized to have a range of 0 to 1: R = R´2.2 G = G´2.2 B = B´2.2 OUT 1.0 0.8 TRANSMITTED PRE-CORRECTION 0.6 0.4 DISPLAY CHARACTERISTIC 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 IN Figure 3.17. Effect of Gamma. To compensate for the nonlinear display, linear RGB data was “gamma-corrected” prior to transmission by the inverse transform. RGB values are normalized to have a range of 0 to 1: R´ = R1/2.2 G´ = G1/2.2 B´ = B1/2.2 Gamma Correction 35 Early PAL and SECAM Systems Most early PAL and SECAM systems assumed a simple transform at the display, with a gamma of 2.8. RGB values are normalized to have a range of 0 to 1: R = R´2.8 G = G´2.8 B = B´2.8 To compensate for the nonlinear display, linear RGB data was “gamma-corrected” prior to transmission by the inverse transform. RGB values are normalized to have a range of 0 to 1: R´ = R1/2.8 G´ = G1/2.8 B´ = B1/2.8 Current Systems Current NTSC, 480i, 480p, and HDTV video systems assume the following transform at the display, with a gamma of [1/0.45]. RGB values are normalized to have a range of 0 to 1: if (R´, G´, B´) < 0.081 R = R´ / 4.5 G = G´ / 4.5 B = B´ / 4.5 if (R´, G´, B´) ≥ 0.081 R = ((R´ + 0.099) / 1.099)1/0.45 G = ((G´ + 0.099) / 1.099)1/0.45 B = ((B´ + 0.099) / 1.099)1/0.45 Extended gamut color spaces, such as scRGB, do additional processing for belowzero values: if (R´, G´, B´) < −0.081 R = −((R´ − 0.099) / −1.099)1/0.45 G = −((G´ − 0.099) / −1.099)1/0.45 B = −((B´ − 0.099) / −1.099)1/0.45 if −0.081 ≤ (R´, G´, B´) < 0.081 R = R´ / 4.5 G = G´ / 4.5 B = B´ / 4.5 To compensate for the nonlinear display, linear RGB data is “gamma-corrected” prior to transmission by the inverse transform. RGB values are normalized to have a range of 0 to 1: if (R, G, B) < 0.018 R´ = 4.5R G´ = 4.5G B´ = 4.5B for (R, G, B) ≥ 0.018 R´ = 1.099R0.45 – 0.099 G´ = 1.099G0.45 – 0.099 B´ = 1.099B0.45 – 0.099 Extended gamut color spaces, such as scRGB, do additional processing for belowzero values: if (R, G, B) < −0.018 R´ = −1.099(−R0.45) + 0.099 G´ = −1.099(−G0.45) + 0.099 B´ = −1.099(−B0.45) + 0.099 36 Chapter 3: Color Spaces if −0.018 ≤ (R, G, B) < 0.018 R´ = 4.5R G´ = 4.5G B´ = 4.5B Although most PAL and SECAM standards specify a gamma of 2.8, a value of [1/0.45] is now commonly used. Thus, these equations are also now used for PAL, SECAM, 576i, and 576p video systems. Non-CRT Displays Since they are not based on CRTs, the LCD, LCOS, DLP, and plasma displays have different display transforms. To simplify interfacing to these displays, their electronics are designed to accept standard gamma-corrected video and then compensate for the actual transform of the display panel. Constant Luminance Problem Due to the wrong order of the gamma and matrix operations, the U and V (or Cb and Cr) signals also contribute to the luminance (Y) signal. This causes an error in the perceived luminance when the amplitude of U and V is not correct. This may be due to bandwidth-limiting U and V or a non-nominal setting of the U and V gain (color saturation). For low color frequencies, there is no problem. For high color frequencies, U and V disappear and consequently R´, G´, and B´ degrade to be equal to (only) Y. References 1. Benson, K. Blair, Television Engineering Handbook. McGraw-Hill, Inc., 1986. 2. Devereux, V. G., 1987, Limiting of YUV digital video signals, BBC Research Department Report BBC RD1987 22. 3. EIA Standard EIA-189–A, July 1976, Encoded Color Bar Signal. 4. Faroudja, Yves Charles, NTSC and Beyond. IEEE Transactions on Consumer Electronics, Vol. 34, No. 1, February 1988. 5. IEC 61966–2–1, 1999, Colour Management–Default RGB Colour Space–sRGB. 6. IEC 61966–2–2, 2003, Colour Management–Extended RGB Colour Space–scRGB. 7. IEC 61966–2–4, 2006, Colour Management–Extended-Gamut YCC Colour Space for Video Applications–xvYCC. 8. ITU-R BT.470–6, 1998, Conventional Television Systems. 9 ITU-R BT.601–5, 1995, Studio Encoding Parameters of Digital Television for Standard 4:3 and Widescreen 16:9 Aspect Ratios. 10. ITU-R BT.709–5, 2002, Parameter Values for the HDTV Standards for Production and International Programme Exchange. 11. Photo CD Information Bulletin, Fully Utilizing Photo CD Images–PhotoYCC Color Encoding and Compression Schemes, May 1994, Eastman Kodak Company. Chapter 4: Video Signals Overview Chapter 4 Digital Component Video Background 37 Video Signals Overview Video signals come in a wide variety of options—number of scan lines, interlaced vs. progressive, analog vs. digital, and so on. This chapter provides an overview of the common video signal formats and their timing. Digital Component Video Background In digital component video, the video signals are in digital form (YCbCr or R´G´B´), being encoded to composite NTSC, PAL, or SECAM only when it is necessary for broadcasting or recording purposes. The European Broadcasting Union (EBU) became interested in a standard for digital component video due to the difficulties of exchanging video material between the 576i PAL and SECAM systems. The format held the promise that the digital video signals would be identical whether sourced in a PAL or SECAM country, allowing subsequent encoding to the appropriate composite form for broadcasting. Consultations with the Society of Motion Pic- ture and Television Engineers (SMPTE) resulted in the development of an approach to support international program exchange, including 480i systems. A series of demonstrations was carried out to determine the quality and suitability for signal processing of various methods. From these investigations, the main parameters of the digital component coding, filtering, and timing were chosen and incorporated into ITU-R BT.601. BT.601 has since served as the starting point for other digital component video standards. Coding Ranges The selection of the coding ranges balanced the requirements of adequate capacity for signals beyond the normal range and minimizing quantizing distortion. Although the black level of a video signal is reasonably well defined, the white level can be subject to variations due to video signal and equipment tolerances. Noise, gain variations, and transients produced by filtering can produce signal levels outside the nominal ranges. 37 38 Chapter 4: Video Signals Overview 8 or 10 bits per sample are used for each of the YCbCr or R´G´B´ components. Although 8bit coding introduces some quantizing distortion, it was originally felt that most video sources contained sufficient noise to mask most of the quantizing distortion. However, if the video source is virtually noise-free, the quantizing distortion is noticeable as contouring in areas where the signal brightness gradually changes. In addition, at least two additional bits of fractional YCbCr or R´G´B´ data were desirable to reduce rounding effects when transmitting between equipment in the studio editing environment. For these reasons, most pro-video equipment uses 10-bit YCbCr or R´G´B´, allowing 2 bits of fractional YCbCr or R´G´B´ data to be maintained. Initial proposals had equal coding ranges for all three YCbCr components. However, this was changed so that Y had a greater margin for overloads at the white levels, as white level limiting is more visible than black. Thus, the nominal 8-bit Y levels are 16–235, while the nominal 8-bit CbCr levels are 16–240 (with 128 corresponding to no color). Occasional excursions into the other levels are permissible, but never at the 0 and 255 levels. For 8-bit systems, the values of 0x00 and 0xFF are reserved for timing information. For 10-bit systems, the values of 0x000–0x003 and 0x3FC–0x3FF are reserved for timing information, to maintain compatibility with 8-bit systems. The YCbCr or R´G´B´ levels to generate 75% color bars are discussed in Chapter 3. Digital R´G´B´ signals are defined to have the same nominal levels as Y to provide processing margin and simplify the digital matrix conversions between R´G´B´ and YCbCr. SDTV Sample Rate Selection Line-locked sampling of analog R´G´B´ or YUV video signals is done. This technique produces a static orthogonal sampling grid in which samples on the current scan line fall directly beneath those on previous scan lines and fields, as shown Figures 3.2 through 3.11. Another important feature is that the sampling is locked in phase so that one sample is coincident with the 50% amplitude point of the falling edge of analog horizontal sync (0x0). This ensures that different sources produce samples at nominally the same positions in the picture. Making this feature common simplifies conversion from one standard to another. For 480i and 576i video systems, several Y sampling frequencies were initially examined, including four times Fsc. However, the fourtimes Fsc sampling rates did not support the requirement of simplifying international exchange of programs, so they were dropped in favor of a single common sampling rate. Because the lowest sample rate possible (while still supporting quality video) was a goal, a 12 MHz sample rate was preferred for a long time, but eventually was considered to be too close to the Nyquist limit, complicating the filtering requirements. When the frequencies between 12 MHz and 14.3 MHz were examined, it became evident that a 13.5 MHz sample rate for Y provided some commonality between 480i and 576i systems. Cb and Cr, being color difference signals, do not require the same bandwidth as the Y, so may be sampled at one-half the Y sample rate, or 6.75 MHz. The “4:2:2” notation now commonly used originally applied to NTSC and PAL video, 480i and 480p Systems 39 implying that Y, U and V were sampled at 4×, 2×, and 2× the color subcarrier frequency, respectively. The “4:2:2” notation was then adapted to BT.601 digital component video, implying that the sampling frequencies of Y, Cb and Cr were 4×, 2×, and 2× 3.375 MHz, respectively. “4:2:2” now commonly means that the sample rate of Cb and Cr is one-half that of Y, regardless of the actual sample rates used. With 13.5 MHz sampling, each scan line contains 858 samples (480i systems) or 864 samples (576i systems) and consists of a digital blanking interval followed by an active line period. Both the 480i and 576i systems use 720 samples during the active line period. Having a common number of samples for the active line period simplifies the design of multistandard equipment and standards conversion. With a sample rate of 6.75 MHz for Cb and Cr (4:2:2 sampling), each active line period contains 360 Cr samples and 360 Cb samples. With analog systems, problems may arise with repeated processing, causing an extension of the blanking intervals and softening of the blanking edges. Using 720 digital samples for the active line period accommodates the range of analog blanking tolerances of both the 480i and 576i systems. Therefore, repeated processing may be done without affecting the digital blanking interval. Blanking to define the analog picture width need only be done once, preferably at the display or upon conversion to analog video. Initially, BT.601 supported only 480i and 576i systems with a 4:3 aspect ratio (720 × 480i and 720 × 576i active resolutions). Support for a 16:9 aspect ratio was then added (960 × 480i and 960 × 576i active resolutions) using an 18 MHz sample rate. EDTV Sample Rate Selection ITU BT.1358 defines the progressive SDTV video signals, also known as 480p or 576p, or Enhanced Digital Television (EDTV). The sample rate is doubled to 27 MHz (4:3 aspect ratio) or 36 MHz (16:9 aspect ratio) in order to keep the same static orthogonal sampling grid as that used by BT.601. HDTV Sample Rate Selection ITU BT.709 defines the 720p, 1080i, and 1080p video signals, respectively. With HDTV, a different technique was used—the number of active samples per line and the number of active lines per frame is constant regardless of the frame rate. Thus, in order to keep a static orthogonal sampling grid, each frame rate uses a different sample clock rate. 480i and 480p Systems Interlaced Analog Composite Video (M) NTSC and (M) PAL are analog composite video signals that carry all timing and color information within a single signal. These analog interfaces use 525 lines per frame and are discussed in detail in Chapter 8. Interlaced Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 480i (since there are typically 480 active scan lines per frame and they are interlaced), the frame rate is usually 29.97 Hz (30/1.001) for compatibility with (M) NTSC timing. The analog interface uses 525 lines per frame, with active video present on lines 23– 262 and 286–525, as shown in Figure 4.1. 40 Chapter 4: Video Signals Overview START OF VSYNC 523 524 525 1 2 3 4 5 6 7 8 9 10 23 261 262 263 264 265 266 267 268 269 270 271 272 285 286 HSYNC HSYNC / 2 H/2 H/2 H/2 H/2 Figure 4.1. 480i Vertical Interval Timing. START OF VSYNC 523 524 525 1 7 8 13 14 15 16 45 Figure 4.2. 480p Vertical Interval Timing. 480i and 480p Systems 41 For the 29.97 Hz frame rate, each scan line time (H) is about 63.556 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. Progressive Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 480p (since there are typically 480 active scan lines per frame and they are progressive), the frame rate is usually 59.94 Hz (60/1.001) for easier compatibility with (M) NTSC timing. The analog interface uses 525 lines per frame, with active video present on lines 45–524, as shown in Figure 4.2. For the 59.94 Hz frame rate, each scan line time (H) is about 31.776 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. Interlaced Digital Component Video BT.601 and SMPTE 267M specify the representation for 480i digital R´G´B´ or YCbCr video signals. Active resolutions defined within BT.601 and SMPTE 267M, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 960 × 480i 720 × 480i 18.0 MHz 13.5 MHz 29.97 Hz 29.97 Hz Other common active resolutions, their 1× sample rates (Fs), and frame rates, are: 864 × 480i 704 × 480i 640 × 480i 544 × 480i 528 × 480i 480 × 480i 352 × 480i 16.38 MHz 13.50 MHz 12.27 MHz 10.12 MHz 9.900 MHz 9.000 MHz 6.750 MHz 29.97 Hz 29.97 Hz 29.97 Hz 29.97 Hz 29.97 Hz 29.97 Hz 29.97 Hz SAMPLE RATE = 13.5 MHZ 16 SAMPLES DIGITAL BLANKING 138 SAMPLES (0–137) DIGITAL ACTIVE LINE 720 SAMPLES (138–857) TOTAL LINE 858 SAMPLES (0–857) Figure 4.3. 480i Analog-Digital Relationship (4:3 Aspect Ratio, 29.97 Hz Frame Rate, 13.5 MHz Sample Clock). BT.601 specifies 16 samples for the front porch; CEA-861D (DVI and HDMI timing) specifies 19 samples for the front porch. 42 Chapter 4: Video Signals Overview SAMPLE RATE = 18.0 MHZ 21.5 SAMPLES DIGITAL BLANKING 184 SAMPLES (0–183) DIGITAL ACTIVE LINE 960 SAMPLES (184–1143) TOTAL LINE 1144 SAMPLES (0–1143) Figure 4.4. 480i Analog-Digital Relationship (16:9 Aspect Ratio, 29.97 Hz Frame Rate, 18 MHz Sample Clock). SAMPLE RATE = 12.27 MHZ 14.5 SAMPLES DIGITAL BLANKING 140 SAMPLES (0–139) DIGITAL ACTIVE LINE 640 SAMPLES (140–779) TOTAL LINE 780 SAMPLES (0–779) Figure 4.5. 480i Analog-Digital Relationship (4:3 Aspect Ratio, 29.97 Hz Frame Rate, 12.27 MHz Sample Clock). 480i and 480p Systems 43 SAMPLE RATE = 10.125 MHZ 12 SAMPLES DIGITAL BLANKING 99 SAMPLES (0–98) DIGITAL ACTIVE LINE 544 SAMPLES (99–642) TOTAL LINE 643 SAMPLES (0–642) Figure 4.6. 480i Analog-Digital Relationship (4:3 Aspect Ratio, 29.97 Hz Frame Rate, 10.125 MHz Sample Clock). SAMPLE RATE = 9 MHZ 10.7 SAMPLES DIGITAL BLANKING 92 SAMPLES (0–91) DIGITAL ACTIVE LINE 480 SAMPLES (92–571) TOTAL LINE 572 SAMPLES (0–571) Figure 4.7. 480i Analog-Digital Relationship (4:3 Aspect Ratio, 29.97 Hz Frame Rate, 9 MHz Sample Clock). 44 Chapter 4: Video Signals Overview LINE 4 FIELD 1 (F = 0) ODD LINE 266 FIELD 2 (F = 1) EVEN BLANKING FIELD 1 ACTIVE VIDEO BLANKING FIELD 2 ACTIVE VIDEO LINE 1 (V = 1) LINE 23 (V = 0) LINE 263 (V = 1) LINE 286 (V = 0) LINE F V NUMBER 1–3 4–22 23–262 263–265 266–285 286–525 1 1 0 1 0 0 0 1 1 1 1 0 LINE 3 LINE 525 (V = 0) H=1 EAV H=0 SAV Figure 4.8. 480i Digital Vertical Timing (480 Active Lines). F and V change state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.3 through 4.7. These active lines are used by the SMPTE RP-202, ATSC A/54a, and ARIB STD-B32 standards. CEA-861D (DVI and HDMI timing) specifies lines 22–261 and 285–524 for active video. IEC 61834-2, ITU-R BT.1618, and SMPTE 314M (DV formats) specify lines 23–262 and 285–524 for active video. ITU-R BT.656 specifies lines 20–263 and 283–525 for active video, resulting in 487 total active lines per frame. 480i and 480p Systems 45 864 × 480i is a 16:9 square pixel format, while 640 × 480i is a 4:3 square pixel format. Although the ideal 16:9 resolution is 854 × 480i, 864 × 480i supports the MPEG 16 × 16 block structure. The 704 × 480i format is done by using the 720 × 480i format, and blanking the first eight and last eight samples each active scan line. Example relationships between the analog and digital signals are shown in Figures 4.3 through 4.7. The H (horizontal blanking), V (vertical blanking), and F (field) signals are defined in Figure 4.8. The H, V, and F timing indicated is compatible with video compression standards rather than BT.656 discussed in Chapter 6. Progressive Digital Component Video BT.1358 and SMPTE 293M specify the representation for 480p digital R´G´B´ or YCbCr video signals. Active resolutions defined within BT.1358 and SMPTE 293M, their 1× sample rates (Fs), and frame rates, are: 960 × 480p 720 × 480p 36.0 MHz 27.0 MHz 59.94 Hz 59.94 Hz Other common active resolutions, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 864 × 480p 704 × 480p 640 × 480p 544 × 480p 528 × 480p 480 × 480p 352 × 480p 32.75 MHz 27.00 MHz 24.54 MHz 20.25 MHz 19.80 MHz 18.00 MHz 13.50 MHz 59.94 Hz 59.94 Hz 59.94 Hz 59.94 Hz 59.94 Hz 59.94 Hz 59.94 Hz 864 × 480p is a 16:9 square pixel format, while 640 × 480p is a 4:3 square pixel format. Although the ideal 16:9 resolution is 854 × 480p, 864 × 480p supports the MPEG 16 × 16 block structure. The 704 × 480p format is done SAMPLE RATE = 27.0 MHZ 16 SAMPLES DIGITAL BLANKING 138 SAMPLES (0–137) DIGITAL ACTIVE LINE 720 SAMPLES (138–857) TOTAL LINE 858 SAMPLES (0–857) Figure 4.9. 480p Analog-Digital Relationship (4:3 Aspect Ratio, 59.94 Hz Frame Rate, 27 MHz Sample Clock). 46 Chapter 4: Video Signals Overview SAMPLE RATE = 36.0 MHZ 21.5 SAMPLES DIGITAL BLANKING 184 SAMPLES (0–183) DIGITAL ACTIVE LINE 960 SAMPLES (184–1143) TOTAL LINE 1144 SAMPLES (0–1143) Figure 4.10. 480p Analog-Digital Relationship (16:9 Aspect Ratio, 59.94 Hz Frame Rate, 36 MHz Sample Clock). SAMPLE RATE = 24.54 MHZ 14.5 SAMPLES DIGITAL BLANKING 140 SAMPLES (0–139) DIGITAL ACTIVE LINE 640 SAMPLES (140–779) TOTAL LINE 780 SAMPLES (0–779) Figure 4.11. 480p Analog-Digital Relationship (4:3 Aspect Ratio, 59.94 Hz Frame Rate, 24.54 MHz Sample Clock). 480i and 480p Systems 47 SAMPLE RATE = 20.25 MHZ 12 SAMPLES DIGITAL BLANKING 99 SAMPLES (0–98) DIGITAL ACTIVE LINE 544 SAMPLES (99–642) TOTAL LINE 643 SAMPLES (0–642) Figure 4.12. 480p Analog-Digital Relationship (4:3 Aspect Ratio, 59.94 Hz Frame Rate, 20.25 MHz Sample Clock). BLANKING ACTIVE VIDEO LINE 1 (V = 1) LINE 45 (V = 0) LINE F V NUMBER 1–44 45–524 525 0 1 0 0 0 1 LINE 525 (V = 1) H=1 EAV H=0 SAV Figure 4.13. 480p Digital Vertical Timing (480 Active Lines). V changes state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.9 through 4.12. These active lines are used by the SMPTE RP-202, ATSC A/54, and ARIB STD-B32 standards. However, CEA-861 (DVI and HDMI timing) specifies lines 43–522 for active video. 48 Chapter 4: Video Signals Overview by using the 720 × 480p format, and blanking the first eight and last eight samples each active scan line. Example relationships between the analog and digital signals are shown in Figures 4.9 through 4.12. The H (horizontal blanking), V (vertical blanking), and F (field) signals are defined in Figure 4.13. The H, V, and F timing indicated is compatible with video compression standards rather than BT.656 discussed in Chapter 6. SIF and QSIF SIF is defined to have an active resolution of 352 × 240p. Square pixel SIF is defined to have an active resolution of 320 × 240p. QSIF is defined to have an active resolution of 176 × 120p. Square pixel QSIF is defined to have an active resolution of 160 × 120p. For the 25 Hz frame rate, each scan line time (H) is 64 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. Progressive Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 576p (since there are typically 576 active scan lines per frame and they are progressive), the frame rate is usually 50 Hz for compatibility with PAL timing. The analog interface uses 625 lines per frame, with active video present on lines 45–620, as shown in Figure 4.15. For the 50 Hz frame rate, each scan line time (H) is 32 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. 576i and 576p Systems Interlaced Analog Composite Video (B, D, G, H, I, N, NC) PAL are analog composite video signals that carry all timing and color information within a single signal. These analog interfaces use 625 lines per frame and are discussed in detail in Chapter 8. Interlaced Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 576i (since there are typically 576 active scan lines per frame and they are interlaced), the frame rate is usually 25 Hz for compatibility with PAL timing. The analog interface uses 625 lines per frame, with active video present on lines 23–310 and 336–623, as shown in Figure 4.14. Interlaced Digital Component Video BT.601 specifies the representation for 576i digital R´G´B´ or YCbCr video signals. Active resolutions defined within BT.601, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 960 × 576i 720 × 576i 18.0 MHz 13.5 MHz 25 Hz 25 Hz Other common active resolutions, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1024 × 576i 768 × 576i 704 × 576i 544 × 576i 480 × 576i 19.67 MHz 14.75 MHz 13.50 MHz 10.12 MHz 9.000 MHz 25 Hz 25 Hz 25 Hz 25 Hz 25 Hz START OF VSYNC 576i and 576p Systems 49 620 621 622 623 624 625 1 2 3 4 5 6 7 23 24 308 309 310 311 312 313 314 315 316 317 318 319 320 336 337 HSYNC HSYNC / 2 H/2 H/2 H/2 H/2 Figure 4.14. 576i Vertical Interval Timing. START OF VSYNC 619 620 621 625 1 2 6 7 8 9 45 Figure 4.15. 576p Vertical Interval Timing. 50 Chapter 4: Video Signals Overview SAMPLE RATE = 13.5 MHZ 12 SAMPLES DIGITAL BLANKING 144 SAMPLES (0–143) DIGITAL ACTIVE LINE 720 SAMPLES (144–863) TOTAL LINE 864 SAMPLES (0–863) Figure 4.16. 576i Analog-Digital Relationship (4:3 Aspect Ratio, 25 Hz Frame Rate, 13.5 MHz Sample Clock). SAMPLE RATE = 18.0 MHZ 16 SAMPLES DIGITAL BLANKING 192 SAMPLES (0–191) DIGITAL ACTIVE LINE 960 SAMPLES (192–1151) TOTAL LINE 1152 SAMPLES (0–1151) Figure 4.17. 576i Analog-Digital Relationship (16:9 Aspect Ratio, 25 Hz Frame Rate, 18 MHz Sample Clock). 576i and 576p Systems 51 SAMPLE RATE = 14.75 MHZ 13 SAMPLES DIGITAL BLANKING 176 SAMPLES (0–175) DIGITAL ACTIVE LINE 768 SAMPLES (176–943) TOTAL LINE 944 SAMPLES (0–943) Figure 4.18. 576i Analog-Digital Relationship (4:3 Aspect Ratio, 25 Hz Frame Rate, 14.75 MHz Sample Clock). SAMPLE RATE = 10.125 MHZ 9 SAMPLES DIGITAL BLANKING 104 SAMPLES (0–103) DIGITAL ACTIVE LINE 544 SAMPLES (104–647) TOTAL LINE 648 SAMPLES (0–647) Figure 4.19. 576i Analog-Digital Relationship (4:3 Aspect Ratio, 25 Hz Frame Rate, 10.125 MHz Sample Clock). 52 Chapter 4: Video Signals Overview LINE 1 FIELD 1 (F = 0) EVEN LINE 313 FIELD 2 (F = 1) ODD BLANKING FIELD 1 ACTIVE VIDEO BLANKING FIELD 2 ACTIVE VIDEO LINE 1 (V = 1) LINE 23 (V = 0) LINE 311 (V = 1) LINE 336 (V = 0) LINE F V NUMBER 1–22 23–310 311–312 313–335 336–623 624–625 0 1 0 0 0 1 1 1 1 0 1 1 LINE 625 BLANKING LINE 624 (V = 1) LINE 625 (V = 1) H=1 EAV H=0 SAV Figure 4.20. 576i Digital Vertical Timing (576 Active Lines). F and V change state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.16 through 4.19. IEC 61834-2, ITU-R BT.1618, and SMPTE 314M (DV formats) specify lines 23–310 and 335–622 for active video. 576i and 576p Systems 53 1024 × 576i is a 16:9 square pixel format, while 768 × 576i is a 4:3 square pixel format. The 704 × 576i format is done by using the 720 × 576i format, and blanking the first eight and last eight samples each active scan line. Example relationships between the analog and digital signals are shown in Figures 4.16 through 4.19. The H (horizontal blanking), V (vertical blanking), and F (field) signals are defined in Figure 4.20. The H, V, and F timing indicated is compatible with video compression standards rather than BT.656 discussed in Chapter 6. Progressive Digital Component Video BT.1358 specifies the representation for 576p digital R´G´B´ or YCbCr signals. Active resolutions defined within BT.1358, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 960 × 576p 720 × 576p 36.0 MHz 27.0 MHz 50 Hz 50 Hz Other common active resolutions, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1024 × 576p 768 × 576p 704 × 576p 544 × 576p 480 × 576p 39.33 MHz 29.50 MHz 27.00 MHz 20.25 MHz 18.00 MHz 50 Hz 50 Hz 50 Hz 50 Hz 50 Hz 1024 × 576p is a 16:9 square pixel format, while 768 × 576p is a 4:3 square pixel format. The 704 × 576p format is done by using the 720 × 576p format, and blanking the first eight and last eight samples each active scan line. Example relationships between the analog and digital signals are shown in Figures 4.21 through 4.24. SAMPLE RATE = 27.0 MHZ 12 SAMPLES DIGITAL BLANKING 144 SAMPLES (0–143) DIGITAL ACTIVE LINE 720 SAMPLES (144–863) TOTAL LINE 864 SAMPLES (0–863) Figure 4.21. 576p Analog-Digital Relationship (4:3 Aspect Ratio, 50 Hz Frame Rate, 27 MHz Sample Clock). 54 Chapter 4: Video Signals Overview SAMPLE RATE = 36.0 MHZ 16 SAMPLES DIGITAL BLANKING 192 SAMPLES (0–191) DIGITAL ACTIVE LINE 960 SAMPLES (192–1151) TOTAL LINE 1152 SAMPLES (0–1151) Figure 4.22. 576p Analog-Digital Relationship (16:9 Aspect Ratio, 50 Hz Frame Rate, 36 MHz Sample Clock). SAMPLE RATE = 29.5 MHZ 13 SAMPLES DIGITAL BLANKING 176 SAMPLES (0–175) DIGITAL ACTIVE LINE 768 SAMPLES (176–943) TOTAL LINE 944 SAMPLES (0–943) Figure 4.23. 576p Analog-Digital Relationship (4:3 Aspect Ratio, 50 Hz Frame Rate, 29.5 MHz Sample Clock). 576i and 576p Systems 55 SAMPLE RATE = 20.25 MHZ 9 SAMPLES DIGITAL BLANKING 104 SAMPLES (0–103) DIGITAL ACTIVE LINE 544 SAMPLES (104–647) TOTAL LINE 648 SAMPLES (0–647) Figure 4.24. 576p Analog-Digital Relationship (4:3 Aspect Ratio, 50 Hz Frame Rate, 20.25 MHz Sample Clock). BLANKING ACTIVE VIDEO BLANKING LINE 1 (V = 1) LINE 45 (V = 0) LINE 621 (V = 1) LINE 625 (V = 1) LINE F V NUMBER 1–44 45–620 621–625 0 1 0 0 0 1 H=1 EAV H=0 SAV Figure 4.25. 576p Digital Vertical Timing (576 Active Lines). V changes state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.21 through 4.24. 56 Chapter 4: Video Signals Overview The H (horizontal blanking), V (vertical blanking), and F (field) signals are defined in Figure 4.25. The H, V, and F timing indicated is compatible with video compression standards rather than BT.656 discussed in Chapter 6. 720p Systems Progressive Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 720p (since there are typically 720 active scan lines per frame and they are progressive), the frame rate is usually 59.94 Hz (60/1.001) to simplify the generation of (M) NTSC video. The analog interface uses 750 lines per frame, with active video present on lines 26–745, as shown in Figure 4.26. For the 59.94 Hz frame rate, each scan line time (H) is about 22.24 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. Progressive Digital Component Video SMPTE 296M specifies the representation for 720p digital R´G´B´ or YCbCr signals. Active resolutions defined within SMPTE 296M, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1280 × 720p 1280 × 720p 1280 × 720p 1280 × 720p 1280 × 720p 1280 × 720p 1280 × 720p 1280 × 720p 74.176 MHz 74.250 MHz 74.250 MHz 74.176 MHz 74.250 MHz 74.250 MHz 74.176 MHz 74.250 MHz 23.976 Hz 24.000 Hz 25.000 Hz 29.970 Hz 30.000 Hz 50.000 Hz 59.940 Hz 60.000 Hz Note that square pixels and a 16:9 aspect ratio are used. Example relationships between the analog and digital signals are shown in Figures 4.27 and 4.28, and Table 4.1. The H (horizontal blanking), V (vertical blanking), and F (field) signals are as defined in Figure 4.29. START OF VSYNC 744 745 746 750 1 2 6 7 8 9 26 Figure 4.26. 720p Vertical Interval Timing. 720p Systems 57 SAMPLE RATE = 74.176 OR 74.25 MHZ 110 SAMPLES DIGITAL BLANKING 370 SAMPLES (0–369) DIGITAL ACTIVE LINE 1280 SAMPLES (370–1649) TOTAL LINE 1650 SAMPLES (0–1649) Figure 4.27. 720p Analog-Digital Relationship (16:9 Aspect Ratio, 59.94 Hz Frame Rate, 74.176 MHz Sample Clock and 60 Hz Frame Rate, 74.25 MHz Sample Clock). [C] SAMPLES DIGITAL BLANKING [B] SAMPLES DIGITAL ACTIVE LINE 1280 SAMPLES TOTAL LINE [A] SAMPLES Figure 4.28. General 720p Analog-Digital Relationship. 58 Chapter 4: Video Signals Overview Active Horizontal Samples Frame Rate (Hz) 1280 24/1.001 24 251 251 25 30/1.001 30 50 60/1.001 60 1× Y Sample Rate (MHz) 74.25/1.001 74.25 48 49.5 74.25 74.25/1.001 74.25 74.25 74.25/1.001 74.25 Total Horizontal Samples (A) 4125 4125 1536 1584 3960 3300 3300 1980 1650 1650 Horizontal Blanking Samples (B) C Samples 2845 2845 256 304 2680 2020 2020 700 370 370 2585 2585 21 25 2420 1760 1760 440 110 110 Note: 1. Useful for CRT-based 50 Hz HDTVs based on a 31.250 kHz horizontal fre- quency. Sync pulses are –300 mV bi-level, rather than ±300 mV tri-level. 720p content scaled vertically to 1152i active scan lines; 1250i total scan lines instead of 750p. Table 4.1. Various 720p Analog-Digital Parameters for Figure 4.28. BLANKING ACTIVE VIDEO BLANKING LINE 1 (V = 1) LINE 26 (V = 0) LINE 746 (V = 1) LINE 750 (V = 1) LINE F V NUMBER 1–25 26–745 746–750 0 1 0 0 0 1 H=1 EAV H=0 SAV Figure 4.29. 720p Digital Vertical Timing (720 Active Lines). V changes state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.27 and 4.28. 1080i and 1080p Systems 59 1080i and 1080p Systems Interlaced Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 1080i (since there are typically 1080 active scan lines per frame and they are interlaced), the frame rate is usually 25 or 29.97 Hz (30/1.001) to simplify the generation of (B, D, G, H, I) PAL or (M) NTSC video. The analog interface uses 1125 lines per frame, with active video present on lines 21–560 and 584–1123, as shown in Figure 4.30. MPEG-2 and MPEG-4 systems use 1088 lines, rather than 1080, in order to have a multiple of 32 scan lines per frame. In this case, an additional 4 lines per field after the active video are used. For the 25 Hz frame rate, each scan line time is about 35.56 μs. For the 29.97 Hz frame rate, each scan line time is about 29.66 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. 1152i Format The 1152i active (1250 total) line format is not a broadcast transmission format. However, it is being used as an analog interconnection standard from HD set-top boxes and DVD players to 50 Hz CRT-based HDTVs. This enables 50 Hz HDTVs to use a fixed 31.25 kHz horizontal frequency, reducing their cost. Other HDTV display technologies, such as DLP, LCD, and plasma, are capable of handling the native timing of 720p50 (750p50 with VBI) and 1080i25 (1125i25 with VBI) analog signals. The set-top box or DVD player converts 720p50 and 1080i25 content to the 1152i25 format. 1280 × 720p50 content is scaled to 1280 × 1152i25; 1920 × 1080i25 content is presented letterboxed in a 1920 × 1152i25 format. HDTVs will have a nominal vertical zoom mode for correcting the geometry of 1080i25, which can be recognized by the vertical synchronizing signal. Progressive Analog Component Video Analog component signals are comprised of three signals, analog R´G´B´ or YPbPr. Referred to as 1080p (since there are typically 1080 active scan lines per frame and they are progressive), the frame rate is usually 50 or 59.94 Hz (60/1.001) to simplify the generation of (B, D, G, H, I) PAL or (M) NTSC video. The analog interface uses 1125 lines per frame, with active video present on lines 42–1121, as shown in Figure 4.31. MPEG-2 and MPEG-4 systems use 1088 lines, rather than 1080, in order to have a multiple of 16 scan lines per frame. In this case, an additional 8 lines per frame after the active video are used. For the 50 Hz frame rate, each scan line time is about 17.78 μs. For the 59.94 Hz frame rate, each scan line time is about 14.83 μs. Detailed horizontal timing is dependent on the specific video interface used, as discussed in Chapter 5. 60 Chapter 4: Video Signals Overview 1123 1125 1 2 3 4 5 6 7 21 560 562 563 564 565 566 567 568 569 584 START OF VSYNC Figure 4.30. 1080i Vertical Interval Timing. START OF VSYNC 1120 1121 1122 1125 1 2 6 7 8 9 42 Figure 4.31. 1080p Vertical Interval Timing. 1080i and 1080p Systems 61 SAMPLE RATE = 74.25 OR 74.176 MHZ 88 SAMPLES DIGITAL BLANKING 280 SAMPLES (0–279) DIGITAL ACTIVE LINE 1920 SAMPLES (280–2199) TOTAL LINE 2200 SAMPLES (0–2199) Figure 4.32. 1080i Analog-Digital Relationship (16:9 Aspect Ratio, 29.97 Hz Frame Rate, 74.176 MHz Sample Clock and 30 Hz Frame Rate, 74.25 MHz Sample Clock). [D] SAMPLES DIGITAL BLANKING [C] SAMPLES DIGITAL ACTIVE LINE [A] SAMPLES TOTAL LINE [B] SAMPLES Figure 4.33. General 1080i Analog-Digital Relationship. 62 Chapter 4: Video Signals Overview Active Horizontal Samples (A) 1920 1440 1280 Frame Rate (Hz) 251 251 25 30/1.001 30 251 25 30/1.001 30 251 25 30/1.001 30 1× Y Sample Rate (MHz) 72 74.25 74.25 74.25/1.001 74.25 54 55.6875 55.6875/1.001 55.6875 48 49.5 49.5/1.001 49.5 Total Horizontal Samples (B) 2304 2376 2640 2200 2200 1728 1980 1650 1650 1536 1760 1466.7 1466.7 Horizontal Blanking Samples (C) D Samples 384 32 456 38 720 528 280 88 280 88 288 24 540 396 210 66 210 66 256 21 480 352 186.7 58.7 186.7 58.7 Notes: 1. Useful for CRT-based 50 Hz HDTVs based on a 31.250 kHz horizontal fre- quency. Sync pulses are –300 mV bi-level, rather than ±300 mV tri-level. 1080i content letterboxed in 1152i active scan lines; 1250i total scan lines instead of 1125i. Table 4.2. Various 1080i Analog-Digital Parameters for Figure 4.33. Interlaced Digital Component Video ITU-R BT.709 and SMPTE 274M specify the digital component format for the 1080i digital R´G´B´ or YCbCr signal. Active resolutions defined within BT.709 and SMPTE 274M, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1920 × 1080i 1920 × 1080i 1920 × 1080i 74.250 MHz 74.176 MHz 74.250 MHz 25.00 Hz 29.97 Hz 30.00 Hz Note that square pixels and a 16:9 aspect ratio are used. Other common active resolu- tions, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1280 × 1080i 1280 × 1080i 1280 × 1080i 1440 × 1080i 1440 × 1080i 1440 × 1080i 49.500 MHz 49.451 MHz 49.500 MHz 55.688 MHz 55.632 MHz 55.688 MHz 25.00 Hz 29.97 Hz 30.00 Hz 25.00 Hz 29.97 Hz 30.00 Hz Example relationships between the analog and digital signals are shown in Figures 4.32 and 4.33, and Table 4.2. The H (horizontal blanking) and V (vertical blanking) signals are as defined in Figure 4.34. 1080i and 1080p Systems 63 LINE 1 FIELD 1 (F = 0) ODD LINE 583 FIELD 2 (F = 1) EVEN BLANKING FIELD 1 ACTIVE VIDEO BLANKING FIELD 2 ACTIVE VIDEO LINE 1 (V = 1) LINE 21 (V = 0) LINE 561 (V = 1) LINE 584 (V = 0) LINE F V NUMBER 1–20 0 1 21–560 0 0 561–562 0 1 563–583 1 1 584–1123 1 0 1124–1125 1 1 LINE 1125 BLANKING LINE 1124 (V = 1) LINE 1125 (V = 1) H=1 EAV H=0 SAV Figure 4.34. 1080i Digital Vertical Timing (1080 Active Lines). F and V change state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.32 and 4.33. 64 Chapter 4: Video Signals Overview Progressive Digital Component Video ITU-R BT.709 and SMPTE 274M specify the digital component format for the 1080p digital R´G´B´ or YCbCr signal. Active resolutions defined within BT.709 and SMPTE 274M, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1920 × 1080p 1920 × 1080p 1920 × 1080p 1920 × 1080p 1920 × 1080p 1920 × 1080p 1920 × 1080p 1920 × 1080p 74.176 MHz 74.250 MHz 74.250 MHz 74.176 MHz 74.250 MHz 148.50 MHz 148.35 MHz 148.50 MHz 23.976 Hz 24.000 Hz 25.000 Hz 29.970 Hz 30.000 Hz 50.000 Hz 59.940 Hz 60.000 Hz Note that square pixels and a 16:9 aspect ratio are used. Other common active resolu- tions, their 1× Y and R´G´B´ sample rates (Fs), and frame rates, are: 1280 × 1080p 1280 × 1080p 1280 × 1080p 1280 × 1080p 1280 × 1080p 1280 × 1080p 1280 × 1080p 1280 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 1440 × 1080p 49.451 MHz 49.500 MHz 49.500 MHz 49.451 MHz 49.500 MHz 99.000 MHz 98.901 MHz 99.000 MHz 55.632 MHz 55.688 MHz 55.688 MHz 55.632 MHz 55.688 MHz 111.38 MHz 111.26 MHz 111.38 MHz 23.976 Hz 24.000 Hz 25.000 Hz 29.970 Hz 30.000 Hz 50.000 Hz 59.940 Hz 60.000 Hz 23.976 Hz 24.000 Hz 25.000 Hz 29.970 Hz 30.000 Hz 50.000 Hz 59.940 Hz 60.000 Hz Example relationships between the analog and digital signals are shown in Figures 4.35 and 4.36, and Table 4.3. The H (horizontal blanking), V (vertical blanking), and F (field) signals are as defined in Figure 4.37. Other Video Systems Some consumer displays, such as those based on LCD and plasma technologies, have adapted other resolutions as their native resolution. Common active resolutions and their names are: 640 × 400 640 × 480 854 × 480 800 × 600 1024 × 768 1280 × 768 1366 × 768 1024 × 1024 1280 × 1024 1600 × 1024 1600 × 1200 1920 × 1200 VGA VGA WVGA SVGA XGA WXGA WXGA XGA SXGA WSXGA UXGA WUXGA These resolutions, and their timings, are defined for computer monitors by the Video Electronics Standards Association (VESA). Displays based on one of these native resolutions are usually capable of accepting many input resolutions, scaling the source to match the display resolution. Other Video Systems 65 SAMPLE RATE = 148.5 OR 148.35 MHZ 88 SAMPLES DIGITAL BLANKING 280 SAMPLES (0–279) DIGITAL ACTIVE LINE 1920 SAMPLES (280–2199) TOTAL LINE 2200 SAMPLES (0–2199) Figure 4.35. 1080p Analog-Digital Relationship (16:9 Aspect Ratio, 59.94 Hz Frame Rate, 148.35 MHz Sample Clock and 60 Hz Frame Rate, 148.5 MHz Sample Clock). [D] SAMPLES DIGITAL BLANKING [C] SAMPLES DIGITAL ACTIVE LINE [A] SAMPLES TOTAL LINE [B] SAMPLES Figure 4.36. General 1080p Analog-Digital Relationship. 66 Chapter 4: Video Signals Overview Active Horizontal Samples (A) 1920 1440 1280 Frame Rate (Hz) 24/1.001 24 25 30/1.001 30 50 60/1.001 60 24/1.001 24 25 30/1.001 30 50 60/1.001 60 24/1.001 24 25 30/1.001 30 50 60/1.001 60 1× Y Sample Rate (MHz) 74.25/1.001 74.25 74.25 74.25/1.001 74.25 148.5 148.5/1.001 148.5 55.6875/1.001 55.6875 55.6875 55.6875/1.001 55.6875 111.375 111.375/1.001 111.375 49.5/1.001 49.5 49.5 49.5/1.001 49.5 99 99/1.001 99 Total Horizontal Samples (B) 2750 2750 2640 2200 2200 2640 2200 2200 2062.5 2062.5 1980 1650 1650 1980 1650 1650 1833.3 1833.3 1760 1466.7 1466.7 1760 1466.7 1466.7 Horizontal Blanking Samples (C) D Samples 830 830 720 280 280 720 280 280 622.5 622.5 540 210 210 540 210 210 553.3 553.3 480 186.7 186.7 480 186.7 186.7 638 638 528 88 88 528 88 88 478.5 478.5 396 66 66 396 66 66 425.3 425.3 352 58.7 58.7 352 58.7 58.7 Table 4.3. Various 1080p Analog-Digital Parameters for Figure 4.36. References 67 BLANKING ACTIVE VIDEO BLANKING LINE 1 (V = 1) LINE 42 (V = 0) LINE 1122 (V = 1) LINE 1125 (V = 1) LINE F V NUMBER 1–41 0 1 42–1121 0 0 1122–1125 0 1 H=1 EAV H=0 SAV Figure 4.37. 1080p Digital Vertical Timing (1080 Active Lines). V changes state at the EAV sequence at the beginning of the digital line. Note that the digital line number changes state prior to the start of horizontal sync, as shown in Figures 4.35 and 4.36. References 1. CEA-861D, A DTV Profile for Uncompressed High Speed Digital Interfaces, July 2006. 2. EIA-770.1, Analog 525-Line Component Video Interface—Three Channels, November 2001. 3. EIA-770.2, Standard-Definition TV Analog Component Video Interface, November 2001. 4. EIA-770.3, High-Definition TV Analog Component Video Interface, November 2001. 5. ITU-R BT.601–5, 1995, Studio Encoding Parameters of Digital Television for Standard 4:3 and Widescreen 16:9 Aspect Ratios. 6. ITU-R BT.709–5, 2002, Parameter Values for the HDTV Standards for Production and International Programme Exchange. 7. ITU-R BT.1358, 1998, Studio Parameters of 625 and 525 Line Progressive Scan Television Systems. 8. SMPTE 267M–1995, Television—Bit-Parallel Digital Interface—Component Video Signal 4:2:2 16 × 9 Aspect Ratio. 9. SMPTE 274M–2005, Television—1920 × 1080 Image Sample Structure, Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates. 10. SMPTE 293M–2003, Television—720 × 483 Active Line at 59.94 Hz Progressive Scan Production—Digital Representation. 11. SMPTE 296M–2001, Television—1280 × 720 Progressive Image Sample Structure, Analog and Digital Representation and Analog Interface. 68 Chapter 5: Analog Video Interfaces Chapter 5: Analog Video Interfaces Chapter 5 Analog Video Interfaces For years, the primary video signal used by the consumer market has been composite NTSC or PAL video (see Figures 8.2 and 8.13). Attempts have been made to support S-video, but, until recently, it has been largely limited to S-VHS VCRs and high-end televisions. With the introduction of DVD players, digital set-top boxes, and DTV, there has been renewed interest in providing high-quality video to the consumer market. This equipment not only supports very high-quality composite and S-video signals, but many devices also allow the option of using analog R´G´B´ or YPbPr video. Using analog R´G´B´ or YPbPr video eliminates NTSC/PAL encoding and decoding artifacts. As a result, the picture is sharper and has less noise. More color bandwidth is also available, increasing the horizontal detail. S-Video Interface The RCA phono connector (consumer market) or BNC connector (pro-video market) transfers a composite NTSC or PAL video signal, made by adding the intensity (Y) and color (C) video signals together. The television then has to separate these Y and C video signals in order to display the picture. The problem is that the Y/C separation process is never perfect, as discussed in Chapter 9. Many video components now support a 4pin “S1” S-video connector, illustrated in Figure 5.1 (the female connector viewpoint). This connector keeps the intensity (Y) and color (C) video signals separate, eliminating the Y/C separation process in the TV. As a result, the picture is sharper and has less noise. Figures 9.2 and 9.3 illustrate the Y signal, and Figures 9.10 and 9.11 illustrate the C signal. NTSC and PAL VBI (vertical blanking interval) data, discussed in Chapter 8, may be present on the 480i or 576i Y video signal. 68 SCART Interface 69 The “S2” version adds a +5V DC offset to the C signal when a widescreen (16:9) anamorphic program (horizontally squeezed by 25%) is present. A 16:9 TV detects the DC offset and horizontally expands the 4:3 image to fill the screen, restoring the correct aspect ratio of the program. The “S3” version also supports using a +2.3V offset when a program is letterboxed. The IEC 60933-5 standard specifies the Svideo connector, including signal levels. Extended S-Video Interface The PC market also uses an extended SVideo interface. This interface has 7 pins, as shown in Figure 5.1, and is backwards compatible with the 4-pin interface. The use of the three additional pins varies by manufacturer. They may be used to support an I2C interface (SDA bi-directional data pin and SCL clock pin), +12V power, a composite NTSC/PAL video signal (CVBS), or analog R´G´B´ or YPbPr video signals. SCART Interface Most consumer video components in Europe support one or two 21-pin SCART connectors (also known as Peritel, Peritelevision, and Euroconnector). This connection allows analog R´G´B´ video or S-video, composite video, and analog stereo audio to be transmitted between equipment using a single cable. The composite video signal must always be present, as it provides the basic video timing for the analog R´G´B´ video signals. Note that the 700 mV R´G´B´ signals do not have a blanking pedestal or sync information, as illustrated in Figure 5.4. PAL VBI (vertical blanking interval) data, discussed in Chapter 8, may be present on the 576i composite video signal. There are now several types of SCART pinouts, depending on the specific functions implemented, as shown in Tables 5.1 through 5.3. Pinout details are shown in Figure 5.2. The CENELEC EN 50049–1 and IEC 60933 standards specify the basic SCART connector, including signal levels. 7-PIN MINI DIN CONNECTOR 4-PIN MINI DIN CONNECTOR 4 7 3 2 6 5 1 4 2 3 1 1, 2 = GND 3 =Y 4=C 5 = CVBS / SCL (SERIAL CLOCK) 6 = GND / SDA (SERIAL DATA) 7 = NC / +12V 1, 2 = GND 3 =Y 4=C Figure 5.1. S-Video Connector and Signal Names. 1 3 5 7 9 11 13 15 17 19 21 2 4 6 8 10 12 14 16 18 20 Figure 5.2. SCART Connector. 70 Chapter 5: Analog Video Interfaces Pin Function Signal Level Impedance 1 right audio out 0.5V rms < 1K ohm 2 right audio in 0.5V rms > 10K ohm 3 left / mono audio out 0.5V rms < 1K ohm 4 ground - for pins 1, 2, 3, 6 5 ground - for pin 7 6 left / mono audio in 0.5V rms > 10K ohm 7 blue (or C) video in / out 0.7V (or 0.3V burst) 75 ohms 8 status and aspect ratio in / out 9.5V–12V = 4:3 source > 10K ohm 4.5V–7V = 16:9 source 0V–2V = inactive source 9 ground - for pin 11 10 data 2 11 green video in / out 0.7V 75 ohms 12 data 1 13 ground - for pin 15 14 ground - for pin 16 15 red (or C) video in / out 0.7V (or 0.3V burst) 75 ohms 16 RGB control in / out 1–3V = RGB, 75 ohms 0–0.4V = composite 17 ground - for pin 19 18 ground - for pin 20 19 composite (or Y) video out 1V 75 ohms 20 composite (or Y) video in 1V 75 ohms 21 ground - for pins 8, 10, 12, shield Note: Often, the SCART 1 connector supports composite video and RGB, the SCART 2 connector supports composite video and S-Video, and the SCART 3 connector supports only composite video. SCART connections may also be used to add external decoders or descramblers to the video path, the video signal goes out and comes back in. The RGB control signal controls the TV switch between the composite and RGB inputs, enabling the overlaying of text onto the video, even the internal TV program. This enables an external teletext or closed captioning decoder to add information over the current program. If pin 16 is held high, signifying RGB signals are present, the sync is still carried on the Composite Video pin. Some devices (such as DVD players) may provide RGB on a SCART and hold pin 16 permanently high. When a source becomes active, it sets a 12V level on pin 8. This causes the TV to automatically switch to that SCART input. When the source stops, the signal returns to 0V and TV viewing is resumed. If an anamorphic 16:9 program is present, the source raises the signal on pin 8 to only 6V. This causes the TV to switch to that SCART input and at the same time enable the video processing for anamorphic 16:9 programs. Table 5.1. SCART Connector Signals. SDTV RGB Interface 71 SDTV RGB Interface Some SDTV consumer video equipment supports an analog R´G´B´ video interface. NTSC and PAL VBI (vertical blanking interval) data, discussed in Chapter 8, may be present on 480i or 576i R´G´B´ video signals. Three separate RCA phono connectors (consumer market) or BNC connectors (pro-video and PC market) are used. The horizontal and vertical video timing are dependent on the video standard, as discussed in Chapter 4. For sources, the video signal at the connector should have a source impedance of 75Ω ±5%. For receivers, video inputs should be AC-coupled and have a 75-Ω ±5% input impedance. The three signals must be coincident with respect to each other within ±5 ns. Sync information may be present on just the green channel, all three channels, as a separate composite sync signal, or as separate horizontal and vertical sync signals. A gamma of 1/0.45 is used. 7.5 IRE Blanking Pedestal As shown in Figure 5.3, the nominal active video amplitude is 714 mV, including a 7.5 ±2 IRE blanking pedestal. A 286 ±6 mV composite sync signal may be present on just the green channel (consumer market), or all three channels (pro-video market). DC offsets up to ±1V may be present. Analog R´G´B´ Generation Assuming 10-bit D/A converters (DACs) with an output range of 0–1.305V (to match the video DACs used by the NTSC/PAL encoder in Chapter 9), the 10-bit YCbCr to R´G´B´ equations are: R´ = 0.591(Y – 64) + 0.810(Cr – 512) G´ = 0.591(Y – 64) – 0.413(Cr – 512) – 0.199(Cb – 512) B´ = 0.591(Y – 64) + 1.025(Cb – 512) R´G´B´ has a nominal 10-bit range of 0–518 to match the active video levels used by the NTSC/PAL encoder in Chapter 9. Note that negative values of R´G´B´ should be supported at this point. To implement the 7.5 IRE blanking pedestal, a value of 42 is added to the digital R´G´B´ data during active video. 0 is added during the blanking time. After the blanking pedestal is added, the R´G´B´ data is clamped by a blanking signal that has a raised cosine distribution to slow the slew rate of the start and end of the video signal. For 480i and 576i systems, blank rise and fall times are 140 ±20 ns. For 480p and 576p systems, blank rise and fall times are 70 ±10 ns. Composite sync information may be added to the R´G´B´ data after the blank processing has been performed. Values of 16 (sync present) or 240 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution (between 16 and 240) to slow the slew rate of the sync signal. For 480i and 576i systems, sync rise and fall times are 140 ±20 ns, and horizontal sync width at the 50% point is 4.7 ±0.1 μs. For 480p and 576p systems, sync rise and fall times are 70 ±10 ns, and horizontal sync width at the 50% point is 2.33 ±0.05 μs. At this point, we have digital R´G´B´ with sync and blanking information, as shown in Figure 5.3 and Table 5.2. The numbers in parentheses in Figure 5.3 indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. The digital R´G´B´ data drives 72 Chapter 5: Analog Video Interfaces 1.020 V 100 IRE WHITE LEVEL (800) 0.357 V 0.306 V 0.020 V 40 IRE 7.5 IRE GREEN, BLUE, OR RED CHANNEL, SYNC PRESENT 1.020 V BLACK LEVEL (282) BLANK LEVEL (240) SYNC LEVEL (16) WHITE LEVEL (800) 100 IRE 0.357 V 0.306 V 7.5 IRE GREEN, BLUE, OR RED CHANNEL, NO SYNC PRESENT BLACK LEVEL (282) BLANK LEVEL (240) Figure 5.3. SDTV Analog RGB Levels. 7.5 IRE blanking level. SDTV RGB Interface 73 1.020 V 100 IRE WHITE LEVEL (800) 0.321 V 0.020 V 43 IRE 1.020 V BLACK / BLANK LEVEL (252) GREEN, BLUE, OR RED CHANNEL, SYNC PRESENT SYNC LEVEL (16) WHITE LEVEL (800) 100 IRE 0.321 V GREEN, BLUE, OR RED CHANNEL, NO SYNC PRESENT BLACK / BLANK LEVEL (252) Figure 5.4. SDTV Analog RGB Levels. 0 IRE blanking level. 74 Chapter 5: Analog Video Interfaces three 10-bit DACs to generate the analog R´G´B´ video signals. As the sample-and-hold action of the DAC introduces a (sin x)/x characteristic, the video data may be digitally filtered by a [(sin x)/x]–1 filter to compensate. Alternately, as an analog lowpass filter is usually present after each DAC, the correction may take place in the analog filter. Video Level white black blank sync 7.5 IRE 0 IRE Blanking Pedestal Blanking Pedestal 800 800 282 252 240 252 16 16 Table 5.2. SDTV 10-Bit R´G´B´ Values. Analog R´G´B´ Digitization Assuming 10-bit A/D converters (ADCs) with an input range of 0–1.305V (to match the video ADCs used by the NTSC/PAL decoder in Chapter 9), the 10-bit R´G´B´ to YCbCr equations are: Y = 0.506(R´ – 282) + 0.992(G´ – 282) + 0.193(B´ – 282) + 64 Cb = –0.291(R´ – 282) – 0.573(G´ – 282) + 0.864(B´ – 282) + 512 Cr = 0.864(R´ – 282) – 0.724(G´ – 282) – 0.140(B´ – 282) + 512 R´G´B´ has a nominal 10-bit range of 282– 800 to match the active video levels used by the NTSC/PAL decoder in Chapter 9. Table 5.2 and Figure 5.3 illustrate the 10-bit R´G´B´ values for the white, black, blank, and (optional) sync levels. 0 IRE Blanking Pedestal As shown in Figure 5.4, the nominal active video amplitude is 700 mV, with no blanking pedestal. A 300 ±6 mV composite sync signal may be present on just the green channel (consumer market), or all three channels (provideo market). DC offsets up to ±1V may be present. Analog R´G´B´ Generation Assuming 10-bit DACs with an output range of 0–1.305V (to match the video DACs used by the NTSC/PAL encoder in Chapter 9), the 10-bit YCbCr to R´G´B´ equations are: R´ = 0.625(Y – 64) + 0.857(Cr – 512) G´ = 0.625(Y – 64) – 0.437(Cr – 512) – 0.210(Cb – 512) B´ = 0.625(Y – 64) + 1.084(Cb – 512) R´G´B´ has a nominal 10-bit range of 0–548 to match the active video levels used by the NTSC/PAL encoder in Chapter 9. Note that negative values of R´G´B´ should be supported at this point. The R´G´B´ data is processed as discussed when using a 7.5 IRE blanking pedestal. However, no blanking pedestal is added during active video, and the sync values are 16–252 instead of 16–240. At this point, we have digital R´G´B´ with sync and blanking information, as shown in Figure 5.4 and Table 5.2. The numbers in parentheses in Figure 5.4 indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. The digital R´G´B´ data drives three 10-bit DACs to generate the analog R´G´B´ video signals. HDTV RGB Interface 75 Analog R´G´B´ Digitization Assuming 10-bit ADCs with an input range of 0–1.305V (to match the video ADCs used by the NTSC/PAL decoder in Chapter 9), the 10bit R´G´B´ to YCbCr equations are: Y = 0.478(R´ – 252) + 0.938(G´ – 252) + 0.182(B´ – 252) + 64 Cb = –0.275(R´ – 252) – 0.542(G´ – 252) + 0.817(B´ – 252) + 512 Cr = 0.817(R´ – 252) – 0.685(G´ – 252) – 0.132(B´ – 252) + 512 R´G´B´ has a nominal 10-bit range of 252– 800 to match the active video levels used by the NTSC/PAL decoder in Chapter 9. Table 5.2 and Figure 5.4 illustrate the 10-bit R´G´B´ values for the white, black, blank, and (optional) sync levels. HDTV RGB Interface Some HDTV consumer video equipment supports an analog R´G´B´ video interface. Three separate RCA phono connectors (consumer market) or BNC connectors (pro-video and PC market) are used. The horizontal and vertical video timing are dependent on the video standard, as discussed in Chapter 4. For sources, the video signal at the connector should have a source impedance of 75Ω ±5%. For receivers, video inputs should be AC-coupled and have a 75-Ω ±5% input impedance. The three signals must be coincident with respect to each other within ±5 ns. Sync information may be present on just the green channel, all three channels, as a separate composite sync signal, or as separate horizontal and vertical sync signals. A gamma of 1/0.45 is used. As shown in Figure 5.5, the nominal active video amplitude is 700 mV, and has no blanking pedestal. A ±300 ±6 mV tri-level composite sync signal may be present on just the green channel (consumer market), or all three channels (pro-video market). DC offsets up to ±1V may be present. Analog R´G´B´ Generation Assuming 10-bit DACs with an output range of 0–1.305V (to match the video DACs used by the NTSC/PAL encoder in Chapter 9), the 10-bit YCbCr to R´G´B´ equations are: R´ = 0.625(Y – 64) + 0.963(Cr – 512) G´ = 0.625(Y – 64) – 0.287(Cr – 512) – 0.114(Cb – 512) B´ = 0.625(Y – 64) + 1.136(Cb – 512) R´G´B´ has a nominal 10-bit range of 0–548 to match the active video levels used by the NTSC/PAL encoder in Chapter 9. Note that negative values of R´G´B´ should be supported at this point. The R´G´B´ data is clamped by a blanking signal that has a raised cosine distribution to slow the slew rate of the start and end of the video signal. For 1080i and 720p systems, blank rise and fall times are 54 ±20 ns. For 1080p systems, blank rise and fall times are 27 ±10 ns. Composite sync information may be added to the R´G´B´ data after the blank processing has been performed. Values of 16 (sync low), 488 (high sync), or 252 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution to slow the slew rate of the sync signal. For 1080i systems, sync rise and fall times are 54 ±20 ns, and the horizontal sync low and high widths at the 50% points are 593 ±40 ns. For 76 Chapter 5: Analog Video Interfaces 1.020 V WHITE LEVEL (800) 0.622 V 0.321 V 0.020 V 100 IRE 43 IRE 43 IRE GREEN, BLUE, OR RED CHANNEL, SYNC PRESENT SYNC LEVEL (488) BLACK / BLANK LEVEL (252) SYNC LEVEL (16) 1.020 V WHITE LEVEL (800) 100 IRE 0.321 V GREEN, BLUE, OR RED CHANNEL, NO SYNC PRESENT BLACK / BLANK LEVEL (252) Figure 5.5. HDTV Analog RGB Levels. 0 IRE blanking level. SDTV YPbPr Interface 77 720p systems, sync rise and fall times are 54 ±20 ns, and the horizontal sync low and high widths at the 50% points are 539 ±40 ns. For 1080p systems, sync rise and fall times are 27 ±10 ns, and the horizontal sync low and high widths at the 50% points are 296 ±20 ns. At this point, we have digital R´G´B´ with sync and blanking information, as shown in Figure 5.5 and Table 5.3. The numbers in parentheses in Figure 5.5 indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. The digital R´G´B´ data drive three 10-bit DACs to generate the analog R´G´B´ video signals. Video Level white sync high black blank sync low 0 IRE Blanking Pedestal 800 488 252 252 16 Table 5.3. HDTV 10-Bit R´G´B´ Values. Analog R´G´B´ Digitization Assuming 10-bit ADCs with an input range of 0–1.305V (to match the video ADCs used by the NTSC/PAL decoder in Chapter 9), the 10bit R´G´B´ to YCbCr equations are: Y = 0.341(R´ – 252) + 1.143(G´ – 252) + 0.115(B´ – 252) + 64 Cb = –0.188(R´ – 252) – 0.629(G´ – 252) + 0.817(B´ – 252) + 512 Cr = 0.817(R´ – 252) – 0.743(G´ – 252) – 0.074(B´ – 252) + 512 R´G´B´ has a nominal 10-bit range of 252– 800 to match the active video levels used by the NTSC/PAL decoder in Chapter 9. Table 5.3 and Figure 5.5 illustrate the 10-bit R´G´B´ values for the white, black, blank, and (optional) sync levels. Constrained Image Due to the limited availability of copy protection technology for high-definition analog interfaces, some standards and DRM implementations only allow a constrained image to be output. A constrained image has an effective maximum resolution of 960 × 540p, although the total number of video samples and the video timing remain unchanged (for example, 1280 × 720p or 1920 × 1080i). In these situations, the full resolution image is still available via an approved secure digital video output, such as HDMI. SDTV YPbPr Interface Some SDTV consumer video equipment supports an analog YPbPr video interface. NTSC and PAL VBI (vertical blanking interval) data, discussed in Chapter 8, may be present on 480i or 576i Y video signals. Three separate RCA phono connectors (consumer market) or BNC connectors (pro-video market) are used. The horizontal and vertical video timing are dependent on the video standard, as discussed in Chapter 4. For sources, the video signal at the connector should have a source impedance of 75Ω ±5%. For receivers, video inputs should be AC-coupled and have a 75-Ω ±5% input impedance. The three signals must be coincident with respect to each other within ±5 ns. 78 Chapter 5: Analog Video Interfaces 1.020 V 100 IRE WHITE LEVEL (800) 0.321 V 0.020 V 43 IRE Y CHANNEL, SYNC PRESENT BLACK / BLANK LEVEL (252) SYNC LEVEL (16) 1.003 V PEAK LEVEL (786) 50 IRE 0.653 V BLACK / BLANK LEVEL (512) 50 IRE 0.303 V PB OR PR CHANNEL, NO SYNC PRESENT PEAK LEVEL (238) Figure 5.6. EIA-770.2 SDTV Analog YPbPr Levels. Sync on Y. SDTV YPbPr Interface 79 1.020 V 100 IRE WHITE LEVEL (800) 0.321 V 0.020 V 43 IRE Y CHANNEL, SYNC PRESENT BLACK / BLANK LEVEL (252) SYNC LEVEL (16) 1.003 V 50 IRE PEAK LEVEL (786) 0.653 V 0.352 V 0.303 V 50 IRE 43 IRE PB OR PR CHANNEL, SYNC PRESENT BLACK / BLANK LEVEL (512) SYNC LEVEL (276) PEAK LEVEL (238) Figure 5.7. SDTV Analog YPbPr Levels. Sync on YPbPr. 80 Chapter 5: Analog Video Interfaces White Yellow Cyan Green Magenta Red Blue Black IRE 100 88.6 70.1 58.7 41.3 29.9 11.4 0 Y mV 700 620 491 411 289 209 80 0 IRE Pb mV 0 –50 16.8 –33.1 33.1 –16.8 50 0 0 –350 118 –232 232 –118 350 0 IRE Pr mV 0 8.1 –50 –41.8 41.8 50 –8.1 0 0 57 –350 –293 293 350 –57 0 Y 64 to 940 940 840 678 578 426 326 164 64 Cb 64 to 960 512 64 663 215 809 361 960 512 Cr 64 to 960 512 585 64 137 887 960 439 512 Table 5.4. EIA-770.2 SDTV YPbPr and YCbCr 100% Color Bars. YPbPr values relative to the blanking level. White Yellow Cyan Green Magenta Red Blue Black IRE 75 66.5 52.6 44 31 22.4 8.6 0 Y mV 525 465 368 308 217 157 60 0 IRE Pb mV 0 –37.5 12.6 –24.9 24.9 –12.6 37.5 0 0 –262 88 –174 174 –88 262 0 IRE Pr mV 0 6.1 –37.5 –31.4 31.4 37.5 –6.1 0 0 43 –262 –220 220 262 –43 0 Y 64 to 940 721 646 525 450 335 260 139 64 Cb 64 to 960 512 176 625 289 735 399 848 512 Cr 64 to 960 512 567 176 231 793 848 457 512 Table 5.5. EIA-770.2 SDTV YPbPr and YCbCr 75% Color Bars. YPbPr values relative to the blanking level. SDTV YPbPr Interface 81 For consumer products, composite sync is present on only the Y channel. For pro-video applications, composite sync is present on all three channels. A gamma of 1/0.45 is specified. As shown in Figures 5.6 and 5.7, the Y signal consists of 700 mV of active video (with no blanking pedestal). Pb and Pr have a peak-topeak amplitude of 700 mV. A 300 ±6 mV composite sync signal is present on just the Y channel (consumer market), or all three channels (pro-video market). DC offsets up to ±1V may be present. The 100% and 75% YPbPr color bar values are shown in Tables 5.4 and 5.5. Analog YPbPr Generation Assuming 10-bit DACs with an output range of 0–1.305V (to match the video DACs used by the NTSC/PAL encoder in Chapter 9), the 10-bit YCbCr to YPbPr equations are: Y = ((800 – 252)/(940 – 64))(Y – 64) Pb = ((800 – 252)/(960 – 64))(Cb – 512) Pr = ((800 – 252)/(960 – 64))(Cr – 512) Y has a nominal 10-bit range of 0–548 to match the active video levels used by the NTSC/PAL encoder in Chapter 9. Pb and Pr have a nominal 10-bit range of 0 to ±274. Note that negative values of Y should be supported at this point. The YPbPr data is clamped by a blanking signal that has a raised cosine distribution to slow the slew rate of the start and end of the video signal. For 480i and 576i systems, blank rise and fall times are 140 ±20 ns. For 480p and 576p systems, blank rise and fall times are 70 ±10 ns. Composite sync information is added to the Y data after the blank processing has been performed. Values of 16 (sync present) or 252 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution (between 16 and 252) to slow the slew rate of the sync signal. Composite sync information may also be added to the PbPr data after the blank processing has been performed. Values of 276 (sync present) or 512 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution (between 276 and 512) to slow the slew rate of the sync signal. For 480i and 576i systems, sync rise and fall times are 140 ±20 ns, and horizontal sync width at the 50% point is 4.7 ±0.1 μs. For 480p and 576p systems, sync rise and fall times are 70 ±10 ns, and horizontal sync width at the 50% point is 2.33 ±0.05 μs. At this point, we have digital YPbPr with sync and blanking information, as shown in Figures 5.6 and 5.7 and Table 5.6. The numbers in parentheses in Figures 5.6 and 5.7 indicate the data value for a 10-bit DAC with a fullscale output value of 1.305V. The digital YPbPr data drive three 10-bit DACs to generate the analog YPbPr video signals. Video Level white black blank sync Y PbPr 800 512 252 512 252 512 16 276 Table 5.6. SDTV 10-Bit YPbPr Values. 82 Chapter 5: Analog Video Interfaces Analog YPbPr Digitization Assuming 10-bit ADCs with an input range of 0–1.305V (to match the video ADCs used by the NTSC/PAL decoder in Chapter 9), the 10bit YPbPr to YCbCr equations are: Y = 1.5985(Y – 252) + 64 Cb = 1.635(Pb – 512) + 512 Cr = 1.635(Pr – 512) + 512 Y has a nominal 10-bit range of 252–800 to match the active video levels used by the NTSC/PAL decoder in Chapter 9. Table 5.6 and Figures 5.6 and 5.7 illustrate the 10-bit YPbPr values for the white, black, blank, and (optional) sync levels. VBI Data for 480p Systems CGMS Type A CEA-805, IEC 61880–2, and EIA-J CPR– 1204–1 define the format of CGMS (Copy Generation Management System) data on line 41 for 480p systems. The waveform is illustrated in Figure 5.8. A sample clock rate of 27 MHz (59.94 Hz frame rate) or 27.027 MHz (60 Hz frame rate) is used. Each data bit is 26 clock cycles, or 963 ±30 ns, wide with a maximum rise and fall time of 50 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 156 clock cycles, or about 5.778 μs, after 0H. It consists of a “1” followed by a “0.” The 6-bit header follows the start symbol, and defines the nature of the payload data as shown in Table 5.7. The End of Message immediately follows the last packet of any data service that uses more than one packet. It has an associated payload consisting of all zeros. ECCI is a data service that may use more than one packet, thus requiring the use of the End of Message. The 14-bit payload for CGMS data is shown in Table 5.8. The 14-bit payload for ECCI data is currently “reserved,” consisting of all ones. 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D13) 14 BITS SYNC LEVEL 5.778 µS Figure 5.8. CEA-805, IEC 61880–2, and EIA-J CPR–1204–1 Line 41 Timing. SDTV YPbPr Interface 83 H0 H1 00 01 10 11 Aspect Ratio Picture Display Format 4:3 normal 4:3 letter box 16:9 normal CEA-805 Type A packet H2 H3 H4 H5 0000 0001 0010 : 1110 1111 Service Name CGMS (see Table 5.8) Extended Copy Control Information (ECCI) reser ved End of Message (default if no copyright information) Table 5.7. CEA-805, IEC 61880–2, and EIA-J CPR–1204–1 Line 41 Header Format. The H2-H5 bits must be “0000” if Type A packet is indicated. 84 Chapter 5: Analog Video Interfaces D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 G0 G1 G2 G3 ASB 0 0 0 CRC = x6 + x + 1 G0–G1: CGMS Definition 00 copying permitted 01 no more copies (one copy has already been made) 10 one copy permitted 11 no copying permitted G2–G3: Analog Protection Service (valid only if both G0–G1 are “01” or “10”) 00 no Analog Protection Service 01 PSP on, color striping off 10 PSP on, 2-line color striping on 11 PSP on, 4-line color striping on ASB: Analog Source Bit 0 not analog pre-recorded medium 1 analog pre-recorded medium Table 5.8. CEA-805, IEC 61880–2, and EIA-J CPR–1204–1 Line 41 CGMS Service Format. CGMS Type B CEA-805 defines the format of CGMS (Copy Generation Management System) data on line 40 for 480p systems. The waveform is illustrated in Figure 5.9. A sample clock rate of 27 MHz (59.94 Hz frame rate) or 27.027 MHz (60 Hz frame rate) is used. Each data bit is four clock cycles, or 148 ±18.5 ns, wide with a maximum rise and fall time of 37 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 156 clock cycles, or about 5.778 μs, after 0H. It consists of a “1” followed by a “0.” The 6-bit header follows the start symbol, and defines the nature of the payload data as shown in Table 5.9. The 16-byte payload is shown in Table 5.10. SDTV YPbPr Interface 85 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D127) 128 BITS SYNC LEVEL 5.778 µS Figure 5.9. CEA-805 Line 40 Timing. H0 H1 H2 H3 H4 H5 000000 : 110001 110010 110010 : 111111 Service Name reserved for future use Type B packet reserved for future use Table 5.9. CEA-805 Line 40 Header Format. 86 Chapter 5: Analog Video Interfaces D7 D6 D5 D4 D3 D2 D1 D0 version number = 0000 0001 length of payload packet = 0001 0000 AR1 AR0 ASB A0 1 B0 S1 S0 C3 C2 C1 C0 R3 R2 R1 R0 RCI 1 1 1 G3 G2 G1 G0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 line number of end of top bar (lower 8 bits) line number of end of top bar (upper 8 bits) line number of start of bottom bar (lower 8 bits) line number of start of bottom bar (upper 8 bits) pixel number of end of left bar (lower 8 bits) pixel number of end of left bar (upper 8 bits) pixel number of start of right bar (lower 8 bits) pixel number of start of right bar (upper 8 bits) 1 1 CRC = x6 + x + 1 AR1–AR0: Intended display aspect ratio 00 4:3 normal 01 4:3 letterbox 10 16:9 normal 11 reserved ASB: Analog Source Bit A0: Active Format Description (AFD) data flag 0 no AFD data (R0–R3) 1 AFD data (R0–R3) valid B0: Bar data (for letterboxing) 0 no bar data 1 bar data present S1–S0: Scan data (amount of overscan and underscan is not indicated) 00 no data 01 overscanned (television) 10 underscanned (computer) 11 reserved C3–C0: Colorimetry 0000 no data 0001 BT.601 0010 BT.709 0011 reserved : : 1111 reserved R0–R3: Active Format Description (AFD) active_format value (refer to Table 13.56) RCI: Redistribution Control Information (RCI) flag G0–G1: CGMS Definition (refer to Table 5.8) G2–G3: Analog Protection Services (refer to Table 5.8) Table 5.10. CEA-805 Line 40 Payload Format. SDTV YPbPr Interface 87 VBI Data for 576p Systems CGMS IEC 62375 defines the format of CGMS (Copy Generation Management System) and widescreen signaling (WSS) data on line 43 for 576p systems. The waveform is illustrated in Figure 5.10. This standard allows a WSSenhanced 16:9 TV to display programs in their correct aspect ratio. Data Timing CGMS and WSS data is normally on line 43, as shown in Figure 5.10. However, due to video editing, the data may reside on any line between 43–47. The clock frequency is 10 MHz (±1 kHz). The signal waveform should be a sine-squared pulse, with a half-amplitude duration of 100 ±10 ns. The signal amplitude is 500 mV ±5%. The NRZ data bits are processed by a biphase code modulator, such that one data period equals 6 elements at 10 MHz. Data Content The WSS consists of a run-in code, a start code, and 14 bits of data, as shown in Table 5.11. Run-In The run-in consists of 29 elements of a spe- cific sequence at 10 MHz, shown in Table 5.11. Start Code The start code consists of 24 elements of a specific sequence at 10 MHz, shown in Table 5.11. 500 MV ±5% BLANK LEVEL RUN IN START CODE 29 10 MHZ ELEMENTS 24 10 MHZ ELEMENTS DATA (B0 - B13) 84 10 MHZ ELEMENTS 43 IRE SYNC LEVEL 5.5 ± 0.125 µS 13.7 µS Figure 5.10. IEC 62375 Line 43 CGMS Timing. 90–110 NS RISE / FALL TIMES (2T BAR SHAPING) 88 Chapter 5: Analog Video Interfaces Group 1 Data The group 1 data consists of 4 data bits that specify the aspect ratio. Each data bit generates 6 elements at 10 MHz. b0 is the LSB. Table 5.11 lists the data bit assignments and usage. The number of active lines listed in Table 5.12 are for the exact aspect ratio (a = 1.33, 1.56, or 1.78). The aspect ratio label indicates a range of possible aspect ratios (a) and number of active lines: 4:3 14:9 16:9 >16:9 a ≤ 1.46 1.46 < a ≤ 1.66 1.66 < a ≤ 1.90 a > 1.90 527–576 463–526 405–462 < 405 To allow automatic selection of the display mode, a 16:9 receiver should support the following minimum requirements: Case 1: The 4:3 aspect ratio picture should be centered on the display, with black bars on the left and right sides. Case 2: The 14:9 aspect ratio picture should be centered on the display, with black bars on the left and right sides. Alternately, the picture may be displayed using the full display width by using a small (typically 8%) horizontal geometrical error. Case 3: The 16:9 aspect ratio picture should be displayed using the full width of the display. Case 4: The >16:9 aspect ratio picture should be displayed as in Case 3 or use the full height of the display by zooming in. Group 3 Data The group 3 data consists of three data bits that specify subtitles. Each data bit generates six elements at 10 MHz. Data bit b8 is the LSB. b9, b10: open subtitles 00 no 01 outside active picture 10 inside active picture 11 reserved Group 4 Data The group 4 data consists of three data bits that specify surround sound and copy protection. Each data bit generates six elements at 10 MHz. Data bit b11 is the LSB. b11: surround sound 0 no 1 yes b12: copyright 0 no copyright asserted or unknown 1 copyright asserted b13: copy protection 0 copying not restricted 1 copying restricted SDTV YPbPr Interface 89 run-in start code group 1 (aspect ratio) group 2 (enhanced services) group 3 (subtitles) group 4 (reserved) 29 elements at 10 MHz 24 elements at 10 MHz 24 elements at 10 MHz “0” = 000 111 “1” = 111 000 24 elements at 10 MHz “0” = 000 111 “1” = 111 000 18 elements at 10 MHz “0” = 000 111 “1” = 111 000 18 elements at 10 MHz “0” = 000 111 “1” = 111 000 1 1111 0001 1100 0111 0001 1100 0111 (0x1F1C71C7) 0001 1110 0011 1100 0001 1111 (0x1E3C1F) b0, b1, b2, b3 b4, b5, b6, b7 (b4, b5, b6 and b7 = “0” since reserved) b8, b9, b10 (b8 = “0” since reserved) b11, b12, b13 Table 5.11. IEC 62375 Line 43 WSS Information. b0, b1, b2, b3 0001 1000 0100 1101 0010 1011 0111 1110 Aspect Ratio Label 4:3 14:9 14:9 16:9 16:9 > 16:9 14:9 16:9 Format full format letterbox letterbox letterbox letterbox letterbox full format full format (anamorphic) Position On 4:3 Display – center top center top center center – Active Lines 576 504 504 430 430 – 576 576 Minimum Requirements case 1 case 2 case 2 case 3 case 3 case 4 – – Table 5.12. IEC 62375 Group 1 (Aspect Ratio) Data Bit Assignments and Usage. 90 Chapter 5: Analog Video Interfaces HDTV YPbPr Interface Most HDTV consumer video equipment supports an analog YPbPr video interface. Three separate RCA phono connectors (consumer market) or BNC connectors (pro-video market) are used. The horizontal and vertical video timing is dependent on the video standard, as discussed in Chapter 4. For sources, the video signal at the connector should have a source impedance of 75Ω ±5%. For receivers, video inputs should be AC-coupled and have a 75-Ω ±5% input impedance. The three signals must be coincident with respect to each other within ±5 ns. For consumer products, composite sync is present on only the Y channel. For pro-video applications, composite sync is present on all three channels. A gamma of 1/0.45 is specified. As shown in Figures 5.11 and 5.12, the Y signal consists of 700 mV of active video (with no blanking pedestal). Pb and Pr have a peakto-peak amplitude of 700 mV. A ±300 ±6 mV composite sync signal is present on just the Y channel (consumer market), or all three channels (pro-video market). DC offsets up to ±1V may be present. The 100% and 75% YPbPr color bar values are shown in Tables 5.13 and 5.14. Analog YPbPr Generation Assuming 10-bit DACs with an output range of 0–1.305V (to match the video DACs used by the NTSC/PAL encoder in Chapter 9), the 10-bit YCbCr to YPbPr equations are: Y = ((800 – 252)/(940 – 64))(Y – 64) Pb = ((800 – 252)/(960 – 64))(Cb – 512) Pr = ((800 – 252)/(960 – 64))(Cr – 512) Y has a nominal 10-bit range of 0–548 to match the active video levels used by the NTSC/PAL encoder in Chapter 9. Pb and Pr have a nominal 10-bit range of 0 to ±274. Note that negative values of Y should be supported at this point. The YPbPr data is clamped by a blanking signal that has a raised cosine distribution to slow the slew rate of the start and end of the video signal. For 1080i and 720p systems, blank rise and fall times are 54 ±20 ns. For 1080p systems, blank rise and fall times are 27 ±10 ns. Composite sync information is added to the Y data after the blank processing has been performed. Values of 16 (sync low), 488 (high sync), or 252 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution to slow the slew rate of the sync signal. Composite sync information may be added to the PbPr data after the blank processing has been performed. Values of 276 (sync low), 748 (high sync), or 512 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution to slow the slew rate of the sync signal. For 1080i systems, sync rise and fall times are 54 ±20 ns, and the horizontal sync low and high widths at the 50% points are 593 ±40 ns. For 720p systems, sync rise and fall times are 54 ±20 ns, and the horizontal sync low and high widths at the 50% points are 539 ±40 ns. For 1080p systems, sync rise and fall times are 27 ±10 ns, and the horizontal sync low and high widths at the 50%-points are 296 ±20 ns. At this point, we have digital YPbPr with sync and blanking information, as shown in Figures 5.11 and 5.12 and Table 5.15. The numbers in parentheses in Figures 5.11 and 5.12 indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. The digital HDTV YPbPr Interface 91 1.020 V WHITE LEVEL (800) 0.622 V 0.321 V 0.020 V 100 IRE 43 IRE 43 IRE Y CHANNEL, SYNC PRESENT SYNC LEVEL (488) BLACK / BLANK LEVEL (252) SYNC LEVEL (16) 1.003 V 50 IRE PEAK LEVEL (786) 0.653 V 50 IRE BLACK / BLANK LEVEL (512) 0.303 V PB OR PR CHANNEL, NO SYNC PRESENT PEAK LEVEL (238) Figure 5.11. EIA-770.3 HDTV Analog YPbPr Levels. Sync on Y. 92 Chapter 5: Analog Video Interfaces 1.020 V WHITE LEVEL (800) 0.622 V 0.321 V 0.020 V 100 IRE 43 IRE 43 IRE Y CHANNEL, SYNC PRESENT SYNC LEVEL (488) BLACK / BLANK LEVEL (252) SYNC LEVEL (16) 1.003 V 0.954 V 0.653 V 0.352 V 0.303 V 50 IRE 43 IRE 50 IRE 43 IRE PB OR PR CHANNEL, SYNC PRESENT PEAK LEVEL (786) SYNC LEVEL (748) BLACK / BLANK LEVEL (512) SYNC LEVEL (276) PEAK LEVEL (238) Figure 5.12. SMPTE 274M and 296M HDTV Analog YPbPr Levels. Sync on YPbPr. HDTV YPbPr Interface 93 White Yellow Cyan Green Magenta Red Blue Black IRE 100 92.8 78.7 71.5 28.5 21.3 7.2 0 Y mV 700 650 551 501 200 149 50 0 IRE Pb mV 0 –50 11.4 –38.5 38.5 –11.4 50 0 0 –350 80 –270 270 –80 350 0 IRE Pr mV 0 4.6 –50 –45.4 45.4 50 –4.6 0 0 32 –350 –318 318 350 –32 0 Y 64 to 940 940 877 753 690 314 251 127 64 Cb 64 to 960 512 64 614 167 857 410 960 512 Cr 64 to 960 512 553 64 106 918 960 471 512 Table 5.13. EIA-770.3 HDTV YPbPr and YCbCr 100% Color Bars. YPbPr values relative to the blanking level. White Yellow Cyan Green Magenta Red Blue Black IRE 75 69.6 59 53.7 21.3 16 5.4 0 Y mV 525 487 413 376 149 112 38 0 IRE Pb mV 0 –37.5 8.6 –28.9 28.9 –8.6 37.5 0 0 –263 60 –202 202 –60 263 0 IRE Pr mV 0 3.5 –37.5 –34 34 37.5 –3.5 0 0 24 –263 –238 238 263 –24 0 Y 64 to 940 721 674 581 534 251 204 111 64 Cb 64 to 960 512 176 589 253 771 435 848 512 Cr 64 to 960 512 543 176 207 817 848 481 512 Table 5.14. EIA-770.3 HDTV YPbPr and YCbCr 75% Color Bars. YPbPr values relative to the blanking level. 94 Chapter 5: Analog Video Interfaces YPbPr data drives three 10-bit DACs to generate the analog YPbPr video signals. Video Level Y white 800 sync high 488 black 252 blank 252 sync low 16 PbPr 512 748 512 512 276 Table 5.15. HDTV 10-Bit YPbPr Values. Analog YPbPr Digitization Assuming 10-bit ADCs with an input range of 0–1.305V (to match the video ADCs used by the NTSC/PAL decoder in Chapter 9), the 10bit YPbPr to YCbCr equations are: Y = 1.5985(Y – 252) + 64 Cb = 1.635(Pb – 512) + 512 Cr = 1.635(Pr – 512) + 512 Y has a nominal 10-bit range of 252–800 to match the active video levels used by the NTSC/PAL decoder in Chapter 9. Table 5.15 and Figures 5.11 and 5.12 illustrate the 10-bit YPbPr values for the white, black, blank, and (optional) sync levels. VBI Data for 720p Systems CGMS Type A CEA-805 and EIA-J CPR–1204–2 define the format of CGMS (Copy Generation Management System) data on line 24 for 720p systems. The waveform is illustrated in Figure 5.13. A sample clock rate of 74.176 MHz (59.94 Hz frame rate) or 74.25 MHz (60 Hz frame rate) is used. Each data bit is 58 clock cycles, or 782 ±30 ns, wide with a maximum rise and fall time of 50 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 232 clock cycles, or about 3.128 μs, after 0H. It consists of a “1” followed by a “0.” The 6-bit header and 14-bit CGMS payload data format is the same as for 480p systems discussed earlier in this chapter. CGMS Type B CEA-805 defines the format of CGMS (Copy Generation Management System) data on line 23 for 720p systems. The waveform is illustrated in Figure 5.14. A sample clock rate of 74.176 MHz (59.94 Hz frame rate) or 74.25 MHz (60 Hz frame rate) is used. Each data bit is eight clock cycles, or 107.7 ±18.5 ns, wide with a maximum rise and fall time of 37 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 232 clock cycles, or about 3.128 μs, after 0H. It consists of a “1” followed by a “0.” HDTV YPbPr Interface 95 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D13) 14 BITS SYNC LEVEL 3.128 µS Figure 5.13. CEA-805 and EIA-J CPR–1204–2 Line 24 Timing. 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D127) 128 BITS SYNC LEVEL 3.128 µS Figure 5.14. CEA-805 Line 23 Timing. 96 Chapter 5: Analog Video Interfaces The 6-bit header and 16-byte payload data format is the same as for 480p systems discussed earlier in this chapter. The 6-bit header and 14-bit CGMS payload data format is the same as for 480p systems discussed earlier in this chapter. VBI Data for 1080i Systems CGMS Type A CEA-805 and EIA-J CPR–1204–2 define the format of CGMS (Copy Generation Management System) data on lines 19 and 582 for 1080i systems. The waveform is illustrated in Figure 5.15. A sample clock rate of 74.176 MHz (59.94 Hz field rate) or 74.25 MHz (60 Hz field rate) is used. Each data bit is 77 clock cycles, or 1038 ±30 ns, wide with a maximum rise and fall time of 50 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 308 clock cycles, or about 4.152 μs, after 0H. It consists of a “1” followed by a “0.” CGMS Type B CEA-805 defines the format of CGMS (Copy Generation Management System) data on lines 18 and 581 for 1080i systems. The waveform is illustrated in Figure 5.16. A sample clock rate of 74.176 MHz (59.94 Hz frame rate) or 74.25 MHz (60 Hz frame rate) is used. Each data bit is ten clock cycles, or 134.6 ±18.5 ns, wide with a maximum rise and fall time of 37 ns. A logical “1” has an amplitude of 70 ±10 IRE; a logical “0” has an amplitude of 0 ±5 IRE. The 2-bit start symbol begins 308 clock cycles, or about 4.152 μs, after 0H. It consists of a “1” followed by a “0.” The 6-bit header and 16-byte payload data format is the same as for 480p systems discussed earlier in this chapter. 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D13) 14 BITS SYNC LEVEL 4.152 µS Figure 5.15. CEA-805 and EIA-J CPR–1204–2 Lines 19 and 582 Timing. D-Connector Interface 97 70 ±10 IRE BLANK LEVEL START SYMBOL 2 BITS HEADER (H0 - H5) 6 BITS DATA (D0 - D127) 128 BITS SYNC LEVEL 4.152 µS Figure 5.16. CEA-805 Lines 18 and 581 Timing. Constrained Image Due to the limited availability of copy protection technology for high-definition analog interfaces. some standards and DRM implementations only allow a constrained image to be output. A constrained image has an effective maximum resolution of 960 × 540p, although the total number of video samples and the video timing remain unchanged (for example, 1280 × 720p60 or 1920 × 1080i30). In these situations, the full resolution image is still available via an approved secure digital video output, such as HDMI. D-Connector Interface A 14-pin female D-connector (EIA-J CP– 4120 standard, EIA-J RC–5237 connector) is optionally used on some high-end consumer equipment in Japan, Hong Kong, and Singapore. It is used to transfer EIA 770.2 or EIA 770.3 interlaced or progressive analog YPbPr video. There are five flavors of the D-connector, referred to as D1, D2, D3, D4, and D5, each used to indicate supported video formats, as shown in Table 5.16. Figure 5.17 illustrates the connector and Table 5.17 lists the pin names. Three line signals (Line 1, Line 2, and Line 3) indicate the resolution and frame rate of the YPbPr source video, as indicated in Table 5.18. 98 Chapter 5: Analog Video Interfaces 480i 480p 720p 1080i 1080p D1 × D2 × × D3 × × × D4 × × × × D5 × × × × × Table 5.16. D-Connector Supported Video Formats. Figure 5.17. D-Connector. Pin Function 1 Y 2 ground - Y 3 Pb 4 ground - Pb 5 Pr 6 ground - Pr 7 reserved 1 8 line 1 9 line 2 10 reserved 2 11 line 3 12 ground - detect plugged 13 reserved 3 14 detect plugged Signal Level 0.700V + sync ±0.350V ±0.350V 0V, 2.2V, or 5V1 0V, 2.2V, or 5V1 0V, 2.2V, or 5V1 0V = plugged in2 Impedance 75 ohms 75 ohms 75 ohms 10K ±3Κ ohm 10K ±3Κ ohm 10K ±3Κ ohm > 100K ohm Notes: 1. 2.2V has range of 2.2V ±0.8V. 5V has a range of 5V ±1.5V. 2. Inside equipment, pin 12 is connected to ground and pin 14 is pulled to 5V through a resistor. Inside each D-Connector plug, pins 12 and 14 are shorted together. Table 5.17. D-Connector Pin Descriptions. D-Connector Interface 99 Resolution Frame Rate Line 1 Scan Lines Line 2 Frame Rate Line 3 Aspect Ratio Chromaticity and Reference White Color Space Equations Gamma Correction Sync Amplitude on Y 30i 5V 0V 5V 25i2 5V 2.2V 5V 1920x1080 30p 5V 2.2V 5V 25p2 5V 2.2V 5V 24p2 5V 2.2V 5V 24sF2 5V 2.2V 5V 60p 2.2V 5V 5V 50p2 2.2V 2.2V 5V 1280x720 640x480 30p 2.2V 2.2V 5V 25p2 2.2V 2.2V 5V 24p2 2.2V 2.2V 5V 60p2 0V 5V 0V 16:9 Squeeze 60p 0V 5V 5V 720x480 16:9 Squeeze 30i 0V 16:9 Letterbox 30i 0V 0V 5V 0V 2.2V 4:3 30i 0V 0V 0V EIA-770.3 EIA-770.3 EIA-770.3 ±0.300V3 EIA-770.2 EIA-770.2 EIA-770.2 –0.300V3 Notes: 1. 60p, 30i, 30p, and 24p frame rates also include the 59.94p, 29.97i, 29.97p, and 23.976p frame rates. 2. Not part of EIAJ CP-4120 specification, but commonly supported by equipment. 3. Relative to the blanking level. Table 5.18. Voltage Levels of Line Signals for Various Video Formats for D-Connector. 100 Chapter 5: Analog Video Interfaces Other Pro-Video Analog Interfaces Tables 5.19 and 5.20 list some other common component analog video formats. The horizontal and vertical timing is the same as for 525-line (M) NTSC and 625-line (B, D, G, H, I) PAL. The 100% and 75% color bar values are shown in Tables 5.21 through 5.24. The SMPTE, EBU N10, 625-line Betacam, and 625line MII values are the same as for SDTV YPbPr. VGA Interface Table 5.25 and Figure 5.18 illustrate the 15pin VGA connector used by computer equipment, and some consumer equipment, to transfer analog RGB signals. The analog RGB signals do not contain sync information and have no blanking pedestal, as shown in Figure 5.4. References 1. CEA-805-C, Data on the Component Video Interfaces, July 2006. 2. EIA-770.1, Analog 525-Line Component Video Interface—Three Channels, November 2001. 3. EIA-770.2, Standard-Definition TV Analog Component Video Interface, November 2001. 4. EIA-770.3, High-Definition TV Analog Component Video Interface, November 2001. 5. EIA-J CPR–1204–1, Transfer Method of Video ID Information Using Vertical Blanking Interval (525P System), March 1998. 6. EIA-J CPR–1204–2, Transfer Method of Video ID Information Using Vertical Blanking Interval (720P, 1125I System), January 2000. 7. EIA-J CP–4120, Interface Between Digital Tuner and Television Receiver Using DConnector, January 2000. 8. IEC 60933–1, Audio, Video and Audiovisual Systems—Interconnections and Matching Values—Part 1: 21-pin Connector for Video Systems, Application No. 1, April 1988. 9. IEC 61880–2, Video Systems (525/60)— Video and Accompanied Data Using the Vertical Blanking Interval—Part 2: 525 Progressive Scan System, September 2002. 10. IEC 62375, Video Systems (625/50 Progressive)—Video and Accompanied Data Using the Vertical Blanking Interval—Analog Interface, February 2004. 11. ITU-R BT.709–5, 2002, Parameter Values for the HDTV Standards for Production and International Programme Exchange. 12. SMPTE 253M–1998, Television—ThreeChannel RGB Analog Video Interface. 13. SMPTE 274M–2005, Television—1920 × 1080 Image Sample Structure, Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates. 14. SMPTE 293M–2003, Television—720 × 483 Active Line at 59.94 Hz Progressive Scan Production—Digital Representation. 15. SMPTE RP-160–1997, Three-Channel Parallel Analog Component High-Definition Video Interface. References 101 Format SMP TE, EBU N10 525-Line Betacam1 625-Line Betacam1 525-Line MII2 625-Line MII2 Output Signal Y sync R´–Y, B´–Y Y sync R´–Y, B´–Y Y sync R´–Y, B´–Y Y sync R´–Y, B´–Y Y sync R´–Y, B´–Y Signal Amplitudes (volts) +0.700 –0.300 ±0.350 +0.714 –0.286 ±0.467 +0.700 –0.300 ±0.350 +0.700 –0.300 ±0.324 +0.700 –0.300 ±0.350 Notes: 1. Trademark of Sony Corporation. 2. Trademark of Matsushita Corporation. Notes 0% setup on Y 100% saturation three wire = (Y + sync), (R´–Y), (B´–Y) 7.5% setup on Y only 100% saturation three wire = (Y + sync), (R´–Y), (B´–Y) 0% setup on Y 100% saturation three wire = (Y + sync), (R´–Y), (B´–Y) 7.5% setup on Y only 100% saturation three wire = (Y + sync), (R´–Y), (B´–Y) 0% setup on Y 100% saturation three wire = (Y + sync), (R´–Y), (B´–Y) Table 5.19. Common Pro-Video Component Analog Video Formats. 102 Chapter 5: Analog Video Interfaces Format SMP TE, EBU N10 NTSC (setup) NTSC (no setup) MII1 Output Signal G´, B´, R´ sync G´, B´, R´ sync G´, B´, R´ sync G´, B´, R´ sync Signal Amplitudes (volts) +0.700 –0.300 +0.714 –0.286 +0.714 –0.286 +0.700 –0.300 Notes: 1. Trademark of Matsushita Corporation. Notes 0% setup on G´, B´, and R´ 100% saturation three wire = (G´ + sync), B´, R´ 7.5% setup on G´, B´, and R´ 100% saturation three wire = (G´ + sync), B´, R´ 0% setup on G´, B´, and R´ 100% saturation three wire = (G´ + sync), B´, R´ 7.5% setup on G´, B´, and R´ 100% saturation three wire = (G´ + sync), B´, R´ Table 5.20. Common Pro-Video RGB Analog Video Formats. References 103 White Yellow Cyan Green Magenta Red Blue Black IRE Y mV IRE B´–Y mV IRE R´–Y mV 100 89.5 72.3 61.8 45.7 35.2 18.0 7.5 714 639 517 441 326 251 129 54 0 –65.3 22.0 –43.3 43.3 –22.0 65.3 0 0 –466 157 –309 309 –157 466 0 0 10.6 –65.3 –54.7 54.7 65.3 –10.6 0 0 76 –466 –391 391 466 –76 0 Table 5.21. 525-Line Betacam 100% Color Bars. Values are relative to the blanking level. White Yellow Cyan Green Magenta Red Blue Black IRE Y mV IRE B´–Y mV IRE R´–Y mV 76.9 69.0 56.1 48.2 36.2 28.2 15.4 7.5 549 492 401 344 258 202 110 54 0 –49.0 16.5 –32.5 32.5 –16.5 49.0 0 0 –350 118 –232 232 –118 350 0 0 8.0 –49.0 –41.0 41.0 49.0 –8.0 0 0 57 –350 –293 293 350 –57 0 Table 5.22. 525-Line Betacam 75% Color Bars. Values are relative to the blanking level. 104 Chapter 5: Analog Video Interfaces White Yellow Cyan Green Magenta Red Blue Black IRE Y mV IRE B´–Y mV IRE R´–Y mV 100 89.5 72.3 61.8 45.7 35.2 18.0 7.5 700 626 506 433 320 246 126 53 0 –46.3 15.6 –30.6 30.6 –15.6 46.3 0 0 –324 109 –214 214 –109 324 0 0 7.5 –46.3 –38.7 38.7 46.3 –7.5 0 0 53 –324 –271 271 324 –53 0 Table 5.23. 525-Line MII 100% Color Bars. Values are relative to the blanking level. White Yellow Cyan Green Magenta Red Blue Black IRE Y mV IRE B´–Y mV IRE R´–Y mV 76.9 69.0 56.1 48.2 36.2 28.2 15.4 7.5 538 483 393 338 253 198 108 53 0 –34.7 11.7 –23.0 23.0 –11.7 34.7 0 0 –243 82 –161 161 –82 243 0 0 5.6 –34.7 –29.0 29.0 34.7 –5.6 0 0 39 –243 –203 203 243 –39 0 Table 5.24. 525-Line MII 75% Color Bars. Values are relative to the blanking level. 5 1 10 6 15 11 Figure 5.18. VGA 15-Pin D-SUB Female Connector. References 105 Pin Function 1 red 2 green 3 blue 4 ground 5 ground 6 ground - red 7 ground - green 8 ground - blue 9 +5V DC 10 ground - HSYNC 11 ground - VSYNC 12 DDC SDA (data) 13 HSYNC (horizontal sync) 14 VSYNC (vertical sync) 15 DDC SCL (clock) Signal Level 0.7v 0.7v 0.7v ≥ 2.4v ≥ 2.4v ≥ 2.4v ≥ 2.4v Notes: 1. DDC = Display Data Channel. Table 5.25. VGA Connector Signals. Impedance 75 ohms 75 ohms 75 ohms 106 Chapter 6: Digital Video Interfaces Chapter 6: Digital Video Interfaces Chapter 6 Digital Video Interfaces Pro-Video Component Interfaces Pro-video equipment, such as that used within studios, has unique requirements and therefore its own set of digital video interconnect standards. Table 6.1 lists the various provideo parallel and serial digital interface standards. Video Timing Rather than digitize and transmit the blanking intervals, special sequences are inserted into the digital video stream to indicate the start of active video (SAV) and end of active video (EAV). These EAV and SAV sequences indicate when horizontal and vertical blanking is present and which field is being transmitted. They also enable the transmission of ancillary data such as digital audio, teletext, captioning, etc. during the blanking intervals. The EAV and SAV sequences must have priority over active video data or ancillary data to ensure that correct video timing is always maintained at the receiver. The receiver decodes the EAV and SAV sequences to recover the video timing. The video timing sequence of the encoder is controlled by three timing signals discussed in Chapter 4: H (horizontal blanking), V (vertical blanking), and F (Field 1 or Field 2). A zero-to-one transition of H triggers an EAV sequence while a one-to-zero transition triggers an SAV sequence. F and V are allowed to change only at EAV sequences. Usually, both 8-bit and 10-bit interfaces are supported, with the 10-bit interface used to transmit 2 bits of fractional video data to minimize cumulative processing errors and to support 10-bit ancillary data. YCbCr or R´G´B´ data may not use the 10bit values of 0x000–0x003 and 0x3FC–0x3FF, or the 8-bit values of 0x00 and 0xFF, since they are used for timing information. 106 Pro-Video Component Interfaces 107 Active Resolution (H × V) Total Display Resolution1 Aspect (H × V) Ratio Frame Rate (Hz) 1× Y Sample Rate (MHz) SDTV or HDTV 720 × 480i 858 × 525i 4:3 29.97 13.5 SDTV Digital Parallel Standard BT.656 BT.799 SMPTE 125M 720 × 480p 858 × 525p 4:3 59.94 27 SDTV – 720 × 576i 864 × 625i 4:3 720 × 576p 864 × 625p 4:3 960 × 480i 1144 × 525i 16:9 960 × 576i 1152 × 625i 16:9 1280 × 720p 1650 × 750p 16:9 1280 × 720p 1650 × 750p 16:9 1920 × 1080i 2200 × 1125i 16:9 1920 × 1080i 2200 × 1125i 16:9 1920 × 1080p 2200 × 1125p 16:9 1920 × 1080p 2200 × 1125p 16:9 25 50 29.97 13.5 SDTV 27 SDTV 18 SDTV BT.656 BT.799 – BT.1302 BT.1303 SMPTE 267M 25 18 SDTV 59.94 60 74.176 74.25 HDTV HDTV 29.97 74.176 HDTV 30 74.25 HDTV BT.1302 BT.1303 SMPTE 274M SMPTE 274M BT.1120 SMPTE 274M BT.1120 SMPTE 274M 59.94 148.35 HDTV BT.1120 SMPTE 274M 60 148.5 HDTV BT.1120 SMPTE 274M 1920 × 1080i 2376 × 1250i 16:9 25 74.25 HDTV BT.1120 Digital Serial Standard BT.656 BT.799 BT.1362 SMPTE 294M BT.656 BT.799 BT.1362 BT.1302 BT.1303 BT.1302 BT.1303 – – BT.1120 SMPTE 292M BT.1120 SMPTE 292M – – BT.1120 1920 × 1080p 2376 × 1250p 16:9 50 148.5 HDTV BT.1120 – Table 6.1. Pro-Video Parallel and Serial Digital Interface Standards for Various Component Video Formats. 1i = interlaced, p = progressive. 108 Chapter 6: Digital Video Interfaces The EAV and SAV sequences are shown in Table 6.2. The status word is defined as: F = “0” for Field 1 F = “1” for Field 2 V = “1” during vertical blanking H = “0” at SAV H = “1” at EAV P3–P0 = protection bits P3 = V ⊕ H P2 = F ⊕ H P1 = F ⊕ V P0 = F ⊕ V ⊕ H where ⊕ represents the exclusive-OR function. These protection bits enable 1- and 2-bit errors to be detected and 1-bit errors to be corrected at the receiver. For most progressive video systems, F is usually a “0” since there is no field information. For 4:2:2 YCbCr data, after each SAV sequence, the stream of active data words always begins with a Cb sample, as shown in Figure 6.1. In the multiplexed sequence, the co-sited samples (those that correspond to the same point on the picture) are grouped as Cb, Y, Cr. During blanking intervals, unless ancillary data is present, 10-bit Y or R´G´B´ values should be set to 0x040 and 10-bit CbCr values should be set to 0x200. The receiver detects the EAV and SAV sequences by looking for the 8-bit 0xFF 0x00 0x00 preamble. The status word (optionally error corrected at the receiver, see Table 6.3) is used to recover the H, V, and F timing signals. Ancillary Data Ancillary data packets are used to transmit non-video information (such as digital audio, closed captioning, teletext, etc.) during the blanking intervals. A wide variety of ITU-R and SMPTE specifications describe the various ancillary data formats. During horizontal blanking, ancillary data may be transmitted in the interval between the EAV and SAV sequences. During vertical blanking, ancillary data may be transmitted in the interval between the SAV and EAV sequences. Multiple ancillary packets may be present in a horizontal or vertical blanking interval, but they must be contiguous with each other. 8-bit Data 10-bit Data D9 (MSB) D8 D7 D6 D5 D4 D3 D2 D1 D0 1 1 1 1 1 1 1 1 1 1 preamble 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 status word 1 F V H P3 P2 P1 P0 0 0 Table 6.2. EAV and SAV Sequence. Pro-Video Component Interfaces 109 Received D5–D2 Received F, V, H (Bits D8–D6) 000 001 010 011 100 101 110 111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 000 000 * 000 * * 111 000 * * 111 * 111 111 111 000 * * 011 * 101 * * * * 010 * 100 * * 111 000 * * 011 * * 110 * * 001 * * 100 * * 111 * 011 011 011 100 * * 011 100 * * 011 100 100 100 * 000 * * * * 101 110 * * 001 010 * * * * 111 * 101 010 * 101 101 * 101 010 * 010 010 * 101 010 * * 001 110 * 110 * 110 110 001 001 * 001 * 001 110 * * * * 011 * 101 110 * * 001 010 * 100 * * * Notes: * = uncorrectable error. Table 6.3. SAV and EAV Error Correction at Decoder. START OF DIGITAL LINE BT.601 H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 268 (280) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 1716 (1728) 1440 NEXT LINE C Y3 R 719 F 718 F BT.656 4:2:2 VIDEO Figure 6.1. BT.656 Parallel Interface Data For One Scan Line. 480i; 4:2:2 YCbCr; 720 active samples per line; 27 MHz clock; 10-bit system. The values for 576i systems are shown in parentheses. 110 Chapter 6: Digital Video Interfaces There are two types of ancillary data formats. The older Type 1 format uses a single data ID word to indicate the type of ancillary data; the newer Type 2 format uses two words for the data ID. The general packet format is shown in Table 6.4. Data ID (DID) DID indicates the type of data being sent. The assignment of most of the DID values is controlled by the ITU and SMPTE to ensure equipment compatibility. A few DID values are available that don’t require registration. User Data Words (UDW) Up to 255 user data words may be present in the packet. In 8-bit applications, the number of user data words must be an integral number of four. Padding words may be added to ensure an integral number of four user data words are present. User data may not use the 10-bit values of 0x000–0x003 and 0x3FC–0x3FF, or the 8-bit values of 0x00 and 0xFF, since they are used for timing information. Parallel Interfaces Secondar y ID (SDID, Type 2 Only) SDID is also part of the data ID for Type 2 ancillary formats. The assignment of most of the SDID values is also controlled by the ITU and SMPTE to ensure equipment compatibility. A few SDID values are available that don’t require registration. Data Block Number (DBN, Type 1 Only) DBN is used to allow multiple ancillary packets (sharing the same DID) to be put back together at the receiver. This is the case when there are more than 255 user data words required to be transmitted, thus requiring more than one ancillary packet to be used. The DBN value increments by one for each consecutive ancillary packet. Data Count (DC) DC specifies the number of user data words in the packet. In 8-bit applications, it specifies the six MSBs of an 8-bit value, so the number of user data words must be an integral number of four. 25-pin Parallel Interface This interface is used to transfer SDTV resolution 4:2:2 YCbCr data. 8-bit or 10-bit data and a clock are transferred. The individual bits are labeled D0–D9, with D9 being the most significant bit. The pin allocations for the signals are shown in Table 6.5. Y has a nominal 10-bit range of 0x040– 0x3AC. Values less than 0x040 or greater than 0x3AC may be present due to processing. During blanking, Y data should have a value of 040H, unless other information is present. Cb and Cr have a nominal 10-bit range of 0x040–0x3C0. Values less than 0x040 or greater than 0x3C0 may be present due to processing. During blanking, CbCr data should have a value of 0x200, unless other data is present. Signal levels are compatible with ECLcompatible balanced drivers and receivers. The generator must have a balanced output with a maximum source impedance of 110 Ω; the signal must be 0.8–2.0V peak-to-peak measured across a 110-Ω load. At the receiver, the transmission line is terminated by 110 ±10 Ω. Pro-Video Component Interfaces 111 ancillar y data flag (ADF) D9 (MSB) D8 0 0 1 1 1 1 data ID (DID) D8 even parity data block number or SDID D8 even parity data count (DC) D8 even parity user data word 0 user data word N checksum D8 8-bit Data 10-bit Data D7 D6 D5 D4 D3 D2 D1 D0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 value of 0000 0000 to 1111 1111 value of 0000 0000 to 1111 1111 value of 0000 0000 to 1111 1111 value of 00 0000 0100 to 11 1111 1011 : value of 00 0000 0100 to 11 1111 1011 sum of D0–D8 of data ID through last user data word. Preset to all zeros; carry is ignored. Table 6.4. Ancillary Data Packet General Format. 112 Chapter 6: Digital Video Interfaces Pin Signal Pin Signal 1 clock 14 clock– 2 system ground A 15 system ground B 3 D9 16 D9– 4 D8 17 D8– 5 D7 18 D7– 6 D6 19 D6– 7 D5 20 D5– 8 D4 21 D4– 9 D3 22 D3– 10 D2 23 D2– 11 D1 24 D1– 12 D0 25 D0– 13 cable shield Table 6.5. 25-Pin Parallel Interface Connector Pin Assignments. For 8-bit interfaces, D9–D2 are used. 27 MHz Parallel Interface This BT.656 and SMPTE 125M interface is used for 480i and 576i systems with an aspect ratio of 4:3. Y and multiplexed CbCr information at a sample rate of 13.5 MHz are multiplexed into a single 8-bit or 10-bit data stream, at a clock rate of 27 MHz. The 27 MHz clock signal has a clock pulse width of 18.5 ±3 ns. The positive transition of the clock signal occurs midway between data transitions with a tolerance of ±3 ns (as shown in Figure 6.2). To permit reliable operation at interconnect lengths of 50–200 meters, the receiver must use frequency equalization, with typical characteristics shown in Figure 6.3. This example enables operation with a range of cable lengths down to zero. 36 MHz Parallel Interface This BT.1302 and SMPTE 267M interface is used for 480i and 576i systems with an aspect ratio of 16:9. Y and multiplexed CbCr information at a sample rate of 18 MHz are multiplexed into a single 8-bit or 10-bit data stream, at a clock rate of 36 MHz. The 36 MHz clock signal has a clock pulse width of 13.9 ±2 ns. The positive transition of the clock signal occurs midway between data transitions with a tolerance of ±2 ns (as shown in Figure 6.4. To permit reliable operation at interconnect lengths of 40–160 meters, the receiver must use frequency equalization, with typical characteristics shown in Figure 6.3. Pro-Video Component Interfaces 113 CLOCK DATA TW TD TC TW = 18.5 ± 3 NS TC = 37 NS TD = 18.5 ± 3 NS Figure 6.2. 25-Pin 27 MHz Parallel Interface Waveforms. RELATIVE GAIN (DB) 20 18 16 14 12 10 8 6 4 2 0 0.1 1 10 100 FREQUENCY (MHZ) Figure 6.3. Example Line Receiver Equalization Characteristics for Small Signals. 114 Chapter 6: Digital Video Interfaces CLOCK DATA TW TD TC TW = 13.9 ± 2 NS TC = 27.8 NS TD = 13.9 ± 2 NS Figure 6.4. 25-Pin 36 MHz Parallel Interface Waveforms. 93-pin Parallel Interface This interface is used to transfer HDTV resolution R´G´B´ data, 4:2:2 YCbCr data, or 4:2:2:4 YCbCrK data. The pin allocations for the signals are shown in Table 6.6. The most significant bits are R9, G9, and B9. When transferring 4:2:2 YCbCr data, the green channel carries Y information and the red channel carries multiplexed CbCr information. When transferring 4:2:2:4 YCbCrK data, the green channel carries Y information, the red channel carries multiplexed CbCr information, and the blue channel carries K (alpha keying) information. Y has a nominal 10-bit range of 0x040– 0x3AC. Values less than 040H or greater than 0x3AC may be present due to processing. During blanking, Y data should have a value of 0x040, unless other information is present. Cb and Cr have a nominal 10-bit range of 0x040–0x3C0. Values less than 0x040 or greater than 0x3C0 may be present due to processing. During blanking, CbCr data should have a value of 0x200, unless other information is present. R´G´B´ and K have a nominal 10-bit range of 0x040–0x3AC. Values less than 0x040 or greater than 0x3AC may be present due to processing. During blanking, R´G´B´ data should have a value of 0x040, unless other information is present. Signal levels are compatible with ECLcompatible balanced drivers and receivers. The generator must have a balanced output with a maximum source impedance of 110 Ω; the signal must be 0.6–2.0V peak-to-peak measured across a 110-Ω load. At the receiver, the transmission line must be terminated by 110 ±10 Ω. 74.25 and 74.176 MHz Parallel Interface This ITU-R BT.1120 and SMPTE 274M interface is primarily used for HDTV systems. The 74.25 or 74.176 MHz (74.25/1.001) clock signal has a clock pulse width of 6.73 ±1.48 ns. The positive transition of the clock signal occurs midway between data transitions with a tolerance of ±1 ns (as shown in Figure 6.5). Pro-Video Component Interfaces 115 Pin Signal Pin Signal Pin Signal Pin 1 clock 26 GND 51 B2 76 2 G9 27 GND 52 B1 77 3 G8 28 GND 53 B0 78 4 G7 29 GND 54 R9 79 5 G6 30 GND 55 R8 80 6 G5 31 GND 56 R7 81 7 G4 32 GND 57 R6 82 8 G3 33 clock– 58 R5 83 9 G2 34 G9– 59 R4 84 10 G1 35 G8– 60 R3 85 11 G0 36 G7– 61 R2 86 12 B9 37 G6– 62 R1 87 13 B8 38 G5– 63 R0 88 14 B7 39 G4– 64 GND 89 15 B6 40 G3– 65 GND 90 16 B5 41 G2– 66 GND 91 17 GND 42 G1– 67 GND 92 18 GND 43 G0– 68 GND 93 19 GND 44 B9– 69 GND 20 GND 45 B8– 70 GND 21 GND 46 B7– 71 GND 22 GND 47 B6– 72 GND 23 GND 48 B5– 73 GND 24 GND 49 B4 74 GND 25 GND 50 B3 75 GND Table 6.6. 93-Pin Parallel Interface Connector Pin Assignments. For 8-bit interfaces, bits 9–2 are used. Signal GND GND GND B4– B3– B2– B1– B0– R9– R8– R7– R6– R5– R4– R3– R2– R1– R0– 116 Chapter 6: Digital Video Interfaces To permit reliable operation at interconnect lengths greater than 20 meters, the receiver must use frequency equalization. 148.5 and 148.35 MHz Parallel Interface This BT.1120 and SMPTE 274M interface is used for HDTV systems. The 148.5 or 148.35 MHz (148.5/1.001) clock signal has a clock pulse width of 3.37 ±0.74 ns. The positive transition of the clock signal occurs midway between data transitions with a tolerance of ±0.5 ns (similar to Figure 6.5). To permit reliable operation at interconnect lengths greater than 14 meters, the receiver must use frequency equalization. Applications One or more parallel interfaces may be used to transfer various video formats between equipment. 4:2:2 YCbCr - Interlaced SDTV The ITU-R BT.656 and BT.1302 parallel interfaces were developed to transfer BT.601 4:2:2 YCbCr digital video between equipment. SMPTE 125M and 267M further clarify the operation for 480i systems. Figure 6.6 illustrates the timing for one scan line for the 4:3 aspect ratio, using a 27 MHz sample clock. Figure 6.7 shows the timing for one scan line for the 16:9 aspect ratio, using a 36 MHz sample clock. The 25-pin parallel interface is used. 4:4:4:4 YCbCrK - Interlaced SDTV The ITU-R BT.799 and BT.1303 parallel interfaces were developed to transfer BT.601 4:4:4:4 YCbCrK digital video between equipment. K is an alpha keying signal, used to mix two video sources, discussed in Chapter 7. SMPTE RP-175 further clarifies the operation for 480i systems. CLOCK DATA TW TD TC TW = 6.73 ± 1.48 NS TC = 13.47 NS TD = 6.73 ± 1 NS Figure 6.5. 93-Pin 74.25 and 74.176 MHz Parallel Interface Waveforms. Pro-Video Component Interfaces 117 START OF DIGITAL LINE BT.601 H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 268 (280) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 1716 (1728) 1440 NEXT LINE C Y3 R 719 F 718 F BT.656 4:2:2 VIDEO Figure 6.6. BT.656 and SMPTE 125M Parallel Interface Data for One Scan Line. 480i; 4:2:2 YCbCr; 720 active samples per line; 27 MHz clock; 10-bit system. The values for 576i systems are shown in parentheses. START OF DIGITAL LINE BT.601 H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 360 (376) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 2288 (2304) 1920 NEXT LINE C Y3 R 959 F 958 F BT.1302 4:2:2 VIDEO Figure 6.7. BT.1302 and SMPTE 267M Parallel Interface Data for One Scan Line. 480i; 4:2:2 YCbCr; 960 active samples per line; 36 MHz clock; 10-bit system. The values for 576i systems are shown in parentheses. 118 Chapter 6: Digital Video Interfaces Two transmission links are used. Link A contains all the Y samples plus those Cb and Cr samples located at even-numbered sample points. Link B contains samples from the keying channel and the Cb and Cr samples from the odd-numbered sampled points. Although it may be common to refer to Link A as 4:2:2 and Link B as 2:2:4, Link A is not a true 4:2:2 signal since the CbCr data was sampled at 13.5 MHz, rather than 6.75 MHz. Figure 6.8 shows the contents of links A and B when transmitting 4:4:4:4 YCbCrK video data. Figure 6.9 illustrates the contents when transmitting R´G´B´K video data. If the keying signal (K) is not present, the K sample values should have a 10-bit value of 3ACH. Figure 6.10 illustrates the YCbCrK timing for one scan line for the 4:3 aspect ratio, using a 27 MHz sample clock. Figure 6.11 shows the YCbCrK timing for one scan line for the 16:9 aspect ratio, using a 36 MHz sample clock. Two 25-pin parallel interfaces are used. RGBK - Interlaced SDTV BT.799 and BT.1303 also support transfer- ring BT.601 R´G´B´K digital video between equipment. For additional information, see the 4:4:4:4 YCbCrK interface. SMPTE RP-175 further clarifies the operation for 480i systems. The G´ samples are sent in the Y locations, the R´ samples are sent in the Cr locations, and the B´ samples are sent in the Cb locations. 4:2:2 YCbCr - Progressive SDTV ITU-R BT.1362 defines two 10-bit 4:2:2 YCbCr data streams (Figure 6.12), using a 27 MHz sample clock. SMPTE 294M further clarifies the operation for 480p systems. What stream is used for which scan line is shown in Table 6.7. 4:2:2 YCbCr - Interlaced HDTV The ITU-R BT.1120 parallel interface was developed to transfer interlaced HDTV 4:2:2 YCbCr digital video between equipment. SMPTE 274M further clarifies the operation for 29.97 and 30 Hz systems. Figure 6.13 illustrates the timing for one scan line for the 1920 × 1080i active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (25 or 30 Hz frame rate) or 74.176 MHz (29.97 Hz frame rate). 4:2:2:4 YCbCrK - Interlaced HDTV BT.1120 also supports transferring HDTV 4:2:2:4 YCbCrK digital video between equipment. SMPTE 274M further clarifies the operation for 29.97 and 30 Hz systems. Figure 6.14 illustrates the timing for one scan line for the 1920 × 1080i active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (25 or 30 Hz frame rate) or 74.176 MHz (29.97 Hz frame rate). RGB - Interlaced HDTV BT.1120 also supports transferring HDTV R´G´B´ digital video between equipment. SMPTE 274M further clarifies the operation for 29.97 and 30 Hz systems. Figure 6.15 illustrates the timing for one scan line for the 1920 × 1080i active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (25 or 30 Hz frame rate) or 74.176 MHz (29.97 Hz frame rate). Pro-Video Component Interfaces 119 LINK A LINK B SAMPLE NUMBER 0 1 2 3 4 5 6 7 Y Y Y Y Y Y Y Y CB CB CB CB CB CB CB CB CR CR CR CR CR CR CR CR K K K K K K K K Figure 6.8. Link Content Representation for YCbCrK Video Signals. LINK A LINK B SAMPLE NUMBER 0 1 2 3 4 5 6 7 GGGGGGGG B B B B B B B B RRRRRRRR K K K K K K K K Figure 6.9. Link Content Representation for R´G´B´K Video Signals. START OF DIGITAL LINE BT.601 H SIGNAL START OF DIGITAL ACTIVE LINE NEXT LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 268 (280) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 1716 (1728) 1440 EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 SAV CODE 2 0 3 0 0XCKCKCKCK 0 4F0 0YB0R1B2R3 00F00Z1 1 3 3 C Y3 R 719 F 718 F 4:2:2 STREAM (LINK A) C K3 R 719 F 719 F 4:2:2 STREAM (LINK B) Figure 6.10. BT.799 and SMPTE RP-175 Parallel Interface Data for One Scan Line. 480i; 4:4:4:4 YCbCrK; 720 active samples per line; 27 MHz clock; 10-bit system. The values for 576i systems are shown in parentheses. 120 Chapter 6: Digital Video Interfaces START OF DIGITAL LINE BT.601 H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 360 (376) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 2288 (2304) 1920 EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 SAV CODE 2 0 3 0 0XCKCKCKCK 0 4F0 0YB0R1B2R3 00F00Z1 1 3 3 NEXT LINE C Y3 R 959 F 958 F 4:2:2 STREAM (LINK A) C K3 R 959 F 959 F 4:2:2 STREAM (LINK B) Figure 6.11. BT.1303 Parallel Interface Data for One Scan Line. 480i; 4:4:4:4 YCbCrK; 960 active samples per line; 36 MHz clock; 10-bit system. The values for 576i systems are shown in parentheses. START OF DIGITAL LINE BT.1358, SMPTE 293M H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 4 268 (280) SAV CODE CO–SITED CO–SITED 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 4 1716 (1728) 1440 EAV CODE BLANKING 300X2020 F00Y0404 F00Z0000 SAV CODE 2 0 3 0 0XCYCYCYCY 0 4F0 0YB0R1B2R3 00F00Z0 0 2 2 NEXT LINE C Y3 R 719 F 718 F BT.656 4:2:2 STREAM (LINK A) C Y3 R 719 F 718 F BT.656 4:2:2 STREAM (LINK B) Figure 6.12. BT.1362 and SMPTE 294M Parallel Data for Two Scan Lines. 480p; 4:2:2 YCbCr; 720 active samples per line; 27 MHz clock; 10-bit system. The values for 576p systems are shown in parentheses. Pro-Video Component Interfaces 121 480p (525p) System Link Link Link Link A B A B 7 8 6 7 9 10 : : : : 522 523 523 524 524 525 525 1 1 2 2 3 3 4 4 5 5 6 576p (625p) System Link Link Link Link A B A B 1 2 4 5 3 4 6 7 : : 8 9 621 622 : : 623 624 620 621 625 1 622 623 2 3 624 625 Table 6.7. BT.1362 and SMPTE 294M Scan Line Numbering and Link Assignment. START OF DIGITAL LINE BT.709, SMPTE 274M H SIGNAL START OF DIGITAL ACTIVE LINE NEXT LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 272 (712) SAV CODE 0 0 3 0 0XYYYYYYYY 44F00Y01234567 00F00Z 4 2200 (2640) 1920 EAV CODE BLANKING 300X2222 F00Y0000 F00Z0000 SAV CODE 2 2 3 0 0 XCCCCCCCC 0 0 F 0 0YBRBRBRBR 00F00Z00224466 Y Y3 1918 1919 F F Y CHANNEL C C3 B RF 1918 1918 F CBCR CHANNEL Figure 6.13. BT.1120 and SMPTE 274M Parallel Interface Data for One Scan Line. 1080i29.97, 1080i30, 1080p59.94, and 1080p60 systems; 4:2:2 YCbCr; 1920 active samples per line; 74.176, 74.25, 148.35, or 148.5 MHz clock; 10-bit system. The values for 1080i25 and 1080p50 systems are shown in parentheses. 122 Chapter 6: Digital Video Interfaces START OF DIGITAL LINE BT.709, SMPTE 274M H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 272 (712) SAV CODE 0 0 3 0 0XYYYYYYYY 44F00Y01234567 00F00Z 4 2200 (2640) 1920 EAV CODE BLANKING 300X2222 F00Y0000 F00Z0000 SAV CODE 2 2 3 0 0 XCCCCCCCC 0 0 F 0 0YBRBRBRBR 00F00Z00224466 EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0XKKKKKKKK 44F00Y01234567 00F00Z NEXT LINE Y Y3 1918 1919 F F Y CHANNEL C C3 B RF 1918 1918 F CBCR CHANNEL K K3 1918 1919 F F K CHANNEL Figure 6.14. BT.1120 and SMPTE 274M Parallel Interface Data for One Scan Line. 1080i29.97, 1080i30, 1080p59.94, and 1080p60 systems; 4:2:2:4 YCbCrK; 1920 active samples per line; 74.176, 74.25, 148.35, or 148.5 MHz clock; 10-bit system. The values for 1080i25 and 1080p50 systems are shown in parentheses. Pro-Video Component Interfaces 123 START OF DIGITAL LINE BT.709, SMPTE 274M H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 272 (712) SAV CODE 0 0 3 0 0 XGGGGGGGG 44F00Y01234567 00F00Z 4 2200 (2640) 1920 EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0 XRRRRRRRR 44F00Y01234567 00F00Z EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0XBBBBBBBB 44F00Y01234567 00F00Z NEXT LINE G G3 1918 1919 F F GREEN CHANNEL R R3 1918 1919 F F RED CHANNEL B B3 1918 1919 F F BLUE CHANNEL Figure 6.15. BT.1120 and SMPTE 274M Parallel Interface Data for One Scan Line. 1080i29.97, 1080i30, 1080p59.94, and 1080p60 systems; R´G´B´; 1920 active samples per line; 74.176, 74.25, 148.35, or 148.5 MHz clock; 10-bit system. The values for 1080i25 and 1080p50 systems are shown in parentheses. 124 Chapter 6: Digital Video Interfaces 4:2:2 YCbCr - Progressive HDTV The ITU-R BT.1120 and SMPTE 274M par- allel interfaces were developed to transfer progressive HDTV 4:2:2 YCbCr digital video between equipment. Figure 6.13 illustrates the timing for one scan line for the 1920 × 1080p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 148.5 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 148.35 MHz (23.98, 29.97, or 59.94 Hz frame rate). Figure 6.16 illustrates the timing for one scan line for the 1280 × 720p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 74.176 MHz (23.98, 29.97, or 59.94 Hz frame rate). 4:2:2:4 YCbCrK - Progressive HDTV BT.1120 and SMPTE 274M also support transferring HDTV 4:2:2:4 YCbCrK digital video between equipment. Figure 6.14 illustrates the timing for one scan line for the 1920 × 1080p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 148.5 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 148.35 MHz (23.98, 29.97, or 59.94 Hz frame rate). Figure 6.17 illustrates the timing for one scan line for the 1280 × 720p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 74.176 MHz (23.98, 29.97, or 59.94 Hz frame rate). RGB - Progressive HDTV BT.1120 and SMPTE 274M also support transferring HDTV R´G´B´ digital video between equipment. Figure 6.15 illustrates the timing for one scan line for the 1920 × 1080p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 148.5 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 148.35 MHz (23.98, 29.97, or 59.94 Hz frame rate). Figure 6.18 illustrates the timing for one scan line for the 1280 × 720p active resolutions. The 93-pin parallel interface is used with a sample clock rate of 74.25 MHz (24, 25, 30, 50, or 60 Hz frame rate) or 74.176 MHz (23.98, 29.97, or 59.94 Hz frame rate). Serial Interfaces The parallel formats can be converted to a serial format (Figure 6.19), allowing data to be transmitted using a 75-Ω coaxial cable or optical fiber. For cable interconnect, the generator has an unbalanced output with a source impedance of 75 Ω; the signal must be 0.8V ±10% peak-topeak measured across a 75-Ω load. The receiver has an input impedance of 75 Ω. In an 8-bit environment, before serialization, the 0x00 and 0xFF codes during EAV and SAV are expanded to 10-bit values of 0x000 and 0x3FF, respectively. All other 8-bit data is appended with two least significant “0” bits before serialization. The 10 bits of data are serialized (LSB first) and processed using a scrambled and polarity-free NRZI algorithm: G(x) = (x9 + x4 + 1)(x + 1) The input signal to the scrambler (Figure 6.20) uses positive logic (the highest voltage represents a logical one; lowest voltage represents a logical zero). The formatted serial data is output at the 10× sample clock rate. Since the parallel clock may contain large amounts of jitter, deriving the 10× sample clock directly from an unfiltered parallel clock may result in excessive signal jitter. Pro-Video Component Interfaces 125 START OF DIGITAL LINE SMPTE 296M H SIGNAL START OF DIGITAL ACTIVE LINE NEXT LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 362 (692) SAV CODE 0 0 3 0 0XYYYYYYYY 44F00Y01234567 00F00Z 4 1650 (1980) 1280 EAV CODE BLANKING 300X2222 F00Y0000 F00Z0000 SAV CODE 2 2 3 0 0 XCCCCCCCC 0 0 F 0 0YBRBRBRBR 00F00Z00224466 Y Y3 1278 1279 F F Y CHANNEL C C3 B RF 1278 1278 F CBCR CHANNEL Figure 6.16. SMPTE 274M Parallel Interface Data for One Scan Line. 720p59.94 and 720p60 systems; 4:2:2 YCbCr; 1280 active samples per line; 74.176 or 74.25 MHz clock; 10-bit system. The values for 720p50 systems are shown in parentheses. SMPTE 296M H SIGNAL START OF DIGITAL LINE START OF DIGITAL ACTIVE LINE NEXT LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 362 (692) SAV CODE 0 0 3 0 0XYYYYYYYY 44F00Y01234567 00F00Z 4 1650 (1980) 1280 EAV CODE BLANKING 300X2222 F00Y0000 F00Z0000 SAV CODE 2 2 3 0 0 XCCCCCCCC 0 0 F 0 0YBRBRBRBR 00F00Z00224466 EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0XKKKKKKKK 44F00Y01234567 00F00Z Y Y3 1278 1279 F F Y CHANNEL C C3 B RF 1278 1278 F CBCR CHANNEL K K3 1278 1279 F F K CHANNEL Figure 6.17. SMPTE 274M Parallel Interface Data for One Scan Line. 720p59.94 and 720p60 systems; 4:2:2:4 YCbCrK; 1280 active samples per line; 74.176 or 74.25 MHz clock; 10-bit system. The values for 720p50 systems are shown in parentheses. 126 Chapter 6: Digital Video Interfaces START OF DIGITAL LINE SMPTE 296M H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 4 362 (692) SAV CODE 0 0 3 0 0 XGGGGGGGG 44F00Y01234567 00F00Z 4 1650 (1980) 1280 EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0 XRRRRRRRR 44F00Y01234567 00F00Z EAV CODE BLANKING 300X0000 F00Y4444 F00Z0000 SAV CODE 0 0 3 0 0XBBBBBBBB 44F00Y01234567 00F00Z NEXT LINE G G3 1278 1279 F F GREEN CHANNEL R R3 1278 1279 F F RED CHANNEL B B3 1278 1279 F F BLUE CHANNEL Figure 6.18. SMPTE 274M Parallel Interface Data for One Scan Line. 720p59.94 and 720p60 systems; R´G´B´; 1280 active samples per line; 74.176 or 74.25 MHz clock; 10-bit system. The values for 720p50 systems are shown in parentheses. Pro-Video Component Interfaces 127 PARALLEL 10 4:2:2 VIDEO SHIFT REGISTER SCRAMBLER SAMPLE CLOCK 10X SERIAL PLL CLOCK 75–OHM COAX DESCRAMBLER SHIFT REGISTER 10 PARALLEL 4:2:2 VIDEO SAV, EAV DETECT DIVIDE PLL BY 10 SAMPLE CLOCK Figure 6.19. Serial Interface Block Diagram. SERIAL DATA IN (NRZ) + DQ DQ DQ DQ DQ DQ DQ DQ DQ + DQ ENCODED DATA OUT (NRZI) + G(X) = X9 + X4 + 1 G(X) = X + 1 Figure 6.20. Typical Scrambler Circuit. ENCODED DATA IN (NRZI) DQ + DQ DQ DQ DQ DQ DQ DQ DQ DQ SERIAL + DATA OUT (NRZ) + Figure 6.21. Typical Descrambler Circuit. 128 Chapter 6: Digital Video Interfaces At the receiver, phase-lock synchronization is done by detecting the EAV and SAV sequences. The PLL is continuously adjusted slightly each scan line to ensure that these patterns are detected and to avoid bit slippage. The recovered 10× sample clock is divided by ten to generate the sample clock, although care must be taken not to mask word-related jitter components. The serial data is low- and high-frequency equalized, inverse scrambling performed (Figure 6.21), and deserialized. 360 MHz clock from the 36 MHz clock signal. This interface is primarily used for 480i and 576i 16:9 systems. 540 Mbps Serial Interface This SMPTE 344M interface converts a 54 MHz parallel stream, or two 27 MHz parallel streams, into a 540 Mbps serial stream. The 10× PLL generates a 540 MHz clock from the 54 MHz clock signal. This interface is primarily used for 480p and 576p 4:3 systems. 270 Mbps Serial Interface This BT.656 and SMPTE 259M interface (also called SDI) converts a 27 MHz parallel stream into a 270 Mbps serial stream. The 10× PLL generates a 270 MHz clock from the 27 MHz clock signal. This interface is primarily used for 480i and 576i 4:3 systems. 360 Mbps Serial Interface This BT.1302 and SMPTE 259M interface converts a 36 MHz parallel stream into a 360 Mbps serial stream. The 10× PLL generates a 1.485 and 1.4835 Gbps Serial Interface This BT.1120 and SMPTE 292M interface multiplexes two 74.25 or 74.176 (74.25/1.001) MHz parallel streams (Y and CbCr) into a single 1.485 or 1.4835 Gbps serial stream. A 20× PLL generates a 1.485 GHz clock from the 74.25 or 74.176 MHz clock signal. This interface is used for HDTV systems. Before multiplexing the two parallel streams together, line number and CRC information (Table 6.8) is added to each stream after each EAV sequence. The CRC is used to LN0 LN1 CRC0 CRC1 D9 (MSB) D8 D7 D6 D5 D4 D3 D2 D1 D0 D8 L6 L5 L4 L3 L2 L1 L0 0 0 D8 0 0 0 L10 L9 L8 L7 0 0 D8 crc8 crc7 crc6 crc5 crc4 crc3 crc2 crc1 crc0 D8 crc17 crc16 crc15 crc14 crc13 crc12 crc11 crc10 crc9 Table 6.8. Line Number and CRC Data. Pro-Video Composite Interfaces 129 detect errors in the active video and EAV. It consists of two words generated by the polynomial: CRC = x18 + x5 + x4 + 1 The initial value is set to zero. The calculation starts with the first active line word and ends at the last word of the line number (LN1). Applications One or more serial interfaces may be used to transfer various video formats between equipment. RGBK - Interlaced SDTV BT.799 and BT.1303 also define a R´G´B´K serial interface. The two 10-bit R´G´B´K parallel streams in Figure 6.10 are serialized using two 270 or 360 Mbps serial interfaces. 4:2:2 YCbCr - Progressive SDTV ITU-R BT.1362 and SMPTE 294M also define a 4:2:2 YCbCr serial interface. The two 10-bit 4:2:2 YCbCr parallel streams in Figure 6.12 are serialized using two 270 Mbps serial interfaces. 4:2:2 YCbCr - Interlaced HDTV BT.1120 and SMPTE 292M also define a 4:2:2 YCbCr serial interface. The two 10-bit 4:2:2 YCbCr parallel streams shown in Figure 6.13 are multiplexed together, then serialized using a 1.485 or 1.4835 Gbps serial interface. Pro-Video Composite Interfaces Digital composite video is essentially a digital version of a composite analog (M) NTSC or (B, D, G, H, I) PAL video signal. The sample clock rate is four times FSC: about 14.32 MHz for (M) NTSC and about 17.73 MHz for (B, D, G, H, I) PAL. Usually, both 8-bit and 10-bit interfaces are supported, with the 10-bit interface used to transmit 2 bits of fractional video data to minimize cumulative processing errors and to support 10-bit ancillary data. Table 6.9 lists the digital composite levels. Video data may not use the 10-bit values of 0x000–0x003 and 0x3FC–0x3FF, or the 8-bit values of 0x00 and 0xFF, since they are used for timing information. NTSC Video Timing There are 910 total samples per scan line, as shown in Figure 6.22. Horizontal count 0 corresponds to the start of active video, and a horizontal count of 768 corresponds to the start of horizontal blanking. Sampling is along the ±I and ±Q axes (33°, 123°, 213°, and 303°). The sampling phase at horizontal count 0 of line 10, Field 1 is on the +I axis (123°). The sync edge values, and the horizontal counts at which they occur, are defined as shown in Figure 6.23 and Tables 6.10–6.12. 8bit values for one color burst cycle are 45, 83, 75, and 37. The burst envelope starts at horizontal count 857, and lasts for 43 clock cycles, as shown in Table 6.10. Note that the peak amplitudes of the burst are not sampled. To maintain zero SCH phase, horizontal count 784 occurs 25.6 ns (33° of the subcarrier phase) before the 50% point of the falling edge of horizontal sync, and horizontal count 785 occurs 44.2 ns (57° of the subcarrier phase) after the 50% point of the falling edge of horizontal sync. 130 Chapter 6: Digital Video Interfaces PAL Video Timing There are 1135 total samples per line, except for two lines per frame which have 1137 samples per line, making a total of 709,379 samples per frame. Figure 6.24 illustrates the typical line timing. Horizontal count 0 corresponds to the start of active video, and a horizontal count of 948 corresponds to the start of horizontal blanking. Sampling is along the ±U and ±V axes (0°, 90°, 180°, and 270°), with the sampling phase at horizontal count 0 of line 1, Field 1 on the +V axis (90°). 8-bit color burst values are 95, 64, 32, and 64, continuously repeated. The swinging burst causes the peak burst (32 and 95) and zero burst (64) samples to change places. The burst envelope starts at horizontal count 1058, and lasts for 40 clock cycles. Sampling is not H-coherent as with (M) NTSC, so the position of the sync pulses changes from line to line. Zero SCH phase is defined when alternate burst samples have a value of 64. Ancillary Data Ancillary data packets are used to transmit information (such as digital audio, closed captioning, and teletext data) during the blanking intervals. ITU-R BT.1364 and SMPTE 291M describe the ancillary data formats. The ancillary data formats are the same as for digital component video, discussed earlier in this chapter. However, instead of a 3-word preamble, a one-word ancillary data flag is used, with a 10-bit value of 3FCH. There may be multiple ancillary data flags following the TRS-ID, with each flag identifying the beginning of another ancillary packet. Ancillary data may be present within the following word number boundaries (see Figures 6.25 through 6.30). NTSC 795–849 795–815 340–360 795–260 340–715 PAL 972–1035 972–994 404–426 972–302 404–869 horizontal sync period equalizing pulse periods vertical sync periods Video Level peak chroma white peak burst black blank peak burst peak chroma sync (M) NTSC 972 800 352 280 240 128 104 16 (B, D, G, H, I) PAL 1040 (limited to 1023) 844 380 256 256 128 128 4 Table 6.9. 10-Bit Video Levels for Digital Composite Video Signals. Pro-Video Composite Interfaces 131 DIGITAL BLANKING 142 SAMPLES (768–909) DIGITAL ACTIVE LINE 768 SAMPLES (0–767) TOTAL LINE 910 SAMPLES (0–909) Figure 6.22. Digital Composite (M) NTSC Analog and Digital Timing Relationship. END OF ANALOG LINE END OF DIGITAL LINE 768 (60) 784 (41) 785 (17) 50% 787 (4) Figure 6.23. Digital Composite (M) NTSC Sync Timing. The horizontal counts are shown with the corresponding 8-bit sample values in parentheses. 132 Chapter 6: Digital Video Interfaces Sample 768–782 783 784 785 786 787–849 850 851 852 853 854–856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 8-bit Hex Value Fields 1, 3 Fields 2, 4 3C 3C 3A 3A 29 29 11 11 04 04 04 04 06 06 17 17 2F 2F 3C 3C 3C 3C 3C 3C 3D 3B 37 41 36 42 4B 2D 49 2F 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 10-bit Hex Value Fields 1, 3 Fields 2, 4 0F0 0F0 0E9 0E9 0A4 0A4 044 044 011 011 010 010 017 017 05C 05C 0BC 0BC 0EF 0EF 0F0 0F0 0F0 0F0 0F4 0EC 0DC 104 0D6 10A 12C 0B4 123 0BD 096 14A 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 Table 6.10a. Digital Values During the Horizontal Blanking Intervals for Digital Composite (M) NTSC Video Signals. Pro-Video Composite Interfaces 133 Sample 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900–909 8-bit Hex Value Fields 1, 3 Fields 2, 4 4B 2D 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 4B 2D 25 53 2D 4B 53 25 4A 2E 2A 4E 33 45 44 34 3F 39 3B 3D 3C 3C 10-bit Hex Value Fields 1, 3 Fields 2, 4 12D 0B3 092 14E 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 12D 0B3 092 14E 0B3 12D 14E 092 129 0B7 0A6 13A 0CD 113 112 0CE 0FA 0E6 0EC 0F4 0F0 0F0 Table 6.10b. Digital Values During the Horizontal Blanking Intervals for Digital Composite (M) NTSC Video Signals. 134 Chapter 6: Digital Video Interfaces Sample 768–782 783 784 785 786 787–815 816 817 818 819 820–327 328 329 330 331 332–360 361 362 363 364 365–782 Fields 1, 3 8-bit Hex Value 3C 3A 29 11 04 04 06 17 2F 3C 3C 3A 29 11 04 04 06 17 2F 3C 3C 10-bit Hex Value 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 Sample 313–327 328 329 330 331 332–360 361 362 363 364 365–782 783 784 785 786 787–815 816 817 818 819 820–327 Fields 2, 4 8-bit Hex Value 3C 3A 29 11 04 04 06 17 2F 3C 3C 3A 29 11 04 04 06 17 2F 3C 3C 10-bit Hex Value 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 Table 6.11. Equalizing Pulse Values During the Vertical Blanking Intervals for Digital Composite (M) NTSC Video Signals. Pro-Video Composite Interfaces 135 Sample 782 783 784 785 786 787–260 261 262 263 264 265–327 328 329 330 331 332–715 716 717 718 719 720–782 Fields 1, 3 8-bit Hex Value 3C 3A 29 11 04 04 06 17 2F 3C 3C 3A 29 11 04 04 06 17 2F 3C 3C 10-bit Hex Value 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 Sample 327 328 329 330 331 332–715 716 717 718 719 720–782 783 784 785 786 787–260 261 262 263 264 265–327 Fields 2, 4 8-bit Hex Value 3C 3A 29 11 04 04 06 17 2F 3C 3C 3A 29 11 04 04 06 17 2F 3C 3C 10-bit Hex Value 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 0E9 0A4 044 011 010 017 05C 0BC 0EF 0F0 Table 6.12. Serration Pulse Values During the Vertical Blanking Intervals for Digital Composite (M) NTSC Video Signals. 136 Chapter 6: Digital Video Interfaces User data may not use the 10-bit values of 0x000–0x003 and 0x3FC–0x3FF, or the 8-bit values of 0x00 and 0xFF, since they are used for timing information. Parallel Interface The SMPTE 244M 25-pin parallel interface is based on that used for 27 MHz 4:2:2 digital component video (Table 6.5), except for the timing differences. This interface is used to transfer SDTV resolution digital composite data. 8-bit or 10-bit data and a 4× FSC clock are transferred. Signal levels are compatible with ECLcompatible balanced drivers and receivers. The generator must have a balanced output with a maximum source impedance of 110 Ω; the signal must be 0.8–2.0V peak-to-peak measured across a 110-Ω load. At the receiver, the transmission line must be terminated by 110 ±10 Ω. The clock signal is a 4× FSC square wave, with a clock pulse width of 35 ±5 ns for (M) NTSC or 28 ±5 ns for (B, D, G, H, I) PAL. The positive transition of the clock signal occurs midway between data transitions with a tolerance of ±5 ns (as shown in Figure 6.31). To permit reliable operation at interconnect lengths of 50–200 meters, the receiver must use frequency equalization, with typical characteristics shown in Figure 6.3. This example enables operation with a range of cable lengths down to zero. DIGITAL BLANKING 187 SAMPLES (948–1134) DIGITAL ACTIVE LINE 948 SAMPLES (0–947) TOTAL LINE 1135 SAMPLES (0–1134) Figure 6.24. Digital Composite (B, D, G, H, I) PAL Analog and Digital Timing Relationship. Pro-Video Composite Interfaces 137 Serial Interface The parallel format can be converted to a SMPTE 259M serial format (Figure 6.32), allowing data to be transmitted using a 75-Ω coaxial cable (or optical fiber). This interface converts the 14.32 or 17.73 MHz parallel stream into a 143 or 177 Mbps serial stream. The 10× PLL generates the 143 or 177 MHz clock from the 14.32 or 17.73 MHz clock signal. For cable interconnect, the generator has an unbalanced output with a source impedance of 75 Ω; the signal must be 0.8V ±10% peak-topeak measured across a 75-Ω load. The receiver has an input impedance of 75 Ω. The 10 bits of data are serialized (LSB first) and processed using a scrambled and polarity-free NRZI algorithm: G(x) = (x9 + x4 + 1)(x + 1) This algorithm is the same as used for digital component video discussed earlier. In an 8-bit environment, 8-bit data is appended with two least significant “0” bits before serialization. The input signal to the scrambler (Figure 6.20) uses positive logic (the highest voltage represents a logical one; lowest voltage represents a logical zero). The formatted serial data is output at the 40× FSC rate. At the receiver, phase-lock synchronization is done by detecting the TRS-ID sequences. The PLL is continuously adjusted slightly each scan line to ensure that these patterns are detected and to avoid bit slippage. The recovered 10× clock is divided by ten to generate the 4× FSC sample clock. The serial data is lowand high-frequency equalized, inverse scrambling performed (Figure 6.21), and deserialized. END OF ANALOG LINE END OF DIGITAL LINE 768 782 784 785 50% 787 790–794 795–849 TRS–ID ANC DATA (OPTIONAL) Figure 6.25. (M) NTSC TRS-ID and Ancillary Data Locations During Horizontal Sync Intervals. 138 Chapter 6: Digital Video Interfaces 50% 50% 787 790–794 795–260 TRS–ID ANC DATA (OPTIONAL) 340–715 ANC DATA (OPTIONAL) Figure 6.26. (M) NTSC TRS-ID and Ancillary Data Locations During Vertical Sync Intervals. 50% 50% 787 790–794 795–815 TRS–ID ANC DATA (OPTIONAL) 340–360 ANC DATA (OPTIONAL) Figure 6.27. (M) NTSC TRS-ID and Ancillary Data Locations During Equalizing Pulse Intervals. END OF ANALOG LINE END OF DIGITAL LINE Pro-Video Composite Interfaces 139 948 954 957 958 50% 962 967–971 972–1035 TRS–ID ANC DATA (OPTIONAL) Figure 6.28. (B, D, G, H, I) PAL TRS-ID and Ancillary Data Locations During Horizontal Sync Intervals. 50% 50% 962 967–971 972–302 TRS–ID ANC DATA (OPTIONAL) 404–869 ANC DATA (OPTIONAL) Figure 6.29. (B, D, G, H, I) PAL TRS-ID and Ancillary Data Locations During Vertical Sync Intervals. 140 Chapter 6: Digital Video Interfaces 50% 50% 962 967–971 972–994 TRS–ID ANC DATA (OPTIONAL) 404–426 ANC DATA (OPTIONAL) Figure 6.30. (B, D, G, H, I) PAL TRS-ID and Ancillary Data Locations During Equalizing Pulse Intervals. TRS-ID When using the serial interface, a special five-word sequence, known as the TRS-ID, must be inserted into the digital video stream during the horizontal sync time. The TRS-ID is present only following sync leading edges which identify a horizontal transition, and occupies horizontal counts 790–794, inclusive (NTSC) or 967–971, inclusive (PAL). Table 6.13 shows the TRS-ID format; Figures 6.25 through 6.30 show the TRS-ID locations for digital composite (M) NTSC and (B, D, G, H, I) PAL video signals. The line number ID word at horizontal count 794 (NTSC) or 971 (PAL) is defined as shown in Table 6.14. PAL requires the reset of the TRS-ID position relative to horizontal sync once per field on only one of lines 625–4 and 313–317 due to the 25 Hz offset. All lines have 1135 samples except the two lines used for reset, which have 1137 samples. The two additional samples are numbered 1135 and 1136, and occur just prior to the first active picture sample (sample 0). Due to the 25 Hz offset, the samples occur slightly earlier each line. Initial determination of the TRS-ID position should be done on line 1, Field 1, or a nearby line. The TRS-ID location always starts at sample 967, but the distance from the leading edge of sync varies due to the 25 Hz offset. Pro-Video Composite Interfaces 141 CLOCK DATA TW TD TC TW = 35 ± 5 NS (M) NTSC; 28 ± 5 NS (B, D, G, H, I) PAL TC = 69.84 NS (M) NTSC; 56.39 NS (B, D, G, H, I) PAL TD = 35 ± 5 NS (M) NTSC; 28 ± 5 NS (B, D, G, H, I) PAL Figure 6.31. Digital Composite Video Parallel Interface Waveforms. TRS ID INSERTION 10–BIT 10 DIGITAL COMPOSITE VIDEO 4X FSC CLOCK SHIFT REGISTER SCRAMBLER 10X 40X FSC PLL CLOCK 75-OHM COAX DESCRAMBLER 10 SHIFT REGISTER 10–BIT DIGITAL COMPOSITE VIDEO 40X FSC PLL TRS–ID DETECT DIVIDE BY 10 4X FSC CLOCK Figure 6.32. Serial Interface Block Diagram. 142 Chapter 6: Digital Video Interfaces D9 (MSB) D8 D7 D6 D5 D4 D3 D2 D1 D0 TRS word 0 1 1 1 1 1 1 1 1 1 1 TRS word 1 0 0 0 0 0 0 0 0 0 0 TRS word 2 0 0 0 0 0 0 0 0 0 0 TRS word 3 0 0 0 0 0 0 0 0 0 0 line number ID D8 EP line number ID Note: EP = even parity for D0–D7. Table 6.13. TRS-ID Format. D2 D1 D0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 (M) NTSC line 1–263 field 1 line 264–525 field 2 line 1–263 field 3 line 264–525 field 4 not used not used not used not used (B, D, G, H, I) PAL line 1–313 field 1 line 314–625 field 2 line 1–313 field 3 line 314–625 field 4 line 1–313 field 5 line 314–625 field 6 line 1–313 field 7 line 314–625 field 8 D7–D3 1 ≤ x ≤ 30 x = 31 x=0 (M) NTSC line number 1–30 [264–293] line number ≥ 31 [294] not used (B, D, G, H, I) PAL line number 1–30 [314–343] line number ≥ 31 [344] not used Table 6.14. Line Number ID Word at Horizontal Count 794 (NTSC) or 971 (PAL). Pro-Video Transport Interfaces 143 Pro-Video Transport Interfaces Serial Data Transport Interface (SDTI) SMPTE 305M and ITU-R BT.1381 define a Serial Data Transport Interface (SDTI) that enables transferring data between equipment. The physical layer uses the 270 or 360 Mbps BT.656, BT.1302, and SMPTE 259M digital component video serial interface. Figure 6.33 illustrates the signal format. A 53-word header is inserted immediately after the EAV sequence, specifying the source, destination, and data format. Table 6.15 illustrates the header contents. The payload data is defined within BT.1381 and by other application-specific standards such as SMPTE 326M. It may consist of MPEG-2 program or transport streams, DV streams, etc., and uses either 8-bit words plus even parity and D8, or 9-bit words plus D8. Line Number The line number specifies a value of 1–525 (480i systems) or 1–625 (576i systems). L0 is the least significant bit. Line Number CRC The line number CRC applies to the data ID through the line number, for the entire 10 bits. C0 is the least significant bit. It is an 18-bit value, with an initial value set to all ones: CRC = x18 + x5 + x4 + x1 Code and AAI The 4-bit code value (CD3–CD0) specifies the length of the payload (the user data contained between the SAV and EAV sequences): 0000 0001 0010 1000 4:2:2 YCbCr video data 1440 word payload (uses 270 Mbps interface) 1920 word payload (uses 360 Mbps interface) 143 Mbps digital composite video The 4-bit authorized address identifier (AAI) value, AAI3–AAI0, specifies the format of the destination and source addresses: 0000 unspecified format 0001 IPv6 address Destination and Source Addresses These specify the address of the source and destination devices. A universal address is indicated when all address bits are zero and AAI3–AAI0 = 0000. Block Type The block type value specifies the segmen- tation of the payload. BL7–BL6 indicate the payload block structure: 00 fixed block size without ECC 01 fixed block size with ECC 10 unassigned 11 variable block size BL5–BL0 indicate the segmentation for fixed block sizes. Variable block sizes are indicated by BL7–BL0 having a value of 11000001. The ECC format is application-dependent. 144 Chapter 6: Digital Video Interfaces E S A HEADER A V V USER DATA (PAYLOAD) Figure 6.33. SDTI Signal Format. Payload CRC Flag The CRCF bit indicates whether or not the payload CRC is present at the end of the payload: 0 no CRC 1 CRC present Header CRC The header CRC applies to the code and AAI word through the last reserved data word, for the entire 10 bits. C0 is the least significant bit. It is an 18-bit value, with an initial value set to all ones: CRC = x18 + x5 + x4 + x1 74.25/1.001) MHz data stream occupies the Y data space and the other 74.25 (or 74.25/1.001) MHz data stream occupies the CbCr data space. A 49-word header is inserted immediately after the line number CRC data, specifying the source, destination, and data format. Table 6.16 illustrates the header contents. The payload data is defined by other application-specific standards. It may consist of MPEG-2 program or transport streams, DV streams, etc., and uses either 8-bit words plus even parity and D8, or 9-bit words plus D8. Code and AAI The 4-bit code value (CD3–CD0) specifies the length of the payload (the user data contained between the SAV and EAV sequences): High Data-Rate Serial Data Transport Interface (HD-SDTI) SMPTE 348M and ITU-R BT.1577 define a High Data-Rate Serial Data Transport Interface (HD-SDTI) that enables transferring data between equipment. The physical layer uses the 1.485 (or 1.485/1.001) Gbps SMPTE 292M digital component video serial interface. Figure 6.34 illustrates the signal format. Two data channels are multiplexed onto the single HD-SDTI stream such that one 74.25 (or 0000 0001 0010 0011 1000 1001 1010 1011 1100 1101 1110 1111 4:2:2 YCbCr video data 1440 word payload 1920 word payload 1280 word payload 143 Mbps digital composite video 2304 word payload (extended mode) 2400 word payload (extended mode) 1440 word payload (extended mode) 1728 word payload (extended mode) 2880 word payload (extended mode) 3456 word payload (extended mode) 3600 word payload (extended mode) Pro-Video Transport Interfaces 145 ancillar y data flag (ADF) D9 (MSB) D8 0 0 1 1 1 1 data ID (DID) D8 EP SDID D8 EP data count (DC) D8 EP line number D8 EP D8 EP line number CRC D8 C8 D8 C17 code and AAI D8 EP D8 EP destination address D8 EP source address D8 EP D8 EP D8 EP D8 EP Note: EP = even parity for D0–D7. 10-bit Data D7 D6 D5 D4 D3 D2 D1 D0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 L7 L6 L5 L4 L3 L2 L1 L0 0 0 0 0 0 0 L9 L8 C7 C6 C5 C4 C3 C2 C1 C0 C16 C15 C14 C13 C12 C11 C10 C9 AAI3 AAI2 AAI1 AAI0 CD3 CD2 CD1 CD0 DA7 DA6 DA5 DA4 DA3 DA2 DA1 DA0 DA15 DA14 DA13 DA12 DA11 DA10 DA9 DA8 : DA127 DA126 DA125 DA124 DA123 DA122 DA121 DA120 SA7 SA6 SA5 SA4 SA3 SA2 SA1 SA0 SA15 SA14 SA13 SA12 SA11 SA10 SA9 SA8 : SA127 SA126 SA125 SA124 SA123 SA122 SA121 SA120 Table 6.15a. SDTI Header Structure. 146 Chapter 6: Digital Video Interfaces 10-bit Data block type D9 (MSB) D8 D8 EP D7 D6 D5 D4 D3 D2 D1 BL7 BL6 BL5 BL4 BL3 BL2 BL1 payload CRC flag D8 EP 0 0 0 0 0 0 0 reser ved D8 EP 0 0 0 0 0 0 0 reser ved D8 EP 0 0 0 0 0 0 0 reser ved D8 EP 0 0 0 0 0 0 0 reser ved D8 EP 0 0 0 0 0 0 0 reser ved D8 EP 0 0 0 0 0 0 0 header CRC D8 C8 C7 C6 C5 C4 C3 C2 C1 D8 C17 C16 C15 C14 C13 C12 C11 C10 checksum D8 Sum of D0–D8 of data ID through last header CRC word. Preset to all zeros; carry is ignored. Note: EP = even parity for D0–D7. Table 6.15b. SDTI Header Structure (continued). D0 BL0 CRCF 0 0 0 0 0 C0 C9 E A V L N C R C HEADER S A V USER DATA (PAYLOAD) C CHANNEL E A V L N C R C HEADER S A V USER DATA (PAYLOAD) Y CHANNEL Figure 6.34. HD-SDTI Signal Format. LN = line number (two 10-bit words), CRC = line number CRC (two 10-bit words). Pro-Video Transport Interfaces 147 ancillar y data flag (ADF) D9 (MSB) D8 0 0 1 1 1 1 data ID (DID) D8 EP SDID D8 EP data count (DC) D8 EP code and AAI D8 EP D8 EP destination address D8 EP source address D8 EP D8 EP D8 EP D8 EP block type D8 EP payload CRC flag D8 EP reser ved D8 EP Note: EP = even parity for D0–D7. 10-bit Data D7 D6 D5 D4 D3 D2 D1 D0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 AAI3 AAI2 AAI1 AAI0 CD3 CD2 CD1 CD0 DA7 DA6 DA5 DA4 DA3 DA2 DA1 DA0 DA15 DA14 DA13 DA12 DA11 DA10 DA9 DA8 : DA127 DA126 DA125 DA124 DA123 DA122 DA121 DA120 SA7 SA6 SA5 SA4 SA3 SA2 SA1 SA0 SA15 SA14 SA13 SA12 SA11 SA10 SA9 SA8 : SA127 SA126 SA125 SA124 SA123 SA122 SA121 SA120 BL7 BL6 BL5 BL4 BL3 BL2 BL1 BL0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Table 6.16a. HD-SDTI Header Structure. 148 Chapter 6: Digital Video Interfaces reser ved reser ved reser ved reser ved header CRC checksum 10-bit Data D9 (MSB) D8 D7 D6 D5 D4 D3 D2 D1 D0 D8 EP 0 0 0 0 0 0 0 0 D8 EP 0 0 0 0 0 0 0 0 D8 EP 0 0 0 0 0 0 0 0 D8 EP 0 0 0 0 0 0 0 0 D8 C8 C7 C6 C5 C4 C3 C2 C1 C0 D8 C17 C16 C15 C14 C13 C12 C11 C10 C9 D8 Sum of D0–D8 of data ID through last header CRC word. Preset to all zeros; carry is ignored. Note: EP = even parity for D0–D7. Table 6.16b. HD-SDTI Header Structure (continued). The extended mode advances the timing of the SAV sequence, shortening the blanking interval, so that the payload data rate remains a constant 129.6 (or 129.6/1.001) MBps. The 4-bit authorized address identifier (AAI) format is the same as for SDTI. Destination and Source Addresses The source and destination address for- mats are the same as for SDTI. Block Type The block type format is the same as for SDTI. Header CRC The header CRC applies to the DID through the last reserved data word, for the entire 10 bits. C0 is the least significant bit. It is an 18-bit value, with an initial value set to all ones: CRC = x18 + x5 + x4 + x1 IC Component Interfaces 149 IC Component Interfaces Many solutions for transferring digital video between chips are derived from the provideo interconnect standards. Chips for the pro-video market typically support 10 or 12 bits of data per video component, while chips for the consumer market typically use 8 bits of data per video component. BT.601 and BT.656 are the most popular interfaces. YCbCr Values: 8-bit Data Y has a nominal range of 0x10–0xEB. Values less than 10H or greater than 0xEBH may be present due to processing. Cb and Cr have a nominal range of 0x10–0xF0. Values less than 0x10 or greater than 0xF0 may be present due to processing. YCbCr data may not use the values of 00H and FFH since those values may be used for timing information. During blanking, Y data should have a value of 0x10 and CbCr data should have a value of 0x80, unless other information is present. YCbCr Values: 10-bit Data For higher accuracy, pro-video solutions typically use 10-bit YCbCr data. Y has a nominal range of 0x040–0x3AC. Values less than 0x040 or greater than 0x3AC may be present due to processing. Cb and Cr have a nominal range of 040H–3C0H. Values less than 0x040 or greater than 0x3C0 may be present due to processing. The values 0x000–0x003 and 0x3FC– 0x3FF may not be used to avoid timing contention with 8-bit systems. During blanking, Y data should have a value of 0x040 and CbCr data should have a value of 0x200, unless other information is present. RGB Values: 8-bit Data Consumer solutions typically use 8-bit R´G´B´ data, with a range of 0x10–0xEB (note that PCs typically use a range of 0x00–0xFF). Values less than 0x10 or greater than 0xEB may be present due to processing. During blanking, R´G´B´ data should have a value of 0x10, unless other information is present. RGB Values: 10-bit Data For higher accuracy, pro-video solutions typically use 10-bit R´G´B´ data, with a nominal range of 0x040–0x3AC. Values less than 0x040 or greater than 0x3ACH may be present due to processing. The values 0x000–0x003 and 0x3FC–0x3FF may not be used to avoid timing contention with 8-bit systems. During blanking, R´G´B´ data should have a value of 0x040, unless other data is present. BT.601 Video Interface The BT.601 video interface has been used for years, with the control signal names and timing reflecting the video standard. Supported active resolutions and sample clock rates are dependent on the video standard and aspect ratio. Devices usually support multiple data formats to simplify using them in a wide variety of applications. Video Data Formats The 24-bit 4:4:4 YCbCr data format is shown in Figure 6.35. Y, Cb, and Cr are each 8 bits, and all are sampled at the same rate, resulting in 24 bits of data per sample clock. Pro-video solutions typically use a 30-bit interface, with the Y, Cb, and Cr streams each being 150 Chapter 6: Digital Video Interfaces 11111111 00000000 1 1 1 1 1 1YYYYYYYY 00000001234567 BLANKING ONE SCAN LINE ACTIVE VIDEO 88888888 00000000 8 8 8 8 8 8CCCCCCCC 0 0 0 0 0 0BBBBBBBB 01234567 88888888 00000000 8 8 8 8 8 8CCCCCCCC 0 0 0 0 0 0RRRRRRRR 01234567 Y Y1 [N - 1] [N] 0 C C8 B B0 [N - 1] [N] C C8 R R0 [N - 1] [N] 24-BIT 4:4:4 VIDEO Figure 6.35. 24-Bit 4:4:4 YCbCr Data Format. 11111111 00000000 1 1 1 1 1 1YYYYYYYY 00000001234567 BLANKING ONE SCAN LINE ACTIVE VIDEO 88888888 00000000 8 8 8 8 8 8CCCCCCCC 0 0 0 0 0 0BRBRBRBR 00224466 Y Y1 [N - 1] [N] 0 C C8 B R0 [N - 1] [N - 1] 16-BIT 4:2:2 VIDEO Figure 6.36. 16-Bit 4:2:2 YCbCr Data Format. 81818181 00000000 BLANKING 8 1 8 1 8 1CYCYCYCY 0 0 0 0 0 0B0R1B2R3 0 0 2 2 ACTIVE VIDEO ONE SCAN LINE C Y8 R [N] 0 [N - 1] 8-BIT 4:2:2 VIDEO Figure 6.37. 8-Bit 4:2:2 YCbCr Data Format. IC Component Interfaces 151 10 bits. Y0, Cb0, and Cr0 are the least significant bits. The 16-bit 4:2:2 YCbCr data format is shown in Figure 6.36. Cb and Cr are sampled at one-half the Y sample rate, then multiplexed together. The CbCr stream of active data words always begins with a Cb sample. Provideo solutions typically use a 20-bit interface, with the Y and CbCr streams each being 10 bits. The 8-bit 4:2:2 YCbCr data format is shown in Figure 6.37. The Y and CbCr streams from the 16-bit 4:2:2 YCbCr format are simply multiplexed at 2× the sample clock rate. The YCbCr stream of active data words always begins with a Cb sample. Pro-video solutions typically use a 10-bit interface. Tables 6.17 and 6.18 illustrate the 15-bit RGB, 16-bit RGB, and 24-bit RGB formats. For the 15-bit RGB format, the unused bit is sometimes used for keying (alpha) information. R0, G0, and B0 are the least significant bits. Control Signals In addition to the video data, there are four control signals: HSYNC# (or HREF) VSYNC# (or VREF) BLANK# (or ACTIVE) CLK horizontal sync vertical sync blanking 1× or 2× sample clock For the 8-bit and 10-bit 4:2:2 YCbCr data formats, CLK is a 2× sample clock. For the other data formats, CLK is a 1× sample clock. For sources, the control signals and video data are output following the rising edge of CLK. For receivers, the control signals and video data are sampled on the rising edge of CLK. While BLANK# is negated, active R´G´B´ or YCbCr video data is present. HSYNC# is asserted during the horizontal sync time each scan line, with the leading edge indicating the start of a new line. The amount of time that HSYNC# is asserted is usually the same as that specified by the video standard. VSYNC# is asserted during the vertical sync time each field or frame, with the leading edge indicating the start of a new field or frame. The number of scan lines that VSYNC# is asserted is usually same as that specified by the video standard. For interlaced video, if the leading edges of VSYNC# and HSYNC# are coincident, the field is Field 1. If the leading edge of VSYNC# occurs mid-line, the field is Field 2. For noninterlaced video, the leading edge of VSYNC# indicates the start of a new frame. Figure 6.38 illustrates the typical HSYNC# and VSYNC# relationships. Some products use different signal names (such as HREF, VREF, and ACTIVE), different polarity, and slightly different signal timing. Some products can also transfer data and control information using both edges of the clock to reduce pin count or to be able to handle HDTV data rates without increasing pin count. Receiver Considerations Assumptions should not be made about the number of samples per line or horizontal blanking interval. Otherwise, the implementation may not work with all sources. To ensure compatibility between various sources, horizontal counters should be reset by the leading edge of HSYNC#, not by the trailing edge of BLANK#. To handle real-world sources, a receiver should use a window for detecting whether Field 1 or Field 2 is present. For example, if the leading edge of VSYNC# occurs within ±64 1× clock cycles of the leading edge of HSYNC#, the field is Field 1. Otherwise, the field is Field 2. Some video sources indicate sync timing by having Y data be an 8-bit value less than 0x10. However, most video ICs do not do this. 152 Chapter 6: Digital Video Interfaces Single Clock Edge R7 R6 R5 R4 R3 R2 R1 R0 G7 G6 G5 G4 G3 G2 G1 G0 B7 B6 B5 B4 B3 B2 B1 B0 24-bit RGB Double Clock Edge G3 R7 G2 R6 G1 R5 G0 R4 B7 R3 B6 R2 B5 R1 B4 R0 B3 G7 B2 G6 B1 G5 B0 G4 16-bit RGB (5,6,5) R4 R3 R2 R1 R0 G5 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 15-bit RGB (5,5,5) – R4 R3 R2 R1 R0 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 24-bit 4:4:4 YCbCr Single Clock Edge Cr7 Cr6 Cr5 Cr4 Cr3 Cr2 Cr1 Cr0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Cb7 Cb6 Cb5 Cb4 Cb3 Cb2 Cb1 Cb0 Double Clock Edge Y3 Cr7 Y2 Cr6 Y1 Cr5 Y0 Cr4 Cb7 Cr3 Cb6 Cr2 Cb5 Cr1 Cb4 Cr0 Cb3 Y7 Cb2 Y6 Cb1 Y5 Cb0 Y4 16-bit 4:2:2 YCbCr* 8-bit 4:2:2 YCbCr Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Cb7, Cr7 Cb6, Cr6 Cb5, Cr5 Cb4, Cr4 Cb3, Cr3 Cb2, Cr2 Cb1, Cr1 Cb0, Cr0 Cb7, Y7, Cr7 Cb6, Y6, Cr6 Cb5, Y5, Cr5 Cb4, Y4, Cr4 Cb3, Y3, Cr3 Cb2, Y2, Cr2 Cb1, Y1, Cr1 Cb0, Y0, Cr0 Table 6.17. Transferring YCbCr and RGB Data over a 12-bit, 16-bit, or 24-bit Interface. *Many designs alternately use the red channel to transfer the multiplexed CbCr data. IC Component Interfaces 153 24-bit RGB R7 R6 R5 R4 R3 R2 R1 R0 G7 G6 G5 G4 G3 G2 G1 G0 B7 B6 B5 B4 B3 B2 B1 B0 16-bit RGB (5,6,5) R4 R3 R2 R1 R0 G5 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 R4 R3 R2 R1 R0 G5 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 15-bit RGB (5,5,5) – R4 R3 R2 R1 R0 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 – R4 R3 R2 R1 R0 G4 G3 G2 G1 G0 B4 B3 B2 B1 B0 24-bit 4:4:4 YCbCr Cr7 Cr6 Cr5 Cr4 Cr3 Cr2 Cr1 Cr0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Cb7 Cb6 Cb5 Cb4 Cb3 Cb2 Cb1 Cb0 16-bit 4:2:2 YCbCr Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Cb7, Cr7 Cb6, Cr6 Cb5, Cr5 Cb4, Cr4 Cb3, Cr3 Cb2, Cr2 Cb1, Cr1 Cb0, Cr0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Cb7, Cr7 Cb6, Cr6 Cb5, Cr5 Cb4, Cr4 Cb3, Cr3 Cb2, Cr2 Cb1, Cr1 Cb0, Cr0 8-bit 4:2:2 YCbCr Cb7, Y7, Cr7 Cb6, Y6, Cr6 Cb5, Y5, Cr5 Cb4, Y4, Cr4 Cb3, Y3, Cr3 Cb2, Y2, Cr2 Cb1, Y1, Cr1 Cb0, Y0, Cr0 Table 6.18. Transferring YCbCr and RGB Data over a 32-bit Interface. 154 Chapter 6: Digital Video Interfaces HSYNC# START OF FIELD 1 OR FRAME VSYNC# HSYNC# START OF FIELD 2 VSYNC# Figure 6.38. Typical HSYNC# and VSYNC# Relationships (Not to Scale). Some products use different signal names (such as HREF, VREF, and ACTIVE), different polarity and slightly different signal timing. In addition, to allow real-world video and test signals to be passed through with minimum disruption, many ICs now allow the Y data to have a value less than 0x10 during active video. Thus, receiver designs assuming sync timing is present on the Y channel may no longer work. Video Module Interface (VMI) VMI (Video Module Interface) was developed in cooperation with several multimedia IC manufacturers. The goal was to standardize the video interfaces between devices such as MPEG decoders, NTSC/PAL decoders, and graphics chips. Video Data Formats The VMI specification specifies an 8-bit 4:2:2 YCbCr data format as shown in Figure 6.39. Many devices also support the other YCbCr and R´G´B´ formats discussed in the “BT.601 Video Interface” section. Control Signals In addition to the video data, there are four control signals: HREF VREF VACTIVE PIXCLK horizontal blanking vertical sync active video 2× sample clock IC Component Interfaces 155 For the 8-bit and 10-bit 4:2:2 YCbCr data formats, PIXCLK is a 2× sample clock. For the other data formats, PIXCLK is a 1× sample clock. For sources, the control signals and video data are output following the rising edge of PIXCLK. For receivers, the control signals and video data are sampled on the rising edge of PIXCLK. While VACTIVE is asserted, active R´G´B´ or YCbCr video data is present. Although transitions in VACTIVE are allowed, it is intended to allow a hardware mechanism for cropping video data. For systems that do not support a VACTIVE signal, HREF can generally be connected to VACTIVE with minimal loss of function. To support video sources that do not generate a line-locked clock, a DVALID# (data valid) signal may also be used. While DVALID# is asserted, valid data is present. HREF is asserted during the active video time each scan line, including during the vertical blanking interval. VREF is asserted for 6 scan line times, starting one-half scan line after the start of vertical sync. For interlaced video, the trailing edge of VREF is used to sample HREF. If HREF is asserted, the field is Field 1. If HREF is negated, the field is Field 2. For noninterlaced video, the leading edge of VREF indicates the start of a new frame. Figure 6.40 illustrates the typical HREF and VREF relationships. Receiver Considerations Assumptions should not be made about the number of samples per line or horizontal blanking interval. Otherwise, the implementation may not work with all sources. Video data has input setup and hold times, relative to the rising edge of PIXCLK, of 5 and 0 ns, respectively. VACTIVE has input setup and hold times, relative to the rising edge of PIXCLK, of 5 and 0 ns, respectively. HREF and VREF both have input setup and hold times, relative to the rising edge of PIXCLK, of 5 and 5 ns, respectively. HREF 81818181 00000000 BLANKING 8 1 8 1 8 1CYCYCYCY 0 0 0 0 0 0B0R1B2R3 0 0 2 2 ACTIVE VIDEO ONE SCAN LINE C Y8 R [N] 0 [N - 1] 8-BIT 4:2:2 VIDEO Figure 6.39. VMI 8-bit 4:2:2 YCbCr Data for One Scan Line. 156 Chapter 6: Digital Video Interfaces HREF VREF START OF FIELD 1 OR FRAME HREF VREF START OF FIELD 2 Figure 6.40. VMI Typical HREF and VREF Relationships (Not to Scale). BT.656 Interface The BT.656 interface for ICs is based on the pro-video BT.656-type parallel interfaces, discussed earlier in this chapter (Figures 6.1 and 6.9xxx). Using EAV and SAV sequences to indicate video timing reduces the number of pins required. The timing of the H, V, and F signals for common video formats is illustrated in Chapter 4. Standard IC signal levels and timing are used, and any resolution can be supported. Video Data Formats 8-bit or 10-bit 4:2:2 YCbCr data is used, as shown in Figures 6.1 and 6.6. Although sources should generate the four protection bits in the EAV and SAV sequences, receivers may choose to ignore them due to the reliability of point-to-point transfers between chips. Control Signals CLK is a 2× sample clock. For sources, the video data is output following the rising edge of CLK. For receivers, the video data is sampled on the rising edge of CLK. To be able to handle HDTV data rates, some designs use a 16-bit or 20-bit YCbCr interface (essentially two BT.656 streams, one for Y data and one for CbCr data) or transfer data using both edges of the clock. IC Component Interfaces 157 Zoomed Video Port (ZV Port) An early standard for notebook PCs, the ZV Port was a point-to-point uni-directional bus between the PC Card host adaptor and the graphics controller. It enabled video data to be transferred real time directly from the PC Card into the graphics frame buffer. The PC Card host adaptor had a special multimedia mode configuration. If a non-ZV PC Card was plugged into the slot, the host adaptor was not switched into the multimedia mode, and the PC Card behaved as expected. Once a ZV card was been plugged in and the host adaptor had been switched to the multimedia mode, the pin assignments changed. As shown in Table 6.19, the PC Card signals A6– A25, SPKR#, INPACK#, and IOIS16# are replaced by ZV Port video signals (Y0–Y7, CbCr0–CbCr7, HREF, VREF, and PCLK) and 4- channel audio signals (MCLK, SCLK, LRCK, and SDATA). Video Data Formats 16-bit 4:2:2 YCbCr data was used, as shown in Figure 6.36. Control Signals In addition to the video data, there were four control signals: HREF VREF PCLK horizontal reference vertical sync 1× sample clock HREF, VREF, and PCLK had the same timing as the VMI interface discussed earlier in this chapter. PC Card Signal A25 A24 A23 A22 A21 A20 A19 A18 ZV Port Signal CbCr7 CbCr5 CbCr3 CbCr1 CbCr0 Y7 Y5 Y3 PC Card Signal A17 A16 A15 A14 A13 A12 A11 A10 ZV Port Signal Y1 CbCr2 CbCr4 Y6 Y4 CbCr6 VREF HREF PC Card Signal A9 A8 A7 A6 SPKR# IOIS16# INPACK# ZV Port Signal Y0 Y2 SCLK MCLK SDATA PCLK LRCK Table 6.19. PC Card vs. ZV Port Signal Assignments. 158 Chapter 6: Digital Video Interfaces Video Interface Port (VIP) The VESA VIP specification is an enhancement to the BT.656 interface for ICs, previously discussed. The primary application is to interface up to four devices to a graphics controller chip, although the concept can easily be applied to other applications. There are three sections to the interface: Host Interface: VIPCLK HAD0–HAD7 HCTL Video Interface: host clock host address/data bus host control PIXCLK VID0–VID7 VIDA, VIDB video sample clock lower video data bus 10-bit data extension XPIXCLK XVID0–XVID7 XVIDA, XVIDB video sample clock upper video data bus 10-bit data extension System Interface: VRST# VIRQ# reset interrupt request The host interface signals are provided by the graphics controller. Essentially, a 2-, 4-, or 8-bit version of the PCI interface is used. VIPCLK has a frequency range of 25–33 MHz. PIXCLK and XPIXCLK have a maximum frequency of 75 and 80 MHz, respectively. Video Interface As with the BT.656 interface, special four- word sequences are inserted into the 8-bit or 10-bit 4:2:2 YCbCr video stream to indicate the start of active video (SAV) and end of active video (EAV). These sequences also indicate when horizontal and vertical blanking are present and which field is being transmitted. VIP modifies the BT.656 EAV and SAV sequences as shown in Table 6.20. BT.656 uses four protection bits (P0–P3) in the status word since it was designed for long cable connections between equipment. With chip-to-chip interconnect, this protection isn’t required, so the bits are used for other purposes. The timing of the H, V, and F signals for common video formats are illustrated in Chapter 4. The status word for VIP is defined as: T = “0” for task B T = “1” for task A F = “0” for Field 1 F = “1” for Field 2 V = “1” during vertical blanking H = “0” at SAV H = “1” at EAV The task bit, T, is programmable. If BT.656 compatibility is required, it should always be a “1.” Otherwise, it may be used to indicate which one of two data streams are present: stream A = “1” and stream B = “0.” Alternately, T may be a “0” when raw 2× oversampled VBI data is present, and a “1” otherwise. The noninterlaced bit, N, indicates whether the source is progressive (“1”) or interlaced (“0”). The repeat bit, R, is a “1” if the current field is a repeat field. This occurs only during 3:2 pull-down. The repeat bit (R), in conjunction with the noninterlaced bit (N), enables the graphics controller to handle Bob and Weave, as well as 3:2 pull-down (further discussed in Chapter 7), in hardware. The extra flag bit, E, is a “1” if another byte follows the EAV. Table 6.21 illustrates the extra flag byte. This bit is valid only during EAV sequences. If the E bit in the extra byte is “1,” another extra byte immediately follows. This allows chaining any number of extra bytes together as needed. IC Component Interfaces 159 8-bit Data D7 (MSB) D6 D5 D4 D3 D2 D1 D0 1 1 1 1 1 1 1 1 preamble 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 status word T F V H N R 0 E Table 6.20. VIP EAV and SAV Sequence. extra byte 8-bit Data D7 (MSB) D6 D5 D4 D3 D2 D1 D0 D0 user defined E Table 6.21. VIP EAV Extra Byte. Unlike pro-video interfaces, code 0x00 may be used during active video data to indicate an invalid video sample. This is used to accommodate scaled video and square pixel timing. Video Data Formats In the 8-bit mode (Figure 6.41), the video interface is similar to BT.656, except for the differences mentioned. XVID0–XVID7 are not used. In the 16-bit mode (Figure 6.42), SAV sequences, EAV sequences, Y video data, ancillary packet headers, and even-numbered ancillary data values are transferred across the lower 8 bits (VID0–VID7). CbCr video data and odd-numbered ancillary data values are transferred across the upper 8 bits (XVID0–XVID7). Note that “skip data” (value 0x00) during active video must also appear in 16-bit format to preserve the 16-bit data alignment. 10-bit video data is supported by the VIDA, VIDB, XVIDA, and XVIDB signals. VIDA and XVIDA are the least significant bits. Ancillary Data Ancillary data packets are used to transmit information (such as digital audio, closed captioning, and teletext data) during the blanking intervals, as shown in Table 6.22. Unlike provideo interfaces, the 0x00 and 0xFF values may be used by the ancillary data. Note that the ancillary data formats were defined prior to many of the pro-video ancillary data formats, and therefore may not match. 160 Chapter 6: Digital Video Interfaces START OF DIGITAL LINE H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING F0 0 X 8 1 8 1 F0 0 Y 0 0 0 0 Z 4 268 SAV CODE CO–SITED CO–SITED 8 1 F0 0XCYCYCYCY 0 0 F 0 0Y B0R1B2R3 Z0 0 2 2 4 1716 1440 NEXT LINE C YF R 719 F 718 VIP 4:2:2 VIDEO Figure 6.41. VIP 8-Bit Interface Data for One Scan Line. 480i; 720 active samples per line; 27 MHz clock. START OF DIGITAL LINE H SIGNAL START OF DIGITAL ACTIVE LINE EAV CODE BLANKING F0 0X1 1 1 1 F0 0Y 0 0 0 0 Z 4 272 SAV CODE 11 F0 0XYYYYYYYY 00F00Y 01234567 Z 4 2200 1920 BLANKING 8888 8888 0000 0000 8 8 8 8 8 8 CCCCCCCC 0 0 0 0 00 BRBRBRBR 00224466 NEXT LINE Y YF 1918 1919 F Y CHANNEL C CF B RF 1918 1918 CBCR CHANNEL Figure 6.42. VIP 16-Bit Interface Data for One Scan Line. 1080i; 1920 active samples per line; 74.176 or 74.25 MHz clock. IC Component Interfaces 161 ancillar y data flag (ADF) D7 (MSB) D6 0 0 1 1 1 1 data ID (DID) D6 EP SDID D6 EP data count (DC) D6 EP internal data ID 0 internal data ID 1 data word 0 D7 D6 : data word N D7 D6 checksum D6 EP optional fill data D6 EP 8-bit Data D5 D4 D3 D2 D1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 DID2 DID1 user defined value DC5 DC4 DC3 DC2 DC1 user defined value user defined value D5 D4 D3 D2 D1 : D5 D4 D3 D2 D1 CS5 CS4 CS3 CS2 CS1 0 0 0 0 0 D0 0 1 1 DID0 DC0 D0 D0 CS0 0 Note: EP = even parity for D0–D5. Table 6.22. VIP Ancillary Data Packet General Format. DID2 of the DID field indicates whether Field 1 or Field 2 ancillary data is present: 0 Field 1 1 Field 2 DID1–DID0 of the DID field indicate the type of ancillary data present: 00 start of field 01 sliced VBI data, lines 1–23 10 end of field VBI data, line 23 11 sliced VBI data, line 24 to end of field The data count value (DC) specifies the number of D-words (4-byte blocks) of ancillary data present. Thus, the number of data words in the ancillary packet after the DID must be a multiple of four. 1–3 optional fill bytes may be added after the checksum data to meet this requirement. When DID1–DID0 are “00” or “10,” no ancillary data or checksum is present. The data count (DC) value is “00000,” and is the last field present in the packet. 162 Chapter 6: Digital Video Interfaces Consumer Component Interfaces Many solutions for transferring digital video between equipment have been developed over the years. HDMI, originally derived from DVI, is the most popular digital video interfaces for consumer equipment. Digital Visual Interface (DVI) In 1998, the Digital Display Working Group (DDWG) was formed to address the need for a standardized digital video interface between a PC and VGA monitor, as illustrated in Figure 6.43. The DVI 1.0 specification was released in April 1999. Designed to transfer uncompressed realtime digital video, DVI supports PC graphics resolutions beyond 1600 × 1200 and HDTV resolutions, including 720p, 1080i, and 1080p. In 2003, the consumer electronics industry started adding DVI outputs to DVD players and cable/satellite set-top boxes. DVI inputs also started appearing on digital televisions and LCD/plasma monitors. Technology DVI is based on the Digital Flat Panel (DFP) Interface, enhancing it by supporting more formats and timings. It also includes support for the High-bandwidth Digital Content Protection (HDCP) specification to deter unauthorized copying of content. DVI also supports VESA’s Extended Display Identification Data (EDID) standard, Display Data Channel (DDC) standard (used to read the EDID), and Monitor Timing Specification (DMT). DDC and EDID enable automatic display detection and configuration. Extended Display Identification Data (EDID) was created to enable plug and play capabilities of displays. Data is stored in the display, describing the supported video formats. This information is supplied to the source device, over DVI, at the request of the source device. The source device then chooses its output format, taking into account the format of the original video stream and the formats supported by the display. The source device is responsible for the format conversions necessary to supply video in an understandable form to the display. R G PC B H V MONITOR WITHOUT DVI PC DVI MONITOR WITH DVI Figure 6.43. Using DVI to Connect a VGA Monitor to a PC. Consumer Component Interfaces 163 In addition, the CEA-861 standard specifies mandatory and optionally supported resolutions and timings, and how to include data such as aspect ratio and format information. TMDS Links DVI uses transition-minimized differential signaling (TMDS). Eight bits of video data are converted to a 10-bit transition-minimized, DCbalanced value, which is then serialized. The receiver deserializes the data, and converts it back to 8 bits. Thus, to transfer digital R´G´B´ data requires three TMDS signals that comprise one TMDS link. “TFT data mapping” is supported as the minimum requirement: 1 pixel per clock, 8 bits per channel, MSB justified. Either one or two TMDS links may be used, as shown in Figures 6.44 and 6.45, depending on the formats and timing required. A system supporting two TMDS links must be able to switch dynamically between formats requiring a single link and formats requiring a dual link. A single DVI connector can handle two TMDS links. A single TMDS link supports resolutions and timings using a video sample rate of 25– 165 MHz. Resolutions and timings using a video sample rate of 165–330 MHz are implemented using two TMDS links, with each TMDS link operating at one-half the frequency. Thus, the two TMDS links share the same clock and the bandwidth is shared evenly between the two links. Video Data Formats Typically, 24-bit R´G´B´ data is transferred over a link. For applications requiring more than 8 bits per color component, the second TMDS link may be used for the additional least significant bits. For PC applications, R´G´B´ data typically has a range of 0x00–0xFF. For consumer applications, R´G´B´ data typically has a range of 0x10–0xEB (values less than 0x10 or greater than 0xEB may be occasionally present due to processing). B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK TMDS TRANSMITTER TMDS LINK ENCODER AND SERIALIZER CHANNEL 0 ENCODER AND SERIALIZER CHANNEL 1 ENCODER AND SERIALIZER CHANNEL 2 CHANNEL C TMDS RECEIVER RECEIVER AND DECODER B0–B7 VSYNC HSYNC DE0 RECEIVER AND DECODER G0–G7 CTL0 CTL1 DE1 RECEIVER AND DECODER R0–R7 CTL2 CTL3 DE2 INTER CHANNEL ALIGNMENT B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK Figure 6.44. DVI Single TMDS Link. 164 Chapter 6: Digital Video Interfaces B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK B0–B7 CTL4 CTL5 G0–G7 CTL6 CTL7 R0–R7 CTL8 CTL9 TMDS TRANSMITTER DUAL TMDS LINK ENCODER AND SERIALIZER CHANNEL 0 ENCODER AND SERIALIZER CHANNEL 1 ENCODER AND SERIALIZER CHANNEL 2 CHANNEL C ENCODER AND SERIALIZER CHANNEL 3 ENCODER AND SERIALIZER CHANNEL 4 ENCODER AND SERIALIZER CHANNEL 5 TMDS RECEIVER RECEIVER AND DECODER B0–B7 VSYNC HSYNC DE0 RECEIVER AND DECODER G0–G7 CTL0 CTL1 DE1 RECEIVER AND DECODER R0–R7 CTL2 CTL3 DE2 RECEIVER AND DECODER B0–B7 CTL4 CTL5 DE3 INTER CHANNEL ALIGNMENT RECEIVER AND DECODER G0–G7 CTL6 CTL7 DE4 RECEIVER AND DECODER R0–R7 CTL8 CTL9 DE5 B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK B0–B7 CTL4 CTL5 G0–G7 CTL6 CTL7 R0–R7 CTL8 CTL9 Figure 6.45. DVI Dual TMDS Link. Consumer Component Interfaces 165 Control Signals In addition to the video data, DVI transmit- ter and receiver chips typically use up to 14 control signals for interfacing to other chips in the system: HSYNC VSYNC DE CTL0–CTL3 CTL4–CTL9 CLK horizontal sync vertical sync data enable reserved (link 0) reserved (link 1) 1× sample clock While DE is a “1,” active video is processed. While DE is a “0,” the HSYNC, VSYNC, and CTL0–CTL9 signals are processed. HSYNC and VSYNC may be either polarity. One issue is that some HDTVs use the falling edge of the YPbPr tri-level sync, rather than the center (rising edge), for horizontal timing. When displaying content from DVI, this results in the image shifting by 2.3%. Providing the ability to adjust the DVI embedded sync timing relative to the YPbPr tri-level sync timing is a useful capability in this case. Many fixed-pixel displays, such as DLP, LCD, and plasma, instead use the DE signal as a timing reference, avoiding the issue. Digital-Only (DVI-D) Connector The digital-only connector, which supports dual link operation, contains 24 contacts arranged as three rows of eight contacts, as shown in Figure 6.46. Table 6.23 lists the pin assignments. Digital-Analog (DVI-I) Connector In addition to the 24 contacts used by the digital-only connector, the 29-contact digitalanalog connector adds five additional contacts to support analog video as shown in Figure 6.47. Table 6.24 lists the pin assignments. HSYNC VSYNC RED GREEN BLUE horizontal sync vertical sync analog red video analog green video analog blue video The operation of the analog signals is the same as for a standard VGA connector. DVI-A is available as a plug (male) connector only and mates to the analog-only pins of a DVI-I connector. DVI-A is only used in adapter cables, where there is the need to convert to or from a traditional analog VGA signal. 12345678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Figure 6.46. DVI-D Connector. 12345678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 C1 C2 C3 C4 C5 Figure 6.47. DVI-I Connector. 166 Chapter 6: Digital Video Interfaces Pin Signal Pin Signal Pin 1 D2– 9 D1– 17 2 D2 10 D1 18 3 shield 11 shield 19 4 D4– 12 D3– 20 5 D4 13 D3 21 6 DDC SCL 14 +5V 22 7 DDC SDA 15 ground 23 8 reser ved 16 Hot Plug Detect 24 SIgnal D0– D0 shield D5– D5 shield CLK CLK– Table 6.23. DVI-D Connector Signal Assignments. Pin Signal Pin Signal Pin 1 D2– 9 D1– 17 2 D2 10 D1 18 3 shield 11 shield 19 4 D4– 12 D3– 20 5 D4 13 D3 21 6 DDC SCL 14 +5V 22 7 DDC SDA 15 ground 23 8 VSYNC 16 Hot Plug Detect 24 C1 RED C2 GREEN C3 C4 HSYNC C5 ground SIgnal D0– D0 shield D5– D5 shield CLK CLK– BLUE Table 6.24. DVI-I Connector Signal Assignments. Consumer Component Interfaces 167 High-Definition Multimedia Interface (HDMI) Although DVI handles transferring uncompressed real-time digital RGB video to a display, the consumer electronics industry preferred a smaller, more flexible solution, based on DVI technology. In April 2002, the HDMI working group was formed by Hitachi, Matsushita Electric (Panasonic), Philips, Silicon Image, Sony, Thomson, and Toshiba. HDMI is capable of replacing up to eight audio cables (7.1 channels) and up to three video cables with a single cable, as illustrated in Figure 6.48. In 2004, the consumer electronics industry started adding HDMI outputs to DVD players and cable/satellite set-top boxes. HDMI inputs started appearing on digital televisions and monitors in 2005. Through the use of an adaptor cable, HDMI is backwards compatible with equipment using DVI and the CEA-861 DTV profile. However, the advanced features of HDMI, such as digital audio, Consumer Electronics Control (used to enable passing control commands between equipment) and color gamut metadata, are not available. Technology HDMI, based on DVI, supports VESA’s Extended Display Identification Data (EDID) standard and Display Data Channel (DDC) standard (used to read the EDID). In addition, the CEA-861 standard specifies mandatory and optionally supported resolutions and timings, and how to include data such as aspect ratio and format information. HDMI also supports the High-bandwidth Digital Content Protection (HDCP) specification to deter unauthorized copying of content. A common problem is sources not polling the TV often enough (twice per second) to see if its HDCP circuit is active. This results in snow if the TV’s HDMI input is deselected, then later selected again. The 19-pin Type A connector uses a single TMDS link and can therefore carry video signals with a 25–340 MHz sample rate. Video with sample rates below 25 MHz (i.e. 13.5 MHz 480i and 576i) are transmitted using a pixelrepetition scheme. To support video signals sampled at greater than 340 MHz, the dual-link capability of the 29-pin Type B connector is used. The 19-pin Type C connector, designed for mobile applications, is a smaller version of the Type A connector. DVD PLAYER OR SET-TOP BOX Y PB PR AUDIO DISPLAY WITHOUT HDMI DVD PLAYER OR SET-TOP BOX HDMI DISPLAY WITH HDMI Figure 6.48. Using HDMI Eliminates Confusing Cable Connections for Consumers. 168 Chapter 6: Digital Video Interfaces Video Data Formats HDMI supports R´G´B´, 4:4:4 YCbCr, 4:2:2 YCbCr, 4:4:4 xvYCC and 4:2:2 xvYCC. 24, 30, 36 or 48 bits per pixel can be transferred; color depths greater than 24 bits per pixel are called “deep color”. Video data is either “full range” (0x00– 0xFF for 8-bit RGB data) or “limited range” (0x10–0xEB for 8-bit RGB or Y data, 0x10– 0xF0 for 8-bit CbCr data; values less than or greater than these may be present). R´G´B´ data may be either “full range” or “limited range”, except for the 640 × 480 resolution which must always be “full range”. YCbCr and xvYCC video data must always be “limited range”. Audio Data Formats Driven by the DVD-Audio standard, audio support consists of 1–8 uncompressed audio streams with a sample rate of up to 48, 96, or 192 kHz, depending on the video format. It can alternately carry a compressed multi-channel audio stream at sample rates up to 192 kHz. Digital Flat Panel (DFP) Interface The VESA DFP interface was developed for transferring uncompressed digital video from a computer to a digital flat panel display. It supports VESA’s Plug and Display (P&D) standard, Extended Display Identification Data (EDID) standard, Display Data Channel (DDC) standard, and Monitor Timing Specification (DMT). DDC and EDID enable automatic display detection and configuration. Only TFT data mapping is supported: 1 pixel per clock, 8 bits per channel, MSB justified. Like DVI, DFP uses transition-minimized differential signaling (TMDS). 8 bits of video data are converted to a 10-bit transition-minimized, DC-balanced value, which is then serialized. The receiver deserializes the data, and converts it back to 8 bits. Thus, to transfer digital R´G´B´ data requires three TMDS signals that comprise one TMDS link. Cable lengths may be up to 5 meters. B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK TMDS TRANSMITTER TMDS LINK ENCODER AND SERIALIZER CHANNEL 0 ENCODER AND SERIALIZER CHANNEL 1 ENCODER AND SERIALIZER CHANNEL 2 CHANNEL C TMDS RECEIVER RECEIVER AND DECODER B0–B7 VSYNC HSYNC DE0 RECEIVER AND DECODER G0–G7 CTL0 CTL1 DE1 RECEIVER AND DECODER R0–R7 CTL2 CTL3 DE2 INTER CHANNEL ALIGNMENT B0–B7 VSYNC HSYNC DE G0–G7 CTL0 CTL1 R0–R7 CTL2 CTL3 CLK Figure 6.49. DFP TMDS Link. Consumer Component Interfaces 169 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 Figure 6.50. DFP Connector. TMDS Links A single TMDS link, as shown in Figure 6.49, supports formats and timings requiring a clock rate of 22.5–160 MHz. Video Data Formats 24-bit R´G´B´ data is transferred over the link, as shown in Figure 6.49. Control Signals In addition to the video data, DFP transmit- ter and receiver chips typically use up to 8 control signals for interfacing to other chips in the system: HSYNC VSYNC DE CTL0–CTL3 CLK horizontal sync vertical sync data enable reser ved 1× sample clock While DE is a “1,” active video is processed. While DE is a “0,” the HSYNC, VSYNC, and CTL0–CTL3 signals are processed. HSYNC and VSYNC may be either polarity. Connector The 20-pin mini-D ribbon (MDR) connec- tor contains 20 contacts arranged as two rows of ten contacts, as shown in Figure 6.25. Table 6.39 lists the pin assignments. Pin Signal Pin Signal 1 D1 11 D2 2 D1– 12 D2– 3 shield 13 shield 4 shield 14 shield 5 CLK 15 D0 6 CLK– 16 D0– 7 ground 17 no connect 8 +5V 18 Hot Plug Detect 9 no connect 19 DDC SDA 10 no connect 20 DDC SCL Table 6.25. DFP Connector Signal Assignments. 170 Chapter 6: Digital Video Interfaces Open LVDS Display Interface (OpenLDI) OpenLDI was developed for transferring uncompressed digital video from a computer to a digital flat panel display. It enhances the FPD-Link standard used to drive the displays of laptop computers, and adds support for VESA’s Plug and Display (P&D) standard, Extended Display Identification Data (EDID) standard, and Display Data Channel (DDC) standard. DDC and EDID enable automatic display detection and configuration. Unlike DVI and DFP, OpenLDI uses lowvoltage differential signaling (LVDS). Cable lengths may be up to 10 meters. LVDS Link The LVDS link, as shown in Figure 6.51, supports formats and timings requiring a clock rate of 32.5–160 MHz. Eight serial data lines (A0–A7) and two sample clock lines (CLK1 and CLK2) are used. The number of serial data lines actually used is dependent on the pixel format, with the serial data rate being 7× the sample clock rate. The CLK2 signal is used in the dual pixel modes for backwards compatibility with FPD-Link receivers. Video Data Formats 18-bit single pixel, 24-bit single pixel, 18-bit dual pixel, or 24-bit dual pixel R´G´B´ data is transferred over the link. Table 6.26 illustrates the mapping between the pixel data bit number and the OpenLDI bit number. The 18-bit single pixel R´G´B´ format uses three 6-bit R´G´B´ values: R0–R5, G0–G5, and B0–B5. OpenLDI serial data lines A0–A2 are used to transfer the data. The 24-bit single pixel R´G´B´ format uses three 8-bit R´G´B´ values: R0–R7, G0–G7, and B0–B7. OpenLDI serial data lines A0–A3 are used to transfer the data. The 18-bit dual pixel R´G´B´ format represents two pixels as three upper/lower pairs of 6-bit R´G´B´ values: RU0–RU5, GU0–GU5, BU0–BU5, RL0–RL5, GL0–GL5, BL0–BL5. Each upper/lower pair represents two pixels. OpenLDI serial data lines A0–A2 and A4–A6 are used to transfer the data. The 24-bit dual pixel R´G´B´ format represents two pixels as three upper/lower pairs of 8-bit R´G´B´ values: RU0–RU7, GU0–GU7, BU0–BU7, RL0–RL7, GL0–GL7, BL0–BL7. Each upper/lower pair represents two pixels. OpenLDI serial data lines A0–A7 are used to transfer the data. Control Signals In addition to the video data, OpenLDI transmitter and receiver chips typically use up to seven control signals for interfacing to other chips in the system: HSYNC VSYNC DE CNTLE CNTLF CLK1 CLK2 horizontal sync vertical sync data enable reser ved reser ved 1× sample clock 1× sample clock During unbalanced operation, the DE, HSYNC, VSYNC, CNTLE, and CNTLF levels are sent as unencoded bits within the A2 and A6 bitstreams. During balanced operation (used to minimize short- and long-term DC bias), a DC Balance bit is sent within each of the A0–A7 bitstreams to indicate whether the data is unmodified or inverted. Since there is no room left for the control signals to be sent directly, the DE level is sent by slightly modifying the Consumer Component Interfaces 171 B0–B7 VSYNC HSYNC DE G0–G7 CNTLE R0–R7 CNTLF CLK1 CLK2 LVDS TRANSMITTER LVDS LINK LVDS RECEIVER ENCODER AND SERIALIZER CHANNEL 0 CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 CHANNEL 5 CHANNEL 6 CHANNEL 7 RECEIVER AND DECODER Figure 6.51. OpenLDI LVDS Link. B0–B7 VSYNC HSYNC DE G0–G7 CNTLE R0–R7 CNTLF CLK1 CLK2 18 Bits per Pixel Bit Number 5 4 3 2 1 0 24 Bits per Pixel Bit Number 7 6 5 4 3 2 1 0 OpenLDI Bit Number 5 4 3 2 1 0 7 6 Table 6.26. OpenLDI Bit Number Mappings. 172 Chapter 6: Digital Video Interfaces timing of the falling edge of the CLK1 and CLK2 signals. The HSYNC, VSYNC, CNTLE, and CNTLF levels are sent during the blanking intervals using 7-bit code words on the A0, A1, A5, and A4 signals, respectively. Connector The 36-pin mini-D ribbon (MDR) connec- tor is similar to the one shown in Figure 6.50, except that there are two rows of eighteen contacts. Table 6.27 lists the pin assignments. Gigabit Video Interface (GVIF) The Sony GVIF was developed for transferring uncompressed digital video using a single differential signal, instead of the multiple signals that DVI, DFP, and OpenLDI use. Cable lengths may be up to 10 meters. GVIF Link The GVIF link, as shown in Figure 6.52, supports formats and timings requiring a clock rate of 20–80 MHz. For applications requiring higher clock rates, more than one GVIF link may be used. The serial data rate is 24× the sample clock rate for 18-bit R´G´B´ data, or 30× the sample clock rate for 24-bit R´G´B´ data. Video Data Formats 18-bit or 24-bit R´G´B´ data, plus timing, is transferred over the link. The 18-bit R´G´B´ format uses three 6-bit R´G´B´ values: R0–R5, G0– G5, and B0–B5. The 24-bit R´G´B´ format uses three 8-bit R´G´B´ values: R0–R7, G0–G7, and B0–B7. Pin Signal Pin 1 A0– 13 2 A1– 14 3 A2– 15 4 CLK1– 16 5 A3– 17 6 ground 18 7 reser ved 19 8 reser ved 20 9 reser ved 21 10 DDC SCL 22 11 +5V 23 12 USB 24 Signal +5V A4– A5– A6– A7– CLK2– A0 A1 A2 CLK1 A3 reser ved Pin Signal 25 reser ved 26 reser ved 27 ground 28 DDC SDA 29 ground 30 USB– 31 ground 32 A4 33 A5 34 A6 35 A7 36 CLK2 Table 6.27. OpenLDI Connector Signal Assignments. Consumer Component Interfaces 173 18-bit R´G´B´ data is converted to 24-bit data by slicing the R´G´B data into six 3-bit values that are in turn transformed into six 4-bit codes. This ensures rich transitions for receiver PLL locking and good DC balance. 24-bit R´G´B´ data is converted to 30-bit data by slicing the R´G´B data into six 4-bit values that are in turn transformed into six 5-bit codes. Control Signals In addition to the video data, there are six control signals: HSYNC VSYNC DE horizontal sync vertical sync data enable CTL0 CTL1 CLK reser ved reser ved 1× sample clock If any of the HSYNC, VSYNC, DE, CTL0, or CTL1 signals change, during the next CLK cycle a special 30-bit format is used. The first 6 bits are header data indicating the new levels of HSYNC, VSYNC, DE, CTL0, or CTL1. This is followed by 24 bits of R´G´B´ data (unencoded except for inverting the odd bits). Note that during the blanking periods, non-video data, such as digital audio, may be transferred. The CTL signals may be used to indicate when non-video data is present. B0–B7 VSYNC HSYNC DE G0–G7 CTL0 R0–R7 CTL1 CLK GVIF TRANSMITTER GVIF LINK GVIF RECEIVER ENCODER AND SERIALIZER SDATA RECEIVER AND DECODER Figure 6.52. GVIF Link. B0–B7 VSYNC HSYNC DE G0–G7 CTL0 R0–R7 CTL1 CLK 174 Chapter 6: Digital Video Interfaces Consumer Transport Interfaces Several transport interfaces, such as USB 2.0, Ethernet, and IEEE 1394, are available for consumer products. Of course, each standard has its own advantages and disadvantages. USB 2.0 Well known in the PC market for connecting peripherals to a PC, there is growing interest in using USB (Universal Serial Bus) 2.0 to transfer compressed audio/video data between products. USB 2.0 is capable of operating up to 480 Mbps and supports an isochronous mode to guarantee data delivery timing. Thus, it can easily transfer compressed real-time audio/ video data from a cable/satellite set-top box or DVD player to a digital television. DTCP (Digital Transmission Copy Protection) may be used to encrypt the audio and video content over USB. Due to USB’s lower cost and widespread usage, many companies are interested in using USB 2.0 instead of IEEE 1394 to transfer compressed audio/video data between products. However, some still prefer IEEE 1394 since the methods for transferring various types of data are much better defined. USB On-the-Go With portable devices increasing in popu- larity, there was a growing desire for them to communicate directly with each other without requiring a PC or other USB host. On-the-Go addresses this desire by allowing a USB device to communicate directly with other On-the-Go products. It also features a smaller USB connector and low power features to preserve battery life. Ethernet With the widespread adoption of home networks, DSL, and FTTH (Fiber-to-the-Home), Ethernet has become a common interface for transporting digital audio and video data. Initially used for file transfers, streaming of realtime compressed video over wired (802.3) or wireless (802.11) Ethernet networks is now becoming common. Ethernet supports up to 1 Gbps. DTCP/IP (Digital Transmission Copy Protection for Internet Protocol) may be used to encrypt the audio and video content over wired or wireless networks. IEEE 1394 IEEE 1394 was originally developed by Apple Computer as Firewire. Designed to be a generic interface between devices, 1394 specifies the physical characteristics; separate application-specific specifications describe how to transfer data over the 1394 network. 1394 is a transaction-based packet technology, using a bi-directional serial interconnect that features hot plug-and-play. This enables devices to be connected and disconnected without affecting the operation of other devices connected to the network. Guaranteed delivery of time-sensitive data is supported, enabling digital audio and video to be transferred in real time. In addition, multiple independent streams of digital audio and video can be carried. 1 2 3 4 Consumer Transport Interfaces 175 16 HOPS = 17 NODES MAX. 16 17 BRANCHING INCREASES NODE COUNT 1 2 3 4 16 17 18 19 20 21 Figure 6.53. IEEE 1394 Network Topology Examples. Specifications The original 1394-1995 specification sup- ports bit-rates of 98.304, 196.608, and 393.216 Mbps. The 1394A-2000 specification clarifies areas that were vague and led to system interoperability issues. It also reduces the overhead lost to bus control, arbitration, bus reset duration, and concatenation of packets. 1394A-2000 also introduces advanced powersaving features. The electrical signaling method is also common between 1394-1995 and 1394A-2000, using data-strobe (DS) encoding and analog-speed signaling. The 1394B-2002 specification adds support for bit-rates of 786.432, 1572.864, and 3145.728 Mbps. It also includes - 8B/10B encoding technique used by Gigabit Ethernet - Continuous dual simplex operation - Longer distance (up to 100 meters over Cat5) - Changes the speed signaling to a more digital method - Three types of ports: Legacy (1395A compatible), Beta, and Bilingual (supports both Legacy and Beta). Connector keying ensures that incompatible connections cannot physically be made. Endian Issues 1394 uses a big-endian architecture, defin- ing the most significant bit as bit 0. However, many processors are based on the little endian architecture which defines the most significant bit as bit 31 (assuming a 32-bit word). Network Topology Like many networks, there is no desig- nated bus master. The tree-like network structure has a root node, branching out to logical nodes in other devices (Figure 6.53). The root is responsible for certain control functions, and is chosen during initialization. Once chosen, it retains that function for as long as it 176 Chapter 6: Digital Video Interfaces remains powered on and connected to the network. A network can include up to 63 nodes, with each node (or device) specified by a 6-bit physical identification number. Multiple networks may be connected by bridges, up to a system maximum of 1,023 networks, with each network represented by a separate 10-bit bus ID. Combined, the 16-bit address allows up to 64,449 nodes in a system. Since device addresses are 64 bits, and 16 of these bits are used to specify nodes and networks, 48 bits remain for memory addresses, allowing up to 256TB of memory space per node. Node Types Nodes on a 1394 bus may vary in complex- ity and capability (listed simplest to most complex): Transaction nodes respond to asynchronous communication, implement the minimal set of control status registers (CSR), and implement a minimal configuration ROM. Isochronous nodes add a 24.576 MHz clock used to increment a cycle timer register that is updated by cycle start packets. Cycle master nodes add the ability to generate the 8 kHz cycle start event, generate cycle start packets, and implement a bus timer register. Isochronous resource manager (IRM) nodes add the ability to detect bad self-ID packets, determine the node ID of the chosen IRM, and implement the channels available, bandwidth available, and bus manager ID registers. At least one node must be capable of acting as an IRM to support isochronous communication. Bus manager (BM) nodes are the most complex. This level adds responsibility for storing every self-ID packet in a topology map and analyzing that map to produce a speed map of the entire bus. These two maps are used to manage the bus. Finally, the BM must be able to activate the cycle master node, write configuration packets to allow optimization of the bus, and act as the power manager. Node Ports In the network topology, a one-port device is known as a “leaf” device since it is at the end of a network branch. They can be connected to the network, but cannot expand the network. Two-port devices can be used to form daisy-chained topologies. They can be connected to and continue the network, as shown in Figure 6.53. Devices with three or more ports are able to branch the network to the full 63-node capability. DATA STROBE STROBE XOR DATA Figure 6.54. IEEE 1394 Data and Strobe Signal Timing. Consumer Transport Interfaces 177 It is important to note that no loops or parallel connections are allowed within the network. Also, there are no reserved connectors—any connector may be used to add a new device to the network. Since 1394-1995 mandates a maximum of 16 cable hops between any two nodes, a maximum of 17 peripherals can be included in a network if only two-port peripherals are used. Later specifications implement a ping packet to measure the round-trip delay to any node, removing the 16 hop limitation. For 1394-1995 and 1394A-2000, a 4- or 6-pin connector is used. The 6-pin connector can provide power to peripherals. For 1394B-2002, the 9-pin Beta and Bilingual connector includes power, two extra pins for signal integrity, and one pin for reserved for future use. Figure 6.54 illustrates the 1394-1995 and 1394A-2000 data and strobe timing. The strobe signal changes state on every bit period for which the data signal does not. Therefore, by exclusive-ORing the data and strobe signals, the clock is recovered. Physical Layer The typical hardware topology of a 1394 network consists of a physical layer (PHY) and link layer (LINK), as shown in Figure 6.55. The 1394-1995 standard also defined two software layers, the transaction layer and the bus management layer, parts of which may be implemented in hardware. The PHY transforms the point-to-point network into a logical physical bus. Each node is also essentially a data repeater since data is TPA1, TPA1# TPB1, TPB1# TPA2, TPA2# TPB2, TPB2# TPA3, TPA3# TPB3, TPB3# 1394 PHYSICAL LAYER INTERFACE RECEIVED DATA DECODER AND RETIMER CABLE PORT 1 ARBITRATION AND CONTROL STATE MACHINE LOGIC CABLE PORT 2 CABLE PORT 3 TRANSMIT DATA ENCODER 1394 LINK LAYER CONTROL (LLC) IEC 61883-4 PROTOCOL COPY PROTECT 1394 LLC CONTROL AND STATUS REGISTERS CYCLE MONITOR CYCLE TIMER 1394 PACKET TRANSMIT AND RECEIVE CONTROL LOGIC CRC LOGIC ASYNCHRONOUS RECEIVE FIFO ASYNCHRONOUS TRANSMIT FIFO HOST INTERFACE ISOCHRONOUS RECEIVE FIFO ISOCHRONOUS TRANSMIT FIFO ISOCHRONOUS PORT -----------------MPEG 2 TRANSPORT LAYER INTERFACE BUSY# INTREQ# ARXD# BCLK RESET# D0–D15 A0–A7 CS# RD# WR# ID0–ID7 IWR# IRXD# IRDY# IDONE# IRESET# ICLK IERROR# IRST# PFTFLAG# Figure 6.55. IEEE 1394 Typical Physical and Link Layer Block Diagrams. 178 Chapter 6: Digital Video Interfaces reclocked at each node. The PHY also defines the electrical and mechanical connection to the network. Physical signaling circuits and logic responsible for power-up initialization, arbitration, bus-reset sensing, and data signaling are also included. Link Layer The Link provides interfacing between the physical layer and application layer, formatting data into packets for transmission over the network. It supports both asynchronous and isochronous data. Asynchronous Data Asynchronous packets are guaranteed delivery since after an asynchronous packet is received, the receiver transmits an acknowledgment to the sender, as shown in Figure 6.56. However, there is no guaranteed bandwidth. This type of communication is useful for commands, non-real-time data, and error-free transfers. The delivery latency of asynchronous packets is not guaranteed and depends upon the network traffic. However, the sender may continually retry until an acknowledgment is received. Asynchronous packets are targeted to one node on the network or can be sent to all nodes, but cannot be broadcast to a subset of nodes on the bus. The maximum asynchronous packet size is: 512 * (n / 100) bytes n = network speed in Mbps Isochronous Data Isochronous communications have a guar- anteed bandwidth, with up to 80% of the network bandwidth available for isochronous use. Up to 63 independent isochronous channels are available, although the 1394 Open Host Controller Interface (OHCI) currently only supports 4–32 channels. This type of communi- ISOCHRONOUS PACKETS ASYNCHRONOUS PACKETS CYCLE START PACKET CHANNEL 1 CHANNEL 2 CHANNEL 3 PACKET 1 ACK 1 PACKET 2 ACK 2 CYCLE START PACKET 125 µs Figure 6.56. IEEE 1394 Isochronous and Asynchronous Packets. Consumer Transport Interfaces 179 cation is useful for real-time audio and video transfers since the maximum delivery latency of isochronous packets is calculable and may be targeted to multiple destinations. However, the sender may not retry sending a packet. The maximum isochronous packet size is: 1024 * (n / 100) bytes n = network speed in Mbps Isochronous operation guarantees a time slice each 125 µs. Since time slots are guaranteed, and isochronous communication takes priority over asynchronous, isochronous bandwidth is assured. Once an isochronous channel is established, the sending device is guaranteed to have the requested amount of bus time for that channel every isochronous cycle. Only one device may send data on a particular channel, but any number of devices may receive data on a channel. A device may use multiple isochronous channels as long as capacity is available. Transaction Layer The transaction layer supports asynchro- nous write, read, and lock commands. A lock combines a write with a read by producing a round trip routing of data between the sender and receiver, including processing by the receiver. Bus Management Layer The bus management layer control func- tions of the network at the physical, link, and transaction layers. Digital Transmission Content Protection (DTCP) To prevent unauthorized copying of content, the DTCP system was developed. Although originally designed for 1394, it is applicable to any digital network that supports bi-directional communications, such as USB and Ethernet. Device authentication, content encryption, and renewability (should a device ever be compromised) are supported by DTCP. The Digital Transmission Licensing Administrator (DTLA) licenses the content protection system and distributes cipher keys and device certificates. DTCP outlines four elements of content protection: 1. Copy control information (CCI) 2. Authentication and key exchange (AKE) 3. Content encryption 4. System renewability Copy Control Information (CCI) CCI allows content owners to specify how their content can be used, such as “copynever,” “copy-one-generation,” “no-more-copies,” and “copy-free.” DTCP is capable of securely communicating copy control information between devices. Two different CCI mechanisms are supported: embedded and encryption mode indicator. Embedded CCI is carried within the content stream. Tampering with the content stream results in incorrect decryption, maintaining the integrity of the embedded CCI. The encryption mode indicator (EMI) provides a secure, yet easily accessible, transmission of CCI by using the two most significant bits of the sync field of the isochronous packet header. Devices can immediately determine the CCI of the content stream without decoding the content. If the two EMI bits are tampered with, the encryption and decryption modes do not match, resulting in incorrect content decryption. 180 Chapter 6: Digital Video Interfaces Authentication and Key Exchange (AKE) Before sharing content, a device must first verify that the other device is authentic. DTCP includes a choice of two authentication levels: full and restricted. Full authentication can be used with all content protected by the system. Restricted authentication enables the protection of “copy-one-generation” and “no-morecopies” content only. Full Authentication Compliant devices are assigned a unique public/private key pair and a device certificate by the DTLA, both stored within the device so as to prevent their disclosure. In addition, devices store other necessary constants and keys. Full authentication uses the public keybased digital signature standard (DSS) and Diffie-Hellman (DH) key exchange algorithms. DSS is a method for digitally signing and verifying the signatures of digital documents to verify the integrity of the data. DH key exchange is used to establish control-channel symmetric cipher keys, which allows two or more devices to generate a shared key. Initially, the receiver sends a request to the source to exchange device certificates and random challenges. Then, each device calculates a DH key exchange first-phase value. The devices then exchange signed messages that contain the following elements: 1. The other device’s random challenge 2. The DH key-exchange first-phase value 3. The renewability message version number of the newest system renewability message (SRM) stored by the device The devices check the message signatures using the other device’s public key to verify that the message has not been tampered with and also verify the integrity of the other device’s certificate. Each device also examines the certificate revocation list (CRL) embedded in its system renewability message (SRM) to verify that the other device’s certificate has not been revoked due to its security having been compromised. If no errors have occurred, the two devices have successfully authenticated each other and established an authorization key. Restricted Authentication Restricted authentication may be used between sources and receivers for the exchange of “copy-one-generation” and “nomore-copies” contents. It relies on the use of a shared secret to respond to a random challenge. The source initiates a request to the receiver, requests its device ID, and sends a random challenge. After receiving the challenge back from the source, the receiver computes a response and sends it to the source. The source compares this response with similar information generated by the source using its service key and the ID of the receiver. If the comparison matches its own calculation, the receiver has been verified and authenticated. The source and receiver then each calculate an authorization key. Content Encryption To ensure interoperability, all compliant devices must support the 56-bit M6 baseline cipher. Additional content protection may be supported by using additional, optional ciphers. System Renewability Devices that support full authentication can receive and process SRMs that are created by the DTLA and distributed with content. Sys- Consumer Transport Interfaces 181 tem renewability is used to ensure the longterm system integrity by revoking the device IDs of compromised devices. SRMs can be updated from other compliant devices that have a newer list, from media with prerecorded content, or via compliant devices with external communication capability (Internet, phone, cable, network, and so on). Example Operation For this example, the source has been instructed to transmit a copy-protected system stream of content. The source initiates the transmission of content marked with the copy protection status: “copy-one-generation,” “copy-never,” “nomore-copies,” or “copy-free.” Upon receiving the content stream, the receiver determines the copy protection status. If marked “copy never,” the receiver requests that the source initiate full authentication. If the content is marked “copy once” or “no more copies,” the receiver will request full authentication if supported, or restricted authentication if it isn’t. When the source receives the authentication request, it proceeds with the requested type of authentication. If full authentication is requested but the source can only support restricted authentication, then restricted authentication is used. Once the devices have completed the authentication procedure, a content-channel encryption key (content key) is exchanged between them. This key is used to encrypt the content at the source device and decrypt the content at the receiver. 1394 Open Host Controller Interface (OHCI) The 1394 Open Host Controller Interface (OHCI) specification is an implementation of the 1394 link layer, with additional features to support the transaction and bus management layers. It provides a standardized way of interacting with the 1394 network. Home AV Interoperability (HAVi) Home AV Interoperability (HAVi) is another layer of protocols for 1394. HAVi is directed at making 1394 devices plug-and-play interoperable in a 1394 network whether or not a PC host is present. Serial Bus Protocol (SBP-2) The ANSI Serial Bus Protocol 2 (SBP-2) defines standard way of delivering command and status packets over 1394 for devices such DVD players, printers, scanners, hard drives, and other devices. IEC 61883 Specifications Certain types of isochronous signals, such as MPEG-2 and the IEC 61834, SMPTE 314M, and ITU-R BT.1618 digital video (DV) standards, use specific data transport protocols and formats. When this data is sent isochronously over a 1394 network, special packetization techniques are used. The IEC 61883 series of specifications define the details for transferring various application-specific data over 1394: IEC 61883-1 = General specification IEC 61883-2 = SD-DVCR data transmission 25 Mbps continuous bit-rate IEC 61883-3 = HD-DVCR data transmission IEC 61883-4 = MPEG-2 TS data transmission bitrate bursts up to 44 Mbps IEC 61883-5 = SDL-DVCR data transmission IEC 61883-6 = Audio and music data transmission IEC 61883-7 = Transmission of ITU-R BO.1294 System B 182 Chapter 6: Digital Video Interfaces IEC 61883-1 IEC 61883-1 defines the general structure for transferring digital audio and video data over 1394. It describes the general packet format, data flow management, and connection management for digital audio and video data, and also the general transmission rules for control commands. A common isochronous packet (CIP) header is placed at the beginning of the data field of isochronous data packets, as shown in Figure 6.57. It specifies the source node, data block size, data block count, time stamp, type of real-time data contained in the data field, etc. A connection management procedure (CMP) is also defined for making isochronous connections between devices. In addition, a functional control protocol (FCP) is defined for exchanging control commands over 1394 using asynchronous data. IEC 61883-2 IEC 61883-2 and SMPTE 396M define the CIP header, data packet format, and transmission timing for IEC 61834, SMPTE 314M, and ITU-R BT.1618 digital video (DV) standards over 1394. Active resolutions of 720 × 480 (at 29.97 frames per second) and 720 × 576 (at 25 frames per second) are supported. DV data packets are 488 bytes long, made up of 8 bytes of CIP header and 480 bytes of DV data, as shown in Figure 6.57. Figure 6.58 illustrates the frame data structure. Each of the 720 × 480 4:1:1 YCbCr frames are compressed to 103,950 bytes, resulting in a 4.9:1 compression ratio. Including overhead and audio increases the amount of data to 120,000 bytes. NORMAL ISOCHRONOUS PACKET PACKET HEADER HEADER CRC ISOCHRONOUS PACKET PAYLOAD DATA CRC 32 BITS 61883 - 2 ISOCHRONOUS PACKET CIP HEADER 0 CIP HEADER 1 DATA PAYLOAD (480 BYTES) 32 BITS Figure 6.57. 61883-2 Isochronous Packet Formatting. Consumer Transport Interfaces 183 1 FRAME IN 1.001 / 30 SECOND (10 DIF SEQUENCES) DIFS0 DIFS1 DIFS2 DIFS3 DIFS4 DIFS5 DIFS6 DIFS7 DIFS8 DIFS9 1 DIF SEQUENCE IN 1.001 / 300 SECOND (150 DIF BLOCKS) HEADER SUBCODE VAUX (1 DIF) (2 DIF) (3 DIF) 135 VIDEO AND 9 AUDIO DIF BLOCKS 150 DIF BLOCKS IN 1.001 / 30 SECOND DIF0 DIF1 DIF2 DIF3 DIF4 DIF5 DIF6 DIF148 DIF149 ID HEADER (3 BYTES) (1 BYTE) 1 DIF BLOCK IN 1.001 / 45000 SECOND DATA (76 BYTES) Y0 (14 BYTES) Y1 (14 BYTES) Y2 (14 BYTES) Y3 (14 BYTES) CR (10 BYTES) CB (10 BYTES) DC0 AC DC1 AC DC2 AC DC3 AC DC4 AC DC5 AC COMPRESSED MACROBLOCK Figure 6.58. IEC 61834, SMPTE 314M, and ITU-R BT.1618 Packet Formatting for 720 × 480 Systems (4:1:1 YCbCr). 184 Chapter 6: Digital Video Interfaces The compressed 720 × 480 frame is divided into 10 DIF (data in frame) sequences. Each DIF sequence contains 150 DIF blocks of 80 bytes each, used as follows: 135 DIF blocks for video 9 DIF blocks for audio 6 DIF blocks used for Header, Subcode, and Video Auxiliary (VAUX) information Figure 6.59 illustrates the DIF sequence structure in detail. The audio DIF blocks contain both audio data and audio auxiliary data (AAUX). IEC 61834 supports four 32-kHz, 12bit nonlinear audio signals or two 48-, 44.1-, or 32-kHz, 16-bit audio signals. SMPTE 314M and ITU-R BT.1618 at 25 Mbps support two 48-kHz 16-bit audio signals, while the 50 Mbps version supports four. Video auxiliary data (VAUX) DIF blocks include recording date and time, lens aperture, shutter speed, color balance, and other camera setting data. The subcode DIF blocks store a variety of information, the most important of which is timecode. Each video DIF block contains 80 bytes of compressed macroblock data: 3 bytes for DIF block ID information 1 byte for the header that includes the quantization number (QNO) and block status (STA) 14 bytes each for Y0, Y1, Y2, and Y3 10 bytes each for Cb and Cr As the 488-byte packets come across the 1394 network, the start of a video frame is determined. Once the start of a frame is detected, 250 valid packets of data are collected to have a complete DV frame; each packet contains 6 DIF blocks of data. Every 15th packet is a null packet and should be discarded. Once 250 valid packets of data are in the buffer, discard the CIP headers. If all went well, you have a frame buffer with a 120,000 byte compressed DV frame in it. 720 × 576 frames may use either the 4:2:0 YCbCr format (IEC 61834) or the 4:1:1 YCbCr format (SMPTE 314M and ITU-R BT.1618), and require 12 DIF sequences. Each 720 × 576 frame is compressed to 124,740 bytes. Including overhead and audio increases the amount of data to 144,000 bytes, requiring 300 packets to transfer. Note that the organization of data transferred over 1394 differs from the actual DV recording format since error correction is not required for digital transmission. In addition, although the video blocks are numbered in sequence in Figure 6.59, the sequence does not correspond to the left-to-right, top-to-bottom transmission of blocks of video data. Compressed macroblocks are shuffled to minimize the effect of errors and aid in error concealment. Audio data also is shuffled. Data is transmitted in the same shuffled order as recorded. To illustrate the video data shuffling, DV video frames are organized as 50 superblocks, with each superblock being composed of 27 compressed macroblocks, as shown in Figure 6.60. A group of 5 superblocks (one from each superblock column) make up one DIF sequence. Table 6.28 illustrates the transmission order of the DIF blocks. Additional information on the DV data structure is available in Chapter 11. IEC 61883-4 IEC 61883-4 defines the CIP header, data packet format, and transmission timing for MPEG-2 transport streams over 1394. It is most efficient to carry an integer number of 192 bytes (188 bytes of MPEG-2 data plus 4 bytes of time stamp) per isochronous packet, as shown in Figure 6.61. However, MPEG data rates are rarely integer multiples of the isochronous data rate. Thus, it is more Consumer Transport Interfaces 185 H SC0 SC1 VA0 VA1 VA2 0 1 2 3 4 5 H = HEADER SECTION SC0, SC1 = SUBCODE SECTION VA0, VA1, VA2 = VAUX SECTION A0–A8 = AUDIO SECTION V0–V134 = VIDEO SECTION A0 V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A1 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 A8 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133 V134 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 Figure 6.59. IEC 61834, SMPTE 314M, and ITU-R BT.1618 DIF Sequence Detail (25 Mbps). 186 Chapter 6: Digital Video Interfaces 480 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 0 11 12 23 24 8 9 20 21 0 11 12 23 24 8 9 20 21 0 11 12 23 24 1 10 13 22 25 7 10 19 22 1 10 13 22 25 7 10 19 22 1 10 13 22 2 9 14 21 26 6 11 18 23 2 9 14 21 26 6 11 18 23 2 9 14 21 25 3 8 15 20 0 5 12 17 24 3 8 15 20 0 5 12 17 24 3 8 15 20 4 7 16 19 1 4 13 16 25 4 7 16 19 1 4 13 16 25 4 7 16 19 26 5 6 17 18 2 3 14 15 26 5 6 17 18 2 3 14 15 26 5 6 17 18 MACROBLOCK Figure 6.60. Relationship Between Superblocks and Macroblocks (720 × 480, 4:1:1 YCbCr). Consumer Transport Interfaces 187 DIF Sequence Number 0 1 Video DIF Block Number 0 1 2 3 4 133 134 0 1 2 3 4 133 134 Compressed Macroblock Superblock Macroblock Number Number DIF Sequence Number 2, 2 0 6, 1 0 8, 3 0 0, 0 0 4, 4 0 : n–1 0, 0 26 4, 4 26 3, 2 0 7, 1 0 9, 3 0 1, 0 0 5, 4 0 : 1, 0 26 5, 4 26 Video DIF Block Number 0 1 2 3 4 133 134 Compressed Macroblock Superblock Number : Macroblock Number 1, 2 0 5, 1 0 7, 3 0 n–1, 0 0 3, 4 0 : n–1, 0 26 3, 4 26 Note: 1. n = 10 for 480-line systems, n = 12 for 576-line systems. Table 6.28. Video DIF Blocks and Compressed Macroblocks for 25 Mbps. 188 Chapter 6: Digital Video Interfaces efficient to divide the MPEG packets into smaller components of 24 bytes each to maximize available bandwidth. The transmitter then uses an integer number of data blocks (restricted multiples of 0, 1, 2, 4, or 8) placing them in an isochronous packet and adding the 8-byte CIP header. 50 Mbps DV Like the 25 Mbps DV format, the 50 Mbps DV format supports 720 × 480i30 and 720 × 576i25 sources. However, the 50 Mbps DV format uses 4:2:2 YCbCr rather than 4:1:1 YCbCr. As previously discussed, the source packet size for the 25 Mbps DV streams is 480 bytes (consisting of 6 DIF blocks). The 250 packets (300 packets for 576i25 systems) of 480-byte data are transferred over a 25 Mbps channel. The source packet size for the 50 Mbps DV streams is 960 bytes (consisting of 12 DIF blocks). The first 125 packets (150 packets for 576i25 systems) of 960-byte data are sent over one 25 Mbps channel and the next 125 packets (150 packets for 576i25 systems) of 960-byte data are sent over a second 25 Mbps channel. 100 Mbps DV 100 Mbps DV streams support 1920 × 1080i30, 1920 × 1080i25, and 1280 × 720p60 sources. 1920 × 1080i30 sources are horizontally scaled to 1280 × 1080i30. 1920 × 1080i25 sources are horizontally scaled to 1440 × 1080i25. 1280 × 720p60 sources are horizontally scaled to 960 × 720p60. The 4:2:2 YCbCr format is used. The source packet size for the 100 Mbps DV streams is 1920 bytes (consisting of 24 DIF blocks). The first 63 packets (75 packets for 1080i25 systems) of 1920-byte data are sent over one 25 Mbps channel, the next 62 packets (75 packets for 1080i25 systems) of 1920-byte data are sent over a second 25 Mbps channel, the next 63 packets (75 packets for 1080i25 systems) of 1920-byte data are sent over a third 25 Mbps channel, and the last 62 packets (75 packets for 1080i25 systems) of 1920-byte data are sent over a fourth 25 Mbps channel. NORMAL ISOCHRONOUS PACKET PACKET HEADER HEADER CRC ISOCHRONOUS PACKET PAYLOAD DATA CRC 32 BITS 61883 - 4 ISOCHRONOUS PACKET CIP HEADER 0 CIP HEADER 1 DATA PAYLOAD (192 BYTES) 32 BITS Figure 6.61. 61883-4 Isochronous Packet Formatting. References 189 Digital Camera Specification The 1394 Trade Association has written a specification for 1394-based digital video cameras. This was done to avoid the silicon and software cost of implementing the full IEC 61883 specification. Seven resolutions are defined, with a wide range of format support: 160 × 120 320 × 240 640 × 480 800 × 600 1024 × 768 1280 × 960 1600 × 1200 4:4:4 YCbCr 4:2:2 YCbCr 4:1:1, 4:2:2 YCbCr, 24-bit RGB 4:2:2 YCbCr, 24-bit RGB 4:2:2 YCbCr, 24-bit RGB 4:2:2 YCbCr, 24-bit RGB 4:2:2 YCbCr, 24-bit RGB Supported frame rates are 1.875, 3.75, 7.5, 15, 30, and 60 frames per second. Isochronous packets are used to transfer the uncompressed digital video data over the 1394 network. References 1. 1394-based Digital Camera Specification, Version 1.20, July 23, 1998. 2. Digital Transmission Content Protection Specification, Volume 1 (Informational Version), July 25, 2000. 3. Digital Visual Interface (DVI), April 2, 1999. 4. EBU Tech. 3267-E, 1992, EBU Interfaces for 625-Line Digital Video Signals at the 4:2:2 Level of CCIR Recommendation 601, European Broadcasting Union, June, 1991. 5. IEC 61883–1, 2003, Consumer Audio/Video Equipment—Digital Interface—Part 1: General. 6. IEC 61883–2, 1998, Consumer Audio/Video Equipment—Digital Interface—Part 2: SDDVCR Data Transmission. 7. IEC 61883–3, 1998, Consumer Audio/Video Equipment—Digital Interface—Part 3: HD-DVCR Data Transmission. 8. IEC 61883–4, 1998, Consumer Audio/Video Equipment—Digital Interface—Part 4: MPEG-2 TS Data Transmission. 9. IEC 61883–5, 1998, Consumer Audio/Video Equipment—Digital Interface—Part 5: SDL-DVCR Data Transmission. 10. ITU-R BT.656–4, 1998, Interfaces for Digital Component Video Signals in 525-Line and 625-Line Television Systems Operating at the 4:2:2 Level of Recommendation ITU-R BT.601. 11. ITU-R BT.799–3, 1998, Interfaces For Digital Component Video Signals in 525Line and 625-Line Television Systems Operating at the 4:4:4 Level of Recommendation ITU-R BT.601 (Part A). 12. ITU-R BT.1302, 1997, Interfaces for Digital Component Video Signals in 525-Line and 625-Line Television Systems Operating at the 4:2:2 Level of ITU-R BT.601. 13. ITU-R BT.1303, 1997, Interfaces For Digital Component Video Signals in 525-Line and 625-Line Television Systems Operating at the 4:4:4 Level of Recommendation ITU-R BT.601 (Part B). 14. ITU-R BT.1304, 1997, Checksum for Error Detection and Status Information in Interfaces Conforming to ITU-R BT.656 and ITU-R BT.799. 15. ITU-R BT.1305, 1997, Digital Audio and Auxiliary Data as Ancillary Data Signals in Interfaces Conforming to ITU-R BT.656 and ITU-R BT.799. 16. ITU-R BT.1362, 1998, Interfaces For Digital Component Video Signals in 525-Line and 625-Line Progressive Scan Television Systems. 190 Chapter 6: Digital Video Interfaces 17. ITU-R BT.1364, 1998, Format of Ancillary Data Signals Carried in Digital Component Studio Interfaces. 18. ITU-R BT.1365, 1998, 24-Bit Digital Audio Format as Ancillary Data Signals in HDTV Serial Interfaces. 19. ITU-R BT.1366, 1998, Transmission of Time Code and Control Code in the Ancillary Data Space of a Digital Television Stream According to ITU-R BT.656, ITU-R BT.799, and ITU-R BT.1120. 20. ITU-R BT.1381–1, 2001, Serial Digital Interface-based Transport Interface for Compressed Television Signals in Networked Television Production Based on Recommendations ITU-R BT.656 and ITUR BT.1302. 21. ITU-R BT.1577, 2002, Serial Digital Interface-based Transport Interface for Compressed Television Signals in Networked Television Production Based on Recommendation ITU-R BT.1120. 22. ITU-R BT.1616, 2003, Data Stream Format for the Exchange of DV-based Audio, Data and Compressed Video over Interfaces Complying with Recommendation ITU-R BT.1381. 23. ITU-R BT.1617, 2003, Format for Transmission of DV Compressed Video, Audio and Data over Interfaces Complying with Recommendation ITU-R BT.1381. 24. ITU-R BT.1618, 2003, Data Structure for DV-based Audio, Data and Compressed Video at Data Rates of 25 and 50 Mbit/s. 25. ITU-R BT.1619, 2003, Vertical Ancillary Data Mapping for Serial Digital Interface. 26. ITU-R BT.1620, 2003, Data Structure for Dv-based Audio, Data and Compressed Video at a Data Rate of 100 Mbit/s. 27. Kikuchi, Hidekazu, et al., A 1-bit Serial Interface Chip Set for Full-Color XGA Pictures, Society for Information Display, 1999. 28. Kikuchi, Hidekazu, et al., Gigabit Video Interface: A Fully Serialized Data Transmission System for Digital Moving Pictures, International Conference on Consumer Electronics, 1998. 29. Open LVDS Display Interface (OpenLDI) Specification, v0.95, May 13, 1999. 30. SMPTE 125M–1995, Television—Component Video Signal 4:2:2—Bit-Parallel Digital Interface. 31. SMPTE 240M–1999, Television—Signal Parameters—1125-Line High-Definition Production Systems. 32. SMPTE 244M–2003, Television—System M/NTSC Composite Video Signals—BitParallel Digital Interface. 33. SMPTE 259M–1997, Television—10-Bit 4:2:2 Component and 4FSC Composite Digital Signals—Serial Digital Interface. 34. SMPTE 260M–1999, Television—1125/60 High-Definition Production System—Digital Representation and Bit-Parallel Interface. 35. SMPTE 266M–2002, Television—4:2:2 Digital Component Systems—Digital Vertical Interval Time Code. 36. SMPTE 267M–1995, Television—Bit-Parallel Digital Interface—Component Video Signal 4:2:2 16 × 9 Aspect Ratio. 37. SMPTE 272M–1994, Television—Formatting AES/EBU Audio and Auxiliary Data into Digital Video Ancillary Data Space. 38. SMPTE 274M–2005, Television—1920 × 1080 Image Sample Structure, Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates. 39. SMPTE 291M–1998, Television—Ancillary Data Packet and Space Formatting. 40. SMPTE 292M–1998, Television—Bit-Serial Digital Interface for High-Definition Television Systems. 41. SMPTE 293M–2003, Television—720 × 483 Active Line at 59.94 Hz Progressive Scan Production—Digital Representation. References 191 42. SMPTE 294M–2001, Television—720 × 483 Active Line at 59.94 Hz Progressive Scan Production—Bit-Serial Interfaces. 43. SMPTE 296M–2001, Television—1280 × 720 Progressive Image Sample Structure, Analog and Digital Representation and Analog Interface. 44. SMPTE 305.2M–2000, Television—Serial Data Transport Interface (SDTI). 45. SMPTE 314M–1999, Television—Data Structure for DV-Based Audio, Data and Compressed Video—25 and 50 Mb/s. 46. SMPTE 326M–2000, Television—SDTI Content Package Format (SDTI-CP). 47. SMPTE 334M–2000, Television—Vertical Ancillary Data Mapping for Bit-Serial Inter face. 48. SMPTE 344M–2000, Television—540 Mbps Serial Digital Interface. 49. SMPTE 348M–2000, Television—High Data-Rate Serial Data Transport Interface (HD-SDTI). 50. SMPTE 370M–2002, Television—Data Structure for DV-Based Audio, Data and Compressed Video at 100 Mb/s 1080/60i, 1080/50i, 720-60p. 51. SMPTE 372M–2002, Television—Dual Link 292M Interface for 1920 × 1080 Picture Raster. 52. SMPTE 396M–2003, Television—Packet Format and Transmission Timing of DVBased Data Streams over IEEE 1394. 53. SMPTE RP-165–1994, Error Detection Checkwords and Status Flags for Use in BitSerial Digital Interfaces for Television. 54. SMPTE RP-174–1993, Bit-Parallel Digital Interface for 4:4:4:4 Component Video Signal (Single Link). 55. SMPTE RP-175–1997, Digital Interface for 4:4:4:4 Component Video Signal (Dual Link). 56. SMPTE RP-168–2002, Definition of Vertical Interval Switching Point for Synchronous Video Switching. 57. SMPTE RP-188–1999, Transmission of Time Code and Control Code in the Ancillary Data Space of a Digital Television Data Stream. 58. SMPTE RP-208–2002, Transport of VBI Packet Data in Ancillary Data Packets. 59. Teener, Michael D. Johas, IEEE 13941995 High Performance Serial Bus, 1394 Developer’s Conference, 1997. 60. VESA DFP 1.0: Digital Flat Panel (DFP) Standard. 61. VESA Video Interface Port (VIP), Version 2, October 21, 1998. 62. VMI Specification, v1.4, January 30, 1996. 63. Wickelgren, Ingrid J., The Facts about FireWire, IEEE Spectrum, April 1997. 192 Chapter 7: Digital Video Processing Chapter 7: Digital Video Processing Chapter 7 Digital Video Processing In addition to encoding and decoding MPEG, NTSC/PAL, and many other types of video, a typical system usually requires considerable additional video processing. Since many consumer displays, and most computer displays, are progressive (noninterlaced), interlaced video must be converted to progressive (“deinterlaced”). Progressive video must be converted to interlaced to drive a conventional analog VCR or interlaced TV, requiring noninterlaced-to-interlaced conversion. Many computer displays support refresh rates up to at least 75 frames per second. CRTbased televisions have a refresh rate of 50 or 59.94 (60/1.001) fields per second. Refresh rates of up to 120 frames per second are becoming common for flat-panel televisions. For film-based compressed content, the source may only be 24 frames per second. Thus, some form of frame rate conversion must be done. Another not-so-subtle problem includes video scaling. SDTV and HDTV support multiple resolutions, yet the display may be a single, fixed resolution. Alpha mixing and chroma keying are used to mix multiple video signals or video with computer-generated text and graphics. Alpha mixing ensures a smooth crossover between sources, allows subpixel positioning of text, and limits source transition bandwidths to simplify eventual encoding to composite video signals. Since no source is perfect, even digital sources, user controls for adjustable brightness, contrast, saturation, and hue are always desirable. 192 Rounding Considerations 193 Rounding Considerations When two 8-bit values are multiplied together, a 16-bit result is generated. At some point, a result must be rounded to some lower precision (for example, 16 bits to 8 bits or 32 bits to 16 bits) in order to realize a cost-effective hardware implementation. There are several rounding techniques: truncation, conventional rounding, error feedback rounding, and dynamic rounding. Truncation Truncation drops any fractional data during each rounding operation. As a result, after only a few operations, a significant error may be introduced. This may result in contours being visible in areas of solid colors. Conventional Rounding Conventional rounding uses the fractional data bits to determine whether to round up or round down. If the fractional data is 0.5 or greater, rounding up should be performed— positive numbers should be made more positive and negative numbers should be made more negative. If the fractional data is less than 0.5, rounding down should be performed— positive numbers should be made less positive and negative numbers should be made less negative. Error Feedback Rounding Error feedback rounding follows the principle of “never throw anything away.” This is accomplished by storing the residue of a truncation and adding it to the next video sample. This approach substitutes less visible noiselike quantizing errors in place of contouring effects caused by simple truncation. An example of an error feedback rounding implementation is shown in Figure 7.1. In this example, 16 bits are reduced to 8 bits using error feedback. Dynamic Rounding This technique (a licensable Quantel patent) dithers the LSB according to the weighting of the discarded fractional bits. The original data word is divided into two parts, one representing the resolution of the final output word and one dealing with the remaining fractional data. The fractional data is compared to the output of a random number generator equal in resolution to the fractional data. The output of the comparator is a 1-bit random pattern weighted by the value of the fractional 16–BIT DATA IN 16 + 8 (LSB) 8 MSB = 0 16 8 (MSB) 8 (LSB) REGISTER 8–BIT DATA OUT Figure 7.1. Error Feedback Rounding. 194 Chapter 7: Digital Video Processing data, and serves as a carry-in to the adder. In all instances, only one LSB of the output word is changed, in a random fashion. An example of a dynamic rounding implementation is shown in Figure 7.2. 1 –0.11554975 –0.20793764 0 1.01863972 0.11461795 0 0.07504945 1.02532707 SDTV-HDTV YCbCr Transforms SDTV and HDTV applications have different colorimetric characteristics, as discussed in Chapter 3. Thus, when SDTV (HDTV) data is displayed on an HDTV (SDTV) display, the YCbCr data should be processed to compensate for the different colorimetric characteristics. SDTV to HDTV A 3 × 3 matrix can be used to convert from Y601CbCr (SDTV) to Y709CbCr (HDTV): Note that before processing, the 8-bit DC offset (16 for Y and 128 for CbCr) must be removed, then added back in after processing. HDTV to SDTV A 3 × 3 matrix can be used to convert from Y709CbCr (HDTV) to Y601CbCr (SDTV): 1 0.09931166 0.19169955 0 0.98985381 –0.11065251 0 –0.07245296 0.98339782 Note that before processing, the 8-bit DC offset (16 for Y and 128 for CbCr) must be removed, then added back in after processing. 16 16–BIT DATA IN PSEUDO RANDOM BINARY SEQUENCE GENERATOR 8 (MSB) 8 (LSB) 8 + CARRY IN A>B A 8 COMPARATOR B 8–BIT DATA OUT Figure 7.2. Dynamic Rounding. 4:4:4 to 4:2:2 YCbCr Conversion 195 4:4:4 to 4:2:2 YCbCr Conversion Converting 4:4:4 YCbCr to 4:2:2 YCbCr (Figure 7.3) is a common function in digital video. 4:2:2 YCbCr is the basis for many digital video interfaces, and requires fewer connections to implement than 4:4:4. Saturation logic should be included in the Y, Cb, and Cr data paths to limit the 8-bit range to 1–254. The 16 and 128 values shown in Figure 7.3 are used to generate the proper levels during blanking intervals. Y Filtering A template for the Y lowpass filter is shown in Figure 7.4 and Table 7.1. Because there may be many cascaded conversions (up to 10 were envisioned), the filters were designed to adhere to very tight toler- ances to avoid a buildup of visual artifacts. Departure from flat amplitude and group delay response due to filtering is amplified through successive stages. For example, if filters exhibiting –1 dB at 1 MHz and –3 dB at 1.3 MHz were employed, the overall response would be –8 dB (at 1 MHz) and –24 dB (at 1.3 MHz) after four conversion stages (assuming two filters per stage). Although the sharp cut-off results in ringing on Y edges, the visual effect should be minimal provided that group-delay performance is adequate. When cascading multiple filtering operations, the passband flatness and groupdelay characteristics are very important. The passband tolerances, coupled with the sharp cut-off, make the template very difficult (some say impossible) to match. As a result, there is usually a temptation to relax passband accuracy, but the best approach is to reduce the rate of cut-off and keep the passband as flat as possible. 24-BIT 4:4:4 YCBCR 8 OPTIONAL Y LPF MUX 16 8 OPTIONAL CR LPF 128 MUX 8 OPTIONAL CB LPF 16-BIT 4:2:2 YCBCR 8 Y 8 CBCR 8 MUX Figure 7.3. 4:4:4 to 4:2:2 YCbCr Conversion. 8-BIT 4:2:2 YCBCR YCBCR 196 Chapter 7: Digital Video Processing ATTENUATION (DB) 50 40 30 20 10 0 40 DB 50 DB 12 DB 0.40 FS 0.50 FS 0.60 FS 0.73 FS FREQUENCY (MHZ) Figure 7.4. Y Filter Template. Fs = Y 1× sample rate. Frequency Range 0 to 0.40Fs 0 to 0.27Fs 0.27Fs to 0.40Fs Typical SDTV Tolerances Passband Ripple Tolerance ±0.01 dB increasing to ±0.05 dB Passband Group Delay Tolerance 0 increasing to ±1.35 ns ±1.35 ns increasing to ±2 ns Typical HDTV Tolerances ±0.05 dB ±0.075T ±0.110T Table 7.1. Y Filter Ripple and Group Delay Tolerances. Fs = Y 1× sample rate. T = 1 / Fs. 4:4:4 to 4:2:2 YCbCr Conversion 197 ATTENUATION (DB) 60 50 40 30 20 10 0 55 DB 6 DB 0.20 FS 0.30 FS 0.25 FS FREQUENCY (MHZ) 0.47 FS Figure 7.5. Cb and Cr Filter Template for Digital Filter for Sample Rate Conversion from 4:4:4 to 4:2:2. Fs = Y 1× sample rate. Frequency Range 0 to 0.20Fs 0 to 0.20Fs Typical SDTV Tolerances Typical HDTV Tolerances Passband Ripple Tolerance 0 dB increasing to ±0.05 dB Passband Group Delay Tolerance ±0.05 dB delay distortion is zero by design Table 7.2. CbCr Filter Ripple and Group Delay Tolerances. Fs = Y 1× sample rate. T = 1 / Fs. 198 Chapter 7: Digital Video Processing CbCr Filtering Cb and Cr are lowpass filtered and decimated. In a standard design, the lowpass and decimation filters may be combined into a single filter, and a single filter may be used for both Cb and Cr by multiplexing. As with Y filtering, the Cb and Cr lowpass filtering requires a sharp cut-off to prevent repeated conversions from producing a cumulative resolution loss. However, due to the low cut-off frequency, the sharp cut-off produces ringing that is more noticeable than for Y. A template for the Cb and Cr filters is shown in Figure 7.5 and Table 7.2. Since aliasing is less noticeable in color difference signals, the attenuation at half the sampling frequency is only 6 dB. There is an advantage in using a skew-symmetric response passing through the –6 dB point at half the sampling frequency—this makes alternate coefficients in the digital filter zero, almost halving the number of taps, and also allows using a single digital filter for both the Cb and Cr signals. Use of a transversal digital filter has the advantage of providing perfect linear phase response, eliminating the need for group-delay correction. As with the Y filter, the passband flatness and group-delay characteristics are very important, and the best approach again is to reduce the rate of cut-off and keep the passband as flat as possible. Display Enhancement Brightness, Contrast, Saturation (Color), and Hue (Tint) Working in the YCbCr color space simplifies the implementation of brightness, contrast, saturation, and hue controls, as shown in Fig- ure 7.6. Also illustrated are multiplexers to allow the output of black screen, blue screen, and color bars. The design should ensure that no overflow or underflow wraparound errors occur, effectively saturating results to the 0 and 255 values. Y Processing 16 is subtracted from the Y data to position the black level at zero. This removes the DC offset so adjusting the contrast does not vary the black level. Since the Y input data may have values below 16, negative Y values should be supported at this point. The contrast (or picture or white level) control is implemented by multiplying the YCbCr data by a constant. If Cb and Cr are not adjusted, a color shift will result whenever the contrast is changed. A typical 8-bit contrast adjustment range is 0–1.992×. The brightness (or black level) control is implemented by adding or subtracting from the Y data. Brightness is done after the contrast to avoid introducing a varying DC offset due to adjusting the contrast. A typical 8-bit brightness adjustment range is –128 to +127. Finally, 16 is added to position the black level at 16. CbCr Processing 128 is subtracted from Cb and Cr to posi- tion the range about zero. The hue (or tint) control is implemented by mixing the Cb and Cr data: Cb´ = Cb cos θ + Cr sin θ Cr´= Cr cos θ – Cb sin θ where θ is the desired hue angle. A typical 8-bit hue adjustment range is –30° to +30°. The saturation (or color) control is implemented by multiplying both Cb and Cr by a constant. A typical 8-bit saturation adjustment 16 8+ – Y 128 8+ – CR 8+ – CB CONTRAST BRIGHTNESS VALUE VALUE 16 + + HUE VALUE SIN COS HUE CONTROL + – + Display Enhancement 199 00 = BLACK SCREEN 01 = BLUE SCREEN 10 = COLOR BARS 11 = NORMAL VIDEO 16 163 COLOR BAR Y MUX SATURATION VALUE 8 Y 128 128 105 COLOR BAR CR + 128 167 COLOR BAR CB + MUX MUX 8 CR 8 CB Figure 7.6. Hue, Saturation, Contrast, and Brightness Controls. 200 Chapter 7: Digital Video Processing range is 0–1.992×. In the example shown in Figure 7.6, the contrast and saturation values are multiplied together to reduce the number of multipliers in the CbCr datapath. Finally, 128 is added to both Cb and Cr. Many displays also use separate hue and saturation controls for each of the red, green, blue, cyan, yellow, and magenta colors. This enables tuning the image at production time to better match the display’s characteristics. Color Transient Improvement YCbCr transitions should be aligned. However, the Cb and Cr transitions are usually slower and time-offset due to the narrower bandwidth of color difference information. By monitoring coincident Y transitions, faster horizontal and vertical transitions may be synthesized for Cb and Cr. Small pre- and after-shoots may also be added to the Cb and Cr signals. The new Cb and Cr edges are then aligned with the Y edge, as shown in Figure 7.7. Displays commonly use this technique to provide a sharper-looking picture. 150 NS Y CB, CR ENHANCED CB, CR 800 NS 150 NS Luma Transient Improvement In this case, the Y horizontal and vertical transitions are shortened, and small pre- and after-shoots may also be added, to artificially sharpen the image. Displays commonly use this technique to provide a sharper-looking picture. Sharpness The apparent sharpness of a picture may be increased by increasing the amplitude of high-frequency luminance information. As shown in Figure 7.8, a simple bandpass filter with selectable gain (also called a peaking filter) may be used. The frequency where maximum gain occurs is usually selectable to be either at the color subcarrier frequency or at about 2.6 MHz. A coring circuit is typically used after the filter to reduce low-level noise. Figure 7.9 illustrates a more complex sharpness control circuit. The high-frequency luminance is increased using a variable bandpass filter, with adjustable gain. The coring function (typically ±1 LSB) removes low-level noise. The modified luminance is then added to the original luminance signal. In addition to selectable gain, selectable attenuation of high frequencies should also be supported. Many televisions boost high-frequency gain to improve the apparent sharpness of the picture. Although the sharpness control on the television may be turned down, this affects the picture quality of analog broadcasts. Figure 7.7. Color Transient Improvement. Display Enhancement 201 GAIN (DB) 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 (A) MHZ GAIN (DB) 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 (B) MHZ Figure 7.8. Simple Adjustable Sharpness Control. (A) NTSC. (B) PAL. Y IN VARIABLE BANDPASS FILTER CORING WEIGHTING AND ADDING Y OUT DELAY (A) OUT IN (B) Figure 7.9. More Complex Sharpness Control. (A) Typical implementation. (B) Coring function. 202 Chapter 7: Digital Video Processing Blue Stretch Blue stretch increases the blue value of white and near-white colors in order to make whites appear brighter. When applying blue stretch, only colors only within a specified color range should be processed. Colors with a Y value of ~80% or more of the maximum, have a low saturation value, and fall within a white detection area in the CbCrplane, have their blue components increased by ~4% (the blue gain factor) and their red components decreased the same amount. For more complex designs, the white detection area and blue gain factor can be dependent on the color’s Y value and saturation level. A transition boundary can be used around the white detection area for gradually decreasing the blue gain factor as colors move away from the white detection area boundary. This can prevent hard transitions between areas that are blue stretched and areas that are not. If a color falls inside the transition boundary area, it is blue stretched using a fraction of the blue gain factor, with the fraction decreasing as the distance from the edge of the detection area boundary increases. Green Enhancement Green enhancement creates a richer, more saturated green color when the level of green is low. Displays commonly use this technique to provide greener looking grass, plants, etc. When applying green enhancement, only colors only within a specified color range should be processed. Colors with a low green saturation value, and fall within a green detection area in the CbCr-plane, have their saturation increased. Rather then centering the green detection area about the green axis (241° in Figure 9.28) some designs use ~213° for the green detection axis so the same design can also easily be used to implement skin tone correction. Simple implementations have the maximum saturation gain (~1.2×) occurring on the green detection axis, with the saturation gain decreasing to 1× as the distance from the green detection axis increases. For more complex designs, the green detection area and maximum saturation gain can be dependent on the color’s Y value and saturation level Some displays also use this technique to implement blue enhancement, used to make the sky appear more blue. Dynamic Contrast Using dynamic contrast (also called adaptive contrast enhancement), the differences between dark and light portions of the image are artificially enhanced based on the content in the image. Displays commonly use this technique to improve their contrast ratio. Bright colors in mostly dark images are enhanced by making them brighter (white stretch). This is typically done by using histogram information to modify the upper portion of the gamma curve. Dark colors in mostly light images are enhanced by making them darker (black stretch). This is typically done by using histogram information to modify the lower portion of the gamma curve. For a medium-bright image, both techniques may be applied. A minor gamma correction adjustment may also be applied to colors that are between dark and light, resulting in a more detailed and contrasting picture. Display Enhancement 203 Color Correction The RGB chromaticities are usually slightly different between the source video and what the display uses. This results in red, green and blue colors that are not completely accurate. Color correction can be done on the source video to compensate for the display characteristics, enabling more accurate red, green and blue colors to be displayed. An alternate type of color correction is to perform color expansion, taking advantage of the greater color reproduction capabilities of modern displays. This can result in greener greens, bluer blues, etc. One common technique of implementing color expansion is to use independent hue and saturation controls for each primary and complementary color, plus the skin color. Color Temperature Correction In an uncalibrated television, the color temperature (white color) varies based on the brightness level. The color temperature of D65, the white point specified by most video standards, is 6500 °K. Color temperatures above 6500 °K. are more bluish (cool); color temperatures below 6500 °K. are more reddish (warm). Many televisions ship from the factory with a very high average color temperature (7000–8000 °K.) to emphasize the brightness of the set. Viewers can select from two or three factory presets (warm, cool, etc.) or viewing modes (movies, sports, etc.) which are a reference to the color temperature. A “cool” setting is brighter (like what you see in midday light) and is better for daylight viewing, such as sporting events, because of the enhanced brightness. A “warm” setting is softer (like what you see in a softly lit indoor environment) and is better for viewing movies, or in darkened environments. The color temperature may be finely adjusted by using a 3 × 3 matrix multiplier to process the YCbCr or R´G´B´ data. 10 registers (one for every 10 IRE step from 10–100 IRE) provide the nine coefficients for the 3 × 3 matrix multiplier. The values of the registers are determined by a calibrating process. YCbCr or R´G´B´ values for intermediate IRE levels may be determined using interpolation. 204 Chapter 7: Digital Video Processing Video Mixing and Graphics Overlay Mixing video signals may be as simple as switching between two video sources. This is adequate if the resulting video is to be displayed on a computer monitor. For most other applications, a technique known as alpha mixing should be used. Alpha mixing may also be used to fade to or from a specific color (such as black) or to overlay computer-generated text and graphics onto a video signal. Alpha mixing must be used if the video is to be encoded to composite video. Otherwise, ringing and blurring may appear at the source switching points, such as around the edges of computer-generated text and graphics. This is due to the color information being lowpass filtered within the NTSC/PAL encoder. If the filters have a sharp cut-off, a fast color transition will produce ringing. In addition, the intensity information may be bandwidth-limited to about 4–5 MHz somewhere along the video path, slowing down intensity transitions. Mathematically, with alpha normalized to have values of 0–1, alpha mixing is implemented as: out = (alpha_0)(in_0) + (alpha_1)(in_1) + ... In this instance, each video source has its own alpha information. The alpha information may not total to one (unity gain). Figure 7.10 shows mixing of two YCbCr video signals, each with its own alpha information. As YCbCr uses an offset binary notation, the offset (16 for Y and 128 for Cb and Cr) is removed prior to mixing the video signals. After mixing, the offset is added back in. Note that two 4:2:2 YCbCr streams may also be processed directly; there is no need to convert them to 4:4:4 YCbCr, mix, then convert the result back to 4:2:2 YCbCr. When only two video sources are mixed and alpha_0 + alpha_1 = 1 (implementing a crossfader), a single alpha value may be used mathematically shown as: out = (alpha)(in_0) + (1 – alpha)(in_1) When alpha = 0, the output is equal to the in_1 video signal; when alpha = 1, the output is equal to the in_0 video signal. When alpha is between 0 and 1, the two video signals are proportionally multiplied, and added together. Expanding and rearranging the previous equation shows how a two-channel mixer may be implemented using a single multiplier: out = (alpha)(in_0 – in_1) + in_1 Fading to and from a specific color is done by setting one of the input sources to a constant color. Figure 7.11 illustrates mixing two YCbCr sources using a single alpha channel. Figures 7.12 and 7.13 illustrate mixing two R´G´B´ video sources (R´G´B´ has a range of 0–255). Figures 7.14 and 7.15 show mixing two digital composite video signals. A common problem in computer graphics systems that use alpha is that the frame buffer may contain preprocessed R´G´B´ or YCbCr data; that is, the R´G´B´ or YCbCr data in the frame buffer has already been multiplied by alpha. Assuming an alpha (A) value of 0.5, nonprocessed R´G´B´A values for white are (255, 255, 255, 128); preprocessed R´G´B´A values for white are (128, 128, 128, 128). Therefore, any mixing circuit that accepts R´G´B´ or YCbCr data from a frame buffer should be able to handle either format. By adjusting the alpha values, slow to fast crossfades are possible, as shown in Figure Video Mixing and Graphics Overlay 205 16 ALPHA_0 8 – + Y_0 8 8 ROUNDING + AND + Y_OUT LIMITING 16 ALPHA_1 16 8 – + Y_1 128 ALPHA_0 CR_0 8 – + 8 8 ROUNDING + AND + CR_OUT LIMITING 128 ALPHA_1 128 CR_1 8 – + 128 ALPHA_0 CB_0 8 – + ROUNDING 8 8 + AND + CB_OUT LIMITING 128 ALPHA_1 128 CB_1 8 – + Figure 7.10. Mixing Two YCbCr Video Signals, Each with Its Own Alpha Channel. 206 Chapter 7: Digital Video Processing 8 Y_1 8 – + Y_0 + ALPHA (0–1) ROUNDING 8 AND LIMITING Y_OUT 8 CR_1 CR_0 8 – + + ALPHA (0–1) ROUNDING 8 AND LIMITING CR_OUT 8 CB_1 CB_0 8 – + + ALPHA (0–1) ROUNDING 8 AND LIMITING CB_OUT Figure 7.11. Simplified Mixing (Crossfading) of Two YCbCr Video Signals Using a Single Alpha Channel. Video Mixing and Graphics Overlay 207 ALPHA_0 8 R_0 ROUNDING 8 + AND R_OUT LIMITING ALPHA_1 8 R_1 ALPHA_0 8 G_0 ROUNDING 8 + AND LIMITING G_OUT ALPHA_1 8 G_1 ALPHA_0 8 B_0 ROUNDING 8 + AND LIMITING B_OUT ALPHA_1 8 B_1 Figure 7.12. Mixing Two RGB Video Signals (RGB Has a Range of 0–255), Each with Its Own Alpha Channel. 208 Chapter 7: Digital Video Processing 8 R_1 8 – + R_0 + ALPHA (0–1) ROUNDING 8 AND LIMITING R_OUT 8 G_1 8 – + G_0 + ALPHA (0–1) ROUNDING 8 AND LIMITING G_OUT 8 B_1 8 – + B_0 + ALPHA (0–1) ROUNDING 8 AND LIMITING B_OUT Figure 7.13. Simplified Mixing (Crossfading) of Two RGB Video Signals (RGB Has a Range of 0–255) Using a Single Alpha Channel. Video Mixing and Graphics Overlay 209 BLACK LEVEL ALPHA_0 SOURCE_0 8 – + ROUNDING 8 8 + AND + OUT LMITING SOURCE_1 BLACK LEVEL 8 – + ALPHA_1 BLACK LEVEL Figure 7.14. Mixing Two Digital Composite Video Signals, Each with Its Own Alpha Channel. 8 SOURCE_1 SOURCE_0 8 – + + ALPHA (0–1) ROUNDING 8 AND LIMITING OUT Figure 7.15. Simplified Mixing (Crossfading) of Two Digital Composite Video Signals Using a Single Alpha Channel. 210 Chapter 7: Digital Video Processing NORMALIZED ALPHA VALUE 1 0 50% SAMPLE CLOCK (A) NORMALIZED ALPHA VALUE 1 0 50% SAMPLE CLOCK (B) Figure 7.16. Controlling Alpha Values to Implement (A) Fast or (B) Slow Keying. In (A), the effective switching point lies between two samples. In (B), the transition is wider and is aligned at a sample instant. Luma and Chroma Keying 211 7.16. Large differences in alpha between samples result in a fast crossfade; smaller differences result in a slow crossfade. If using alpha mixing for special effects, such as wipes, the switching point (where 50% of each video source is used) must be able to be adjusted to an accuracy of less than one sample to ensure smooth movement. By controlling the alpha values, the switching point can be effectively positioned anywhere, as shown in Figure 7.16a. Text can be overlaid onto video by having a character generator control the alpha inputs. By setting one of the input sources to a constant color, the text will assume that color. Note that for those designs that subtract 16 (the black level) from the Y channel before processing, negative Y values should be supported after the subtraction. This allows the design to pass through real-world and test video signals with minimum artifacts. Luma and Chroma Keying Keying involves specifying a desired foreground color; areas containing this color are replaced with a background image. Alternately, an area of any size or shape may be specified; foreground areas inside (or outside) this area are replaced with a background image. fying two luminance values of the foreground image: YH and YL (YL < YH). For keying the background into white foreground areas, foreground luminance values (YFG) above YH are replaced with the background image; YFG values below YL contain the foreground image. For YFG values between YL and YH, linear mixing is done between the foreground and background images. This operation may be expressed as: if YFG > YH K = 1 = background only if YFG < YL K = 0 = foreground only if YH ≥ YFG ≥ YL K = (YFG – YL)/(YH – YL) = mix By subtracting K from 1, the new luminance keying signal for keying into black foreground areas can be generated. Figure 7.17 illustrates luminance keying for two YCbCr sources. Although chroma keying typically uses a suppression technique to remove information from the foreground image, this is not done when luminance keying as the magnitudes of Cb and Cr are usually not related to the luminance level. Figure 7.18 illustrates luminance keying for R´G´B´ sources, which is more applicable for computer graphics. YFG may be obtained by the equation: Luminance Keying Luminance keying involves specifying a desired foreground luminance level; foreground areas containing luminance levels above (or below) the keying level are replaced with the background image. Alternately, this hard keying implementation may be replaced with soft keying by speci- YFG = 0.299R´ + 0.587G´ + 0.114B´ In some applications, the red and blue data is ignored, resulting in YFG being equal to only the green data. Figure 7.19 illustrates one technique of luminance keying between two digital composite video sources. 212 Chapter 7: Digital Video Processing LUMINANCE KEY GENERATOR K 16 BACKGROUND LUMINANCE (Y) – + ROUNDING + AND + LIMITING FOREGROUND LUMINANCE (Y) BACKGROUND CR 16 – + 1–K K 128 – + ROUNDING + AND LIMITING 16 MIXER + FOREGROUND CR BACKGROUND CB 128 – + 1–K K 128 – + ROUNDING + AND LIMITING 128 MIXER + FOREGROUND CB 128 – + 1–K 128 MIXER Figure 7.17. Luminance Keying of Two YCbCr Video Signals. Y_OUT CR_OUT CB_OUT Luma and Chroma Keying 213 LUMINANCE KEY GENERATOR K BACKGROUND R FOREGROUND R 1–K ROUNDING + AND LIMITING R_OUT MIXER K BACKGROUND G FOREGROUND G 1–K ROUNDING + AND LIMITING G_OUT MIXER K BACKGROUND B FOREGROUND B 1–K ROUNDING + AND LIMITING B_OUT MIXER Figure 7.18. Luminance Keying of Two RGB Video Signals. RGB range is 0–255. 214 Chapter 7: Digital Video Processing Y Y/C SEPARATOR LUMINANCE KEY GENERATOR K BLACK LEVEL – BACKGROUND + VIDEO + FOREGROUND VIDEO BLACK LEVEL – + 1–K ROUNDING AND LIMITING + BLACK LEVEL OUT MIXER Figure 7.19. Luminance Keying of Two Digital Composite Video Signals. Chroma Keying Chroma keying involves specifying a desired foreground key color; foreground areas containing the key color are replaced with the background image. Cb and Cr are used to specify the key color; luminance information may be used to increase the realism of the chroma keying function. The actual mixing of the two video sources may be done in the component or composite domain, although component mixing reduces artifacts. Early chroma keying circuits simply performed a hard or soft switch between the foreground and background sources. In addition to limiting the amount of fine detail maintained in the foreground image, the background was not visible through transparent or translucent fore- ground objects, and shadows from the foreground were not present in areas containing the background image. Linear keyers were developed that combine the foreground and background images in a proportion determined by the key level, resulting in the foreground image being attenuated in areas containing the background image. Although allowing foreground objects to appear transparent, there is a limit on the fineness of detail maintained in the foreground. Shadows from the foreground are not present in areas containing the background image unless additional processing is done— the luminance levels of specific areas of the background image must be reduced to create the effect of shadows cast by foreground objects. Luma and Chroma Keying 215 If the blue or green backing used with the foreground scene is evenly lit except for shadows cast by the foreground objects, the effect on the background will be that of shadows cast by the foreground objects. This process, referred to as shadow chroma keying, or luminance modulation, enables the background luminance levels to be adjusted in proportion to the brightness of the blue or green backing in the foreground scene. This results in more realistic keying of transparent or translucent foreground objects by preserving the spectral highlights. Note that green backgrounds are now more commonly used due to lower chroma noise. Chroma keyers are also limited in their ability to handle foreground colors that are close to the key color without switching to the background image. Another problem may be a bluish tint to the foreground objects as a result of blue light reflecting off the blue backing or being diffused in the camera lens. Chroma spill is difficult to remove since the spill color is not the original key color; some mixing occurs, changing the original key color slightly. One solution to many of the chroma keying problems is to process the foreground and background images individually before combining them, as shown in Figure 7.20. Rather than choosing between the foreground and background, each is processed individually and then combined. Figure 7.21 illustrates the major processing steps for both the foreground and background images during the chroma key process. Not shown in Figure 7.20 is the circuitry to initially subtract 16 (Y) or FOREGROUND VIDEO (YCBCR) FOREGROUND SUPPRESSOR Y KEY GENERATOR KFG NONADDITIVE MIX FOREGROUND GAIN GARBAGE MATTE KEY PROCESSOR KBG + BACKGROUND GAIN OUTPUT VIDEO (YCBCR) BACKGROUND VIDEO (YCBCR) Figure 7.20. Typical Component Chroma Key Circuit. 216 Chapter 7: Digital Video Processing (A) (B) (C) (D) (E) (F) Figure 7.21. Major Processing Steps During Chroma Keying. (A) Original foreground scene. (B) Original background scene. (C) Suppressed foreground scene. (D) Background keying signal. (E) Background scene after multiplication by background key. (F) Composite scene generated by adding (C) and (E). Luma and Chroma Keying 217 128 (Cb and Cr) from the foreground and background video signals and the addition of 16 (Y) or 128 (Cb and Cr) after the final output adder. Any DC offset not removed will be amplified or attenuated by the foreground and background gain factors, shifting the black level. The foreground key (KFG) and background key (KBG) signals have a range of 0 to 1. The garbage matte key signal (the term matte comes from the film industry) forces the mixer to output the foreground source in one of two ways. The first method is to reduce KBG in proportion to increasing KFG. This provides the advantage of minimizing black edges around the inserted foreground. The second method is to force the background to black for all nonzero values of the matte key, and insert the foreground into the background hole. This requires a cleanup function to remove noise around the black level, as this noise affects the background picture due to the straight addition process. The garbage matte is added to the foreground key signal (KFG) using a non-additive mixer (NAM). A nonadditive mixer takes the brighter of the two pictures, on a sample-bysample basis, to generate the key signal. Matting is ideal for any source that generates its own keying signal, such as character generators, and so on. The key generator monitors the foreground Cb and Cr data, generating the foreground keying signal, KFG. A desired key color is selected, as shown in Figure 7.22. The foreground Cb and Cr data are normalized (generating Cb´ and Cr´) and rotated θ degrees to generate the X and Z data, such that the positive X axis passes as close as possible to the desired key color. Typically, θ may be varied in 1° increments, and optimum chroma keying occurs when the X axis passes through the key color. X and Z are derived from Cb and Cr using the equations: X = Cb´ cos θ + Cr´ sin θ Z = Cr´ cos θ – Cb´ sin θ Since Cb´ and Cr´ are normalized to have a range of ±1, X and Z have a range of ±1. The foreground keying signal (KFG) is generated from X and Z and has a range of 0–1: KFG = X – (|Z|/(tan (α/2))) KFG = 0 if X < (|Z|/(tan (α/2))) where α is the acceptance angle, symmetrically centered about the positive X axis, as shown in Figure 7.23. Outside the acceptance angle, KFG is always set to zero. Inside the acceptance angle, the magnitude of KFG linearly increases the closer the foreground color approaches the key color and as its saturation increases. Colors inside the acceptance angle are further processed by the foreground suppressor. The foreground suppressor reduces foreground color information by implementing X = X – KFG, with the key color being clamped to the black level. To avoid processing Cb and Cr when KFG = 0, the foreground suppressor performs the operations: CbFG = Cb – KFG cos θ CrFG = Cr – KFG sin θ where CbFG and CrFG are the foreground Cb and Cr values after key color suppression. Early implementations suppressed foreground information by multiplying Cb and Cr by a clipped version of the KFG signal. This, however, generated in-band alias components due 218 Chapter 7: Digital Video Processing CR´ Z RED MAGENTA YELLOW θ KEY COLOR CB´ BLUE X GREEN CYAN Figure 7.22. Rotating the Normalized Cb and Cr (Cb´ and Cr´) Axes by θ to Obtain the X and Z Axes, Such That the X Axis Passes Through the Desired Key Color (Blue in This Example). RED Z MAGENTA YELLOW KFG = 0 UNSUPPRESSED FOREGROUND COLORS GREEN KFG = 0 α/ 2 X = 0.5 KFG = 0.5 α/ 2 BLUE SUPPRESSED X FOREGROUND COLORS KFG = 0.5 KFG = 0 CYAN Figure 7.23. Foreground Key Values and Acceptance Angle. Luma and Chroma Keying 219 to the multiplication and clipping process and produced a hard edge at key color boundaries. Unless additional processing is done, the CbFG and CrFG components are set to zero only if they are exactly on the X axis. Hue variations due to noise or lighting will result in areas of the foreground not being entirely suppressed. Therefore, a suppression angle is set, symmetrically centered about the positive X axis. The suppression angle (β) is typically configurable from a minimum of zero degrees, to a maximum of about one-third the acceptance angle (α). Any CbCr components that fall within this suppression angle are set to zero. Figure 7.24 illustrates the use of the suppression angle. Foreground luminance, after being normalized to have a range of 0–1, is suppressed by: YFG = Y´ – ySKFG YFG = 0 if ySKFG > Y´ Here, yS is a programmable value and used to adjust YFG so that it is clipped at the black level in the key color areas. The foreground suppressor also removes key-color fringes on wanted foreground areas caused by chroma spill, the overspill of the key color, by removing discolorations of the wanted foreground objects. Ultimatte® improves on this process by measuring the difference between the blue and green colors, as the blue backing is never pure blue and there may be high levels of blue in the foreground objects. Pure blue is rarely found in nature, and most natural blues have a higher content of green than red. For this reason, the red, green, and blue levels are monitored to differentiate between the blue backing and blue in wanted foreground objects. If the difference between blue and green is great enough, all three colors are set to zero to produce black; this is what happens in areas of the foreground containing the blue backing. If the difference between blue and green is not large, the blue is set to the green level unless the green exceeds red. This technique allows the removal of the bluish tint caused by the blue backing while being able to reproduce natural blues in the foreground. As an example, a white foreground area normally would consist of equal levels of red, green, and blue. If the white area is affected by the key color (blue in this instance), it will have a bluish tint—the blue levels will be greater than the red or green levels. Since the green does not exceed the red, the blue level is made equal to the green, removing the bluish tint. There is a price to pay, however. Magenta in the foreground is changed to red. A green backing can be used, but in this case, yellow in the foreground is modified. Usually, the clamping is released gradually to increase the blue content of magenta areas. The key processor generates the initial background key signal (K´BG) used to remove areas of the background image where the foreground is to be visible. K´BG is adjusted to be zero in desired foreground areas and unity in background areas with no attenuation. It is generated from the foreground key signal (KFG) by applying lift (kL) and gain (kG) adjustments followed by clipping at zero and unity values: K´BG = (KFG – kL)kG Figure 7.25 illustrates the operation of the background key signal generation. The transition between K´BG = 0 and K´BG = 1 should be made as wide as possible to minimize discontinuities in the transitions between foreground and background areas. For foreground areas containing the same CbCr values, but different luminance (Y) val- 220 Chapter 7: Digital Video Processing RED Z MAGENTA YELLOW AFTER SUPPRESSION α/ 2 KFG = 0 KFG = 0 BEFORE SUPPRESSION BLUE X GREEN RED KFG = 0 COLOR SHIFTS AS A RESULT OF SUPPRESSION CYAN (A) Z MAGENTA YELLOW KFG = 0 KFG = 0 COLOR SHIFTS AS A RESULT OF SUPPRESSION BLUE β/2 X GREEN KFG = 0 X=Z=0 AFTER SUPPRESSION CYAN (B) Figure 7.24. Suppression Angle Operation for a Gradual Change from a Red Foreground Object to the Blue Key Color. (A) Simple suppression. (B) Improved suppression using a suppression angle. Luma and Chroma Keying 221 RED Z MAGENTA YELLOW K´BG = 0 GREEN KL 1 / KG K´BG = 0 K´BG = 0.5 K´BG = 1 BLUE X K´BG = 1 K´BG = 0.5 K´BG = 0 CYAN KEY COLOR Figure 7.25. Background Key Generation. ues, as the key color, the key processor may also reduce the background key value as the foreground luminance level increases, allowing turning off the background in foreground areas containing a lighter key color, such as light blue. This is done by: KBG = K´BG – ycYFG KBG = 0 if ycYFG > KFG To handle shadows cast by foreground objects, and opaque or translucent foreground objects, the luminance level of the blue backing of the foreground image is monitored. Where the luminance of the blue backing is reduced, the luminance of the background image also is reduced. The amount of background luminance reduction must be controlled so that defects in the blue backing (such as seams or footprints) are not interpreted as foreground shadows. Additional controls may be implemented to enable the foreground and background signals to be controlled independently. Examples are adjusting the contrast of the foreground so it matches the background or fading the foreground in various ways (such as fading to the background to make a foreground object vanish or fading to black to generate a silhouette). In the computer environment, there may be relatively slow, smooth edges—especially edges involving smooth shading. As smooth edges are easily distorted during the chroma keying process, a wide keying process is usually used in these circumstances. During wide keying, the keying signal starts before the edge of the graphic object. 222 Chapter 7: Digital Video Processing Composite Chroma Keying In some instances, the component signals (such as YCbCr) are not directly available. For these situations, composite chroma keying may be implemented, as shown in Figure 7.26. To detect the chroma key color, the foreground video source must be decoded to produce the Cb and Cr color difference signals. The keying signal, KFG, is then used to mix between the two composite video sources. The garbage matte key signal forces the mixer to output the background source by reducing KFG. Chroma keying using composite video signals usually results in unrealistic keying, since there is inadequate color bandwidth. As a result, there is a lack of fine detail, and halos may be present on edges. Superblack and Luma Keying Video systems also may make use of superblack or luma keying. Areas of the foreground video that have a value within a specified range below the blanking level (analog video) or black level (digital video) are replaced with the background video information. CB, CR DECODE KEY GENERATOR KFG GARBAGE MATTE BACKGROUND VIDEO + OUTPUT VIDEO FOREGROUND VIDEO – + Figure 7.26. Typical Composite Chroma Key Circuit. Video Scaling 223 Video Scaling With all the various video resolutions (Table 7.3), scaling is usually needed in almost every solution. When generating objects that will be displayed on SDTV, computer users must be concerned with such things as text size, line thickness, and so forth. For example, text readable on a 1280 × 1024 computer display may not be readable on an SDTV display due to the large amount of downscaling involved. Thin horizontal lines may either disappear completely or flicker at a 25 or 29.97 Hz rate when converted to interlaced SDTV. Note that scaling must be performed on component video signals (such as R´G´B´ or YCbCr). Composite color video signals cannot be scaled directly due to the color subcarrier phase information present, which would be meaningless after scaling. In general, the spacing between output samples can be defined by a Target Increment (tarinc) value: tarinc = I / O where I and O are the number of input (I) and output (O) samples, either horizontally or vertically. The first and last output samples may be aligned with the first and last input samples by adjusting the equation to be: tarinc = (I – 1) / (O – 1) Displays 704 × 480 854 × 480 704 × 576 854 × 576 1280 × 720 1280 × 768 640 × 480 800 × 600 1024 × 768 1280 × 768 1366 × 768 1024 × 1024 1920 × 1080 1280 × 1024 SDTV Sources 704 × 3601 704 × 4321 480 × 480 480 × 576 528 × 480 544 × 480 544 × 576 640 × 480 704 × 480 704 × 576 768 × 576 HDTV Sources 1280 × 720 1440 × 8162 1440 × 10403 1280 × 1080 1440 × 1080 1920 × 1080 Table 7.3. Common Active Resolutions for Consumer Displays and Broadcast Sources. 116:9 letterbox on a 4:3 display. 22.35:1 anamorphic for a 16:9 1920x1080 display. 31.85:1 anamorphic for a 16:9 1920x1080 display. 224 Chapter 7: Digital Video Processing Pixel Dropping and Duplication This is also called “nearest neighbor” scaling since only the input sample closest to the output sample is used. The simplest form of scaling down is pixel dropping, where (m) out of every (n) samples are thrown away both horizontally and vertically. A modified version of the Bresenham line-drawing algorithm (described in most computer graphics books) is typically used to determine which samples not to discard. Simple upscaling can be accomplished by pixel duplication, where (m) out of every (n) samples are duplicated both horizontally and vertically. Again, a modified version of the Bresenham line-drawing algorithm can be used to determine which samples to duplicate. Scaling using pixel dropping or duplication is not recommended due to the visual artifacts and the introduction of aliasing components. Linear Interpolation An improvement in video quality of scaled images is possible using linear interpolation. When an output sample falls between two input samples (horizontally or vertically), the output sample is computed by linearly interpolating between the two input samples. However, scaling to images smaller than one-half of the original still results in deleted samples. Figure 7.27 illustrates the vertical scaling of a 16:9 image to fit on a 4:3 display. A simple bi-linear vertical filter is commonly used, as shown in Figure 7.28a. Two source samples, Ln and Ln+1, are weighted and added together to form a destination sample, Dm. D0 = 0.75L0 + 0.25L1 D1 = 0.5L1 + 0.5L2 D2 = 0.25L2 + 0.75L3 However, as seen in Figure 7.28a, this results in uneven line spacing, which may result in visual artifacts. Figure 7.28b illustrates vertical filtering that results in the output lines being more evenly spaced: D0 = L0 D1 = (2/3)L1 + (1/3)L2 D2 = (1/3)L2 + (2/3)L3 The linear interpolator is a poor bandwidth-limiting filter. Excess high-frequency detail is removed unnecessarily and too much energy above the Nyquist limit is still present, resulting in aliasing. Anti-Aliased Resampling The most desirable approach is to ensure the frequency content scales proportionally with the image size, both horizontally and vertically. Figure 7.29 illustrates the fundamentals of an anti-aliased resampling process. The input data is upsampled by A and lowpass filtered to remove image frequencies created by the interpolation process. Filter B bandwidth-limits the signal to remove frequencies that will alias in the resampling process B. The ratio of B/A determines the scaling factor. Filters A and B are usually combined into a single filter. The response of the filter largely determines the quality of the interpolation. The ideal lowpass filter would have a very flat passband, a sharp cutoff at half of the lowest sampling frequency (either input or output), and very high attenuation in the stopband. However, since such a filter generates ringing on sharp edges, it is usually desirable to roll off the top of the passband. This makes for slightly softer pictures, but with less pronounced ringing. Video Scaling 225 60 16:9 LETTERBOX PROGRAM (480) * (4 / 3) / (16 / 9) = 360 (A) 360 VISIBLE ACTIVE LINES 60 72 480 TOTAL ACTIVE LINES 16:9 LETTERBOX PROGRAM (576) * (4 / 3) / (16 / 9) = 432 432 VISIBLE ACTIVE LINES 72 576 TOTAL ACTIVE LINES (B) Figure 7.27. Vertical Scaling of 16:9 Images to Fit on a 4:3 Display. (A) 480-line systems. (B) 576-line systems. 226 Chapter 7: Digital Video Processing L0 0.75 D0 0.25 L1 0.5 D1 0.5 L2 0.25 D2 0.75 L3 L0 D0 L1 2/3 D1 1/3 L2 1/3 D2 2/3 L3 L4 0.75 D3 0.25 L5 0.5 D4 0.5 L6 0.25 D5 0.75 L7 (A) L4 D3 L5 2/3 D4 1/3 L6 1/3 D5 2/3 L7 (B) Figure 7.28. 75% Vertical Scaling of 16:9 Images to Fit on a 4:3 Display. (A) Unevenly spaced results. (B) Evenly spaced results. UPSAMPLE LOWPASS IN BY A FILTER A LOWPASS FILTER B RESAMPLE BY B OUT Figure 7.29. General Anti-Aliased Resampling Structure. Scan Rate Conversion 227 Passband ripple and stopband attenuation of the filter provide some measure of scaling quality, but the subjective effect of ringing means a flat passband might not be as good as one might think. Lots of stopband attenuation is almost always a good thing. There are essentially three variations of the general resampling structure. Each combines the elements of Figure 7.29 in various ways. One approach is a variable-bandwidth antialiasing filter followed by a combined interpolator/resampler. In this case, the filter needs new coefficients for each scale factor—as the scale factor is changed, the quality of the image may vary. In addition, the overall response is poor if linear interpolation is used. However, the filter coefficients are time-invariant and there are no gain problems. A second approach is a combined filter/ interpolator followed by a resampler. Generally, the higher the order of interpolation, n, the better the overall response. The center of the filter transfer function is always aligned over the new output sample. With each scaling factor, the filter transfer function is stretched or compressed to remain aligned over n output samples. Thus, the filter coefficients, and the number of input samples used, change with each new output sample and scaling factor. Dynamic gain normalization is required to ensure the sum of the filter coefficients is always equal to one. A third approach is an interpolator followed by a combined filter/resampler. The input data is interpolated up to a common multiple of the input and output rates by the insertion of zero samples. This is filtered with a lowpass finite-impulse-response (FIR) filter to interpolate samples in the zero-filled gaps, then re-sampled at the required locations. This type of design is usually achieved with a “polyphase” filter which switches its coefficients as the relative position of input and output samples change. Display Scaling Examples Figures 7.30 through 7.38 illustrate various scaling examples for displaying 16:9 and 4:3 pictures on 4:3 and 16:9 displays, respectively. How content is displayed is a combination of user preferences and content aspect ratio. For example, when displaying 16:9 content on a 4:3 display, many users prefer to have the entire display filled with the cropped picture (Figure 7.31) rather than seeing black or gray bars with the letterbox solution (Figure 7.32). In addition, some displays incorrectly assume any progressive video signal on their YPbPr inputs is from an “anamorphic” source. As a result, they horizontally upscale progressive 16:9 programs by 25% when no scaling should be applied. Therefore, for set-top boxes it is useful to include a “16:9 (Compressed)” mode, which horizontally downscales the progressive 16:9 program by 25% to pre-compensate for the horizontally upscaling being done by the 16:9 display. Scan Rate Conversion In many cases, some form of scan rate conversion (also called temporal rate conversion, frame rate conversion, or field rate conversion) is needed. Multi-standard analog VCRs and scan converters use scan rate conversion to 228 Chapter 7: Digital Video Processing 1920 Samples 1080 Scan Lines Figure 7.30. 16:9 Source Example. 720 Samples 480 Scan Lines Figure 7.31. Scaling 16:9 Content for a 4:3 Display: “Normal” or pan-and-scan mode. Results in some of the 16:9 content being ignored (indicated by gray regions). 360 Scan Lines 720 Samples Scan Rate Conversion 229 480 Scan Lines Figure 7.32. Scaling 16:9 Content for a 4:3 Display: “Letterbox” mode. Entire 16:9 program visible, with black bars at top and bottom of display. 720 Samples 480 Scan Lines Figure 7.33. Scaling 16:9 Content for a 4:3 Display: “Squeezed” mode. Entire 16:9 program horizontally squeezed to fit 4:3 display, resulting in a distorted picture. 230 Chapter 7: Digital Video Processing 720 Samples 480 Scan Lines Figure 7.34. 4:3 Source Example. 1920 Samples 1080 Scan Lines Figure 7.35. Scaling 4:3 Content for a 16:9 Display: “Normal” mode. Left and right portions of 16:9 display not used, so made black or gray. 1920 Samples Scan Rate Conversion 231 1080 Scan Lines Figure 7.36. Scaling 4:3 Content for a 16:9 Display: “Wide” mode. Entire picture linearly scaled horizontally to fill 16:9 display, resulting in distorted picture unless used with anamorphic content. 1920 Samples 1080 Scan Lines Figure 7.37. Scaling 4:3 Content for a 16:9 Display: “Zoom” mode. Top and bottom portion of 4:3 picture deleted, then scaled to fill 16:9 display. 232 Chapter 7: Digital Video Processing 1920 Samples 1080 Scan Lines Figure 7.38. Scaling 4:3 Content for a 16:9 Display: “Panorama” mode. Left and right 25% edges of picture are nonlinearly scaled horizontally to fill 16:9 display, distorted picture on left and right sides. convert between various video standards. Computers usually operate the display at about 75 Hz noninterlaced, yet need to display 50 and 60 Hz interlaced video. With digital television, multiple frame rates can be supported. Note that processing must be performed on component video signals (such as R´G´B´ or YCbCr). Composite color video signals cannot be processed directly due to the color subcarrier phase information present, which would be meaningless after processing. Frame or Field Dropping and Duplicating Simple scan-rate conversion may be done by dropping or duplicating one out of every N fields. For example, the conversion of 60 Hz to 50 Hz interlaced operation may drop one out of every six fields, as shown in Figure 7.39, using a single field store. The disadvantage of this technique is that the viewer may see jerky motion, or motion judder. In addition, some video decompression products use top-field only to convert from 60 Hz to 50 Hz, degrading the vertical resolution. The worst artifacts are present when a non-integer scan rate conversion is done—for example, when some frames are displayed three times, while others are displayed twice. In this instance, the viewer will observe double or blurred objects. As the human brain tracks an object in successive frames, it expects to see a regular sequence of positions, and has trouble reconciling the apparent stop-start motion of objects. As a result, it incorrectly concludes that there are two objects moving in parallel. Scan Rate Conversion 233 60 HZ INTERLACED (WRITE) 50 HZ INTERLACED (READ) FIELD 1 2 3 4 5 6 FIELD 1 2 3 4 5 Figure 7.39. 60 Hz to 50 Hz Conversion Using a Single Field Store by Dropping One out of Every Six Fields. 1 2 3 4 5 6 525 / 50 FIELDS 100 % 17 % 33 % 50 % 67 % 83 % 83 % 67 % 50 % 33 % 17 % 100 % 525 / 60 FIELDS 1 2 3 4 5 6 7 Figure 7.40. 50 Hz to 60 Hz Conversion Using Temporal Interpolation with No Motion Compensation. 234 Chapter 7: Digital Video Processing Temporal Interpolation This technique generates new frames from the original frames as needed to generate the desired frame rate. Information from both past and future input frames should be used to optimally handle objects appearing and disappearing. Conversion of 50 Hz to 60 Hz operation using temporal interpolation is illustrated in Figure 7.40. For every five fields of 50 Hz video, there are six fields of 60 Hz video. After both sources are aligned, two adjacent 50 Hz fields are mixed together to generate a new 60 Hz field. This technique is used in some inexpensive standards converters to convert between 50 Hz and 60 Hz standards. Note that no motion analysis is done. Therefore, if the camera operating at 50 Hz pans horizontally past a narrow vertical object, you see one object once every six 60 Hz fields, and for the five fields in between, you see two objects, one fading in while the other fades out. 50 Hz to 60 Hz Examples Figure 7.41 illustrates a scan rate con- verter that implements vertical, followed by temporal, interpolation. Figure 7.42 illustrates the spectral representation of the design in Figure 7.41. Many designs now combine the vertical and temporal interpolation into a single design, as shown in Figure 7.43, with the corresponding spectral representation shown in Figure 7.44. This example uses vertical, followed by temporal, interpolation. If temporal, followed by vertical, interpolation were implemented, the field stores would be half the size. However, the number of line stores would increase from four to eight. In either case, the first interpolation process must produce an intermediate, higher- resolution progressive format to avoid interlace components that would interfere with the second interpolation process. It is insufficient to interpolate, either vertically or temporally, using a mixture of lines from both fields, due to the interpolation process not being able to compensate for the temporal offset of interlaced lines. Motion Compensation Higher-quality scan rate converters using temporal interpolation incorporate motion compensation to minimize motion artifacts. This results in extremely smooth and natural motion, and images appear sharper and do not suffer from motion judder. Motion estimation for scan rate conversion differs from that used by MPEG. In MPEG, the goal is to minimize the displaced frame difference (error) by searching for a high correlation between areas in subsequent frames. The resulting motion vectors do not necessarily correspond to true motion vectors. For scan rate conversion, it is important to determine true motion information to perform correct temporal interpolation. The interpolation should be tolerant of incorrect motion vectors to avoid introducing artifacts as unpleasant as those the technique is attempting to remove. Motion vectors could be incorrect for several reasons, such as insufficient time to track the motion, out-of-range motion vectors, and estimation difficulties due to aliasing. 100 Hz Interlaced Television Example A standard 50 Hz interlaced television shows 50 fields per second. The images flicker, especially when you look at large areas of highly saturated color. A much improved picture can be achieved using a 100 Hz interlaced frame rate (also called double scan). Scan Rate Conversion 235 VERTICAL INTERPOLATOR 625 / 50 H H H H INTERLACED + + + + + + 525 / 50 SEQUENTIAL F F F F + + + 525 / 60 INTERLACED TEMPORAL INTERPOLATOR F = FIELD STORE H = LINE STORE Figure 7.41. Typical 50 Hz to 60 Hz Conversion Using Vertical, Followed by Temporal, Interpolation. 236 Chapter 7: Digital Video Processing VERTICAL FREQUENCY (CYCLES PER PICTURE HEIGHT) 937.5 625 312.5 0 0 25 50 75 TEMPORAL FREQUENCY (HZ) (A) VERTICAL FREQUENCY (CYCLES PER PICTURE HEIGHT) VERTICAL FREQUENCY (CYCLES PER PICTURE HEIGHT) 787.5 525 262.5 0 0 25 50 75 TEMPORAL FREQUENCY (HZ) (B) 787.5 525 262.5 0 0 30 60 TEMPORAL FREQUENCY (HZ) (C) Figure 7.42. Spectral Representation of Vertical, Followed by Temporal, Interpolation. (A) Vertical lowpass filtering. (B) Resampling to intermediate sequential format and temporal lowpass filtering. (C) Resampling to final standard. 525 / 60 INTERLACED F H A + F H A + F = FIELD STORE H = LINE STORE F H A Scan Rate Conversion 237 Figure 7.43. Typical 50 Hz to 60 Hz Conversion Using Combined Vertical and Temporal Interpolation. + A + H + H + H H F INTERLACED 625 / 50 238 Chapter 7: Digital Video Processing VERTICAL FREQUENCY (CYCLES PER PICTURE HEIGHT) 937.5 625 312.5 VERTICAL FREQUENCY (CYCLES PER PICTURE HEIGHT) 787.5 525 262.5 0 0 25 50 75 TEMPORAL FREQUENCY (HZ) (A) 0 0 30 60 TEMPORAL FREQUENCY (HZ) (B) Figure 7.44. Spectral Representation of Combined Vertical and Temporal Interpolation. (A) Two-dimensional lowpass filtering. (B) Resampling to final standard. Scan Rate Conversion 239 TIME 50 HZ SOURCE 100 HZ DISPLAY (A) 100 HZ DISPLAY (B) REPEATED FIELDS CALCULATED FIELDS Figure 7.45. 50 Hz to 100 Hz (Double Scan Interlaced) Techniques. 240 Chapter 7: Digital Video Processing Early 100 Hz televisions simply repeated fields (F1F1F2F2F3F3F4F4...), as shown in Figure 7.45a. However, they still had line flicker, where horizontal lines constantly jumped between the odd and even lines. This disturbance occurred once every twenty-fifth of a second. The field sequence F1F2F1F2F3F4F3F4... can be used, which solves the line flicker problem. Unfortunately, this gives rise to the problem of judder in moving images. This can be compensated for by using the F1F2F1F2F3F4F3F4... sequence for static images, and the F1F1F2F2F3F3F4F4... sequence for moving images. An ideal picture is still not obtained when viewing programs created for film. They are subject to judder, owing to the fact that each film frame is transmitted twice. Instead of the field sequence F1F1F2F2F3F3F4F4..., the situation calls for the sequence F1F1´F2F2´F3F3´F4F4´... (Figure 7.45b), where Fn´ is a motion-compensated generated image between Fn and Fn+1. 2:2 Pulldown This technique is used with some filmbased compressed content for 50 Hz regions. Film is usually recorded at 24 frames per second. During compression, the telecine machine is sped up from 24 to 25 frames per second, making the content 25 frames per second progressive. During decompression, each film frame is simply mapped into two video fields (resulting in 576i25 or 1080i25 video) or two video frames (resulting in 576p50, 720p50, or 1080p50 video). This technique provides higher video quality and avoids motion judder artifacts. However, it shortens the duration of the program by about 4%, cutting the duration of a 2-hour movie by ~5 minutes. Some audio decoders cannot handle the 4% faster audio data via S/ PDIF (IEC 60958). To compensate the audio changing pitch due to the telecine speedup, it may be resampled during decoding to restore the original pitch (costly to do in a low-cost consumer product) or resampling may be done during the program authoring. 3:2 Pulldown When converting 24 frames per second content to 60 Hz, 3:2 pulldown is commonly used, as shown in Figure 7.46. During compression, the film speed is slowed down by 0.1% to 23.976 (24/1.001) frames per second since 59.94 Hz is used for NTSC timing compatibility. During decompression, 2 film frames generate 5 video fields (resulting in 480i30 or 1080i30 video) or 5 video frames (resulting in 480p60, 720p60, or 1080p60 video). FILM FRAME N N+1 N+2 N+3 VIDEO WHITE FIELD / FRAME FLAG O N W E N+1 O N+2 E N+3 W O N+4 E N+5 W O N+6 E N+7 O N+8 W E N+9 O = ODD LINES (IF INTERLACED OUTPUT) E = EVEN LINES (IF INTERLACED OUTPUT) Figure 7.46. 3:2 Pulldown for Converting 24 Hz Film to 60 Hz Video. Noninterlaced-to-Interlaced Conversion 241 In scenes of high-speed motion of objects, the specific film frame used for a particular video field or frame may be manually adjusted to minimize motion artifacts. 3:2 pulldown may also be used during video decompression to simply to increase the frame rate from 23.976 (24/1.001) to 59.94 (60/ 1.001) frames per second, avoiding the deinterlacing issue. Varispeed may be used to cover up problems such as defects, splicing, censorship cuts, or to change the running time of a program. Rather than repeating film frames and causing a stutter, the 3:2 relationship between the film and video is disrupted long enough to ensure a smooth temporal rate. Analog laserdiscs used a white flag signal to indicate the start of another sequence of related fields for optimum still-frame performance. During still-frame mode, the white flag signal tells the system to back up two fields (to use two fields that have no motion between them) to re-display the current frame. 3:3 Pulldown This technique is used in some displays that support 72 Hz frame rate. The 24 frames per second film-based content is converted to 72 Hz progressive by simply duplicating each film frame three times. 24:1 Pulldown This technique, also called 12:1 pulldown, can be used to convert 24 frames/second content to 50 fields per second. Two video fields are generated from every film frame, except every 12th film frame generates 3 video fields. Although the audio pitch is correct, motion judder is present every onehalf second when smooth motion is present. Noninterlaced-to-Interlaced Conversion In some applications, it is necessary to display a noninterlaced video signal on an interlaced display. Thus, some form of noninterlaced-to-interlaced conversion may be required. Noninterlaced-to-interlaced conversion must be performed on component video signals (such as R´G´B´ or YCbCr). Composite color video signals (such as NTSC or PAL) cannot be processed directly due to the presence of color subcarrier phase information, which would be meaningless after processing. These signals must be decoded into component color signals, such as R´G´B´ or YCbCr, prior to conversion. There are essentially two techniques: scan line decimation and vertical filtering. Scan Line Decimation The easiest approach is to throw away every other active scan line in each noninterlaced frame, as shown in Figure 7.47. Although the cost is minimal, there are problems with this approach, especially with the top and bottom of objects. If there is a sharp vertical transition of color or intensity, it will flicker at one-half the frame rate. The reason is that it is only displayed every other field as a result of the decimation. For example, a horizontal line that is one noninterlaced scan line wide will flicker on and off. Horizontal lines that are two noninterlaced scan lines wide will oscillate up and down. Simple decimation may also add aliasing artifacts. While not necessarily visible, they will affect any future processing of the picture. 242 Chapter 7: Digital Video Processing NONINTERLACED FRAME N 1 2 3 4 5 6 7 8 NONINTERLACED ACTIVE LINE NUMBER INTERLACED FIELD 1 1 2 3 4 INTERLACED ACTIVE LINE NUMBER NONINTERLACED FRAME N + 1 1 2 3 4 5 6 7 8 NONINTERLACED ACTIVE LINE NUMBER INTERLACED FIELD 2 1 2 3 4 INTERLACED ACTIVE LINE NUMBER Figure 7.47. Noninterlaced-to-Interlaced Conversion Using Scan Line Decimation. NONINTERLACED FRAME N 1 2 3 4 5 6 7 8 NONINTERLACED ACTIVE LINE NUMBER INTERLACED FIELD 1 1 2 3 4 INTERLACED ACTIVE LINE NUMBER NONINTERLACED FRAME N + 1 1 2 3 4 5 6 7 8 NONINTERLACED ACTIVE LINE NUMBER INTERLACED FIELD 2 1 2 3 4 INTERLACED ACTIVE LINE NUMBER Figure 7.48. Noninterlaced-to-Interlaced Conversion Using 3-Line Vertical Filtering. Interlaced-to-Noninterlaced Conversion 243 Vertical Filtering A better solution is to use two or more lines of noninterlaced data to generate one line of interlaced data. Fast vertical transitions are smoothed out over several interlaced lines. For a 3-line filter, such as shown in Figure 7.48, typical coefficients are [0.25, 0.5, 0.25]. Using more than three lines usually results in excessive blurring, making small text difficult to read. An alternate implementation uses IIR rather than FIR filtering. In addition to averaging, this technique produces a reduction in brightness around objects, further reducing flicker. Note that care must be taken at the beginning and end of each frame in the event that fewer scan lines are available for filtering. Interlaced-to-Noninterlaced Conversion In some applications, it is necessary to display an interlaced video signal on a noninterlaced display. Thus, some form of deinterlacing or progressive scan conversion may be required. Note that deinterlacing must be performed on component video signals (such as R´G´B´ or YCbCr). Composite color video signals (such as NTSC or PAL) cannot be deinterlaced directly due to the presence of color subcarrier phase information, which would be meaningless after processing. These signals must be decoded into component color signals, such as R´G´B´ or YCbCr, prior to deinterlacing. There are two fundamental deinterlacing algorithms: video mode and film mode. Video mode deinterlacing can be further broken down into inter-field and intra-field processing. The goal of a good deinterlacer is to correctly choose the best algorithm needed at a particular moment. In systems where the vertical resolution of the source and display do not match (due to, for example, displaying SDTV content on an HDTV), the deinterlacing and vertical scaling can be merged into a single process. Video Mode: Intra-Field Processing This is the simplest method for generating additional scan lines using only information in the original field. The computer industry has coined this technique as bob. Although there are two common techniques for implementing intra-field processing, scan line duplication and scan line interpolation, the resulting vertical resolution is always limited by the content of the original field. Scan Line Duplication Scan line duplication (Figure 7.49) simply duplicates the previous active scan line. Although the number of active scan lines is doubled, there is no increase in the vertical resolution. Scan Line Interpolation Scan line interpolation generates interpo- lated scan lines between the original active scan lines. Although the number of active scan lines is doubled, the vertical resolution is not. The simplest implementation, shown in Figure 7.50, uses linear interpolation to generate a new scan line between two input scan lines: outn = (inn–1 + inn+1) / 2 Better results, at additional cost, may be achieved by using a FIR filter: 244 Chapter 7: Digital Video Processing INPUT FIELD ACTIVE LINES 1 2 3 4 OUTPUT FRAME ACTIVE LINES 1 2=1 3 4=3 5 6=5 7 8=7 Figure 7.49. Deinterlacing Using Scan Line Duplication. New scan lines are generated by duplicating the active scan line above it. INPUT FIELD ACTIVE LINES 1 2 3 4 OUTPUT FRAME ACTIVE LINES 1 2 = (1 + 3) / 2 3 4 = (3 + 5) / 2 5 6 = (5 + 7) / 2 7 8 = (7 + 9) / 2 Figure 7.50. Deinterlacing Using Scan Line Interpolation. New scan lines are generated by averaging the previous and next active scan lines. FIELD 1 ACTIVE LINE 1 2 3 4 FIELD 2 ACTIVE LINE 1 2 3 4 DEINTERLACED FRAME ACTIVE LINE 1 2 3 4 5 6 7 8 Figure 7.51. Deinterlacing Using Field Merging. Shaded scan lines are generated by using the input scan line from the next or previous field. 2 4 6 8 1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 INPUT FIELD NUMBER OUTPUT FRAME NUMBER Figure 7.52. Producing Deinterlaced Frames at Field Rates. Interlaced-to-Noninterlaced Conversion 245 outn = (160*(inn–1 + inn+1) – 48*(inn–3 + inn+3) + 24*(inn–5 + inn+5) – 12*(inn–7 + inn+7) + 6*(inn–9 + inn+9) – 2*(inn–11 + inn+11) Fractional Ratio Interpolation In many cases, there is a periodic, but non- integral, relationship between the number of input scan lines and the number of output scan lines. In this case, fractional ratio interpolation may be necessary, similar to the polyphase filtering used for scaling only performed in the vertical direction. This technique combines deinterlacing and vertical scaling into a single process. Variable Interpolation In a few cases, there is no periodicity in the relationship between the number of input and output scan lines. Therefore, in theory, an infi- nite number of filter phases and coefficients are required. Since this is not feasible, the solution is to use a large, but finite, number of filter phases. The number of filter phases determines the interpolation accuracy. This technique also combines deinterlacing and vertical scaling into a single process. Video Mode: Inter-Field Processing In this method, video information from more than one field is used to generate a single progressive frame. This method can provide higher vertical resolution since it uses content from more than a single field. Field Merging This technique merges two consecutive fields together to produce a frame of video (Figure 7.51). At each field time, the active scan lines of that field are merged with the active scan lines of the previous field. The OBJECT POSITION IN FIELD ONE OBJECT POSITION IN FIELD TWO OBJECT POSITIONS IN MERGED FIELDS Figure 7.53. Movement Artifacts When Field Merging Is Used. 246 Chapter 7: Digital Video Processing result is that for each input field time, a pair of fields combine to generate a frame (see Figure 7.52). Although simple to implement, the vertical resolution is doubled only in regions of no movement. Moving objects will have artifacts, also called combing, due to the time difference between two fields—a moving object is located in a different position from one field to the next. When the two fields are merged, moving objects will have a double image (see Figure 7.53). It is common to soften the image slightly in the vertical direction to attempt to reduce the visibility of combing. When implemented, it causes a loss of vertical resolution and jitter on movement and pans. The computer industry refers to this technique as weave, but weave also includes the inverse telecine process to remove any 3:2 pulldown present in the source. Theoretically, this eliminates the double image artifacts since two identical fields are now being merged. Motion Adaptive Deinterlacing A good deinterlacing solution is to use field merging for still areas of the picture and scan line interpolation for areas of movement. To accomplish this, motion, on a sample-by-sample basis, must be detected over the entire picture in real time, requiring processing several fields of video. As two fields are combined, full vertical resolution is maintained in still areas of the picture, where the eye is most sensitive to detail. The sample differences may have any value, from 0 (no movement and noise-free) to maximum (for example, a change from full intensity to black). A choice must be made when to use a sample from the previous field (which is in the wrong location due to motion) or to interpolate a new sample from adjacent scan lines in the current field. Sudden switching between methods is visible, so crossfading (also called soft switching) is used. At some magnitude of sample difference, the loss of resolution due to a double image is equal to the loss of resolution due to interpolation. That amount of motion should result in the crossfader being at the 50% point. Less motion will result in a fade towards field merging and more motion in a fade towards the interpolated values. Rather than “per pixel” motion adaptive deinterlacing, which makes decisions for every sample, some low-cost solutions use “per field” motion adaptive deinterlacing. In this case, the algorithm is selected each field, based on the amount of motion between the fields. “Per pixel” motion adaptive deinterlacing, although difficult to implement, looks quite good when properly done. “Per field” motion adaptive deinterlacing rarely looks much better than vertical interpolation. Motion-Compensated Deinterlacing Motion-compensated (or motion vector steered) deinterlacing is several orders of magnitude more complex than motion adaptive deinterlacing, and is commonly found in provideo format converters. Motion-compensated processing requires calculating motion vectors between fields for each sample, and interpolating along each sample’s motion trajectory. Motion vectors must also be found that pass through each of any missing samples. Areas of the picture may be covered or uncovered as you move between frames. The motion vectors must also have sub-pixel accuracy, and be determined in two temporal directions between frames. The motion vector errors used by MPEG are self-correcting since the residual difference between the predicted macroblocks is encoded. As motion-compensated deinterlacing is a single-ended system, motion vector errors will produce artifacts, so different Interlaced-to-Noninterlaced Conversion 247 search and verification algorithms must be used. Film Mode (Using Inverse Telecine) For sources that have 3:2 pulldown (i.e., 60 fields/second video converted from 24 frames/second film), higher deinterlacing performance may be obtained by removing duplicate fields prior to processing. The inverse telecine process detects the 3:2 field sequence and the redundant third fields are removed. The remaining field pairs are merged (since there is no motion between them) to form progressive frames at 24 frames/second. These are then repeated in a 3:2 sequence to get to 60 frames/second. Although this may seem to be the ideal solution, some content uses both 60 fields/second video and 24 frames/second video (filmbased) within a program. In addition, some content may occasionally have both video types present simultaneously. In other cases, the 3:2 pulldown timing (cadence) doesn’t stay regular, or the source was never originally from film. Thus, the deinterlacer has to detect each video type and process it differently (video mode vs. film mode). Display artifacts are common due to the delay between the video type changing and the deinterlacer detecting the change. Line A shows the frequency response for line duplication, in which the lowpass filter coefficients for the filter shown are 1, 1, and 0. Line interpolation, using lowpass filter coefficients of 0.5, 1.0, and 0.5, results in the frequency response curve of Line B. Note that line duplication results in a better high-frequency response. Vertical filters with a better frequency response than the one for line duplication are possible, at the cost of more line stores and processing. GAIN C 1 A B VERTICAL FREQUENCY 525 (CYCLES PER PICTURE HEIGHT) H H Frequency Response Considerations Various two-times vertical upsampling techniques for deinterlacing may be implemented by stuffing zero values between two valid lines and filtering, as shown in Figure 7.54. + Figure 7.54. Frequency Response of Various Deinterlacing Filters. (A) Line duplication. (B) Line interpolation. (C) Field merging. 248 Chapter 7: Digital Video Processing The best vertical frequency response is obtained when field merging is implemented. The spatial position of the lines is already correct and no vertical processing is required, resulting in a flat curve (Line C). Again, this applies only for stationary areas of the image. DCT-Based Compression The transform process of many video compression standards is based on the Discrete Cosine Transform, or DCT. The easiest way to envision it is as a filter bank with all the filters computed in parallel. During encoding, the DCT is usually followed by several other operations, such as quantization, zig-zag scanning, run-length encoding, and variable-length encoding. During decoding, this process flow is reversed. Many times, the terms macroblocks and blocks are used when discussing video compression. Figure 7.55 illustrates the relationship between these two terms, and shows why transform processing is usually done on 8 × 8 samples. DCT The 8 × 8 DCT processes an 8 × 8 block of samples to generate an 8 × 8 block of DCT coefficients, as shown in Figure 7.56. The input may be samples from an actual frame of video or motion-compensated difference (error) values, depending on the encoder mode of operation. Each DCT coefficient indicates the amount of a particular horizontal or vertical frequency within the block. DCT coefficient (0,0) is the DC coefficient, or average sample value. Since natural images tend to vary only slightly from sample to sample, low frequency coefficients are typically larger values and high frequency coefficients are typically smaller values. The 8 × 8 DCT is defined in Figure 7.57. f(x, y) denotes sample (x, y) of the 8 × 8 input block and F(u,v) denotes coefficient (u, v) of the DCT transformed block. A reconstructed 8 × 8 block of samples is generated using an 8 × 8 inverse DCT (IDCT), defined in Figure 7.58. Although exact reconstruction is theoretically achievable, it is not practical due to finite-precision arithmetic, quantization and differing IDCT implementations. As a result, there are mismatches between different IDCT implementations. Mismatch control attempts to reduce the drift between encoder and decoder IDCT results by eliminating bit patterns having the greatest contribution towards mismatches. MPEG-1 mismatch control is known as “oddification” since it forces all quantized DCT coefficients to negative values. MPEG-2 and MPEG-4.2 use an improved method called “LSB toggling” which affects only the LSB of the 63rd DCT coefficient after inverse quantization. H.264 (also known as MPEG-4.10) neatly sidesteps the issue by using an “exact-match inverse transform.” Every decoder will produce exactly the same pictures, all else being equal. Quantization The 8 × 8 block of DCT coefficients is quantized, which reduces the overall precision of the integer coefficients and tends to eliminate high frequency coefficients, while maintaining perceptual quality. Higher frequencies are usually quantized more coarsely (fewer values allowed) than lower frequencies, due to visual perception of quantization error. The quantizer is also used for constant bit-rate DCT-Based Compression 249 DIVIDE PICTURE INTO 16 X 16 BLOCKS (MACROBLOCKS) EACH MACROBLOCK IS 16 SAMPLES BY 16 LINES (4 BLOCKS) EACH BLOCK IS 8 SAMPLES BY 8 LINES Figure 7.55. The Relationship between Macroblocks and Blocks. DC TERM FREQUENCY COEFFICIENTS INCREASING HORIZONTAL FREQUENCY 8 X 8 BLOCK DCT INCREASING VERTICAL FREQUENCY ISOLATED HIGH–FREQUENCY TERM Figure 7.56. The DCT Processes the 8 × 8 Block of Samples or Error Terms to Generate an 8 × 8 Block of DCT Coefficients. 250 Chapter 7: Digital Video Processing 77 F(u, v) = 0.25C(u)C(v) ∑ ∑ f(x, y) cos (((2x + 1)uπ) ⁄ 16) cos (((2y + 1)vπ) ⁄ 16) x = 0y = 0 u, v, x, y = 0, 1, 2, . . ., 7 (x, y) are spatial coordinates in the sample domain (u, v) are coordinates in the transform domain Figure 7.57. 8 × 8 Two-Dimensional DCT Definition. 77 f(x, y) = 0.25 ∑ ∑ C(u)C(v)F(u, v) cos(((2x + 1)uπ) ⁄ 16) cos(((2y + 1)vπ) ⁄ 16) u = 0v = 0 Figure 7.58. 8 × 8 Two-Dimensional Inverse DCT (IDCT) Definition. applications where it is varied to control the output bit-rate. Zig-Zag Scanning The quantized DCT coefficients are rearranged into a linear stream by scanning them in a zig-zag order. This rearrangement places the DC coefficient first, followed by frequency coefficients arranged in order of increasing frequency, as shown in Figures 7.59, 7.60, and 7.61. This produces long runs of zero coefficients. Variable-Length Coding The [run, amplitude] pairs are coded using a variable-length code, resulting in additional lossless compression. This produces shorter codes for common pairs and longer codes for less common pairs. This coding method produces a more compact representation of the DCT coefficients, as a large number of DCT coefficients are usually quantized to zero and the re-ordering results (ideally) in the grouping of long runs of consecutive zero values. Run Length Coding The linear stream of quantized frequency coefficients is converted into a series of [run, amplitude] pairs. [run] indicates the number of zero coefficients, and [amplitude] the nonzero coefficient that ended the run. DCT-Based Compression 251 0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63 ZIG–ZAG SCAN OF 8 X 8 BLOCK OF QUANTIZED FREQUENCY COEFFICIENTS A F LINEAR ARRAY OF 64 FREQUENCY COEFFICIENTS Figure 7.59. The 8 × 8 Block of Quantized DCT Coefficients Are Zig-Zag Scanned to Arrange in Order of Increasing Frequency. This scanning order is used for H.261, H.263, MPEG-1, MPEG-2, MPEG-4.2, ITU-R BT.1618, ITU-R BT.1620, SMPTE 314M, and SMPTE 370M. 0 4 6 20 22 36 38 52 1 5 7 21 23 37 39 53 2 8 19 24 34 40 50 54 3 9 18 25 35 41 51 55 10 17 26 30 42 46 56 60 11 16 27 31 43 47 57 61 12 15 28 32 44 48 58 62 13 14 29 33 45 49 59 63 ZIG–ZAG SCAN OF 8 X 8 BLOCK OF QUANTIZED FREQUENCY COEFFICIENTS A F LINEAR ARRAY OF 64 FREQUENCY COEFFICIENTS Figure 7.60. H.263, MPEG-2, and MPEG-4.2 “Alternate-Vertical” Scanning Order. 252 Chapter 7: Digital Video Processing 0 1 2 3 10 11 12 13 4 5 8 9 17 16 15 14 6 7 19 18 26 27 28 29 20 21 24 25 30 31 32 33 22 23 34 35 42 43 44 45 36 37 40 41 46 47 48 49 38 39 50 51 56 57 58 59 52 53 54 55 60 61 62 63 ZIG–ZAG SCAN OF 8 X 8 BLOCK OF QUANTIZED FREQUENCY COEFFICIENTS A F LINEAR ARRAY OF 64 FREQUENCY COEFFICIENTS Figure 7.61. H.263 and MPEG-4.2 “Alternate-Horizontal” Scanning Order. Fixed Pixel Display Considerations The unique designs and color reproduction gamuts of fixed pixel displays have resulted in new video processing technologies being developed. The result is brighter, sharper, more colorful images regardless of the video source. Detail Correction In CRT-based televisions, enhancing the image is commonly done by altering the electron beam diameter. With fixed-pixel displays, adding overshoot and undershoot to the video signals causes distortion. An acceptable implementation is to gradually change the brightness of the images before and after regions needing contour enhancement. Expanded Color Reproduction Broadcast stations are usually tuned to meet the limited color reproduction characteristics of CRT-based televisions. To fit the color reproduction capabilities of PDP and LCD, manufacturers have introduced various color expansion technologies. These include using independent hue and saturation controls for each primary and complementary color, plus the flesh color. Non-Uniform Quantization Rather than simply increasing the number of quantization levels, the quantization steps can be changed in accordance with the intensity of the image. This is possible since people better detect small changes in brightness for dark images than for bright images. In addition, the brighter the image, the less sensitive people are to changes in brightness. This means that more quantization steps can be Application Example 253 used for dark images than for bright ones. This technique can also be used to increase the quantization steps for shades that appear frequently. Scaling and Deinterlacing Fixed-pixel displays, such as LCD and plasma, usually upscale then downscale during deinterlacing to minimize moiré noise due to folded distortion. For example, a 1080i source is deinterlaced to 2160p, scaled to 1536p, then finally scaled to 768p (to drive a 1024 × 768 display). Alternately, some solutions deinterlace and upscale to 1500p, then scale to the display's native resolution. Application Example Figures 7.62 and 7.63 illustrate the typical video processing done after video decompression and deinterlacing. In addition to the primary video source, additional video sources typically include an on-screen-display (OSD), content navigation graphics, closed captioning or subtitles, and a second video for picture-in-picture (PIP). The OSD plane displays configuration menus for the box, such as video output format and resolution, audio output format, etc. OSD design is unique to each product, so the OSD plane usually supports a wide variety of RGB/ YCbCr formats and resolutions. Lookup tables can gamma-correct linear RGB data, convert 2, 4-, or 8-indexed color to 32-bit YCbCrA data, or translate 0–255 graphics levels to the 16–235 video levels. The content navigation plane displays graphics generated by Blu-ray BD-J, HD DVD HDi, electronic program guides, etc. It should support the same formats and capabilities as the OSD plane. The subtitle plane is a useful region for rendering closed captioning, DVB subtitles, DVD subpictures., etc. Lookup tables convert 2-, 4-, or 8-indexed color to 32-bit YCbCrA data. The secondary video plane is usually used to support a second video source for picture-inpicture (PIP) or graphics (such as JPEG images). For graphics data, lookup tables can gamma-correct linear RGB data, convert 2-, 4-, or 8-indexed color to 32-bit YCbCrA data, or translate 0–255 graphics levels to the 16–235 video levels. Being able to scale each source independently offers maximum flexibility. In addition to being able to output any resolution regardless of the source resolutions, special effects can also be accommodated. Chromaticity correction ensure colors are accurate independent of the sources and display (SDTV vs. HDTV). Independent brightness, contrast, saturation, hue, and sharpness controls for each source and video output interface offers the most flexibility. For example, PIP can be adjusted without affecting the main picture, video can be adjusted without affecting still picture video quality, etc. The optional downscaling and progressiveto-interlaced conversion block for the top NTSC/PAL encoder in Figure 7.63 enables simultaneous HD and SD outputs, or simultaneous progressive and interlaced outputs, without affecting the HD or progressive video quality. The second NTSC/PAL encoder shown at the bottom of Figure 7.63 is useful for recording a program without any OSD or subtitle information being accidently recorded. 254 Chapter 7: Digital Video Processing Figure 7.62. Video Composition Simplified Block Diagram. ON-SCREEN MENUS CONTENT NAVIGATION GRAPHICS OSD PLANE PRIORITY 1 CONTENT NAVIGATION PLANE PRIORITY 2 SUBPICTURES, SUBTITLES, CAPTIONING SUBTITLE PLANE PRIORITY 3 PIP VIDEO, JPEG IMAGES, GRAPHICS SECONDARY VIDEO PLANE PRIORITY 4 PRIMARY VIDEO SOURCE PRIMARY VIDEO PLANE PRIORITY 5 RGB GAMMA 4 x 256 x 8 LUT RGB --> YCBCR NDX --> YCBCR 256 x 32 LUT RGB GAMMA 4 x 256 x 8 LUT RGB --> YCBCR NDX --> YCBCR 256 x 32 LUT NDX --> YCBCR 256 x 32 LUT GRAPHICS ONLY RGB GAMMA 4 x 256 x 8 LUT RGB --> YCBCR NDX --> YCBCR 256 x 32 LUT 24-BIT 4:4:4 YCBCR PLUS 8-BIT ALPHA X -Y SCALING WITH FLICKER FILTER X -Y SCALING WITH FLICKER FILTER X -Y SCALING WITH FLICKER FILTER OPTIONAL DEINTERLACE X -Y SCALING WITH FLICKER FILTER OPTIONAL DEINTERLACE X -Y SCALING COLORIMETRY CORRECT COLORIMETRY CORRECT COLORIMETRY CORRECT COLORIMETRY CORRECT COLORIMETRY CORRECT RECORDING COLORIMETRY CORRECT BRIGHTNESS CONTRAST SATURATION HUE BRIGHTNESS CONTRAST SATURATION HUE BRIGHTNESS CONTRAST SATURATION HUE BRIGHTNESS CONTRAST SATURATION HUE SHARPNESS BRIGHTNESS CONTRAST SATURATION HUE SHARPNESS ALPHA MIX TO OUTPUT FORMATTING TO OUTPUT FORMATTING Application Example 255 FROM ALPHA MIXER FROM RECORDING COLORIMETRY CORRECT OPTIONAL X -Y DOWNSCALE OPTIONAL CONSTRAINED IMAGE X -Y FILTER BRIGHTNESS CONTRAST SATURATION HUE SHARPNESS BRIGHTNESS CONTRAST SATURATION HUE SHARPNESS BRIGHTNESS CONTRAST SATURATION HUE SHARPNESS COLORIMETRY CORRECT COLORIMETRY CORRECT COLORIMETRY CORRECT NTSC / PAL ENCODER VIDEO DACS HDMI XMTR OPTIONAL X -Y DOWNSCALE COLORIMETRY CORRECT NTSC / PAL ENCODER Figure 7.63. Video Output Port Processing. NTSC / PAL S-VIDEO YPBPR HDMI NTSC / PAL S-VIDEO 256 Chapter 7: Digital Video Processing References 1. Clarke, C. K. P., 1989, Digital Video: Studio Signal Processing, BBC Research Department Report BBC RD1989/14. 2. Devereux, V. G., 1984, Filtering of the Colour-Difference Signals in 4:2:2 YUV Digital Video Coding Systems, BBC Research Department Report BBC RD1984/4. 3. ITU-R BT.601–5, 1995, Studio Encoding Parameters of Digital Television for Standard 4:3 and Widescreen 16:9 Aspect Ratios. 4. ITU-R BT.709–5, 2002, Parameter Values for the HDTV Standards for Production and International Programme Exchange. 5 ITU-R BT.1358, 1998, Studio Parameters of 625 and 525 Line Progressive Scan Television Systems. 6. Johan G.W.M. Janssen, Jeroen H. Stessen, and Peter H.N. de With, An Advanced Sampling Rate Conversion Technique for Video and Graphics Signals, Philips Research Labs. 7. Sandbank, C. P., Digital Television, John Wiley & Sons, Ltd., New York, 1990. 8. SMPTE 274M–2005, Television—1920 × 1080 Image Sample Structure, Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates. 9. SMPTE 293M–2003, Television—720 × 483 Active Line at 59.94 Hz Progressive Scan Production—Digital Representation. 10. SMPTE 296M–2001, Television—1280 × 720 Progressive Image Sample Structure, Analog and Digital Representation and Analog Interface. 11. SMPTE EG36–1999, Transformations Between Television Component Color Signals. 12. Thomas, G. A., 1996, A Comparison of Motion-Compensated Interlace-to-Progressive Conversion Methods, BBC Research Department Report BBC RD1996/9. 13. Ultimatte®, Technical Bulletin No. 5, Ultimatte Corporation. 14. Watkinson, John, The Engineer’s Guide to Standards Conversion, Snell and Wilcox Handbook Series. 15. Watkinson, John, The Engineer’s Guide to Motion Compensation, Snell and Wilcox Handbook Series. Chapter 8: NTSC, PAL, and SECAM Overview Chapter 8 NTSC Overview 257 NTSC, PAL, and SECAM Overview To fully understand the NTSC, PAL, and SECAM encoding and decoding processes, it is helpful to review the background of these standards and how they came about. NTSC Overview The first color television system was developed in the United States, and on December 17, 1953, the Federal Communications Commission (FCC) approved the transmission standard, with broadcasting approved to begin January 23, 1954. Most of the work for developing a color transmission standard that was compatible with the (then current) 525-line, 60field-per-second, 2:1 interlaced monochrome standard was done by the National Television System Committee (NTSC). Luminance Information The monochrome luminance (Y) signal is derived from gamma-corrected red, green, and blue (R´G´B´) signals: Y = 0.299R´ + 0.587G´ + 0.114B´ Due to the sound subcarrier at 4.5 MHz, a requirement was made that the color signal fit within the same bandwidth as the monochrome video signal (0–4.2 MHz). For economic reasons, another requirement was made that monochrome receivers must be able to display the black and white portion of a color broadcast and that color receivers must be able to display a monochrome broadcast. Color Information The eye is most sensitive to spatial and temporal variations in luminance; therefore, luminance information was still allowed the entire bandwidth available (0–4.2 MHz). Color information, to which the eye is less sensitive and which therefore requires less bandwidth, is represented as hue and saturation information. The hue and saturation information is transmitted using a 3.58 MHz subcarrier, encoded so that the receiver can separate the hue, saturation, and luminance information and convert them back to RGB signals for display. Although this allows the transmission of 257 258 Chapter 8: NTSC, PAL, and SECAM Overview color signals within the same bandwidth as monochrome signals, the problem still remains as to how to separate the color and luminance information cost-effectively, since they occupy the same portion of the frequency spectrum. To transmit color information, U and V or I and Q “color difference” signals are used: R´ – Y = 0.701R´ – 0.587G´ – 0.114B´ B´ – Y = –0.299R´ – 0.587G´ + 0.886B´ U = 0.492(B´ – Y) V = 0.877(R´ – Y) I = 0.596R´ – 0.275G´ – 0.321B´ = Vcos 33° – Usin 33° = 0.736(R´ – Y) – 0.268(B´ – Y) Q = 0.212R´ – 0.523G´ + 0.311B´ = Vsin 33° + Ucos 33° = 0.478(R´ – Y) + 0.413(B´ – Y) The scaling factors to generate U and V from (B´ – Y) and (R´ – Y) were derived due to overmodulation considerations during transmission. If the full range of (B´ – Y) and (R´ – Y) were used, the modulated chrominance levels would exceed what the monochrome transmitters were capable of supporting. Experimentation determined that modulated subcarrier amplitudes of 20% of the Y signal amplitude could be permitted above white and below black. The scaling factors were then selected so that the maximum level of 75% color would be at the white level. I and Q were initially selected since they more closely related to the variation of color acuity than U and V. The color response of the eye decreases as the size of viewed objects decreases. Small objects, occupying frequencies of 1.3–2.0 MHz, provide little color sensation. Medium objects, occupying the 0.6–1.3 MHz frequency range, are acceptable if repro- duced along the orange-cyan axis. Larger objects, occupying the 0–0.6 MHz frequency range, require full three-color reproduction. The I and Q bandwidths were chosen accordingly, and the preferred color reproduction axis was obtained by rotating the U and V axes by 33°. The Q component, representing the green-purple color axis, was band-limited to about 0.6 MHz. The I component, representing the orange-cyan color axis, was band-limited to about 1.3 MHz. Another advantage of limiting the I and Q bandwidths to 1.3 MHz and 0.6 MHz, respectively, is to minimize crosstalk due to asymmetrical sidebands as a result of lowpass filtering the composite video signal to about 4.2 MHz. Q is a double sideband signal; however, I is asymmetrical, bringing up the possibility of crosstalk between I and Q. The symmetry of Q avoids crosstalk into I; since Q is bandwidth limited to 0.6 MHz, I crosstalk falls outside the Q bandwidth. U and V, both bandwidth-limited to 1.3 MHz, are now commonly used instead of I and Q. When broadcast, UV crosstalk occurs above 0.6 MHz; however, this is not usually visible due to the limited UV bandwidths used by NTSC decoders for consumer equipment. The UV and IQ vector diagram is shown in Figure 8.1. Color Modulation I and Q (or U and V) are used to modulate a 3.58 MHz color subcarrier using two balanced modulators operating in phase quadrature: one modulator is driven by the subcarrier at sine phase; the other modulator is driven by the subcarrier at cosine phase. The outputs of the modulators are added together to form the modulated chrominance signal: NTSC Overview 259 C = Q sin (ωt + 33°) + I cos (ωt + 33°) ω = 2πFSC FSC = 3.579545 MHz (± 10 Hz) or, if U and V are used instead of I and Q: C = U sin ωt + V cos ωt Hue information is conveyed by the chrominance phase relative to the subcarrier. Saturation information is conveyed by chrominance amplitude. In addition, if an object has no color (such as a white, gray, or black object), the subcarrier is suppressed. Composite Video Generation The modulated chrominance is added to the luminance information along with appropriate horizontal and vertical sync signals, blanking information, and color burst information, to generate the composite color video waveform shown in Figure 8.2. composite NTSC = Y + Q sin (ωt + 33°) + I cos (ωt + 33°) + timing or, if U and V are used instead of I and Q: composite NTSC = Y + U sin ωt + V cos ωt + timing YELLOW 167˚ BURST 180˚ RED 103˚ +V 90˚ 100 IRE SCALE UNITS MAGENTA 61˚ 88 80 82 +Q 60 33˚ 40 62 20 +U 0˚ 62 BLUE 347˚ 82 GREEN 241˚ 88 CYAN 283˚ –I 303˚ Figure 8.1. UV and IQ Vector Diagram for 75% Color Bars. 260 Chapter 8: NTSC, PAL, and SECAM Overview WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK WHITE LEVEL 100 IRE 3.58 MHZ COLOR BURST (9 ± 1 CYCLES) 20 IRE 20 IRE 7.5 IRE 40 IRE BLACK LEVEL BLANK LEVEL SYNC LEVEL BLANK LEVEL LUMINANCE LEVEL PHASE = HUE COLOR SATURATION Figure 8.2. (M) NTSC Composite Video Signal for 75% Color Bars. NTSC Overview 261 The bandwidth of the resulting composite video signal is shown in Figure 8.3. The I and Q (or U and V) information can be transmitted without loss of identity as long as the proper color subcarrier phase relationship is maintained at the encoding and decoding process. A color burst signal, consisting of nine cycles of the subcarrier frequency at a specific phase, follows most horizontal sync pulses, and provides the decoder a reference signal so as to be able to recover the I and Q (or U and V) signals properly. The color burst phase is defined to be along the –U axis as shown in Figure 8.1. Color Subcarrier Frequency The specific choice for the color subcarrier frequency was dictated by several factors. The first was the need to provide horizontal interlace to reduce the visibility of the subcarrier, requiring that the subcarrier frequency, FSC, be an odd multiple of one-half the horizontal line rate. The second factor was selection of a frequency high enough that it generated a fine interference pattern having low visibility. Third, double sidebands for I and Q (or U and V) bandwidths below 0.6 MHz had to be allowed. The choice of the frequencies is: FH = (4.5 × 106/286) Hz = 15,734.27 Hz FV = FH/(525/2) = 59.94 Hz FSC = ((13 × 7 × 5)/2) × FH = (455/2) × FH = 3.579545 MHz The resulting FV (field) and FH (line) rates were slightly different from the monochrome standards, but fell well within the tolerance ranges and were therefore acceptable. Figure 8.4 illustrates the resulting spectral interleaving. The luminance (Y) components are modulated due to the horizontal blanking process, resulting in bunches of luminance information spaced at intervals of FH. These signals are further modulated by the vertical blanking process, resulting in luminance frequency components occurring at NFH ± MFV. N has a maximum value of about 277 with a 4.2 MHz bandwidth-limited luminance. Thus, luminance information is limited to areas about integral harmonics of the line frequency (FH), with additional spectral lines offset from NFH by the 29.97 Hz vertical frame rate. The area in the spectrum between luminance groups, occurring at odd multiples of one-half the line frequency, contains minimal spectral energy and is therefore used for the transmission of chrominance information. The harmonics of the color subcarrier are separated from each other by FH since they are odd multiples of one-half FH, providing a half-line offset and resulting in an interlace pattern that moves upward. Four complete fields are required to repeat a specific sample position, as shown in Figure 8.5. NTSC Standards Figure 8.6 shows the common designations for NTSC systems. The letter M refers to the monochrome standard for line and field rates (525/59.94), a video bandwidth of 4.2 MHz, an audio carrier frequency 4.5 MHz above the video carrier frequency, and an RF channel bandwidth of 6 MHz. NTSC refers to the technique to add color information to the monochrome signal. Detailed timing parameters can be found in Table 8.9. NTSC 4.43 is commonly used for multistandard analog VCRs. The horizontal and vertical timing is the same as (M) NTSC; color 262 Chapter 8: NTSC, PAL, and SECAM Overview AMPLITUDE CHROMINANCE SUBCARRIER Y I I I QQ 0.0 1.0 2.0 3.0 3.58 4.2 (A) AMPLITUDE CHROMINANCE SUBCARRIER FREQUENCY (MHZ) Y U U V V 0.0 1.0 2.0 3.0 3.58 4.2 (B) FREQUENCY (MHZ) Figure 8.3. Video Bandwidths of Baseband (M) NTSC Video. (A) Using 1.3 MHz I and 0.6 MHz Q signals. (B) Using 1.3 MHz U and V signals. NTSC Overview 263 Y Y Y I, Q I, Q F FH / 2 FH / 2 FH Y I, Q Y I, Q Y 29.97 HZ SPACING 227FH 228FH F 229FH 227.5FH 228.5FH 15.734 KHZ Figure 8.4. Luma and Chroma Frequency Interleave Principle. Note that 227.5FH = FSC. 264 Chapter 8: NTSC, PAL, and SECAM Overview ANALOG FIELD 1 SERRATION PULSES 523 524 525 1 2 3 4 5 6 7 8 9 10 23 EQUALIZING PULSES BURST PHASE EQUALIZING PULSES ANALOG FIELD 2 261 262 263 264 265 266 267 268 ANALOG FIELD 3 START OF VSYNC 269 270 271 272 523 524 525 1 2 3 4 5 6 7 BURST PHASE ANALOG FIELD 4 8 9 10 285 286 23 261 262 263 264 265 266 267 268 269 270 271 272 285 286 BURST BEGINS WITH POSITIVE HALF-CYCLE BURST PHASE = 180˚ RELATIVE TO U BURST BEGINS WITH NEGATIVE HALF-CYCLE BURST PHASE = 180˚ RELATIVE TO U HSYNC H/2 H/2 HSYNC / 2 H/2 H/2 Figure 8.5. Four-Field (M) NTSC Sequence and Burst Blanking. NTSC Overview 265 QUADRATURE MODULATED SUBCARRIER PHASE = HUE AMPLITUDE = SATURATION "M" LINE / FIELD = 525 / 59.94 FH = 15.734 KHZ FV = 59.94 HZ FSC = 3.579545 MHZ BLANKING SETUP = 7.5 IRE VIDEO BANDWIDTH = 4.2 MHZ AUDIO CARRIER = 4.5 MHZ CHANNEL BANDWIDTH = 6 MHZ "NTSC–J" LINE / FIELD = 525 / 59.94 FH = 15.734 KHZ FV = 59.94 HZ FSC = 3.579545 MHZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 4.2 MHZ AUDIO CARRIER = 4.5 MHZ CHANNEL BANDWIDTH = 6 MHZ "NTSC 4.43" LINE / FIELD = 525 / 59.94 FH = 15.734 KHZ FV = 59.94 HZ FSC = 4.43361875 MHZ BLANKING SETUP = 7.5 IRE VIDEO BANDWIDTH = 4.2 MHZ AUDIO CARRIER = 4.5 MHZ CHANNEL BANDWIDTH = 6 MHZ Figure 8.6. Common NTSC Systems. encoding uses the PAL modulation format and a 4.43361875 MHz color subcarrier frequency. NTSC–J, used in Japan, is the same as (M) NTSC, except there is no blanking pedestal during active video. Thus, active video has a nominal amplitude of 714 mV. Noninterlaced NTSC is a 262-line, 60 frames-per-second version of NTSC, as shown in Figure 8.7. This format is identical to standard (M) NTSC, except that there are 262 lines per frame. RF Modulation Figures 8.8, 8.9, and 8.10 illustrate the basic process of converting baseband (M) NTSC composite video to an RF (radio frequency) signal. Figure 8.8a shows the frequency spectrum of a baseband composite video signal. It is similar to Figure 8.3. However, Figure 8.3 only shows the upper sideband for simplicity. The “video carrier” notation at 0 MHz serves only as a reference point for comparison with Figure 8.8b. Figure 8.8b shows the audio/video signal as it resides within a 6 MHz channel (such as channel 3). The video signal has been lowpass filtered, most of the lower sideband has been removed, and audio information has been added. Figure 8.8c details the information present on the audio subcarrier for stereo (BTSC) operation. As shown in Figures 8.9 and 8.10, back porch clamping (see glossary) of the analog video signal ensures that the back porch level is constant, regardless of changes in the average picture level. White clipping of the video signal prevents the modulated signal from going below 10%; below 10% may result in overmodulation and buzzing in television receivers. The video signal is then lowpass filtered to 4.2 MHz and drives the AM (amplitude modulation) video modulator. The sync level corresponds to 100% modulation, the blanking 266 Chapter 8: NTSC, PAL, and SECAM Overview START OF VSYNC 261 262 1 2 3 4 5 6 7 8 9 10 23 BURST BEGINS WITH POSITIVE HALF-CYCLE BURST PHASE = REFERENCE PHASE = 180˚ RELATIVE TO U BURST BEGINS WITH NEGATIVE HALF-CYCLE BURST PHASE = REFERENCE PHASE = 180˚ RELATIVE TO U Figure 8.7. Noninterlaced NTSC Frame Sequence. corresponds to 75%, and the white level corresponds to 10%. (M) NTSC systems use an IF (intermediate frequency) for the video of 45.75 MHz. At this point, audio information is added on a subcarrier at 41.25 MHz. A monaural audio signal is processed as shown in Figure 8.9 and drives the FM (frequency modulation) modulator. The output of the FM modulator is added to the IF video signal. The SAW filter, used as a vestigial sideband filter, provides filtering of the IF signal. The mixer, or up converter, mixes the IF signal with the desired broadcast frequency. Both sum and difference frequencies are generated by the mixing process, so the difference signal is extracted by using a bandpass filter. Stereo Audio (Analog) BTSC This standard, defined by EIA TVSB5 and known as the BTSC system (Broadcast Television Systems Committee), is shown in Figure 8.10. Countries that use this system include the United States, Canada, Mexico, Brazil, and Taiwan. To enable stereo, L–R information is transmitted using a suppressed AM subcarrier. A SAP (secondary audio program) channel may also be present, used to transmit a second language or video description (descriptive audio for the visually impaired). A professional channel may also be present, allowing communication with remote equipment and people. Zweiton M This standard (ITU-R BS.707), also known as A2 M, is similar to that used with PAL. The L+R information is transmitted on an FM subcarrier at 4.5 MHz. The L–R information, or a second L+R audio signal, is transmitted on a second FM subcarrier at 4.724212 MHz. If stereo or dual mono signals are present, the FM subcarrier at 4.724212 MHz is amplitude-modulated with a 55.0699 kHz subcarrier. This 55.0699 kHz subcarrier is 50% amplitudemodulated at 149.9 Hz to indicate stereo audio or 276.0 Hz to indicate dual mono audio. This system is used in South Korea. CHROMINANCE SUBCARRIER VIDEO CARRIER NTSC Overview 267 CHROMINANCE SUBCARRIER –4.5 –4.2 –3.58 –3.0 –1.0 0.0 1.0 (A) 3.0 3.58 4.2 4.5 FREQUENCY (MHZ) 0.75 MHZ VESTIGIAL SIDEBAND VIDEO CARRIER CHROMINANCE SUBCARRIER AUDIO CARRIER –4.0 –3.0 –0.75 0.0 1.0 3.0 6 MHZ CHANNEL 3.58 4.2 4.5 5.0 FREQUENCY (MHZ) –1.25 4.75 (B) AUDIO CARRIER STEREO PILOT FH = 15,734 HZ L+R (FM) L–R (AM) 0.0 FH 2 FH 3 FH 4 FH (C) SAP (FM) 5 FH PROFESSIONAL CHANNEL (FM) 6.5 FH FREQUENCY Figure 8.8. Transmission Channel for (M) NTSC. (A) Frequency spectrum of baseband composite video. (B) Frequency spectrum of typical channel including audio information. (C) Detailed frequency spectrum of BTSC stereo audio information. 268 Chapter 8: NTSC, PAL, and SECAM Overview EIA-J This standard is similar to BTSC, and is used in Japan. The L+R information is transmitted on an FM subcarrier at 4.5 MHz. The L–R signal, or a second L+R signal, is transmitted on a second FM subcarrier at +2FH. If stereo or dual mono signals are present, a +3.5FH subcarrier is amplitude-modulated with either a 982.5 Hz subcarrier (stereo audio) or a 922.5 Hz subcarrier (dual mono audio). Analog Channel Assignments Tables 8.1 through 8.4 list the typical channel assignments for VHF, UHF, and cable for various NTSC systems. Note that cable systems routinely reassign channel numbers to alternate frequencies to minimize interference and provide multiple levels of programming (such as regular and preview premium movie channels). AUDIO LEFT AUDIO RIGHT (M) NTSC COMPOSITE VIDEO L+R --------------- 75 µS PRE-EMPHASIS --------------50–15000 HZ BPF BACK PORCH CLAMP AND WHITE LEVEL CLIP 4.2 MHZ LPF FM MODULATOR AM MODULATOR 41–47 MHZ BANDWIDTH + SAW FILTER MIXER (UP CONVERTER) BANDPASS FILTER MODULATED RF AUDIO / VIDEO (6 MHZ BANDWIDTH) 5 KHZ CLOCK 45.75 MHZ IF VIDEO CARRIER PLL 41.25 MHZ IF AUDIO CARRIER PLL VIDEO CARRIER OF DESIRED CHANNEL PLL Figure 8.9. Typical RF Modulation Implementation for (M) NTSC: Mono Audio. NTSC Overview 269 PROFESSIONAL CHANNEL AUDIO 150 µS PRE-EMPHASIS --------------300–3,400 HZ BPF PROFESSIONAL CHANNEL FM MODULATOR 41.25 MHZ – 6.5FH IF AUDIO CARRIER 50–10,000 HZ BPF SECONDARY AUDIO --------------BTSC FM MODULATOR + COMPRESSION 41.25 MHZ – 5FH SECONDARY AUDIO PROGRAM (SAP) IF AUDIO CARRIER AUDIO LEFT AUDIO RIGHT (M) NTSC COMPOSITE VIDEO BACK PORCH CLAMP AND WHITE LEVEL CLIP FM STEREO PILOT SIGNAL 41.25 MHZ – FH L–R --------------50–15,000 HZ BPF --------------- BTSC COMPRESSION L+R --------------- 75 µS PRE-EMPHASIS --------------50–15,000 HZ BPF STEREO MODULATOR + + 41–47 MHZ BANDWIDTH 4.2 MHZ LPF AM MODULATOR + SAW FILTER MIXER (UP CONVERTER) BANDPASS FILTER MODULATED RF AUDIO / VIDEO 45.75 MHZ IF VIDEO CARRIER CHANNEL SELECT Figure 8.10. Typical RF Modulation Implementation for (M) NTSC: BTSC Stereo Audio. 270 Chapter 8: NTSC, PAL, and SECAM Overview Broadcast Channel – – 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – – 55.25 61.25 67.25 77.25 83.25 175.25 181.25 187.25 193.25 199.25 205.25 211.25 471.25 477.25 483.25 489.25 495.25 501.25 507.25 513.25 519.25 525.25 531.25 537.25 543.25 549.25 555.25 561.25 567.25 573.25 579.25 585.25 591.25 597.25 603.25 609.25 615.25 621.25 Audio Carrier (MHz) – – 59.75 65.75 71.75 81.75 87.75 179.75 185.75 191.75 197.75 203.75 209.75 215.75 475.75 481.75 487.75 493.75 499.75 505.75 511.75 517.75 523.75 529.75 535.75 541.75 547.75 553.75 559.75 565.75 571.75 577.75 583.75 589.75 595.75 601.75 607.75 613.75 619.75 625.75 Channel Range (MHz) – – 54–60 60–66 66–72 76–82 82–88 174–180 180–186 186–192 192–198 198–204 204–210 210–216 470–476 476–482 482–488 488–494 494–500 500–506 506–512 512–518 518–524 524–530 530–536 536–542 542–548 548–554 554–560 560–566 566–572 572–578 578–584 584–590 590–596 596–602 602–608 608–614 614–620 620–626 Broadcast Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 Video Carrier (MHz) 627.25 633.25 639.25 645.25 651.25 657.25 663.25 669.25 675.25 681.25 687.25 693.25 699.25 705.25 711.25 717.25 723.25 729.25 735.25 741.25 747.25 753.25 759.25 765.25 771.25 777.25 783.25 789.25 795.25 801.25 Audio Carrier (MHz) 631.75 637.75 643.75 649.75 655.75 661.75 667.75 673.75 679.75 685.75 691.75 697.75 703.75 709.75 715.75 721.75 727.75 733.75 739.75 745.75 751.75 757.75 763.75 769.75 775.75 781.75 787.75 793.75 799.75 805.75 Channel Range (MHz) 626–632 632–638 638–644 644–650 650–656 656–662 662–668 668–674 674–680 680–686 686–692 692–698 698–704 704–710 710–716 716–722 722–728 728–734 734–740 740–746 746–752 752–758 758–764 764–770 770–776 776–782 782–788 788–794 794–800 800–806 Table 8.1. Analog Broadcast Nominal Frequencies for North America. NTSC Overview 271 Broadcast Channel – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – 91.25 97.25 103.25 171.25 177.25 183.25 189.25 193.25 199.25 205.25 211.25 217.25 471.25 477.25 483.25 489.25 495.25 501.25 507.25 513.25 519.25 525.25 531.25 537.25 543.25 549.25 555.25 561.25 567.25 573.25 579.25 585.25 591.25 597.25 603.25 609.25 615.25 621.25 627.25 Audio Carrier (MHz) – 95.75 101.75 107.75 175.75 181.75 187.75 193.75 197.75 203.75 209.75 215.75 221.75 475.75 481.75 487.75 493.75 499.75 505.75 511.75 517.75 523.75 529.75 535.75 541.75 547.75 553.75 559.75 565.75 571.75 577.75 583.75 589.75 595.75 601.75 607.75 613.75 619.75 625.75 631.75 Channel Range (MHz) – 90–96 96–102 102–108 170–176 176–182 182–188 188–194 192–198 198–204 204–210 210–216 216–222 470–476 476–482 482–488 488–494 494–500 500–506 506–512 512–518 518–524 524–530 530–536 536–542 542–548 548–554 554–560 560–566 566–572 572–578 578–584 584–590 590–596 596–602 602–608 608–614 614–620 620–626 626–632 Broadcast Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 – – – – – – – Video Carrier (MHz) 633.25 639.25 645.25 651.25 657.25 663.25 669.25 675.25 681.25 687.25 693.25 699.25 705.25 711.25 717.25 723.25 729.25 735.25 741.25 747.25 753.25 759.25 765.25 – – – – – – – Audio Carrier (MHz) 637.75 643.75 649.75 655.75 661.75 667.75 673.75 679.75 685.75 691.75 697.75 703.75 709.75 715.75 721.75 727.75 733.75 739.75 745.75 751.75 757.75 763.75 769.75 – – – – – – – Channel Range (MHz) 632–638 638–644 644–650 650–656 656–662 662–668 668–674 674–680 680–686 686–692 692–698 698–704 704–710 710–716 716–722 722–728 728–734 734–740 740–746 746–752 752–758 758–764 764–770 – – – – – – – Table 8.2. Analog Broadcast Nominal Frequencies for Japan. 272 Chapter 8: NTSC, PAL, and SECAM Overview Cable Channel – – 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – – 55.25 61.25 67.25 77.25 83.25 175.25 181.25 187.25 193.25 199.25 205.25 211.25 121.2625 127.2625 133.2625 139.25 145.25 151.25 157.25 163.25 169.25 217.25 223.25 229.2625 235.2625 241.2625 247.2625 253.2625 259.2625 265.2625 271.2625 277.2625 283.2625 289.2625 295.2625 301.2625 307.2625 313.2625 Audio Carrier (MHz) – – 59.75 65.75 71.75 81.75 87.75 179.75 185.75 191.75 197.75 203.75 209.75 215.75 125.7625 131.7625 137.7625 143.75 149.75 155.75 161.75 167.75 173.75 221.75 227.75 233.7625 239.7625 245.7625 251.7625 257.7625 263.7625 269.7625 275.7625 281.7625 287.7625 293.7625 299.7625 305.7625 311.7625 317.7625 Channel Range (MHz) – – 54–60 60–66 66–72 76–82 82–88 174–180 180–186 186–192 192–198 198–204 204–210 210–216 120–126 126–132 132–138 138–144 144–150 150–156 156–162 162–168 168–174 216–222 222–228 228–234 234–240 240–246 246–252 252–258 258–264 264–270 270–276 276–282 282–288 288–294 294–300 300–306 306–312 312–318 Cable Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Video Carrier (MHz) 319.2625 325.2625 331.2750 337.2625 343.2625 349.2625 355.2625 361.2625 367.2625 373.2625 379.2625 385.2625 391.2625 397.2625 403.25 409.25 415.25 421.25 427.25 433.25 439.25 445.55 451.25 457.25 463.25 469.25 475.25 481.25 487.25 493.25 499.25 505.25 511.25 517.25 523.25 529.25 535.25 541.25 547.25 553.25 Audio Carrier (MHz) 323.7625 329.7625 335.7750 341.7625 347.7625 353.7625 359.7625 365.7625 371.7625 377.7625 383.7625 389.7625 395.7625 401.7625 407.75 413.75 419.75 425.75 431.75 437.75 443.75 449.75 455.75 461.75 467.75 473.75 479.75 485.75 491.75 497.75 503.75 509.75 515.75 521.75 527.75 533.75 539.75 545.75 551.75 557.75 Channel Range (MHz) 318–324 324–330 330–336 336–342 342–348 348–354 354–360 360–366 366–372 372–378 378–384 384–390 390–396 396–402 402–408 408–414 414–420 420–426 426–432 432–438 438–444 444–450 450–456 456–462 462–468 468–474 474–480 480–486 486–492 492–498 498–504 504–510 510–516 516–522 522–528 528–534 534–540 540–546 546–552 552–558 Table 8.3a. Standard Analog Cable TV Nominal Frequencies for USA. NTSC Overview 273 Cable Channel 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 Video Carrier (MHz) 559.25 565.25 571.25 577.25 583.25 589.25 595.25 601.25 607.25 613.25 619.25 625.25 631.25 637.25 643.25 91.25 97.25 103.25 109.2750 115.2750 649.25 655.25 661.25 667.25 673.25 679.25 685.25 691.25 697.25 703.25 709.25 715.25 721.25 727.25 733.25 739.25 745.25 751.25 757.25 763.25 Audio Carrier (MHz) 563.75 569.75 575.75 581.75 587.75 593.75 599.75 605.75 611.75 617.75 623.75 629.75 635.75 641.75 647.75 95.75 101.75 107.75 113.7750 119.7750 653.75 659.75 665.75 671.75 677.75 683.75 689.75 695.75 701.75 707.75 713.75 719.75 725.75 731.75 737.75 743.75 749.75 755.75 761.75 767.75 Channel Range (MHz) 558–564 564–570 570–576 576–582 582–588 588–594 594–600 600–606 606–612 612–618 618–624 624–630 630–636 636–642 642–648 90–96 96–102 102–108 108–114 114–120 648–654 654–660 660–666 666–672 672–678 678–684 684–690 690–696 696–702 702–708 708–714 714–720 720–726 726–732 732–738 738–744 744–750 750–756 756–762 762–768 Cable Channel 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 – Video Carrier (MHz) 769.25 775.25 781.25 787.25 793.25 799.25 805.25 811.25 817.25 823.25 829.25 835.25 841.25 847.25 853.25 859.25 865.25 871.25 877.25 883.25 889.25 895.25 901.25 907.25 913.25 919.25 925.25 931.25 937.25 943.25 949.25 955.25 961.25 967.25 973.25 979.25 985.25 991.25 997.25 – Audio Carrier (MHz) 773.75 779.75 785.75 791.75 797.75 803.75 809.75 815.75 821.75 827.75 833.75 839.75 845.75 851.75 857.75 863.75 869.75 875.75 881.75 887.75 893.75 899.75 905.75 911.75 917.75 923.75 929.75 935.75 941.75 947.75 953.75 959.75 965.75 971.75 977.75 983.75 989.75 995.75 1001.75 – Channel Range (MHz) 768–774 774–780 780–786 786–792 792–798 798–804 804–810 810–816 816–822 822–828 828–834 834–840 840–846 846–852 852–858 858–864 864–870 870–876 876–882 882–888 888–894 894–900 900–906 906–912 912–918 918–924 924–930 930–936 936–942 942–948 948–954 954–960 960–966 966–972 972–978 978–984 984–990 990–996 996–1002 – Table 8.3b. Standard Analog Cable TV Nominal Frequencies for USA. 274 Chapter 8: NTSC, PAL, and SECAM Overview Cable Channel – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – 73.2625 55.2625 61.2625 67.2625 79.2625 85.2625 175.2625 181.2625 187.2625 193.2625 199.2625 205.2625 211.2625 121.2625 127.2625 133.2625 139.2625 145.2625 151.2625 157.2625 163.2625 169.2625 217.2625 223.2625 229.2625 235.2625 241.2625 247.2625 253.2625 259.2625 265.2625 271.2625 277.2625 283.2625 289.2625 295.2625 301.2625 307.2625 313.2625 Audio Carrier (MHz) – 77.7625 59.7625 65.7625 71.7625 83.7625 89.7625 179.7625 185.7625 191.7625 197.7625 203.7625 209.7625 215.7625 125.7625 131.7625 137.7625 143.7625 149.7625 155.7625 161.7625 167.7625 173.7625 221.7625 227.7625 233.7625 239.7625 245.7625 251.7625 257.7625 263.7625 269.7625 275.7625 281.7625 287.7625 293.7625 299.7625 305.7625 311.7625 317.7625 Cable Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Video Carrier (MHz) 319.2625 325.2625 331.2750 337.2625 343.2625 349.2625 355.2625 361.2625 367.2625 373.2625 379.2625 385.2625 391.2625 397.2625 403.2625 409.2625 415.2625 421.2625 427.2625 433.2625 439.2625 445.2625 451.2625 457.2625 463.2625 469.2625 475.2625 481.2625 487.2625 493.2625 499.2625 505.2625 511.2625 517.2625 523.2625 529.2625 535.2625 541.2625 547.2625 553.2625 Audio Carrier (MHz) 323.7625 329.7625 335.7750 341.7625 347.7625 353.7625 359.7625 365.7625 371.7625 377.7625 383.7625 389.7625 395.7625 401.7625 407.7625 413.7625 419.7625 425.7625 431.7625 437.7625 443.7625 449.7625 455.7625 461.7625 467.7625 473.7625 479.7625 485.7625 491.7625 497.7625 503.7625 509.7625 515.7625 521.7625 527.7625 533.7625 539.7625 545.7625 551.7625 557.7625 Table 8.3c. Analog Cable TV Nominal Frequencies for USA: Incrementally Related Carrier (IRC) Systems. NTSC Overview 275 Cable Channel 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 Video Carrier (MHz) 559.2625 565.2625 571.2625 577.2625 583.2625 589.2625 595.2625 601.2625 607.2625 613.2625 619.2625 625.2625 631.2625 637.2625 643.2625 91.2625 97.2625 103.2625 109.2750 115.2625 649.2625 655.2625 661.2625 667.2625 673.2625 679.2625 685.2625 691.2625 697.2625 703.2625 709.2625 715.2625 721.2625 727.2625 733.2625 739.2625 745.2625 751.2625 757.2625 763.2625 Audio Carrier (MHz) 563.7625 569.7625 575.7625 581.7625 587.7625 593.7625 599.7625 605.7625 611.7625 617.7625 623.7625 629.7625 635.7625 641.7625 647.7625 95.7625 101.7625 107.7625 113.7750 119.7625 653.7625 659.7625 665.7625 671.7625 677.7625 683.7625 689.7625 695.7625 701.7625 707.7625 713.7625 719.7625 725.7625 731.7625 737.7625 743.7625 749.7625 755.7625 761.7625 767.7625 Cable Channel 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 – Video Carrier (MHz) Audio Carrier (MHz) 769.2625 775.2625 781.2625 787.2625 793.2625 799.2625 805.2625 811.2625 817.2625 823.2625 829.2625 835.2625 841.2625 847.2625 853.2625 859.2625 865.2625 871.2625 877.2625 883.2625 889.2625 895.2625 901.2625 907.2625 913.2625 919.2625 925.2625 931.2625 937.2625 943.2625 949.2625 955.2625 961.2625 967.2625 973.2625 979.2625 985.2625 991.2625 997.2625 – 773.7625 779.7625 785.7625 791.7625 797.7625 803.7625 809.7625 815.7625 821.7625 827.7625 833.7625 839.7625 845.7625 851.7625 857.7625 863.7625 869.7625 875.7625 881.7625 887.7625 893.7625 899.7625 905.7625 911.7625 917.7625 923.7625 929.7625 935.7625 941.7625 947.7625 953.7625 959.7625 965.7625 971.7625 977.7625 983.7625 989.7625 995.7625 1001.7625 – Table 8.3d. Analog Cable TV Nominal Frequencies for USA: Incrementally Related Carrier (IRC) Systems. 276 Chapter 8: NTSC, PAL, and SECAM Overview Cable Channel – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – 72.0036 54.0027 60.0030 66.0033 72.0036 78.0039 174.0087 180.0090 186.0093 192.0096 198.0099 204.0102 210.0105 120.0060 126.0063 132.0066 138.0069 144.0072 150.0075 156.0078 162.0081 168.0084 216.0108 222.0111 228.0114 234.0117 240.0120 246.0123 252.0126 258.0129 264.0132 270.0135 276.0138 282.0141 288.0144 294.0147 300.0150 306.0153 312.0156 Audio Carrier (MHz) – 76.5036 58.5027 64.5030 70.5030 82.5039 88.5042 178.5087 184.5090 190.5093 196.5096 202.5099 208.5102 214.5105 124.5060 130.5063 136.5066 142.5069 148.5072 154.5075 160.5078 166.5081 172.5084 220.5108 226.5111 232.5114 238.5117 244.5120 250.5123 256.5126 262.5129 268.5132 274.5135 280.5138 286.5141 292.5144 298.5147 304.5150 310.5153 316.5156 Cable Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Video Carrier (MHz) 318.0159 324.0162 330.0165 336.0168 342.0168 348.0168 354.0168 360.0168 366.0168 372.0168 378.0168 384.0168 390.0168 396.0168 402.0201 408.0204 414.0207 420.0210 426.0213 432.0216 438.0219 444.0222 450.0225 456.0228 462.0231 468.0234 474.0237 480.0240 486.0243 492.0246 498.0249 504.0252 510.0255 516.0258 522.0261 528.0264 534.0267 540.0270 546.0273 552.0276 Audio Carrier (MHz) 322.5159 328.5162 334.5165 340.5168 346.5168 352.5168 358.5168 364.5168 370.5168 376.5168 382.5168 388.5168 394.5168 400.5168 406.5201 412.5204 418.5207 424.5210 430.5213 436.5216 442.5219 448.5222 454.5225 460.5228 466.5231 472.5234 478.5237 484.5240 490.5243 496.5246 502.5249 508.5252 514.5255 520.5258 526.5261 532.5264 538.5267 544.5270 550.5273 556.5276 Table 8.3e. Analog Cable TV Nominal Frequencies for USA: Harmonically Related Carrier (HRC) Systems. NTSC Overview 277 Cable Channel 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 Video Carrier (MHz) 558.0279 564.0282 570.0285 576.0288 582.0291 588.0294 594.0297 600.0300 606.0303 612.0306 618.0309 624.0312 630.0315 636.0318 642.0321 90.0045 96.0048 102.0051 – – 648.0324 654.0327 660.0330 666.0333 672.0336 678.0339 684.0342 690.0345 696.0348 702.0351 708.0354 714.0357 720.0360 726.0363 732.0366 738.0369 744.0372 750.0375 756.0378 762.0381 Audio Carrier (MHz) 562.5279 568.5282 574.5285 580.5288 586.5291 592.5294 598.5297 604.5300 610.5303 616.5306 622.5309 628.5312 634.5315 640.5318 646.5321 94.5045 100.5048 106.5051 – – 652.5324 658.5327 664.5330 670.5333 676.5336 682.5339 688.5342 694.5345 700.5348 706.5351 712.5354 718.5357 724.5360 730.5363 736.5366 742.5369 748.5372 754.5375 760.5378 766.5381 Cable Channel 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 – Video Carrier (MHz) Audio Carrier (MHz) 768.0384 774.0387 780.0390 786.0393 792.0396 798.0399 804.0402 810.0405 816.0408 822.0411 828.0414 834.0417 840.0420 846.0423 852.0426 858.0429 864.0432 870.0435 876.0438 882.0441 888.0444 894.0447 900.0450 906.0453 912.0456 918.0459 924.0462 930.0465 936.0468 942.0471 948.0474 954.0477 960.0480 966.0483 972.0486 978.0489 984.0492 990.0495 996.0498 – 772.5384 778.5387 784.5390 790.5393 796.5396 802.5399 808.5402 814.5405 820.5408 826.5411 832.5414 838.5417 844.5420 850.5423 856.5426 862.5429 868.5432 874.5435 880.5438 888.5441 892.5444 898.5447 904.5450 910.5453 916.5456 922.5459 928.5462 934.5465 940.5468 946.5471 952.5474 958.5477 964.5480 970.5483 976.5486 982.5489 988.5492 994.5495 1000.5498 – Table 8.3f. Analog Cable TV Nominal Frequencies for USA: Harmonically Related Carrier (HRC) Systems. 278 Chapter 8: NTSC, PAL, and SECAM Overview Cable Channel – – – 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) – – – 109.25 115.25 121.25 127.25 133.25 139.25 145.25 151.25 157.25 165.25 223.25 231.25 237.25 243.25 249.25 253.25 259.25 265.25 271.25 277.25 283.25 289.25 295.25 301.25 307.25 313.25 319.25 Audio Carrier (MHz) – – – 113.75 119.75 125.75 131.75 137.75 143.75 149.75 155.75 161.75 169.75 227.75 235.75 241.75 247.75 253.75 257.75 263.75 269.75 275.75 281.75 287.75 293.75 299.75 305.75 311.75 317.75 323.75 Channel Range (MHz) – – – 108–114 114–120 120–126 126–132 132–138 138–144 144–150 150–156 156–162 164–170 222–228 230–236 236–242 242–248 248–254 252–258 258–264 264–270 270–276 276–282 282–288 288–294 294–300 300–306 306–312 312–318 318–324 Cable Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 – – – – – – Video Carrier (MHz) 325.25 331.25 337.25 343.25 349.25 355.25 361.25 367.25 373.25 379.25 385.25 391.25 397.25 403.25 409.25 415.25 421.25 427.25 433.25 439.25 445.25 451.25 457.25 463.25 – – – – – – Audio Carrier (MHz) 329.75 335.75 341.75 347.75 353.75 359.75 365.75 371.75 377.75 383.75 389.75 395.75 401.75 407.75 413.75 419.75 425.75 431.75 437.75 443.75 449.75 455.75 461.75 467.75 – – – – – – Channel Range (MHz) 324–330 330–336 336–342 342–348 348–354 354–360 360–366 366–372 372–378 378–384 384–390 390–396 396–402 402–408 408–414 414–420 420–426 426–432 432–438 438–444 444–450 450–456 456–462 462–468 – – – – – – Table 8.4. Analog Cable TV Nominal Frequencies for Japan. NTSC Overview 279 Luminance Equation Derivation The equation for generating luminance from RGB is determined by the chromaticities of the three primary colors used by the receiver and what color white actually is. The chromaticities of the RGB primaries and reference white (CIE illuminate C) were specified in the 1953 NTSC standard to be: R: xr = 0.67 yr = 0.33 zr = 0.00 G: xg = 0.21 yg = 0.71 zg = 0.08 B: xb = 0.14 yb = 0.08 zb = 0.78 white: xw = 0.3101 zw = 0.3737 yw = 0.3162 where x and y are the specified CIE 1931 chromaticity coordinates; z is calculated by knowing that x + y + z = 1. Luminance is calculated as a weighted sum of RGB, with the weights representing the actual contributions of each of the RGB primaries in generating the luminance of reference white. We find the linear combination of RGB that gives reference white by solving the equation: xr xg xb Kr xw ⁄ yw yr yg yb Kg = 1 zr zg zb Kb zw ⁄ yw Rearranging to solve for Kr, Kg, and Kb yields: –1 Kr xw ⁄ yw xr xg xb Kg = 1 yr yg yb Kb zw ⁄ yw zr zg zb Substituting the known values gives us the solution for Kr, Kg, and Kb: Kr –1 0.3101 ⁄ 0.3162 0.67 0.21 0.14 Kg = 1 0.33 0.71 0.08 Kb 0.3737 ⁄ 0.3162 0.00 0.08 0.78 0.9807 1.730 –0.482 –0.261 = 1 –0.814 1.652 –0.023 1.1818 0.083 –0.169 1.284 0.906 = 0.827 1.430 Y is defined to be Y = (Kryr)R´ + (Kgyg)G´ + (Kbyb)B´ = (0.906)(0.33)R´ + (0.827)(0.71)G´ + (1.430)(0.08)B´ or Y = 0.299R´ + 0.587G´ + 0.114B´ Modern receivers use a different set of RGB phosphors, resulting in slightly different chromaticities of the RGB primaries and reference white (CIE illuminate D65): R: xr = 0.630 yr = 0.340 zr = 0.030 G: xg = 0.310 yg = 0.595 zg = 0.095 B: xb = 0.155 yb = 0.070 zb = 0.775 white: xw = 0.3127 zw = 0.3583 yw = 0.3290 where x and y are the specified CIE 1931 chromaticity coordinates; z is calculated by know- ing that x + y + z = 1. Once again, substituting the known values gives us the solution for Kr, Kg, and Kb: 280 Chapter 8: NTSC, PAL, and SECAM Overview Kr –1 0.3127 ⁄ 0.3290 0.630 0.310 0.155 Kg == 1 0.340 0.595 0.070 Kb 0.3583 ⁄ 0.3290 0.030 0.095 0.775 0.6243 = 1.1770 1.2362 Since Y is defined to be Y = (Kryr)R´ + (Kgyg)G´ + (Kbyb)B´ = (0.6243)(0.340)R´ + (1.1770)(0.595)G´ + (1.2362)(0.070)B´ this results in: Y = 0.212R´ + 0.700G´ + 0.086B´ However, the standard Y = 0.299R´ + 0.587G´ + 0.114B´ equation is still used. Adjustments are made in the receiver to minimize color errors. PAL Overview Europe delayed adopting a color television standard, evaluating various systems between 1953 and 1967 that were compatible with their 625-line, 50-field-per-second, 2:1 interlaced monochrome standard. The NTSC specification was modified to overcome the high order of phase and amplitude integrity required during broadcast to avoid color distortion. The Phase Alternation Line (PAL) system implements a line-by-line reversal of the phase of one of the color components, originally relying on the eye to average any color distortions to the correct color. Broadcasting began in 1967 in Germany and the United Kingdom, with each using a slightly different variant of the PAL system. Luminance Information The monochrome luminance (Y) signal is derived from R´G´B´: Y = 0.299R´ + 0.587G´ + 0.114B´ As with NTSC, the luminance signal occupies the entire video bandwidth. PAL has several variations, depending on the video bandwidth and placement of the audio subcarrier. The composite video signal has a bandwidth of 4.2, 5.0, 5.5, or 6.0 MHz, depending on the specific PAL standard. PAL Overview 281 Color Information To transmit color information, U and V are used: U = 0.492(B´ – Y) V = 0.877(R´ – Y) U and V have a typical bandwidth of 1.3 MHz. Color Modulation As in the NTSC system, U and V are used to modulate the color subcarrier using two balanced modulators operating in phase quadrature: one modulator is driven by the subcarrier at sine phase; the other modulator is driven by the subcarrier at cosine phase. The outputs of the modulators are added together to form the modulated chrominance signal: C = U sin ωt ± V cos ωt ω = 2πFSC FSC = 4.43361875 MHz (± 5 Hz) for (B, D, G, H, I, N) PAL FSC = 3.58205625 MHz (± 5 Hz) for (NC) PAL FSC = 3.57561149 MHz (± 10 Hz) for (M) PAL In PAL, the phase of V is reversed every other line. V was chosen for the reversal process since it has a lower gain factor than U and therefore is less susceptible to a one-half FH switching rate imbalance. The result of alternating the V phase at the line rate is that any color subcarrier phase errors produce complementary errors, allowing line-to-line averaging at the receiver to cancel the errors and generate the correct hue with slightly reduced saturation. This technique requires the PAL receiver to be able to determine the correct V phase. This is done using a technique known as AB sync, PAL sync, PAL switch, or swinging burst, consisting of alternating the phase of the color burst by ±45° at the line rate. The UV vector diagrams are shown in Figures 8.11 and 8.12. Simple PAL decoders rely on the eye to average the line-by-line hue errors. Standard PAL decoders use a 1H delay line to separate U from V in an averaging process. Both implementations have the problem of Hanover bars, in which pairs of adjacent lines have a real and complementary hue error. Chrominance vertical resolution is reduced as a result of the line averaging process. Composite Video Generation The modulated chrominance is added to the luminance information along with appropriate horizontal and vertical sync signals, blanking signals, and color burst signals, to generate the composite color video waveform shown in Figure 8.13. composite PAL = Y + U sin ωt ± V cos ωt + timing The bandwidth of the resulting composite video signal is shown in Figure 8.14. Like NTSC, the luminance components are spaced at FH intervals due to horizontal blanking. Since the V component is switched symmetrically at one-half the line rate, only odd harmonics are generated, resulting in V components that are spaced at intervals of FH. The V components are spaced at half-line intervals from the U components, which also have FH spacing. If the subcarrier had a half-line offset like NTSC uses, the U components would be perfectly interleaved, but the V components would coincide with the Y components and thus not be interleaved, creating vertical stationary dot patterns. For this reason, PAL uses a 1/4 line offset for the subcarrier frequency: 282 Chapter 8: NTSC, PAL, and SECAM Overview BURST 135˚ YELLOW 167˚ 67 RED 103˚ 95 +V 90˚ 100 80 60 40 20 IRE SCALE UNITS MAGENTA 61˚ 89 67 +U 0˚ BLUE 347˚ 89 95 GREEN 241˚ CYAN 283˚ Figure 8.11. UV Vector Diagram for 75% Color Bars. Line [n], PAL switch = 0. YELLOW 193˚ GREEN 120˚ +V 90˚ 100 IRE SCALE UNITS CYAN 77˚ 80 95 89 60 40 20 67 67 BLUE 13˚ +U 0˚ 89 BURST 95 225˚ RED 257˚ MAGENTA 300˚ Figure 8.12. UV Vector Diagram for 75% Color Bars. Line [n + 1], PAL switch = 1. PAL Overview 283 WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK WHITE LEVEL 100 IRE 21.43 IRE 21.43 IRE 43 IRE COLOR BURST (10 ± 1 CYCLES) BLACK / BLANK LEVEL SYNC LEVEL BLANK LEVEL LUMINANCE LEVEL COLOR SATURATION PHASE = HUE Figure 8.13. (B, D, G, H, I, NC) PAL Composite Video Signal for 75% Color Bars. 284 Chapter 8: NTSC, PAL, and SECAM Overview AMPLITUDE CHROMINANCE SUBCARRIER Y U U ±V ±V 0.0 1.0 2.0 3.0 4.0 4.43 5.0 5.5 (I) PAL AMPLITUDE CHROMINANCE SUBCARRIER FREQUENCY (MHZ) Y U U ±V ±V 0.0 1.0 2.0 3.0 4.0 4.43 5.0 (B, G, H) PAL FREQUENCY (MHZ) Figure 8.14. Video Bandwidths of Some PAL Systems. Y U V Y U V F FH / 4 FH / 2 FH Figure 8.15. Luma and Chroma Frequency Interleave Principle. PAL Overview 285 FSC = ((1135/4) + (1/625)) FH for (B, D, G, H, I, N) PAL FSC = (909/4) FH for (M) PAL FSC = ((917/4) + (1/625)) FH for (NC) PAL The additional (1/625) FH factor (equal to 25 Hz) provides motion to the color dot pattern, reducing its visibility. Figure 8.15 illustrates the resulting frequency interleaving. Eight complete fields are required to repeat a specific sample position, as shown in Figures 8.16 and 8.17. PAL Standards Figure 8.19 shows the common designations for PAL systems. The letters refer to the monochrome standard for line and field rate, video bandwidth (4.2, 5.0, 5.5, or 6.0 MHz), audio carrier relative frequency, and RF channel bandwidth (6.0, 7.0, or 8.0 MHz). PAL refers to the technique to add color information to the monochrome signal. Detailed timing parameters may be found in Table 8.9. Noninterlaced PAL, shown in Figure 8.18, is a 312-line, 50-frames-per-second version of PAL common among video games and onscreen displays. This format is identical to standard PAL, except that there are 312 lines per frame. RF Modulation Figures 8.20 and 8.21 illustrate the process of converting baseband (G) PAL composite video to an RF (radio frequency) signal. The process for the other PAL standards is similar, except primarily for the different video bandwidths and subcarrier frequencies. Figure 8.20a shows the frequency spectrum of a (G) PAL baseband composite video signal. It is similar to Figure 8.14. However, Figure 8.14 only shows the upper sideband for simplicity. The video carrier notation at 0 MHz serves only as a reference point for comparison with Figure 8.20b. Figure 8.20b shows the audio/video signal as it resides within an 8 MHz channel. The video signal has been lowpass filtered, most of the lower sideband has been removed, and audio information has been added. Note that (H) and (I) PAL have a vestigial sideband of 1.25 MHz, rather than 0.75 MHz. Figure 8.20c details the information present on the audio subcarrier for analog stereo operation. As shown in Figure 8.21, back porch clamping of the analog video signal ensures that the back porch level is constant, regardless of changes in the average picture level. The video signal is then lowpass filtered to 5.0 MHz and drives the AM (amplitude modulation) video modulator. The sync level corresponds to 100% modulation; the blanking and white modulation levels are dependent on the specific version of PAL: blanking level (% modulation) B, G 75% D, H, M, N 75% I 76% white level (% modulation) B, G, H, M, N 10% D 10% I 20% Note that PAL systems use a variety of video and audio IF frequencies (values in MHz): 286 Chapter 8: NTSC, PAL, and SECAM Overview START OF VSYNC ANALOG FIELD 1 620 621 622 623 624 625 1 2 3 4 5 –U COMPONENT OF BURST PHASE ANALOG FIELD 2 6 7 308 309 310 311 312 313 314 315 316 317 318 319 320 ANALOG FIELD 3 620 621 622 623 624 625 1 2 3 4 ANALOG FIELD 4 5 6 7 308 309 310 311 312 313 314 315 316 317 318 319 320 23 24 336 337 23 24 336 337 BURST BLANKING INTERVALS FIELD ONE FIELD TWO FIELD THREE FIELD FOUR BURST PHASE = REFERENCE PHASE = 135˚ RELATIVE TO U PAL SWITCH = 0, + V COMPONENT BURST PHASE = REFERENCE PHASE + 90˚ = 225˚ RELATIVE TO U PAL SWITCH = 1, – V COMPONENT Figure 8.16a. Eight-Field (B, D, G, H, I, NC) PAL Sequence and Burst Blanking. See Figure 8.5 for equalization and serration pulse details. START OF VSYNC ANALOG FIELD 5 PAL Overview 287 620 621 622 623 624 625 1 2 3 4 5 –U COMPONENT OF BURST PHASE ANALOG FIELD 6 6 7 23 24 308 309 310 311 312 313 314 315 316 317 318 319 320 ANALOG FIELD 7 336 337 620 621 622 623 624 625 1 2 3 4 ANALOG FIELD 8 5 6 7 23 24 308 309 310 311 312 313 314 315 316 317 318 319 320 336 337 BURST BLANKING INTERVALS FIELD FIVE FIELD SIX FIELD SEVEN FIELD EIGHT BURST PHASE = REFERENCE PHASE = 135˚ RELATIVE TO U PAL SWITCH = 0, + V COMPONENT BURST PHASE = REFERENCE PHASE + 90˚ = 225˚ RELATIVE TO U PAL SWITCH = 1, – V COMPONENT Figure 8.16b. Eight-Field (B, D, G, H, I, NC) PAL Sequence and Burst Blanking. See Figure 8.5 for equalization and serration pulse details. 288 Chapter 8: NTSC, PAL, and SECAM Overview START OF VSYNC ANALOG FIELD 1 / 5 520 521 522 523 524 525 1 2 3 4 5 6 7 8 9 ANALOG FIELD 2 / 6 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 ANALOG FIELD 3 / 7 520 521 522 523 524 525 1 2 3 4 5 6 7 8 9 ANALOG FIELD 4 / 8 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 BURST PHASE = REFERENCE PHASE = 135˚ RELATIVE TO U PAL SWITCH = 0, + V COMPONENT BURST PHASE = REFERENCE PHASE + 90˚ = 225˚ RELATIVE TO U PAL SWITCH = 1, – V COMPONENT Figure 8.17. Eight-Field (M) PAL Sequence and Burst Blanking. See Figure 8.5 for equalization and serration pulse details. START OF VSYNC PAL Overview 289 308 309 310 311 312 1 2 3 4 5 6 7 BURST PHASE = REFERENCE PHASE = 135˚ RELATIVE TO U PAL SWITCH = 0, + V COMPONENT BURST PHASE = REFERENCE PHASE + 90˚ = 225˚ RELATIVE TO U PAL SWITCH = 1, – V COMPONENT Figure 8.18. Noninterlaced PAL Frame Sequence. 23 24 video audio B, G B D D I I M, N 38.900 36.875 37.000 38.900 38.900 39.500 45.750 33.400 31.375 30.500 32.400 32.900 33.500 41.250 Australia China OIRT U.K. At this point, audio information is added on the audio subcarrier. A monaural L+R audio signal is processed as shown in Figure 8.21 and drives the FM (frequency modulation) modulator. The output of the FM modulator is added to the IF video signal. The SAW filter, used as a vestigial sideband filter, provides filtering of the IF signal. The mixer, or up converter, mixes the IF signal with the desired broadcast frequency. Both sum and difference frequencies are generated by the mixing process, so the difference signal is extracted by using a bandpass filter. Stereo Audio (Analog) The standard (ITU-R BS.707), also known as Zweiton or A2, is shown in Figure 8.21. The L+R information is transmitted on a FM subcarrier. The R information, or a second L+R audio signal, is transmitted on a second FM subcarrier at +15.5FH. If stereo or dual mono signals are present, the FM subcarrier at +15.5FH is amplitudemodulated with a 54.6875 kHz (3.5FH) subcarrier. This 54.6875 kHz subcarrier is 50% amplitude-modulated at 117.5 Hz (FH/133) to indicate stereo audio or 274.1 Hz (FH/57) to indicate dual mono audio. Countries that use this system include Australia, Austria, China, Germany, Italy, Malaysia, Netherlands, Slovenia, and Switzerland. Stereo Audio (Digital) The standard uses NICAM 728 (Near Instantaneous Companded Audio Multiplex), 290 Chapter 8: NTSC, PAL, and SECAM Overview QUADRATURE MODULATED SUBCARRIER PHASE = HUE AMPLITUDE = SATURATION LINE ALTERNATION OF V COMPONENT "I" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ FSC = 4.43361875 MHZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 5.5 MHZ AUDIO CARRIER = 5.9996 MHZ CHANNEL BANDWIDTH = 8 MHZ "B, B1, G, H" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ FSC = 4.43361875 MHZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 5.0 MHZ AUDIO CARRIER = 5.5 MHZ CHANNEL BANDWIDTH: B = 7 MHZ B1, G, H = 8 MHZ "M" LINE / FIELD = 525 / 59.94 FH = 15.734 KHZ FV = 59.94 HZ FSC = 3.57561149 MHZ BLANKING SETUP = 7.5 IRE VIDEO BANDWIDTH = 4.2 MHZ AUDIO CARRIER = 4.5 MHZ CHANNEL BANDWIDTH = 6 MHZ "D" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ FSC = 4.43361875 MHZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 6.0 MHZ AUDIO CARRIER = 6.5 MHZ CHANNEL BANDWIDTH = 8 MHZ "N" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ FSC = 4.43361875 MHZ BLANKING SETUP = 7.5 IRE VIDEO BANDWIDTH = 5.0 MHZ AUDIO CARRIER = 5.5 MHZ CHANNEL BANDWIDTH = 6 MHZ "NC" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ FSC = 3.58205625 MHZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 4.2 MHZ AUDIO CARRIER = 4.5 MHZ CHANNEL BANDWIDTH = 6 MHZ Figure 8.19. Common PAL Systems. CHROMINANCE SUBCARRIER VIDEO CARRIER PAL Overview 291 CHROMINANCE SUBCARRIER –5.5 –5.0 –4.43 –1.0 0.0 1.0 (A) 4.43 5.0 5.5 FREQUENCY (MHZ) 0.75 MHZ VESTIGIAL SIDEBAND VIDEO CARRIER CHROMINANCE SUBCARRIER AUDIO CARRIER –4.0 –3.0 –0.75 0.0 –1.25 1.0 8 MHZ CHANNEL 4.43 5.0 5.5 FREQUENCY (MHZ) 6.75 (B) AUDIO CARRIER FH = 15,625 HZ L+R (FM) –50 KHZ 0.0 50 KHZ R (FM) 15.5 FH 15.5 FH – 50 KHZ 15.5 FH + 50 KHZ (C) FREQUENCY Figure 8.20. Transmission Channel for (G) PAL. (A) Frequency spectrum of baseband composite video. (B) Frequency spectrum of typical channel including audio information. (C) Detailed frequency spectrum of Zweiton analog stereo audio information. 292 Chapter 8: NTSC, PAL, and SECAM Overview discussed within BS.707 and ETSI EN 300 163. It was developed by the BBC and IBA to increase sound quality, provide multiple channels of digital sound or data, and be more resistant to transmission interference. The subcarrier resides either 5.85 MHz above the video carrier for (B, D, G, H) PAL and (L) SECAM systems or 6.552 MHz above the video carrier for (I) PAL systems. Countries that use NICAM 728 include Belgium, China, Denmark, Finland, France, Hungary, New Zealand, Norway, Singapore, South Africa, Spain, Sweden, and the United Kingdom. NICAM 728 is a digital system that uses a 32 kHz sampling rate and 14-bit resolution. A bit-rate of 728 kbps is used, giving it the name NICAM 728. Data is transmitted in frames, with each frame containing 1 ms of audio. As shown in Figure 8.22, each frame consists of: 8-bit frame alignment word (01001110) 5 control bits (C0–C4) 11 undefined bits (AD0–AD10) 704 audio data bits (A000–A703) C0 is a “1” for eight successive frames and a “0” for the next eight frames, defining a 16frame sequence. C1–C3 specify the format transmitted: “000” = one stereo signal with the left channel being odd-numbered samples and the right channel being even-numbered samples, “010” = two independent mono channels transmitted in alternate frames, “100” = one mono channel and one 352 kbps data channel 117.5 HZ 274.1 HZ STEREO PILOT SIGNAL 3.5FH AM MODULATOR 50 µS PRE-EMPHASIS --------------40–15,000 HZ BPF FM MODULATOR AM MODULATOR AUDIO RIGHT AUDIO LEFT 33.4 MHZ – 15.5FH IF AUDIO CARRIER L+R --------------- 50 µS PRE-EMPHASIS FM MODULATOR + --------------- 40–15,000 HZ BPF 33.4 MHZ IF AUDIO CARRIER (G) PAL COMPOSITE VIDEO BACK PORCH CLAMP 5.0 MHZ LPF AM MODULATOR + SAW FILTER MIXER (UP CONVERTER) BANDPASS FILTER MODULATED RF AUDIO / VIDEO 38.9 MHZ IF VIDEO CARRIER 33.15–40.15 MHZ BANDWIDTH CHANNEL SELECT Figure 8.21. Typical RF Modulation Implementation for (G) PAL: Zweiton Stereo Audio. PAL Overview 293 transmitted in alternate frames, “110” = one 704 kbps data channel. C4 is a “1” if the analog sound is the same as the digital sound. Stereo Audio Encoding The 32 14-bit samples (1 ms of audio, 2’s complement format) per channel are preemphasized to the ITU-T J.17 curve. The largest positive or negative sample of the 32 is used to determine which 10 bits of all 32 samples to transmit. Three range bits per channel (R0L, R1L, R2L, and R0R, R1R, R2R) are used to indicate the scaling factor. D13 is the sign bit (“0” = positive). D13–D0 R2–R0 Bits Used 01xxxxxxxxxxxx 111 D13, D12–D4 001xxxxxxxxxxx 110 D13, D11–D3 0001xxxxxxxxxx 101 D13, D10–D2 00001xxxxxxxxx 011 D13, D9–D1 000001xxxxxxxx 101 D13, D8–D0 0000001xxxxxxx 010 D13, D8–D0 0000000xxxxxxx 00x D13, D8–D0 1111111xxxxxxx 00x D13, D8–D0 1111110xxxxxxx 010 D13, D8–D0 111110xxxxxxxx 100 D13, D8–D0 11110xxxxxxxxx 011 D13, D9–D1 1110xxxxxxxxxx 101 D13, D10–D2 110xxxxxxxxxxx 110 D13, D11–D3 10xxxxxxxxxxxx 111 D13, D12–D4 A parity bit for the six MSBs of each sample is added, resulting in each sample being 11 bits. The 64 samples are interleaved, generating L0, R0, L1, R1, L2, R2, ..., L31, R31, and numbered 0–63. The parity bits are used to convey to the decoder what scaling factor was used for each channel (“signaling-in-parity”). If R2L = “0,” even parity for samples 0, 6, 12, 18, ..., 48 is used. If R2L = “1,” odd parity is used. If R2R = “0,” even parity for samples 1, 7, 13, 19, ..., 49 is used. If R2R = “1,” odd parity is used. If R1L = “0,” even parity for samples 2, 8, 14, 20, ..., 50 is used. If R1L = “1,” odd parity is used. If R1R = “0,” even parity for samples 3, 9, 15, 21, ..., 51 is used. If R1R = “1,” odd parity is used. If R0L = “0,” even parity for samples 4, 10, 16, 22, ..., 52 is used. If R0L = “1,” odd parity is used. If R0R = “0,” even parity for samples 5, 11, 17, 23, ..., 53 is used. If R0R = “1,” odd parity is used. FRAME ALIGNMENT WORD CONTROL BITS ADDITIONAL DATA BITS 0, 1, 0, 0, 1, 1, 1, 0, C0, C1, C2, C3, C4, AD0, AD1, AD2, AD3, AD4, AD5, AD6, AD7, AD8, AD9, AD10, 704 BITS AUDIO DATA A000, A044, A088, ... A660, A001, A045, A089, ... A661, A002, A046, A090, ... A662, A003, A047, A091, ... A663, : A043, A087, A131, ... A703 Figure 8.22. NICAM 728 Bitstream for One Frame. 294 Chapter 8: NTSC, PAL, and SECAM Overview The parity of samples 54–63 is normally even. However, they may be modified to transmit two additional bits of information: If CIB0 = “0,” even parity for samples 54, 55, 56, 57, and 58 is used. If CIB0 = “1,” odd parity is used. If CIB1 = “0,” even parity for samples 59, 60, 61, 62, and 63 is used. If CIB1 = “1,” odd parity is used. The audio data is bit-interleaved as shown in Figure 8.22 to reduce the influence of dropouts. If the bits are numbered 0–703, they are transmitted in the order 0, 44, 88, ..., 660, 1, 45, 89, ..., 661, 2, 46, 90, ..., 703. The whole frame, except the frame alignment word, is exclusive-ORed with a 1-bit pseudo-random binary sequence (PRBS). The PRBS generator is reinitialized after the frame alignment word of each frame so that the first bit of the sequence processes the C0 bit. The polynomial of the PRBS is x9 + x4 + 1 with an initialization word of “111111111.” Actual transmission consists of taking bits in pairs from the 728 kbps bitstream, then generating 356k symbols per second using Differential Quadrature Phase-Shift Keying (DQPSK). If the symbol is “00,” the subcarrier phase is left unchanged. If the symbol is “01,” the subcarrier phase is delayed 90°. If the symbol is “11,” the subcarrier phase is inverted. If the symbol is “10,” the subcarrier phase is advanced 90°. Finally, the signal is spectrum-shaped to a –30 dB bandwidth of ~700 kHz for (I) PAL or ~500 kHz for (B, G) PAL. Stereo Audio Decoding A PLL locks to the NICAM subcarrier fre- quency and recovers the phase changes that represent the encoded symbols. The symbols are decoded to generate the 728 kbps bitstream. The frame alignment word is found and the following bits are exclusive-ORed with a locally generated PRBS to recover the packet. The C0 bit is tested for 8 frames high, 8 frames low behavior to verify it is a NICAM 728 bitstream. The bit-interleaving of the audio data is reversed, and the signaling-in-parity decoded: A majority vote is taken on the parity of samples 0, 6, 12, ..., 48. If even, R2L = “0”; if odd, R2L = “1.” A majority vote is taken on the parity of samples 1, 7, 13, ..., 49. If even, R2R = “0”; if odd, R2R = “1.” A majority vote is taken on the parity of samples 2, 8, 14, ..., 50. If even, R1L = “0”; if odd, R1L = “1.” A majority vote is taken on the parity of samples 3, 9, 15, ..., 51. If even, R1R = “0”; if odd, R1R = “1.” A majority vote is taken on the parity of samples 4, 10, 16, ..., 52. If even, R0L = “0”; if odd, R0L = “1.” A majority vote is taken on the parity of samples 5, 11, 17, ..., 53. If even, R0R = “0”; if odd, R0R = “1.” PAL Overview 295 A majority vote is taken on the parity of samples 54, 55, 56, 57, and 58. If even, CIB0 = “0”; if odd, CIB0 = “1.” A majority vote is taken on the parity of samples 59, 60, 61, 62, and 63. If even, CIB1 = “0”; if odd, CIB1 = “1.” Any samples whose parity disagreed with the vote are ignored and replaced with an interpolated value. The left channel uses range bits R2L, R1L, and R0L to determine which bits below the sign bit were discarded during encoding. The sign bit is duplicated into those positions to generate a 14-bit sample. The right channel is similarly processed, using range bits R2R, R1R, and R0R. Both channels are then de-emphasized using the J.17 cur ve. Dual Mono Audio Encoding Two blocks of 32 14-bit samples (2 ms of audio, 2’s complement format) are pre-emphasized to the ITU-T J.17 specification. As with the stereo audio, three range bits per block (R0A, R1A, R2A, and R0B, R1B, R2B) are used to indicate the scaling factor. Unlike stereo audio, the samples are not interleaved. If R2A = “0,” even parity for samples 0, 3, 6, 9, ..., 24 is used. If R2A = “1,” odd parity is used. If R2B = “0,” even parity for samples 27, 30, 33, ..., 51 is used. If R2B = “1,” odd parity is used. If R1A = “0,” even parity for samples 1, 4, 7, 10, ..., 25 is used. If R1A = “1,” odd parity is used. If R1B = “0,” even parity for samples 28, 31, 34, ..., 52 is used. If R1B = “1,” odd parity is used. If R0A = “0,” even parity for samples 2, 5, 8, 11, ..., 26 is used. If R0A = “1,” odd parity is used. If R0B = “0,” even parity for samples 29, 32, 35, ..., 53 is used. If R0B = “1,” odd parity is used. The audio data is bit-interleaved; however, odd packets contain 64 samples of audio channel 1 while even packets contain 64 samples of audio channel 2. The rest of the processing is the same as for stereo audio. Analog Channel Assignments Tables 8.5 through 8.7 list the channel assignments for VHF, UHF, and cable for various PAL systems. Note that cable systems routinely reassign channel numbers to alternate frequencies to minimize interference and provide multiple levels of programming (such as two versions of a premium movie channel: one for subscribers, and one for nonsubscribers during pre-view times). Luminance Equation Derivation The equation for generating luminance from RGB information is determined by the chromaticities of the three primary colors used by the receiver and what color white actually is. The chromaticities of the RGB primaries and reference white (CIE illuminate D65) are: R: xr = 0.64 yr = 0.33 zr = 0.03 G: xg = 0.29 yg = 0.60 zg = 0.11 B: xb = 0.15 yb = 0.06 zb = 0.79 white: xw = 0.3127 zw = 0.3583 yw = 0.3290 296 Chapter 8: NTSC, PAL, and SECAM Overview Channel Video Carrier (MHz) Audio Carrier (MHz) Channel Range (MHz) (B) PAL, Australia, 7 MHz Channel 0 46.25 51.75 45–52 1 57.25 62.75 56–63 2 64.25 69.75 63–70 3 86.25 91.75 85–92 4 95.25 100.75 94–101 5 102.25 107.75 101–108 5A 138.25 143.75 137–144 6 175.25 180.75 174–181 7 182.25 187.75 181–188 8 189.25 194.75 188–195 9 196.25 201.75 195–202 10 209.25 214.75 208–215 11 216.25 221.75 215–222 12 223.25 (I) PAL, Ireland, 8 MHz Channel 1 45.75 51.75 44.5–52.5 2 53.75 59.75 52.5–60.5 3 61.75 67.75 60.5–68.5 4 175.25 181.25 174–182 5 183.25 189.25 182–190 6 191.25 197.25 190–198 7 199.25 205.25 198–206 8 207.25 213.25 206–214 9 215.25 221.25 214–222 Channel Video Carrier (MHz) Audio Carrier (MHz) Channel Range (MHz) (B) PAL, Italy, 7 MHz Channel A B C D E F G H H–1 H–2 – – – – 53.75 62.25 82.25 175.25 183.75 192.25 201.25 210.25 217.25 224.25 – – – – 59.25 67.75 87.75 180.75 189.25 197.75 206.75 215.75 222.75 229.75 – – – – 52.5–59.5 61–68 81–88 174–181 182.5–189.5 191–198 200–207 209–216 216–223 223–230 – – – – (B) PAL, New Zealand, 7 MHz Channel 1 45.25 50.75 44–51 2 55.25 60.75 54–61 3 62.25 67.75 61–68 4 175.25 180.75 174–181 5 182.25 187.75 181–188 6 189.25 194.75 188–195 7 196.25 201.75 195–202 8 203.25 208.75 202–209 9 210.25 215.75 209–216 Table 8.5. Analog Broadcast and Cable TV Nominal Frequencies for (B, I) PAL in Various Countries. PAL Overview 297 Broadcast Channel 21 31 41 51 61 71 81 91 101 22 32 42 52 62 72 82 92 102 112 122 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Video Carrier (MHz) 45.75 53.75 61.75 175.25 183.25 191.25 199.25 207.25 215.25 48.25 55.25 62.25 175.25 182.25 189.25 196.25 203.25 210.25 217.25 224.25 471.25 479.25 487.25 495.25 503.25 511.25 519.25 527.25 535.25 543.25 551.25 559.25 567.25 575.25 583.25 591.25 599.25 607.25 615.25 Audio Carrier (MHz) (G, H) PAL 51.25 59.25 67.25 180.75 188.75 196.75 204.75 212.75 220.75 53.75 60.75 67.75 180.75 187.75 194.75 201.75 208.75 215.75 222.75 229.75 476.75 484.75 492.75 500.75 508.75 516.75 524.75 532.75 540.75 548.75 556.75 564.75 572.75 580.75 588.75 596.75 604.75 612.75 620.75 (I) PAL 51.75 59.75 67.75 181.25 189.25 197.25 205.25 213.25 221.25 – – – – – – – – – – – 477.25 485.25 493.25 501.25 509.25 517.25 525.25 533.25 541.25 549.25 557.25 565.25 573.25 581.25 589.25 597.25 605.25 613.25 621.25 Channel Range (MHz) 44.5–52.5 52.5–60.5 60.5–68.5 174–182 182–190 190–198 198–206 206–214 214–222 47–54 54–61 61–68 174–181 181–188 188–195 195–202 202–209 209–216 216–223 223–230 470–478 478–486 486–494 494–502 502–510 510–518 518–526 526–534 534–542 542–550 550–558 558–566 566–574 574–582 582–590 590–598 598–606 606–614 614–622 Table 8.6a. Analog Broadcast Nominal Frequencies for the 1United Kingdom, 1Ireland, 1South Africa, 1Hong Kong, and 2Western Europe. 298 Chapter 8: NTSC, PAL, and SECAM Overview Broadcast Channel 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 Video Carrier (MHz) 623.25 631.25 639.25 647.25 655.25 663.25 671.25 679.25 687.25 695.25 703.25 711.25 719.25 727.25 735.25 743.25 751.25 759.25 767.25 775.25 783.25 791.25 799.25 807.25 815.25 823.25 831.25 839.25 847.25 855.25 Audio Carrier (MHz) (G, H) PAL 628.75 636.75 644.75 652.75 660.75 668.75 676.75 684.75 692.75 700.75 708.75 716.75 724.75 732.75 740.75 748.75 756.75 764.75 772.75 780.75 788.75 796.75 804.75 812.75 820.75 828.75 836.75 844.75 852.75 860.75 (I) PAL 629.25 637.25 645.25 653.25 661.25 669.25 677.25 685.25 693.25 701.25 709.25 717.25 725.25 733.25 741.25 749.25 757.25 765.25 773.25 781.25 789.25 797.25 805.25 813.25 821.25 829.25 837.25 845.25 853.25 861.25 Channel Range (MHz) 622–630 630–638 638–646 646–654 654–662 662–670 670–678 678–686 686–694 694–702 702–710 710–718 718–726 726–734 734–742 742–750 750–758 758–766 766–774 774–782 782–790 790–798 798–806 806–814 814–822 822–830 830–838 838–846 846–854 854–862 Table 8.6b. Analog Broadcast Nominal Frequencies for the United Kingdom, Ireland, South Africa, Hong Kong, and Western Europe. PAL Overview 299 Cable Channel E2 E3 E4 S 01 S 02 S 03 S1 S2 S3 S4 S5 S6 S7 S8 S9 S 10 – – – E5 E6 E7 E8 E9 E 10 E 11 E 12 – – – – Video Carrier (MHz) 48.25 55.25 62.25 69.25 76.25 83.25 105.25 112.25 119.25 126.25 133.25 140.75 147.75 154.75 161.25 168.25 – – – 175.25 182.25 189.25 196.25 203.25 210.25 217.25 224.25 – – – – Audio Carrier (MHz) 53.75 60.75 67.75 74.75 81.75 88.75 110.75 117.75 124.75 131.75 138.75 145.75 152.75 159.75 166.75 173.75 – – – 180.75 187.75 194.75 201.75 208.75 215.75 222.75 229.75 – – – – Channel Range (MHz) 47–54 54–61 61–68 68–75 75–82 82–89 104–111 111–118 118–125 125–132 132–139 139–146 146–153 153–160 160–167 167–174 – – – 174–181 181–188 188–195 195–202 202–209 209–216 216–223 223–230 – – – – Cable Channel S 11 S 12 S 13 S 14 S 15 S 16 S 17 S 18 S 19 S 20 S 21 S 22 S 23 S 24 S 25 S 26 S 27 S 28 S 29 S 30 S 31 S 32 S 33 S 34 S 35 S 36 S 37 S 38 S 39 S 40 S 41 Video Carrier (MHz) 231.25 238.25 245.25 252.25 259.25 266.25 273.25 280.25 287.25 294.25 303.25 311.25 319.25 327.25 335.25 343.25 351.25 359.25 367.25 375.25 383.25 391.25 399.25 407.25 415.25 423.25 431.25 439.25 447.25 455.25 463.25 Audio Carrier (MHz) 236.75 243.75 250.75 257.75 264.75 271.75 278.75 285.75 292.75 299.75 308.75 316.75 324.75 332.75 340.75 348.75 356.75 364.75 372.75 380.75 388.75 396.75 404.75 412.75 420.75 428.75 436.75 444.75 452.75 460.75 468.75 Channel Range (MHz) 230–237 237–244 244–251 251–258 258–265 265–272 272–279 279–286 286–293 293–300 302–310 310–318 318–326 326–334 334–342 342–350 350–358 358–366 366–374 374–382 382–390 390–398 398–406 406–414 414–422 422–430 430–438 438–446 446–454 454–462 462–470 Table 8.7. Analog Cable TV Nominal Frequencies for the United Kingdom, Ireland, South Africa, Hong Kong, and Western Europe. 300 Chapter 8: NTSC, PAL, and SECAM Overview where x and y are the specified CIE 1931 chromaticity coordinates; z is calculated by knowing that x + y + z = 1. As with NTSC, substituting the known values gives us the solution for Kr, Kg, and Kb: Kr –1 0.3127 ⁄ 0.3290 0.64 0.29 0.15 Kg = 1 0.33 0.60 0.06 Kb 0.3583 ⁄ 0.3290 0.03 0.11 0.79 0.674 = 1.177 1.190 Y is defined to be Y = (Kryr)R´ + (Kgyg)G´ + (Kbyb)B´ = (0.674)(0.33)R´ + (1.177)(0.60)G´ + (1.190)(0.06)B´ or Y = 0.222R´ + 0.706G´ + 0.071B´ However, the standard Y = 0.299R´ + 0.587G´ + 0.114B´ equation is still used. Adjustments are made in the receiver to minimize color errors. PALplus PALplus (ITU-R BT.1197 and ETSI ETS 300 731) is the result of a cooperative project started in 1990, undertaken by several European broadcasters. By 1995, they wanted to provide an enhanced definition television system (EDTV), compatible with existing receivers. PALplus has been transmitted by a few broadcasters since 1994. A PALplus picture has a 16:9 aspect ratio. On conventional TVs, it is displayed as a 16:9 letterboxed image with 430 active lines. On PALplus TVs, it is displayed as a 16:9 picture with 574 active lines, with extended vertical resolution. The full video bandwidth is available for luminance detail. Cross color artifacts are reduced by clean encoding. Wide Screen Signaling Line 23 contains a widescreen signaling (WSS) control signal, defined by ITU-R BT.1119 and ETSI EN 300 294, used by PALplus TVs. This signal indicates: Program Aspect Ratio: Full Format 4:3 Letterbox 14:9 Center Letterbox 14:9 Top Full Format 14:9 Center Letterbox 16:9 Center Letterbox 16:9 Top Full Format 16:9 Anamorphic Letterbox > 16:9 Center Enhanced services: Camera Mode Film Mode Subtitles: Teletext Subtitles Present Open Subtitles Present PALplus is defined as being letterbox 16:9 center, camera mode or film mode, helper signals present using modulation, and clean encoding used. Teletext subtitles may or may not be present, and open subtitles may be present only in the active picture area. During a PALplus transmission, any active video on lines 23 and 623 is blanked prior to encoding. In addition to WSS data, line 23 includes 48 ±1 cycles of a 300 ±9 mV subcarrier with a –U phase, starting 51 μs ±250 ns after 0H. Line 623 contains a 10 μs ±250 ns white pulse, starting 20 μs ±250 ns after 0H. PAL Overview 301 A PALplus TV has the option of deinterlacing a film mode signal and displaying it on a 50 Hz progressive-scan display or using field repeating on a 100 Hz interlaced display. Ghost Cancellation An optional ghost cancellation signal on line 318, defined by ITU-R BT.1124 and ETSI ETS 300 732, allows a suitably adapted TV to measure the ghost signal and cancel any ghosting during the active video. A PALplus TV may or may not support this feature. Vertical Filtering All PALplus sources start out as a 16:9 YCbCr anamorphic image, occupying all 576 active scan lines. Any active video on lines 23 and 623 is blanked prior to encoding (since these lines are used for WSS and reference information), resulting in 574 active lines per frame. Lines 24–310 and 336–622 are used for active video. Before transmission, the 574 active scan lines of the 16:9 image are squeezed into 430 scan lines. To avoid aliasing problems, the vertical resolution is reduced by lowpass filtering. For Y, vertical filtering is done using a quadrature mirror filter (QMF) highpass and lowpass pair. Using the QMF process allows the highpass and lowpass information to be resampled, transmitted, and later recombined with minimal loss. The Y QMF lowpass output is resampled into three-quarters of the original height; little information is lost to aliasing. After clean encoding, it is the letterboxed signal that conventional 4:3 TVs display. The Y QMF highpass output contains the rest of the original vertical frequency. It is used to generate the helper signals that are transmitted using the black scan lines not used by the letterbox picture. Film Mode A film mode broadcast has both fields of a frame coming from the same image, as is usually the case with a movie scanned on a telecine. In film mode, the maximum vertical resolution per frame is about 287 cycles per active picture height (cph), limited by the 574 active scan lines per frame. The vertical resolution of Y is reduced to 215 cph so it can be transmitted using only 430 active lines. The QMF lowpass and highpass filters split the Y vertical information into DC– 215 cph and 216–287 cph. The Y lowpass information is re-scanned into 430 lines to become the letterbox image. Since the vertical frequency is limited to a maximum of 215 cph, no information is lost. The Y highpass output is decimated so only one in four lines is transmitted. These 144 lines are used to transmit the helper signals. Because of the QMF process, no information is lost to decimation. The 72 lines above and 72 lines below the central 430-line of the letterbox image are used to transmit the 144 lines of the helper signal. This results in a standard 574 active line picture, but with the original image in its correct aspect ratio, centered between the helper signals. The scan lines containing the 300 mV helper signals are modulated using the U sub- 302 Chapter 8: NTSC, PAL, and SECAM Overview carrier so they look black and are not visible to the viewer. After Fixed ColorPlus processing, the 574 scan lines are PAL encoded and transmitted as a standard interlaced PAL frame. Camera Mode Camera (or video) mode assumes the fields of a frame are independent of each other, as would be the case when a camera scans a scene in motion. Therefore, the image may have changed between fields. Only intra-field processing is done. In camera mode, the maximum vertical resolution per field is about 143 cycles per active picture height (cph), limited by the 287 active scan lines per field. The vertical resolution of Y is reduced to 107 cph so it can be transmitted using only 215 active lines. The QMF lowpass and highpass filter pair split the Y vertical information into DC–107 cph and 108–143 cph. The Y lowpass information is re-scanned into 215 lines to become the letterbox image. Since the vertical frequency is limited to a maximum of 107 cph, no information is lost. The Y highpass output is decimated so only one in four lines is transmitted. These 72 lines are used to transmit the helper signals. Because of the QMF process, no information is lost to decimation. The 36 lines above and 36 lines below the central 215 line of the letterbox image are used to transmit the 72 lines of the helper signal. This results in a 287 active line picture, but with the original image in its correct aspect ratio, centered between the helper signals. The scan lines containing the 300 mV helper signals are modulated using the U subcarrier so they look black and are not visible to the viewer. After either Fixed or Motion Adaptive ColorPlus processing, the 287 scan lines are PAL encoded and transmitted as a PAL field. Clean Encoding Only the letterboxed portion of the PAL- plus signal is clean encoded. The helper signals are not actual PAL video. However, they are close enough to video to pass through the transmission path and remain fairly invisible on standard TVs. ColorPlus Processing Fixed ColorPlus Film mode uses a Fixed ColorPlus tech- nique, making use of the lack of motion between the two fields of the frame. Fixed ColorPlus depends on the subcarrier phase of the composite PAL signal being of opposite phase when 312 lines apart. If these two lines have the same luminance and chrominance information, it can be separated by adding and subtracting the composite signals from each other. Adding cancels the chrominance, leaving luminance. Subtracting cancels the luminance, leaving chrominance. In practice, Y information above 3 MHz (YHF) is intra-frame averaged since it shares the frequency spectrum with the modulated chrominance. For line [n], YHF is calculated as follows: 0 ≤ n ≤ 214 for 430-line letterboxed image YHF(60 + n) = 0.5(YHF(372 + n) + YHF(60 + n)) YHF(372 + n) = YHF(60 + n) YHF is then added to the low-frequency Y (YLF) information. The same intra-frame averaging process is also used for Cb and Cr. The 430-line letterbox image is then PAL encoded. SECAM Overview 303 Thus, Y information above 3 MHz, and CbCr information, is the same on lines [n] and [n+312]. Y information below 3 MHz may be different on lines [n] and [n+312]. The full vertical resolution of 287 cph is reconstructed by the decoder with the aid of the helper signals. Motion Adaptive ColorPlus (MACP) Camera mode uses either Motion Adaptive ColorPlus or Fixed ColorPlus, depending on the amount of motion between fields. This requires a motion detector in both the encoder and decoder. To detect movement, the CbCr data on lines [n] and [n+312] are compared. If they match, no movement is assumed, and Fixed ColorPlus operation is used. If the CbCr data doesn’t match, movement is assumed, and Motion Adaptive ColorPlus operation is used. During Motion Adaptive ColorPlus operation, the amount of YHF added to YLF is dependent on the difference between CbCr(n) and CbCr(n+312). For the maximum CbCr difference, no YHF data for lines [n] and [n+312] is transmitted. In addition, the amount of intra-frame averaged CbCr data mixed with the direct CbCr data is dependent on the difference between CbCr(n) and CbCr(n+312). For the maximum CbCr difference, only direct CbCr data is transmitted separately for lines [n] and [n+312]. SECAM Overview SECAM (Sequentiel Couleur Avec Mémoire, or Sequential Color with Memory) was developed in France, with broadcasting starting in 1967, by realizing that, if color could be bandwidth-limited horizontally, why not also vertically? The two pieces of color information (Db and Dr) added to the monochrome signal could be transmitted on alternate lines, avoiding the possibility of crosstalk. The receiver requires memory to store one line so that it is concurrent with the next line, and also requires the addition of a lineswitching identification technique. Like PAL, SECAM is a 625-line, 50-fieldper-second, 2:1 interlaced system. SECAM was adopted by other countries; however, many are changing to PAL due to the abundance of professional and consumer PAL equipment. Luminance Information The monochrome luminance (Y) signal is derived from R´G´B´: Y = 0.299R´ + 0.587G´ + 0.114B´ As with NTSC and PAL, the luminance signal occupies the entire video bandwidth. SECAM has several variations, depending on the video bandwidth and placement of the audio subcarrier. The video signal has a bandwidth of 5.0 or 6.0 MHz, depending on the specific SECAM standard. Color Information SECAM transmits Db information during one line and Dr information during the next line; luminance information is transmitted each line. Db and Dr are scaled versions of B´ – Y and R´ – Y: Dr = –1.902(R´ – Y) Db = 1.505(B´ – Y) Since there is an odd number of lines, any given line contains Db information on one field and Dr information on the next field. The decoder requires a 1H delay, switched synchronously with the Db and Dr switching, so 304 Chapter 8: NTSC, PAL, and SECAM Overview that Db and Dr exist simultaneously in order to ratio of highly saturated colors. This pre- convert to YCbCr or RGB. emphasis is given as: Color Modulation SECAM uses FM modulation to transmit the Db and Dr color difference information, with each component having its own subcarrier. Db and Dr are lowpass filtered to 1.3 MHz and pre-emphasis is applied. The curve for the pre-emphasis is expressed by: 1 + j ⎛ ⎝ 8---f5--⎠⎞ A = ------------------------- 1 + j ⎛ ⎝ 2----5f---5-⎠⎞ where ƒ = signal frequency in kHz. After pre-emphasis, Db and Dr frequency modulate their respective subcarriers. The frequency of each subcarrier is defined as: FOB = 272 FH = 4.250000 MHz (± 2 kHz) FOR = 282 FH = 4.406250 MHz (± 2 kHz) These frequencies represent no color information. Nominal Dr deviation is ±280 kHz and the nominal Db deviation is ±230 kHz. Figure 8.23 illustrates the frequency modulation process of the color difference signals. The choice of frequency shifts reflects the idea of keeping the frequencies representing critical colors away from the upper limit of the spectrum to minimize distortion. After modulation of Db and Dr, subcarrier pre-emphasis is applied, changing the amplitude of the subcarrier as a function of the frequency deviation. The intention is to reduce the visibility of the subcarriers in areas of low luminance and to improve the signal-to-noise G = M---1----+-----j--1---6----F---1 + j1.26F where F = (ƒ/4286) – (4286/ƒ), ƒ = instantaneous subcarrier frequency in kHz, and 2M = 23 ± 2.5% of luminance amplitude. As shown in Figure 8.24, Db and Dr information is transmitted on alternate scan lines. The initial phase of the color subcarrier is also modified as shown in Table 8.8 to further reduce subcarrier visibility. Note that subcarrier phase information in the SECAM system carries no picture information. Composite Video Generation The subcarrier data is added to the luminance along with appropriate horizontal and vertical sync signals, blanking signals, and burst signals to generate composite video. As with PAL, SECAM requires some means of identifying the line-switching sequence. Modern practice has been to use an FOR/FOB burst after most horizontal syncs to derive the switching synchronization information, as shown in Figure 8.25. SECAM Standards Figure 8.26 shows the common designations for SECAM systems. The letters refer to the monochrome standard for line and field rates, video bandwidth (5.0 or 6.0 MHz), audio carrier relative frequency, and RF channel bandwidth. The SECAM refers to the technique to add color information to the monochrome signal. Detailed timing parameters may be found in Table 8.9. SECAM Overview 305 Y RED + DR ODD LINES GRAY CYAN – DR YELLOW – DB EVEN LINES GRAY BLUE + DB FOB 4.25 FOR 4.40 6 MHZ Figure 8.23. SECAM FM Color Modulation. The initial phase subcarrier undergoes in each line a variation defined by Frame to frame: 0°, 180°, 0°, 180° ... Line to line: 0°, 0°, 180°, 0°, 0°, 180° ... or 0°, 0°, 0°, 180°, 180°, 180° ... Table 8.8. SECAM Subcarrier Timing. 306 Chapter 8: NTSC, PAL, and SECAM Overview DR DB DR DB START OF VSYNC ANALOG FIELD 1 620 621 622 623 624 625 1 2 3 4 5 6 7 DR DB 23 24 DB DR DB ANALOG FIELD 2 DB DR 308 309 310 311 312 313 314 315 316 317 318 319 320 DB DR DB DR ANALOG FIELD 3 336 337 DB DR 620 621 622 623 624 625 1 2 3 4 5 6 7 DR DB DR ANALOG FIELD 4 23 24 DR DB 308 309 310 311 312 313 314 315 316 317 318 319 320 336 337 Figure 8.24. Four-Field SECAM Sequence. See Figure 8.5 for equalization and serration pulse details. BLANK LEVEL FO HORIZONTAL BLANKING SYNC LEVEL Figure 8.25. SECAM Chroma Synchronization Signals. SECAM Overview 307 Luminance Equation Derivation The equation for generating luminance from RGB information is determined by the chromaticities of the three primary colors used by the receiver and what color white actually is. The chromaticities of the RGB primaries and reference white (CIE illuminate D65) are: R: xr = 0.64 yr = 0.33 zr = 0.03 G: xg = 0.29 yg = 0.60 zg = 0.11 B: xb = 0.15 yb = 0.06 zb = 0.79 white: xw = 0.3127 yw = 0.3290 zw = 0.3583 FREQUENCY MODULATED SUBCARRIERS FOR = 282 FH FOB = 272 FH LINE SEQUENTIAL DR AND DB SIGNALS "D, K, K1, L" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 6.0 MHZ AUDIO CARRIER = 6.5 MHZ CHANNEL BANDWIDTH = 8 MHZ "B, G" LINE / FIELD = 625 / 50 FH = 15.625 KHZ FV = 50 HZ BLANKING SETUP = 0 IRE VIDEO BANDWIDTH = 5.0 MHZ AUDIO CARRIER = 5.5 MHZ CHANNEL BANDWIDTH: B = 7 MHZ G = 8 MHZ Figure 8.26. Common SECAM Systems. 308 Chapter 8: NTSC, PAL, and SECAM Overview where x and y are the specified CIE 1931 chro- maticity coordinates; z is calculated by knowing that x + y + z = 1. Once again, substituting the known values gives us the solution for Kr, Kg, and Kb: Kr –1 0.3127 ⁄ 0.3290 0.64 0.29 0.15 Kg = 1 0.33 0.60 0.06 Kb 0.3583 ⁄ 0.3290 0.03 0.11 0.79 0.674 = 1.177 1.190 Y is defined to be Y = (Kryr)R´ + (Kgyg)G´ + (Kbyb)B´ = (0.674)(0.33)R´ + (1.177)(0.60)G´ + (1.190)(0.06)B´ or Y = 0.222R´ + 0.706G´ + 0.071B´ However, the standard Y = 0.299R´ + 0.587G´ + 0.114B´ equation is still used. Adjustments are made in the receiver to minimize color errors. SECAM Overview 309 SCAN LINES PER FRAME FIELD FREQUENCY (FIELDS / SECOND) LINE FREQUENCY (HZ) PEAK WHITE LEVEL (IRE) SYNC TIP LEVEL (IRE) SETUP (IRE) PEAK VIDEO LEVEL (IRE) GAMMA OF RECEIVER VIDEO BANDWIDTH (MHZ) LUMINANCE SIGNAL M N B, G H 525 625 59.94 50 I D, K K1 L 625 50 15,734 15,625 15,625 100 100 100 –40 –40 (–43) –43 7.5 ± 2.5 7.5 ± 2.5 (0) 0 120 133 133 115 115 125 2.2 2.8 2.8 2.8 2.8 2.8 2.8 2.8 4.2 5.0 (4.2) 5.0 5.0 5.5 6.0 6.0 6.0 Y = 0.299R´ + 0.587G´ + 0.114B´ (RGB ARE GAMMA–CORRECTED) 1 Values in parentheses apply to (NC) PAL used in Argentina. Table 8.9a. Basic Characteristics of Color Video Signals. 310 Chapter 8: NTSC, PAL, and SECAM Overview Characteristics Nominal line period (μs) M 63.5555 Line blanking interval (μs) 10.7 ± 0.1 0H to start of active video (μs) 9.2 ± 0.1 Front porch (μs) 1.5 ± 0.1 Line synchronizing pulse (μs) Rise and fall time of line blanking (10%, 90%) (ns) Rise and fall time of line synchronizing pulses (10%, 90%) (ns) 4.7 ± 0.1 140 ± 20 140 ± 20 N 64 10.88 ± 0.64 9.6 ± 0.64 1.92 ± 0.64 4.99 ± 0.77 300 ± 100 ≤ 250 B, D, G, H, I K, K1, L, NC 64 11.85 ± 0.15 10.5 1.65 ± 0.15 4.7 ± 0.2 300 ± 100 250 ± 50 Notes: 1. 0H is at 50% point of falling edge of horizontal sync. 2. In case of different standards having different specifications and tolerances, the tightest specifi- cation and tolerance are listed. 3. Timing is measured between half-amplitude points on appropriate signal edges. Table 8.9b. Details of Line Synchronization Signals. SECAM Overview 311 Characteristics Field period (ms) Field blanking interval Rise and fall time of field blanking (10%, 90%) (ns) Duration of equalizing and synchronizing sequences Equalizing pulse width (μs) Serration pulse width (μs) Rise and fall time of synchronizing and equalizing pulses (10%, 90%) (ns) M 16.6833 20 lines 140 ± 20 3H 2.3 ± 0.1 4.7 ± 0.1 140 ± 20 N 20 19–25 lines ≤ 250 3H 2.43 ± 0.13 4.7 ± 0.8 < 250 B, D, G, H, I K, K1, L, NC 20 25 lines 300 ± 100 2.5 H 2.35 ± 0.1 4.7 ± 0.1 250 ± 50 Notes: 1. In case of different standards having different specifications and tolerances, the tightest specification and tolerance are listed. 2. Timing is measured between half-amplitude points on appropriate signal edges. Table 8.9c. Details of Field Synchronization Signals. 312 Chapter 8: NTSC, PAL, and SECAM Overview ATTENUATION OF COLOR DIFFERENCE SIGNALS START OF BURST AFTER 0H (µS) BURST DURATION (CYCLES) BURST PEAK AMPLITUDE M / NTSC U, V, I, Q: < 2 DB AT 1.3 MHZ > 20 DB AT 3.6 MHZ OR Q: < 2 DB AT 0.4 MHZ < 6 DB AT 0.5 MHZ > 6 DB AT 0.6 MHZ 5.3 ± 0.07 9±1 40 ± 1 IRE M / PAL < 2 DB AT 1.3 MHZ > 20 DB AT 3.6 MHZ B, D, G, H, I, N / PAL < 3 DB AT 1.3 MHZ > 20 DB AT 4 MHZ (> 20 DB AT 3.6 MHZ) B, D, G, K, K1, K / SECAM < 3 DB AT 1.3 MHZ > 30 DB AT 3.5 MHZ (BEFORE LOW–FREQUENCY PRE–CORRECTION) 5.8 ± 0.1 9±1 42.86 ± 4 IRE 5.6 ± 0.1 10 ± 1 (9 ± 1) 42.86 ± 4 IRE Note: Values in parentheses apply to (NC) PAL used in Argentina. Table 8.9d. Basic Characteristics of Color Video Signals. Video Test Signals Many industry-standard video test signals have been defined to help test the relative quality of encoding, decoding, and the transmission path, and to perform calibration. Note that some video test signals cannot properly be generated by providing RGB data to an encoder; in this case, YCbCr data may be used. If the video standard uses a 7.5-IRE setup, typically only test signals used for visual examination use the 7.5-IRE setup. Test signals designed for measurement purposes typically use a 0-IRE setup, providing the advantage of defining a known blanking level. Color Bars Overview Color bars are one of the standard video test signals, and there are several variations, depending on the video standard and application. For this reason, this section reviews the most common color bar formats. Color bars have two major characteristics: amplitude and saturation. The amplitude of a color bar signal is determined by: amplitude (%) = m-------a----x------(--R------,----G------,----B-----)---a-- × 100 max (R, G, B)b where max(R,G,B)a is the maximum value of R´G´B´ during colored bars and max(R,G,B)b is the maximum value of R´G´B´ during reference white. The saturation of a color bar signal is less than 100% if the minimum value of any one of the R´G´B´ components is not zero. The saturation is determined by: saturation (%) = 1 – ⎛ ⎝ m-m----a-i--n-x--(-(--RR---,-,--G-G----,,---BB----))⎠⎞ γ × 100 Video Test Signals 313 where min(R,G,B) and max(R,G,B) are the minimum and maximum values, respectively, of R´G´B´ during color bars, and γ is the gamma exponent, typically [1/0.45]. NTSC Color Bars In 1953, it was normal practice for the ana- log R´G´B´ signals to have a 7.5 IRE setup, and the original NTSC equations assumed this form of input to an encoder. Today, digital R´G´B´ or YCbCr signals typically do not include the 7.5 IRE setup, and the 7.5 IRE setup is added within the encoder. The different color bar signals are described by four amplitudes, expressed in percent, separated by oblique strokes. 100% saturation is implied, so saturation is not specified. The first and second numbers are the white and black amplitudes, respectively. The third and fourth numbers are the white and black amplitudes from which the color bars are derived. For example, 100/7.5/75/7.5 color bars would be 75% color bars with 7.5% setup in which the white bar has been set to 100% and the black bar to 7.5%. Since NTSC systems usually have the 7.5% setup, the two common color bars are 75/7.5/75/7.5 and 100/7.5/100/7.5, which are usually shortened to 75% and 100%, respectively. The 75% bars are most commonly used. Television transmitters do not pass information with an amplitude greater than about 120 IRE. Therefore, the 75% color bars are used for transmission testing. The 100% color bars may be used for testing in situations where a direct connection between equipment is possible. The 75/7.5/75/7.5 color bars are a part of the Electronic Industries Association EIA-189-A Encoded Color Bar Standard. Figure 8.27 shows a typical vectorscope display for full-screen 75% NTSC color bars. Figure 8.28 illustrates the video waveform for 75% color bars. Tables 8.10 and 8.11 list the luminance and chrominance levels for the two common color bar formats for NTSC. For reference, the RGB and YCbCr values to generate the standard NTSC color bars are shown in Tables 8.12 and 8.13. RGB is assumed to have a range of 0–255; YCbCr is assumed to have a range of 16–235 for Y and 16–240 for Cb and Cr. It is assumed any 7.5 IRE setup is implemented within the encoder. PAL Color Bars Unlike NTSC, PAL does not support a 7.5 IRE setup; the black and blank levels are the same. The different color bar signals are usually described by four amplitudes, expressed in percent, separated by oblique strokes. The first and second numbers are the maximum and minimum percentages, respectively, of R´G´B´ values for an uncolored bar. The third and fourth numbers are the maximum and minimum percentages, respectively, of R´G´B´ values for a colored bar. Since PAL systems have a 0% setup, the two common color bars are 100/0/75/0 and 100/0/100/0, which are usually shortened to 75% and 100%, respectively. The 75% color bars are used for transmission testing. The 100% color bars may be used for testing in situations where a direct connection between equipment is possible. The 100/0/75/0 color bars also are referred to as EBU (European Broadcast Union) color bars. All of the color bars discussed in this section are also a part of Specification of Television Standards for 625-line System-I Transmissions (1971) published by the Independent Television Authority (ITA) and the British Broadcasting Corporation (BBC), and ITU-R BT.471. 314 Chapter 8: NTSC, PAL, and SECAM Overview Figure 8.27. Typical Vectorscope Display for 75% NTSC Color Bars. Video Test Signals 315 WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK 100 IRE 20 IRE 20 IRE 7.5 IRE 40 IRE + 100 + 89 + 77 + 69 + 77 + 72 3.58 MHZ COLOR BURST (9 CYCLES) + 56 + 48 + 46 + 36 + 38 + 28 + 12 + 15 +7 –5 – 16 – 16 WHITE LEVEL (100 IRE) BLACK LEVEL (7.5 IRE) BLANK LEVEL (0 IRE) SYNC LEVEL (– 40 IRE) Figure 8.28. IRE Values for 75% NTSC Color Bars. white yellow cyan green magenta red blue black Luminance (IRE) 76.9 69.0 56.1 48.2 36.2 28.2 15.4 7.5 Chrominance Level (IRE) 0 62.1 87.7 81.9 81.9 87.7 62.1 0 Minimum Chrominance Excursion (IRE) – 37.9 12.3 7.3 –4.8 –15.6 –15.6 – Maximum Chrominance Excursion (IRE) – 100.0 100.0 89.2 77.1 72.1 46.4 – Chrominance Phase (degrees) – 167.1 283.5 240.7 60.7 103.5 347.1 – Table 8.10. 75/7.5/75/7.5 (75%) NTSC Color Bars. 316 Chapter 8: NTSC, PAL, and SECAM Overview white yellow cyan green magenta red blue black Luminance (IRE) 100.0 89.5 72.3 61.8 45.7 35.2 18.0 7.5 Chrominance Level (IRE) 0 82.8 117.0 109.2 109.2 117.0 82.8 0 Minimum Chrominance Excursion (IRE) – 48.1 13.9 7.2 –8.9 –23.3 –23.3 – Maximum Chrominance Excursion (IRE) – 130.8 130.8 116.4 100.3 93.6 59.4 – Chrominance Phase (degrees) – 167.1 283.5 240.7 60.7 103.5 347.1 – Table 8.11. 100/7.5/100/7.5 (100%) NTSC Color Bars. White Yellow Cyan Green Magenta Red Blue Black gamma-corrected RGB (gamma = 1/0.45) R´ 191 191 0 0 191 191 0 0 G´ 191 191 191 191 0 0 0 0 B´ 191 0 191 0 191 0 191 0 linear RGB R 135 135 0 0 135 135 0 0 G 135 135 135 135 0 0 0 0 B 135 0 135 0 135 0 135 0 YCbCr Y 180 162 131 112 84 65 35 16 Cb 128 44 156 72 184 100 212 128 Cr 128 142 44 58 198 212 114 128 Table 8.12. RGB and YCbCr Values for 75% NTSC Color Bars. Video Test Signals 317 White Yellow Cyan Green Magenta Red Blue Black gamma-corrected RGB (gamma = 1/0.45) R´ 255 255 0 0 255 255 0 0 G´ 255 255 255 255 0 0 0 0 B´ 255 0 255 0 255 0 255 0 linear RGB R 255 255 0 0 255 255 0 0 G 255 255 255 255 0 0 0 0 B 255 0 255 0 255 0 255 0 YCbCr Y 235 210 170 145 106 81 41 16 Cb 128 16 166 54 202 90 240 128 Cr 128 146 16 34 222 240 110 128 Table 8.13. RGB and YCbCr Values for 100% NTSC Color Bars. Figure 8.29 illustrates the video waveform for 75% color bars. Figure 8.30 shows a typical vectorscope display for full-screen 75% PAL color bars. Tables 8.14, 8.15, and 8.16 list the luminance and chrominance levels for the three common color bar formats for PAL. For reference, the RGB and YCbCr values to generate the standard PAL color bars are shown in Tables 8.17, 8.18, and 8.19. RGB is assumed to have a range of 0–255; YCbCr is assumed to have a range of 16–235 for Y and 16–240 for Cb and Cr. EIA Color Bars (NTSC) The EIA color bars (Figure 8.28 and Table 8.10) are a part of the EIA-189-A standard. The seven bars (gray, yellow, cyan, green, magenta, red, and blue) are at 75% amplitude, 100% saturation. The duration of each color bar is 1/7 of the active portion of the scan line. Note that the black bar in Figure 8.28 and Table 8.10 is not part of the standard and is shown for reference only. The color bar test signal allows checking for hue and color saturation accuracy. 318 Chapter 8: NTSC, PAL, and SECAM Overview white yellow cyan green magenta red blue black Luminance (volts) 0.700 0.465 0.368 0.308 0.217 0.157 0.060 0 Peak-to-Peak Chrominance U axis (volts) 0 0.459 0.155 0.304 0.304 0.155 0.459 0 V axis (volts) – 0.105 0.646 0.541 0.541 0.646 0.105 0 Total (volts) – 0.470 0.664 0.620 0.620 0.664 0.470 0 Chrominance Phase (degrees) Line n (135° burst) – 167 283.5 240.5 60.5 103.5 347 – Line n + 1 (225° burst) – 193 76.5 119.5 299.5 256.5 13.0 – Table 8.14. 100/0/75/0 (75%) PAL Color Bars. white yellow cyan green magenta red blue black Luminance (volts) 0.700 0.620 0.491 0.411 0.289 0.209 0.080 0 Peak-to-Peak Chrominance U axis (volts) 0 0.612 0.206 0.405 0.405 0.206 0.612 0 V axis (volts) – 0.140 0.861 0.721 0.721 0.861 0.140 0 Total (volts) – 0.627 0.885 0.827 0.827 0.885 0.627 0 Chrominance Phase (degrees) Line n (135° burst) – 167 283.5 240.5 60.5 103.5 347 – Line n + 1 (225° burst) – 193 76.5 119.5 299.5 256.5 13.0 – Table 8.15. 100/0/100/0 (100%) PAL Color Bars. Video Test Signals 319 white yellow cyan green magenta red blue black Luminance (volts) 0.700 0.640 0.543 0.483 0.392 0.332 0.235 0 Peak-to-Peak Chrominance U axis (volts) 0 0.459 0.155 0.304 0.304 0.155 0.459 0 V axis (volts) – 0.105 0.646 0.541 0.541 0.646 0.105 0 Total (volts) – 0.470 0.664 0.620 0.620 0.664 0.470 0 Chrominance Phase (degrees) Line n (135° burst) – 167 283.5 240.5 60.5 103.5 347 – Line n + 1 (225° burst) – 193 76.5 119.5 299.5 256.5 13.0 – Table 8.16. 100/0/100/25 (98%) PAL Color Bars. WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK 100 IRE 21.43 IRE 21.43 IRE 43 IRE 4.43 MHZ COLOR BURST (10 CYCLES) + 100 + 88 + 75 + 66 + 69 + 53 + 44 + 43 + 31 + 32 + 22 +9 +6 0 – 13 – 25 – 25 WHITE LEVEL (100 IRE) BLACK / BLANK LEVEL (0 IRE) SYNC LEVEL (– 43 IRE) Figure 8.29. IRE Values for 75% PAL Color Bars. 320 Chapter 8: NTSC, PAL, and SECAM Overview Figure 8.30. Typical Vectorscope Display for 75% PAL Color Bars. Video Test Signals 321 White Yellow Cyan Green Magenta Red Blue Black gamma-corrected RGB (gamma = 1/0.45) R´ 255 191 0 0 191 191 0 0 G´ 255 191 191 191 0 0 0 0 B´ 255 0 191 0 191 0 191 0 linear RGB R 255 135 0 0 135 135 0 0 G 255 135 135 135 0 0 0 0 B 255 0 135 0 135 0 135 0 YCbCr Y 235 162 131 112 84 65 35 16 Cb 128 44 156 72 184 100 212 128 Cr 128 142 44 58 198 212 114 128 Table 8.17. RGB and YCbCr Values for 75% PAL Color Bars. White Yellow Cyan Green Magenta Red Blue Black gamma-corrected RGB (gamma = 1/0.45) R´ 255 255 0 0 255 255 0 0 G´ 255 255 255 255 0 0 0 0 B´ 255 0 255 0 255 0 255 0 linear RGB R 255 255 0 0 255 255 0 0 G 255 255 255 255 0 0 0 0 B 255 0 255 0 255 0 255 0 YCbCr Y 235 210 170 145 106 81 41 16 Cb 128 16 166 54 202 90 240 128 Cr 128 146 16 34 222 240 110 128 Table 8.18. RGB and YCbCr Values for 100% PAL Color Bars. 322 Chapter 8: NTSC, PAL, and SECAM Overview White Yellow Cyan Green Magenta Red Blue Black gamma-corrected RGB (gamma = 1/0.45) R´ 255 255 44 44 255 255 44 44 G´ 255 255 255 255 44 44 44 44 B´ 255 44 255 44 255 44 255 44 linear RGB R 255 255 5 5 255 255 5 5 G 255 255 255 255 5 5 5 5 B 255 5 255 5 255 5 255 5 YCbCr Y 235 216 186 167 139 120 90 16 Cb 128 44 156 72 184 100 212 128 Cr 128 142 44 58 198 212 114 128 Table 8.19. RGB and YCbCr Values for 98% PAL Color Bars. EBU Color Bars (PAL) The EBU color bars are similar to the EIA color bars, except a 100 IRE white level is used (see Figure 8.29 and Table 8.14). The six colored bars (yellow, cyan, green, magenta, red, and blue) are at 75% amplitude, 100% saturation, while the white bar is at 100% amplitude. The duration of each color bar is 1/7 of the active portion of the scan line. Note that the black bar in Figure 8.29 and Table 8.14 is not part of the standard and is shown for reference only. The color bar test signal allows checking for hue and color saturation accuracy. SMPTE Bars (NTSC) This split-field test signal is composed of the EIA color bars for the first 2/3 of the field, the reverse blue bars for the next 1/12 of the field, and the PLUGE test signal for the remainder of the field. Reverse Blue Bars The reverse blue bars are composed of the blue, magenta, and cyan colors bars from the EIA/EBU color bars, but are arranged in a different order—blue, black, magenta, black, cyan, black, and white. The duration of each color bar is 1/7 of the active portion of the scan line. Typically, reverse blue bars are used with the EIA or EBU color bar signal in a split-field arrangement, with the EIA/EBU color bars comprising the first 3/4 of the field and the reverse blue bars comprising the remainder of the field. This split-field arrangement eases adjustment of chrominance and hue on a color monitor. Video Test Signals 323 PLUGE PLUGE (Picture Line-Up Generating Equipment) is a visual black reference, with one area blacker-than-black, one area at black, and one area lighter-than-black. The brightness of the monitor is adjusted so that the black and blacker-than-black areas are indistinguishable from each other and the lighterthan-black area is slightly lighter (the contrast should be at the normal setting). Additional test signals, such as a white pulse and modulated IQ signals, are usually added to facilitate testing and monitor alignment. The NTSC PLUGE test signal (shown in Figure 8.31) is composed of a 7.5 IRE (black level) pedestal with a 40 IRE “–I” phase modulation, a 100 IRE white pulse, a 40 IRE “+Q” phase modulation, and 3.5 IRE, 7.5 IRE, and 11.5 IRE pedestals. Typically, PLUGE is used as part of the SMPTE bars. For PAL, each country has its own slightly different PLUGE configuration, with most differences being in the black pedestal level used, and work is being done on a standard test signal. Figure 8.32 illustrates a typical PAL PLUGE test signal. Usually used as a fullscreen test signal, it is composed of a 0 IRE –I WHITE +Q + 100 3.58 MHZ COLOR BURST (9 CYCLES) + 20 + 27.5 + 27.5 + 11.5 + 3.5 – 12.5 – 12.5 – 20 MICROSECONDS = 9.7 19.1 28.5 38 47 49.6 52 54.5 BLACK LEVEL (7.5 IRE) BLANK LEVEL (0 IRE) SYNC LEVEL (– 40 IRE) Figure 8.31. PLUGE Test Signal for NTSC. IRE values are indicated. 324 Chapter 8: NTSC, PAL, and SECAM Overview pedestal with PLUGE (–2 IRE, 0 IRE, and 2 IRE pedestals) and a white pulse. The white pulse may have five levels of brightness (0, 25, 50, 75, and 100 IRE), depending on the scan line number, as shown in Figure 8.32. The PLUGE is displayed on scan lines that have non-zero IRE white pulses. ITU-R BT.1221 discusses considerations for various PAL systems. Y Bars The Y bars consist of the luminance-only levels of the EIA/EBU color bars; however, the black level (7.5 IRE for NTSC and 0 IRE for PAL) is included and the color burst is still present. The duration of each luminance bar is therefore 1/8 of the active portion of the scan line. Y bars are useful for color monitor adjustment and measuring luminance nonlinearity. Typically, the Y bars signal is used with the EIA or EBU color bar signal in a split-field arrangement, with the EIA/EBU color bars comprising the first 3/4 of the field and the Y bars signal comprising the remainder of the field. WHITE LEVEL (IRE) FULL DISPLAY LINE NUMBERS 4.43 MHZ COLOR BURST (10 CYCLES) + 21.43 +2 –2 – 21.43 MICROSECONDS = 22.5 24.8 27.1 29.4 + 100 63 / 375 + 75 115 / 427 + 50 167 / 479 + 25 219 / 531 0 271 / 583 BLANK / BLACK LEVEL (0 IRE) 41 52.6 SYNC LEVEL (– 43 IRE) Figure 8.32. PLUGE Test Signal for PAL. IRE values are indicated. Video Test Signals 325 Red Field The red field signal consists of a 75% amplitude, 100% saturation red chrominance signal. This is useful as the human eye is sensitive to static noise intermixed in a red field. Distortions that cause small errors in picture quality can be examined visually for the effect on the picture. Typically, the red field signal is used with the EIA/EBU color bars signal in a splitfield arrangement, with the EIA/EBU color bars comprising the first 3/4 of the field, and the red field signal comprising the remainder of the field. 10-Step Staircase This test signal is composed of ten unmodulated luminance steps of 10 IRE each, ranging from 0 IRE to 100 IRE, shown in Figure 8.33. This test signal may be used to measure luminance nonlinearity. Modulated Ramp The modulated ramp test signal, shown in Figure 8.34, is composed of a luminance ramp from 0 IRE to either 80 or 100 IRE, superimposed with modulated chrominance that has a phase of 0° ±1° relative to the burst. The 80 IRE ramp provides testing of the normal operating range of the system; a 100 IRE ramp may optionally be used to test the entire operating range. The peak-to-peak modulated chrominance is 40 ±0.5 IRE for (M) NTSC and 42.86 ±0.5 IRE for (B, D, G, H, I) PAL. Note a 0 IRE setup is used. The rise and fall times at the start and end of the modulated ramp envelope are 400 ±25 ns (NTSC systems) or approximately 1 μs (PAL systems). This test signal may be used to measure differential gain. The modulated ramp signal is preferred over a 5step or 10-step modulated staircase signal when testing digital systems. COLOR BURST 100 90 80 70 IRE LEVELS 60 50 40 30 20 10 WHITE LEVEL (100 IRE) BLANK LEVEL (0 IRE) MICROSECONDS = 17.5 21.5 25.5 29.5 33.5 37.5 41.5 45.5 49.5 53.5 SYNC LEVEL 61.8 Figure 8.33. Ten-Step Staircase Test Signal for NTSC and PAL. 326 Chapter 8: NTSC, PAL, and SECAM Overview Modulated Staircase The 5-step modulated staircase signal (a 10-step version is also used), shown in Figure 8.35, consists of 5 luminance steps, superimposed with modulated chrominance that has a phase of 0° ±1° relative to the burst. The peakto-peak modulated chrominance amplitude is 40 ±0.5 IRE for (M) NTSC and 42.86 ±0.5 IRE for (B, D, G, H, I) PAL. Note that a 0 IRE setup is used. The rise and fall times of each modulation packet envelope are 400 ±25 ns (NTSC systems) or approximately 1 μs (PAL systems). The luminance IRE levels for the 5-step modulated staircase signal are shown in Figure 8.35. This test signal may be used to measure differential gain. The modulated ramp signal is preferred over a 5-step or 10-step modulated staircase signal when testing digital systems. Modulated Pedestal The modulated pedestal test signal (also called a three-level chrominance bar), shown in Figure 8.36, is composed of a 50 IRE luminance pedestal, superimposed with three amplitudes of modulated chrominance that has a phase relative to the burst of –90° ±1°. The peak-to-peak amplitudes of the modulated chrominance are 20 ±0.5, 40 ±0.5, and 80 ±0.5 IRE for (M) NTSC and 20 ±0.5, 60 ±0.5, and 100 ±0.5 IRE for (B, D, G, H, I) PAL. Note a 0 IRE setup is used. The rise and fall times of each modulation packet envelope are 400 ±25 ns (NTSC systems) or approximately 1 μs (PAL systems). This test signal may be used to measure chrominance-to-luminance intermodulation and chrominance nonlinear gain. COLOR BURST 80 IRE BLANK LEVEL (0 IRE) MICROSECONDS = 14.9 20.2 SYNC LEVEL 51.5 56.7 61.8 Figure 8.34. 80 IRE Modulated Ramp Test Signal for NTSC and PAL. Video Test Signals 327 COLOR BURST 90 72 54 36 18 0 BLANK LEVEL (0 IRE) SYNC LEVEL Figure 8.35. Five-Step Modulated Staircase Test Signal for NTSC and PAL. COLOR BURST 50 IRE ± 10 IRE (± 10) ± 20 IRE (± 30) ± 40 IRE (± 50) BLANK LEVEL (0 IRE) MICROSECONDS = 10.0 17.9 29.8 41.7 SYNC LEVEL 53.6 61.6 Figure 8.36. Modulated Pedestal Test Signal for NTSC and PAL. PAL IRE values are shown in parentheses. 328 Chapter 8: NTSC, PAL, and SECAM Overview Multiburst The multiburst test signal for (M) NTSC, shown in Figure 8.37, consists of a white flag with a peak amplitude of 100 ±1 IRE and six frequency packets, each a specific frequency. The packets have a 40 ±1 IRE pedestal with peak-to-peak amplitudes of 60 ±0.5 IRE. Note a 0 IRE setup is used and the starting and ending point of each packet is at zero phase. The ITU multiburst test signal for (B, D, G, H, I) PAL, shown in Figure 8.38, consists of a 4 μs white flag with a peak amplitude of 80 ±1 IRE and six frequency packets, each a specific frequency. The packets have a 50 ±1 IRE pedestal with peak-to-peak amplitudes of 60 ±0.5 IRE. Note the starting and ending points of each packet are at zero phase. The gaps between packets are 0.4–2.0 μs. The ITU multiburst test signal may be present on line 18. The multiburst signals are used to test the frequency response of the system by measuring the peak-to-peak amplitudes of the packets. Line Bar The line bar is a single 100 ±0.5 IRE (reference white) pulse of 10 μs (PAL), 18 μs (NTSC), or 25 μs (PAL) that occurs anywhere within the active scan line time (rise and fall times are ≤ 1 μs). Note that the color burst is not present, and a 0 IRE setup is used. This test signal is used to measure line time distortion (line tilt or H tilt). A digital encoder or decoder does not generate line time distortion; the distortion is generated primarily by the analog filters and transmission channel. Multipulse The (M) NTSC multipulse contains a 2T pulse and 25T and 12.5T pulses with various high-frequency components, as shown in Figure 8.39. The (B, D, G, H, I) PAL multipulse is similar, except 20T and 10T pulses are used, and there is no 7.5 IRE setup. This test signal is typically used to measure the frequency response of the transmission channel. 100 IRE 0.5 (4) MHZ (CYCLES) 1.25 2 3 3.58 4.2 (8) (10) (14) (16) (18) 70 IRE COLOR BURST 40 IRE 10 IRE BLANK LEVEL (0 IRE) SYNC LEVEL Figure 8.37. Multiburst Test Signal for NTSC. COLOR BURST Video Test Signals 329 0.5 1 MHZ 2 4 4.8 5.8 80 IRE 50 IRE 20 IRE BLANK LEVEL (0 IRE) SYNC LEVEL MICROSECONDS = 12 20 24 30 36 42 48 54 62 Figure 8.38. ITU Multiburst Test Signal for PAL. 1.0 2.0 3.0 3.58 4.2 MHZ (4.0) (4.8) (5.8) (MHZ) 25T 12.5T 12.5T 12.5T 12.5T 2T (20T) (10T) (10T) (10T) (10T) COLOR BURST Figure 8.39. Multipulse Test Signal for NTSC and PAL. PAL values are shown in parentheses. 330 Chapter 8: NTSC, PAL, and SECAM Overview Field Square Wave The field square wave contains 100 ±0.5 IRE pulses for the entire active line time for Field 1 and blanked scan lines for Field 2. Note that the color burst is not present and a 0 IRE setup is used. This test signal is used to measure field time distortion (field tilt or V tilt). A digital encoder or decoder does not generate field time distortion; the distortion is generated primarily by the analog filters and transmission channel. Composite Test Signal NTC-7 Version for NTSC The NTC (U. S. Network Transmission Committee) has developed a composite test signal that may be used to test several video parameters, rather than using multiple test signals. The NTC-7 composite test signal for NTSC systems (shown in Figure 8.40) consists of a 100 IRE line bar, a 2T pulse, a 12.5T chrominance pulse, and a 5-step modulated staircase signal. 3.58 MHZ COLOR BURST (9 CYCLES) + 20 100 IRE 2T 12.5T 90 72 54 36 18 0 BLANK LEVEL (0 IRE) - 20 MICROSECONDS = 12 30 34 37 42 46 49 52 55 58 61 SYNC LEVEL (– 40 IRE) Figure 8.40. NTC-7 Composite Test Signal for NTSC, with Corresponding IRE Values. Video Test Signals 331 The line bar has a peak amplitude of 100 ±0.5 IRE, and 10–90% rise and fall times of 125 ±5 ns with an integrated sine-squared shape. It has a width at the 60 IRE level of 18 μs. The 2T pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 250 ±10 ns. The 12.5T chrominance pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 1562.5 ±50 ns. The 5-step modulated staircase signal consists of 5 luminance steps superimposed with a 40 ±0.5 IRE subcarrier that has a phase of 0° ±1° relative to the burst. The rise and fall times of each modulation packet envelope are 400 ±25 ns. The NTC-7 composite test signal may be present on line 17. ITU Version for PAL The ITU (BT.628 and BT.473) has devel- oped a composite test signal that may be used to test several video parameters, rather than using multiple test signals. The ITU composite test signal for PAL systems (shown in Figure 8.41) consists of a white flag, a 2T pulse, and a 5-step modulated staircase signal. The white flag has a peak amplitude of 100 ±1 IRE and a width of 10 μs. The 2T pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 200 ±10 ns. The 5-step modulated staircase signal consists of 5 luminance steps (whose IRE values are shown in Figure 8.41) superimposed with a 42.86 ±0.5 IRE subcarrier that has a phase of 60° ±1° relative to the U axis. The rise and fall times of each modulation packet envelope are approximately 1 μs. 4.43 MHZ COLOR BURST (10 CYCLES) 2T 100 IRE 100 80 60 40 20 0 BLANK LEVEL (0 IRE) MICROSECONDS = 12 22 26 30 40 44 48 52 56 60 SYNC LEVEL (– 43 IRE) Figure 8.41. ITU Composite Test Signal for PAL, with Corresponding IRE Values. 332 Chapter 8: NTSC, PAL, and SECAM Overview The ITU composite test signal may be present on line 330. U.K. Version The United Kingdom allows the use of a slightly different test signal since the 10T pulse is more sensitive to delay errors than the 20T pulse (at the expense of occupying less chrominance bandwidth). Selection of an appropriate pulse width is a trade-off between occupying the PAL chrominance bandwidth as fully as possible and obtaining a pulse with sufficient sensitivity to delay errors. Thus, the national test signal (developed by the British Broadcasting Corporation and the Independent Television Authority) in Figure 8.42 may be present on lines 19 and 332 for (I) PAL systems in the United Kingdom. The white flag has a peak amplitude of 100 ±1 IRE and a width of 10 μs. The 2T pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 200 ±10 ns. The 10T chrominance pulse has a peak amplitude of 100 ±0.5 IRE. The 5-step modulated staircase signal consists of 5 luminance steps (whose IRE values are shown in Figure 8.42) superimposed with a 21.43 ±0.5 IRE subcarrier that has a phase of 60° ±1° relative to the U axis. The rise and fall times of each modulation packet envelope are approximately 1 μs. 4.43 MHZ COLOR BURST (10 CYCLES) 2T 10T 100 IRE 100 80 60 40 20 0 BLANK LEVEL (0 IRE) MICROSECONDS = 12 22 26 30 34 40 44 48 52 56 60 SYNC LEVEL (– 43 IRE) Figure 8.42. United Kingdom (I) PAL National Test Signal #1, with Corresponding IRE Values. Video Test Signals 333 Combination Test Signal NTC-7 Version for NTSC The NTC (U.S. Network Transmission Committee) has also developed a combination test signal that may be used to test several video parameters, rather than using multiple test signals. The NTC-7 combination test signal for NTSC systems (shown in Figure 8.43) consists of a white flag, a multiburst, and a modulated pedestal signal. The white flag has a peak amplitude of 100 ±1 IRE and a width of 4 μs. The multiburst has a 50 ±1 IRE pedestal with peak-to-peak amplitudes of 50 ±0.5 IRE. The starting point of each frequency packet is at zero phase. The width of the 0.5 MHz packet is 5 μs; the width of the remaining packets is 3 μs. The 3-step modulated pedestal is composed of a 50 IRE luminance pedestal, superimposed with three amplitudes of modulated chrominance (20 ±0.5, 40 ±0.5, and 80 ±0.5 IRE peak-to-peak) that have a phase of –90° ±1° relative to the burst. The rise and fall times of each modulation packet envelope are 400 ±25 ns. The NTC-7 combination test signal may be present on line 280. ITU Version for PAL The ITU (BT.473) has developed a combi- nation test signal that may be used to test several video parameters, rather than using multiple test signals. The ITU combination test signal for PAL systems (shown in Figure 8.44) consists of a white flag, a 2T pulse, a 20T modulated chrominance pulse, and a 5-step luminance staircase signal. COLOR BURST 100 IRE MHZ 0.5 1 2 3 3.58 4.2 50 IRE BLANK LEVEL (0 IRE) SYNC LEVEL MICROSECONDS = 12 18 24 28 32 36 40 46 50 54 60 Figure 8.43. NTC-7 Combination Test Signal for NTSC. 334 Chapter 8: NTSC, PAL, and SECAM Overview The line bar has a peak amplitude of 100 ±1 IRE and a width of 10 μs. The 2T pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 200 ±10 ns. The 20T chrominance pulse has a peak amplitude of 100 ±0.5 IRE, with a half-amplitude width of 2.0 ±0.06 μs. The 5-step luminance staircase signal consists of 5 luminance steps, at 20, 40, 60, 80, and 100 ±0.5 IRE. The ITU combination test signal may be present on line 17. ITU ITS Version for PAL The ITU (BT.473) has developed a combi- nation ITS (insertion test signal) that may be used to test several PAL video parameters, rather than using multiple test signals. The ITU combination ITS for PAL systems (shown in Figure 8.45) consists of a 3-step modulated pedestal with peak-to-peak amplitudes of 20, 60, and 100 ±1 IRE, and an extended subcarrier packet with a peak-to-peak amplitude of 60 ±1 IRE. The rise and fall times of each subcarrier packet envelope are approximately 1 μs. The phase of each subcarrier packet is 60° ±1° relative to the U axis. The tolerance on the 50 IRE level is ±1 IRE. The ITU composite ITS may be present on line 331. U.K. Version The United Kingdom allows the use of a slightly different test signal, as shown in Figure 8.46. It may be present on lines 20 and 333 for (I) PAL systems in the United Kingdom. The test signal consists of a 50 IRE luminance bar, part of which has a 100 IRE subcarrier superimposed that has a phase of 60° ±1° relative to the U axis, and an extended burst of subcarrier on the second half of the scan line. 100 IRE 2T 20T 100 IRE 80 4.43 MHZ COLOR BURST (10 CYCLES) 60 40 20 BLANK LEVEL (0 IRE) MICROSECONDS = 12 22 26 32 40 44 48 52 56 SYNC LEVEL (– 43 IRE) 62 Figure 8.44. ITU Combination Test Signal for PAL. 4.43 MHZ COLOR BURST (10 CYCLES) 100 IRE 80 60 40 20 Video Test Signals 335 80 IRE 50 IRE 20 IRE BLANK LEVEL (0 IRE) MICROSECONDS = 12 14 18 22 28 34 SYNC LEVEL (– 43 IRE) 60 Figure 8.45. ITU Combination ITS Test Signal for PAL. 100 IRE 4.43 MHZ COLOR BURST (10 CYCLES) 50 IRE 21.43 IRE BLANK LEVEL (0 IRE) –21.43 IRE MICROSECONDS = 12 14 28 32 34 SYNC LEVEL (– 43 IRE) 62 Figure 8.46. United Kingdom (I) PAL National Test Signal #2. 336 Chapter 8: NTSC, PAL, and SECAM Overview T Pulse Square waves with fast rise times cannot be used for testing video systems, since attenuation and phase shift of out-of-band components cause ringing in the output signal, obscuring the in-band distortions being measured. T, or sin2, pulses are bandwidth-limited, so are used for testing video systems. The 2T pulse is shown in Figure 8.47 and, like the T pulse, is obtained mathematically by squaring a half-cycle of a sine wave. T pulses are specified in terms of half amplitude duration (HAD), which is the pulse width measured at 50% of the pulse amplitude. Pulses with HADs that are multiples of the time interval T are used to test video systems. As seen in Figures 8.39 through 8.44, T, 2T, 12.5T, and 25T pulses are common when testing NTSC video systems, whereas T, 2T, 10T, and 20T pulses are common for PAL video systems. T is the Nyquist interval or 1/2FC where FC is the cutoff frequency of the video system. For NTSC, FC is 4 MHz, whereas FC for PAL systems is 5 MHz. Therefore, T for NTSC systems is 125 ns and for PAL systems it is 100 ns. For a T pulse with a HAD of 125 ns, a 2T pulse has a HAD of 250 ns, and so on. The frequency spectra for the 2T pulse are shown in Figure 8.47 and is representative of the energy content in a typical character generator waveform. To generate smooth rising and falling edges of most video signals, a T step (generated by integrating a T pulse) is typically used. –250 A 50% 0 (A) 250 NS TIME (NS) 250 2A 240 NS A 10% –250 0 (B) 90% TIME (NS) 250 1.0 0.8 0.6 0.4 0.2 0 1 2 3 4 5 (C) FREQUENCY (MHZ) Figure 8.47. The T Pulse. (a) 2T pulse. (b) 2T step. (c) Frequency spectra of the 2T pulse. VBI Data 337 T steps have 10–90% rise/fall times of 0.964T and a well-defined bandwidth. The 2T step generated from a 2T pulse is shown in Figure 8.47. The 12.5T chrominance pulse, illustrated in Figure 8.48, is a good test signal to measure any chrominance-to-luminance timing error since its energy spectral distribution is bunched in two relatively narrow bands. Using this signal detects differences in the luminance and chrominance phase distortion, but not between other frequency groups. VBI Data VBI (vertical blanking interval) data may be inserted up to about five scan lines into the active picture region to ensure it won't be deleted by equipment replacing the VBI, by DSS MPEG which deletes the VBI, or by cable systems inserting their own VBI data. This is common practice by Neilsen and others to ensure their programming and commercial tracking data gets through the distribution systems to the receivers. In most cases, this will be unseen since it is masked by the TV’s overscan. Timecode Two types of timecoding are commonly used, as defined by ANSI/SMPTE 12M and IEC 461: longitudinal timecode (LTC) and vertical interval timecode (VITC). The LTC is recorded on a separate audio track; as a result, the analog VCR must use 0.5 –1562.5 50% 1562.5 NS TIME (NS) 0 1562.5 (A) 0.5 TIME (NS) 1.0 50% 1562.5 NS –1562.5 0 1562.5 TIME (NS) (C) –0.5 (B) Figure 8.48. The 12.5T Chrominance Pulse. (a) Luma component. (b) Chroma component. (c) Addition of (a) and (b). 338 Chapter 8: NTSC, PAL, and SECAM Overview high-bandwidth amplifiers and audio heads. This is due to the timecode frequency increasing as tape speed increases, until the point that the frequency response of the system results in a distorted timecode signal that may not be read reliably. At slower tape speeds, the timecode frequency decreases, until at very low tape speeds or still pictures, the timecode information is no longer recoverable. The VITC is recorded as part of the video signal; as a result, the timecode information is always available, regardless of the tape speed. However, the LTC allows the timecode signal to be written without writing a video signal; the VITC requires the video signal to be changed if a change in timecode information is required. The LTC therefore is useful for synchronizing multiple audio or audio/video sources. Frame Dropping If the field rate is 60/1.001 fields per sec- ond, straight counting at 60 fields per second yields an error of about 108 frames for each hour of running time. This may be handled in one of three ways: Nondrop frame: During a continuous recording, each time count increases by 1 frame. In this mode, the drop frame flag will be a “0.” Drop frame: To minimize the timing error, the first two frame numbers (00 and 01) at the start of each minute, except for minutes 00, 10, 20, 30, 40, and 50, are omitted from the count. In this mode, the drop frame flag will be a “1.” Drop frame for (M) PAL: To minimize the timing error, the first four frame numbers (00 to 03) at the start of every second minute (even minute numbers) are omitted from the count, except for minutes 00, 20, and 40. In this mode, the drop frame flag will be a “1.” Even with drop framing, there is a longterm error of about 2.26 frames per 24 hours. This error accumulation is the reason timecode generators must be periodically reset if they are to maintain any correlation to the correct time-of-day. Typically, this “reset-to-realtime” is referred to as a “jam sync” procedure. Some jam sync implementations reset the timecode to 00:00:00.00 and, therefore, must occur at midnight; others allow a true re-sync to the correct time-of-day. One inherent problem with jam sync correction is the interruption of the timecode. Although this discontinuity may be brief, it may cause timecode readers to hiccup due to the interruption. Longitudinal Timecode (LTC) The LTC information is transferred using a separate serial interface, using the same electrical interface as the AES/EBU digital audio interface standard, and is recorded on a separate track. The basic structure of the time data is based on the BCD system. Tables 8.20 and 8.21 list the LTC bit assignments and arrangement. Note that the 24-hour clock system is used. LTC Timing The modulation technique is such that a transition occurs at the beginning of every bit period. “1” is represented by a second transition one-half a bit period from the start of the bit. “0” is represented when there is no transition within the bit period (see Figure 8.49). The signal has a peak-to-peak amplitude of 0.5– 4.5V, with rise and fall times of 40 ±10 μs (10% to 90% amplitude points). Because the entire frame time is used to generate the 80-bit LTC information, the bitrate (in bits per second) is determined by: VBI Data 339 Bit(s) 0–3 4–7 8–9 10 11 12–15 16–19 20–23 24–26 27 28–31 32–35 36–39 40–42 43 44–47 48–51 52–55 56–57 Function units of frames user group 1 tens of frames flag 1 flag 2 user group 2 units of seconds user group 3 tens of seconds flag 3 user group 4 units of minutes user group 5 tens of minutes flag 4 user group 6 units of hours user group 7 tens of hours Note note 1 note 2 note 3 note 4 Bit(s) 58 59 60–63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Function flag 5 flag 6 user group 8 sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit sync bit Note note 5 note 6 fixed “0” fixed “0” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “1” fixed “0” fixed “1” Notes: 1. Drop frame flag. 525-line and 1125-line systems: “1” if frame numbers are being dropped, “0” if no frame dropping is done. 625-line systems: “0.” 2. Color frame flag. 525-line systems: “1” if even units of frame numbers identify fields 1 and 2 and odd units of field numbers identify fields 3 and 4. 625-line systems: “1” if timecode is locked to the video signal in accordance with 8-field sequence and the video signal has the “preferred subcarrier-to-linesync phase.” 1125-line systems: “0.” 3. 525-line and 1125-line systems: Phase correction. This bit shall be put in a state so that every 80-bit word contains an even number of “0”s. 625-line systems: Binary group flag 0. 4. 525-line and 1125-line systems: Binary group flag 0. 625-line systems: Binary group flag 2. 5. Binary group flag 1. 6. 525-line and 1125-line systems: Binary group flag 2. 625-line systems: Phase correction. This bit shall be put in a state so that every 80-bit word contains an even number of “0”s. Table 8.20. LTC Bit Assignments. 340 Chapter 8: NTSC, PAL, and SECAM Overview Frames (count 0–29 for 525-line and 1125-line systems, 0–24 for 625-line systems) units of frames (bits 0–3) tens of frames (bits 8–9) 4-bit BCD (count 0–9); bit 0 is LSB 2-bit BCD (count 0–2); bit 8 is LSB units of seconds (bits 16–19) tens of seconds (bits 24–26) Seconds 4-bit BCD (count 0–9); bit 16 is LSB 3-bit BCD (count 0–5); bit 24 is LSB units of minutes (bits 32–35) tens of minutes (bits 40–42) Minutes 4-bit BCD (count 0–9); bit 32 is LSB 3-bit BCD (count 0–5); bit 40 is LSB units of hours (bits 48–51) tens of hours (bits 56–57) Hours 4-bit BCD (count 0–9); bit 48 is LSB 2-bit BCD (count 0–2); bit 56 is LSB Table 8.21. LTC Bit Arrangement. "0" "1" Figure 8.49. LTC Data Bit Transition Format. VBI Data 341 FC = 80 FV where FV is the vertical frame rate in frames per second. The 80 bits of timecode information are output serially, with bit 0 being first. The LTC word occupies the entire frame time, and the data must be evenly spaced throughout this time. The start of the LTC word occurs at the beginning of line 5 ±1.5 lines for 525-line systems, at the beginning of line 2 ±1.5 lines for 625-line systems, and at the vertical sync timing reference of the frame ±1 line for 1125line systems. Vertical Inter val Timecode (VITC) The VITC is recorded during the vertical blanking interval of the video signal in both fields. Since it is recorded with the video, it can be read in still mode. However, it cannot be rerecorded (or restriped). Restriping requires dubbing down a generation, deleting, and inserting a new timecode. For YPbPr and Svideo interfaces, VITC is present on the Y signal. For analog RGB interfaces, VITC is present on all three signals. As with the LTC, the basic structure of the time data is based on the BCD system. Tables 8.22 and 8.23 list the VITC bit assignments and arrangement. Note that the 24-hour clock system is used. VITC Cyclic Redundancy Check Eight bits (82–89) are reserved for the code word for error detection by means of cyclic redundancy checking. The generating polynomial, x8 + 1, applies to all bits from 0 to 81, inclusive. Figure 8.50 illustrates implementing the polynomial using a shift register. During passage of timecode data, the multiplexer is in position 0 and the data is output while the CRC calculation is done simultaneously by the shift register. After all the timecode data has been output, the shift register contains the CRC value, and switching the multiplexer to position 1 enables the CRC value to be output. Repeating the process on decoding, the shift register contains all zeros if no errors exist. VITC Timing The modulation technique is such that each state corresponds to a binary state, and a transition occurs only when there is a change in the data between adjacent bits from a “1” to “0” or “0” to “1.” No transitions occur when adjacent bits contain the same data. This is commonly referred to as “non-return to zero” (NRZ). Synchronization bit pairs are inserted throughout the VITC data to assist the receiver in maintaining the correct frequency lock. The bit-rate (FC) is defined to be: FC = 115 FH ± 2% where FH is the horizontal line frequency. The 90 bits of timecode information are output serially, with bit 0 being first. For 625i (576i) systems, lines 19 and 332 (or 21 and 334) are commonly used for the VITC. For 525i (480i) systems, lines 14 and 277 are commonly used. For 1125i (1080i) systems, lines 9 and 571 are commonly used. To protect the VITC against drop-outs, it may also be present two scan lines later, although any two nonconsecutive scan lines per field may be used. Figure 8.51 illustrates the timing of the VITC data on the scan line. The data must be evenly spaced throughout the VITC word. The 10% to 90% rise and fall times of the VITC bit data should be 200 ±50 ns (525-line and 625line systems) or 100 ±25 ns (1125-line systems) before adding it to the video signal to avoid possible distortion of the VITC signal by downstream chrominance circuits. In most circumstances, the analog lowpass filters after the video D/A converters should suffice for the filtering. 342 Chapter 8: NTSC, PAL, and SECAM Overview Bit(s) 0 1 2–5 6–9 10 11 12–13 14 15 16–19 20 21 22–25 26–29 30 31 32–34 35 36–39 40 41 Function sync bit sync bit units of frames user group 1 sync bit sync bit tens of frames flag 1 flag 2 user group 2 sync bit sync bit units of seconds user group 3 sync bit sync bit tens of seconds flag 3 user group 4 sync bit sync bit Note fixed “1” fixed “0” fixed “1” fixed “0” note 1 note 2 fixed “1” fixed “0” fixed “1” fixed “0” note 3 fixed “1” fixed “0” Bit(s) 42–45 46–49 50 51 52–54 55 56–59 60 61 62–65 66–69 70 71 72–73 74 75 76–79 80 81 82–89 Function units of minutes user group 5 sync bit sync bit tens of minutes flag 4 user group 6 sync bit sync bit units of hours user group 7 sync bit sync bit tens of hours flag 5 flag 6 user group 8 sync bit sync bit CRC group Note fixed “1” fixed “0” note 4 fixed “1” fixed “0” fixed “1” fixed “0” note 5 note 6 fixed “1” fixed “0” Notes: 1. Drop frame flag. 525-line and 1125-line systems: “1” if frame numbers are being dropped, “0” if no frame dropping is done. 625-line systems: “0.” 2. Color frame flag. 525-line systems: “1” if even units of frame numbers identify fields 1 and 2 and odd units of field numbers identify fields 3 and 4. 625-line systems: “1” if timecode is locked to the video signal in accordance with 8-field sequence and the video signal has the “preferred subcarrier-to-line-sync phase.” 1125-line systems: “0.” 3. 525-line systems: Field flag. “0” during fields 1 and 3, “1” during fields 2 and 4. 625-line systems: Binary group flag 0. 1125-line systems: Field flag. “0” during field 1, “1” during field 2. 4. 525-line and 1125-line systems: Binary group flag 0. 625-line systems: Binary group flag 2. 5. Binary group flag 1. 6. 525-line and 1125-line systems: Binary group flag 2. 625-line systems: Field flag. “0” during fields 1, 3, 5, and 7, “1” during fields 2, 4, 6, and 8. Table 8.22. VITC Bit Assignments. VBI Data 343 Frames (count 0–29 for 525-line and 1125-line systems, 0–24 for 625-line systems) units of frames (bits 2–5) tens of frames (bits 12–13) 4-bit BCD (count 0–9); bit 2 is LSB 2-bit BCD (count 0–2); bit 12 is LSB units of seconds (bits 22–25) tens of seconds (bits 32–34) Seconds 4-bit BCD (count 0–9); bit 22 is LSB 3-bit BCD (count 0–5); bit 32 is LSB units of minutes (bits 42–45) tens of minutes (bits 52–54) Minutes 4-bit BCD (count 0–9); bit 42 is LSB 3-bit BCD (count 0–5); bit 52 is LSB units of hours (bits 62–65) tens of hours (bits 72–73) Hours 4-bit BCD (count 0–9); bit 62 is LSB 2-bit BCD (count 0–2); bit 72 is LSB Table 8.23. VITC Bit Arrangement. DATA IN XOR DQ DQ DQ DQ DQ DQ DQ DQ 0 MUX 1 DATA + CRC OUT Figure 8.50. VITC CRC Generation. 344 Chapter 8: NTSC, PAL, and SECAM Overview 10 µS MIN (19 BITS) 80 ±10 IRE 63.556 µS (115 BITS) 50.286 µS (90 BITS) 2.1 µS MIN HSYNC 11.2 µS MIN (21 BITS) 78 ±7 IRE 525 / 59.94 SYSTEMS 64 µS (115 BITS) 49.655 µS (90 BITS) HSYNC 1.9 µS MIN HSYNC 2.7 µS MIN (10.5 BITS) 78 ±7 IRE 625 / 50 SYSTEMS 29.63 µS (115 BITS) 23.18 µS (90 BITS) HSYNC 1.5 µS MIN HSYNC 1125 / 59.94 SYSTEMS HSYNC Figure 8.51. VITC Position and Timing. VBI Data 345 User Bits Content user defined 8-bit character set1 user defined reser ved date and time zone3 page / line2 date and time zone3 page / line2 Timecode Referenced to External Clock no no yes unassigned no no yes yes BGF2 0 0 0 0 1 1 1 1 Notes: 1. Conforming to ISO/IEC 646 or 2022. 2. Described in SMPTE 262M. 3. Described in SMPTE 309M. See Tables 8.25 through 8.27. BGF1 0 0 1 1 0 0 1 1 BGF0 0 1 0 1 0 1 0 1 Table 8.24. LTC and VITC Binary Group Flag (BGF) Bit Definitions. 1 2 USER 3 4 GROUPS 5 6 7 8 7–BIT ISO: B1 B2 B3 B4 8–BIT ISO: A1 A2 A3 A4 B5 B6 B7 0 A5 A6 A7 A8 ONE ISO CHARACTER Figure 8.52. Use of Binary Groups to Describe ISO Characters Coded with 7 or 8 Bits. 346 Chapter 8: NTSC, PAL, and SECAM Overview User Group 8 User Group 7 Bit 3 MJD Flag Bit 2 0 Bit 1 Bit 0 Bit 3 Bit 2 Bit 1 time zone offset code 0x00–0x3F Bit 0 Notes: 1. MJD flag: “0” = YYMMDD format, “1” = MJD format. Table 8.25. Date and Time Zone Format Coding. User Group 1 2 3 4 5 6 Assignment D D M M Y Y Value Description 0–9 day units 0–3 day units 0–9 month units 0, 1 month units 0–9 year units 0–9 year units Table 8.26. YYMMDD Date Format. User Bits The binary group flag (BGF) bits shown in Table 8.24 specify the content of the 32 user bits. The 32 user bits are organized as eight groups of four bits each. The user bits are intended for storage of data by users. The 32 bits may be assigned in any manner without restriction, if indicated as user-defined by the binary group flags. If an 8-bit character set conforming to ISO/IEC 646 or 2022 is indicated by the binary group flags, the characters are to be inserted as shown in Figure 8.52. Note that some user bits will be decoded before the binary group flags are decoded; therefore, the decoder must store the early user data before any processing is done. When the user groups are used to transfer time zone and date information, user groups 7 and 8 specify the time zone and the format of the date in the remaining six user groups, as shown in Tables 8.25 and 8.27. The date may be either a six-digit YYMMDD format (Table 8.26) or a six-digit modified Julian date (MJD), as indicated by the MJD flag. CEA-608 Closed Captioning This section reviews CEA-608 closed captioning for the hearing impaired in the United States. Closed captioning and text are transmitted during the blanked active line-time portion of lines 21 and 284. However, due to video editing they may occasionally reside on any line between 21–25 and 284–289. VBI Data 347 Code 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 Hours UTC UTC – 01.00 UTC – 02.00 UTC – 03.00 UTC – 04.00 UTC – 05.00 UTC – 06.00 UTC – 07.00 UTC – 08.00 UTC – 09.00 UTC – 00.30 UTC – 01.30 UTC – 02.30 UTC – 03.30 UTC – 04.30 UTC – 05.30 UTC – 10.00 UTC – 11.00 UTC – 12.00 UTC + 13.00 UTC + 12.00 UTC + 11.00 Code 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B Hours UTC + 10.00 UTC + 09.00 UTC + 08.00 UTC + 07.00 UTC – 06.30 UTC – 07.30 UTC – 08.30 UTC – 09.30 UTC – 10.30 UTC – 11.30 UTC + 06.00 UTC + 05.00 UTC + 04.00 UTC + 03.00 UTC + 02.00 UTC + 01.00 reser ved reser ved TP–3 TP–2 UTC + 11.30 UTC + 10.30 Code 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F Hours UTC + 09.30 UTC + 08.30 UTC + 07.30 UTC + 06.30 TP–1 TP–0 UTC + 12.45 reser ved reser ved reser ved reser ved reser ved user defined unknown UTC + 05.30 UTC + 04.30 UTC + 03.30 UTC + 02.30 UTC + 01.30 UTC + 00.30 Table 8.27. Time Zone Offset Codes. 348 Chapter 8: NTSC, PAL, and SECAM Overview Extended data service (XDS) packets also may be transmitted during the blanked active line-time portion of line 284. XDS packets may indicate the program name, time into the show, time remaining to the end, and so on. Note that due to editing before transmission, it may be possible that the caption information is occasionally moved down a scan line or two. Therefore, caption decoders should monitor more than just lines 21 and 284 for caption information. Waveform The data format for both lines consists of a clock run-in signal, a start bit, and two 7-bit plus parity words of ASCII data (per X3.41967). For YPbPr and S-video interfaces, cap- tioning is present on the Y signal. For analog RGB interfaces, captioning is present on the green channel, at an amplitude 1.7× of that used for composite or Y. Figure 8.53 illustrates the waveform and timing for transmitting the closed captioning and XDS information and conforms to CEA608. The clock run-in is a 7-cycle sinusoidal burst that is frequency-locked and phaselocked to the caption data and is used to provide synchronization for the decoder. The nominal data rate is 32× FH. However, decoders should not rely on this timing relationship due to possible horizontal timing variations introduced by video processing circuitry and VCRs. After the clock run-in signal, the blanking level is maintained for a two data bit duration, followed by a “1” start bit. The start bit is 50 ±2 IRE 10.5 ± 0.25 µS 12.91 µS 7 CYCLES OF 0.5035 MHZ (CLOCK RUN–IN) 3.58 MHZ COLOR BURST (9 CYCLES) BLANK LEVEL 40 IRE SYNC LEVEL 10.003 ± 0.25 µS 27.382 µS TWO 7–BIT + PARITY ASCII CHARACTERS (DATA) S D0–D6 P D0–D6 P T A A A R R R I I T T T Y Y 33.764 µS Figure 8.53. 525-Line Lines 21 and 284 Closed Captioning Timing. 240–288 NS RISE / FALL TIMES (2T BAR SHAPING) VBI Data 349 followed by 16 bits of data, composed of two 7bit + odd parity ASCII characters. Caption data is transmitted using a non–return-to-zero (NRZ) code; a “1” corresponds to the 50±2 IRE level and a “0” corresponds to the blanking level (0–2 IRE). The negative-going crossings of the clock are coherent with the data bit transitions. Typical decoders specify the time between the 50% points of sync and clock run-in to be 10.5 ±0.5 μs, with a ±3% tolerance on FH, 50 ±12 IRE for a “1” bit, and –2 to +12 IRE for a “0” bit. Decoders must also handle bit rise/fall times of 240–480 ns. NUL characters (0x00) should be sent when no display or control characters are being transmitted. This, in combination with the clock run-in, enables the decoder to determine whether captioning or text transmission is being implemented. If using only line 21, the clock run-in and data do not need to be present on line 284. However, if using only line 284, the clock runin and data should be present on both lines 21 and 284; data for line 21 would consist of NUL characters. At the decoder, as shown in Figure 8.54, the display area of a 525-line 4:3 interlaced display is typically 15 rows high and 34 columns wide. The vertical display area begins on lines 43 and 306 and ends on lines 237 and 500. The horizontal display area begins 13 μs and ends 58 μs, after the leading edge of horizontal sync. In text mode, all rows are used to display text; each row contains a maximum of 32 characters, with at least a one-column wide space on the left and right of the text. The only transparent area is around the outside of the text area. In caption mode, text usually appears only on rows 1–4 or 12–15; the remaining rows are usually transparent. Each row contains a maxi- mum of 32 characters, with at least a one-column wide space on the left and right of the text. Some caption decoders support up to 48 columns per row, and up to 16 rows, allowing some customization for the display of caption data. Basic Ser vices There are two types of basic services: text mode (a data service generally not program related) and captioning. In understanding the operation of the decoder, it is easier to visualize an invisible cursor that marks the position where the next character will be displayed. Note that if you are designing a decoder, you should obtain the latest CEA-608 specification to ensure correct operation, as this section is only a summary. Text Mode Text mode, based on real-time scrolling, uses 7–15 rows of the display and is enabled upon receipt of the Resume Text Display or Text Restart code. When text mode has been selected, and the text memory is empty, the cursor starts at the top-most row, character 1 position. Once all the rows of text are displayed, scrolling is enabled. With each carriage return received, the top-most row of text is erased, the remaining text is smoothly rolled up one row (using 6–13 uniform steps over 12–26 fields), the bottom row is erased, and the cursor is moved to the bottom row, character 1 position. If new text is received while scrolling, it is seen scrolling up from the bottom of the display area. If a carriage return is received while scrolling, the rows are immediately moved up one row to their final position. Once the cursor moves to the character 32 position on any row, any text received before a 350 Chapter 8: NTSC, PAL, and SECAM Overview carriage return, preamble address code, or backspace will be displayed at the character 32 position, replacing any previous character at that position. The Text Restart command erases all characters on the display and moves the cursor to the top row, character 1 position. Additional real-time display methods can be optionally implemented by the decoder and used under viewer control. Captioning Mode Captioning has several modes available, including roll-up, pop-on, and paint-on. Roll-up captioning is enabled by receiving one of the miscellaneous control codes to select the number of rows displayed. “Roll-up captions, 2 rows” enables rows 14 and 15; “rollup captions, 3 rows” enables rows 13–15; “roll- up captions, 4 rows” enables rows 12–15. Regardless of the number of rows enabled, the cursor remains on row 15. Once row 15 is full, the rows are scrolled up one row (at the rate of one dot per frame), and the cursor is moved back to row 15, character 1. Pop-on captioning may use rows 1–4 or 12– 15, and is initiated by the Resume Caption Loading command. The display memory is essentially double-buffered. While memory buffer 1 is displayed, memory buffer 2 is being loaded with caption data. At the receipt of an End of Caption code, memory buffer 2 is displayed while memory buffer 1 is being loaded with new caption data. Paint-on captioning, enabled by the Resume Direct Captioning command, is similar to Pop-on captioning, but no double-buffer- LINE COUNT LINE 43 56 69 82 95 108 121 134 147 160 173 186 199 212 225 237 45.02 µS (34 CHARACTERS) ROW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CAPTIONS OR INFOTEXT INFOTEXT ONLY CAPTIONS OR INFOTEXT Figure 8.54. Closed Captioning Display Format. VBI Data 351 ing is used; caption data is loaded directly into display memory. Three types of control codes (preamble address codes, midrow codes, and miscellaneous control codes) are used to specify the format, location, and attributes of the characters. Each control code consists of two bytes, transmitted together on line 21 or line 284. On line 21, they are normally transmitted twice in succession to help ensure correct reception. They are not transmitted twice on line 284 to minimize bandwidth used for captioning. The first byte is a nondisplay control byte with a range of 0x10 to 0x1F; the second byte is a display control byte in the range of 0x20 to 0x7F. At the beginning of each row, a control code is sent to initialize the row. Caption roll-up and text modes allow either a preamble address code or midrow control code at the start of a row; the other caption modes use a preamble address code to initialize a row. The preamble address codes are illustrated in Figure 8.55 and Table 8.28. The midrow codes are typically used within a row to change the color, italics, underline, and flashing attributes and should occur only between words. Color, italics, and underline are controlled by the preamble address and midrow codes; flash on is controlled by a miscellaneous control code. An attribute remains in effect until another control code is received or the end of row is reached. Each row starts with a control code to set the color and underline attributes (white nonunderlined is the default if no control code is received before the first character on an empty row). The color attribute can be changed only by the midrow code of another color; the italics attribute does not change the color attribute. However, a color attribute turns off the italics attribute. The flash on command does not alter the status of the color, italics, or underline attributes. However, a color or italics midrow control code turns off the flash. Note that the underline color is the same color as the character being underlined; the underline resides on PREAMBLE CONTROL CODE (TRANSMITTED TWICE) CAPTION TEXT UP TO 32 CHARACTERS PER ROW START BIT NON-DISPLAY CONTROL CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT DISPLAY CONTROL CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT START BIT FIRST TEXT CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT SECOND TEXT CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT IDENTIFICATION CODE, ROW POSITION, INDENT, AND DISPLAY CONDITION INSTRUCTIONS BEGINNING OF DISPLAYED CAPTION Figure 8.55. Closed Captioning Preamble Address Code Format. 352 Chapter 8: NTSC, PAL, and SECAM Overview Non-display Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 10 001 11 10 010 11 10 101 11 10 110 0 0 1 CH 1 1 ABCDU 10 111 11 00010 10 011 11 10 100 11 Notes: 1. U: “0” = no underline, “1” = underline. 2. CH: “0” = data channel 1, “1” = data channel 2. Row Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ABCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Attribute white green blue cyan red yellow magenta white italics indent 0, white indent 4, white indent 8, white indent 12, white indent 16, white indent 20, white indent 24, white indent 28, white Table 8.28. Closed Captioning Preamble Address Codes. In text mode, the indent codes may be used to perform indentation; in this instance, the row information is ignored. VBI Data 353 dot row 11 and covers the entire width of the character column. Table 8.29, Figure 8.56, and Table 8.30 illustrate the midrow and miscellaneous control code operation. For example, if it were the end of a caption, the control code could be End of Caption (transmitted twice). It could be followed by a preamble address code (transmitted twice) to start another line of captioning. Characters are displayed using a dot matrix format. Each character cell is typically 16 samples wide and 26 samples high (16 × 26), as shown in Figure 8.57. Dot rows 2–19 are usually used for actual character outlines. Dot rows 0, 1, 20, 21, 24, and 25 are usually blanked to provide vertical spacing between characters, and underlining is typically done on dot rows 22 and 23. Dot columns 0, 1, 14 and 15 are blanked to provide horizontal spacing between characters, except on dot rows 22 and 23 when the underline is displayed. This results in 12 × 18 characters stored in character ROM. Table 8.31 shows the basic character set. Some caption decoders support multiple character sizes within the 16 × 26 region, including 13 × 16, 13 × 24, 12 × 20, and 12 × 26. Not all combinations generate a sensible result due to the limited display area available. Optional Captioning Features Three sets of optional features are avail- able for advanced captioning decoders. Optional Attributes Additional color choices are available for advanced captioning decoders, as shown in Table 8.32. If a decoder doesn’t support semitransparent colors, the opaque colors may be used instead. If a specific background color isn’t supported by a decoder, it should default to the Non-display Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 000 001 010 011 0 0 1 CH 0 0 1 0 1 0 U 100 101 110 111 Attribute white green blue cyan red yellow magenta italics Notes: 1. U: “0” = no underline, “1” = underline. 2. CH: “0” = data channel 1, “1” = data channel 2. 3. Italics is implemented as a two-dot slant to the right over the vertical range of the character. Some decoders implement a one-dot slant for every four scan lines. Underline resides on dot rows 22 and 23, and covers the entire column width. Table 8.29. Closed Captioning Midrow Codes. 354 Chapter 8: NTSC, PAL, and SECAM Overview TEXT MID–ROW CONTROL CODE (TRANSMITTED TWICE) START BIT TEXT CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT TEXT CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT START BIT NON-DISPLAY CONTROL CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT DISPLAY CONTROL CHARACTER (7 BITS LSB FIRST) ODD PARITY BIT Figure 8.56. Closed Captioning Midrow Code Format. Miscellaneous control codes may also be transmitted in place of the midrow control code. Non-display Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 0000 0001 0010 0011 0100 0101 0110 0111 0 0 1 CH 1 0 F 0 1 0 1000 1001 1010 1011 1100 1101 1110 1111 0001 0 0 1 CH 1 1 1 0 1 0 0 0 1 0 0011 Command resume caption loading backspace reser ved reser ved delete to end of row roll-up captions, 2 rows roll-up captions, 3 rows roll-up captions, 4 rows flash on resume direct captioning text restart resume text display erase displayed memory carriage return erase nondisplayed memory end of caption (flip memories) tab offset (1 column) tab offset (2 columns) tab offset (3 columns) Notes: 1. F: “0” = line 21, “1” = line 284. CH: “0” = data channel 1, “1” = data channel 2. 2. “Flash on” blanks associated characters for 0.25 seconds once per second. Table 8.30. Closed Captioning Miscellaneous Control Codes. VBI Data 355 LINE 43 LINE 306 LINE 44 LINE 307 LINE 45 LINE 308 LINE 46 LINE 309 LINE 47 LINE 310 LINE 48 LINE 311 LINE 49 LINE 312 LINE 50 LINE 313 LINE 51 LINE 314 LINE 52 LINE 315 LINE 53 LINE 316 LINE 54 LINE 317 LINE 55 LINE 318 DOT ROW 0 2 4 6 8 10 12 14 16 18 20 22 24 BLANK DOT CHARACTER DOT UNDERLINE Figure 8.57. Typical 16×26 Closed Captioning Character Cell Format for Row 1. 356 Chapter 8: NTSC, PAL, and SECAM Overview Nondisplay Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 0000 0001 0010 0011 0100 0101 0110 0111 0 0 1 CH 0 0 1 0 1 1 1000 1001 1010 1011 1100 1101 1110 1111 Special Characters ® ° 1/2 ¿ ™ ¢ £ music note à transparent space è â ê î ô û D6 D5 D4 D3 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 D2 D1 D0 000 ( 0 8 @H P X ú h p x 001 ! ) 1 9 A I QY a i q y 010 “ á 2 : BJRZb j r z 011 # + 3 ; CKS [ c k s ç 100 $ , 4 FNV í f n v ñ 111 ‘ / 7 ? G OW ó g o w Table 8.31. Closed Captioning Basic Character Set. VBI Data 357 Non-display Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 Background Attribute 000 001 010 011 0 0 1 CH 0 0 0 0 1 0 T 100 101 110 111 0 0 1 CH 1 1 1 0 1 0 1 1 0 1 D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 0 0 0 1 CH 1 1 1 0 1 0 1 1 1 1 white green blue cyan red yellow magenta black transparent Foreground Attribute black black underline Notes: 1. F: “0” = opaque, “1” = semi-transparent. 2. CH: “0” = data channel 1, “1” = data channel 2. 3. Underline resides on dot rows 22 and 23, and covers the entire column width. Table 8.32. Closed Captioning Optional Attribute Codes. black background color. However, if the black foreground color is supported in a decoder, all the background colors should be implemented. A background attribute appears as a standard space on the display, and the attribute remains in effect until the end of the row or until another background attribute is received. The foreground attributes provide an eighth color (black) as a character color. As with midrow codes, a foreground attribute code turns off italics and blinking, and the least significant bit controls underlining. Background and foreground attribute codes have an automatic backspace for backward compatibility with current decoders. Thus, an attribute must be preceded by a standard space character. Standard decoders dis- play the space and ignore the attribute. Extended decoders display the space, and on receiving the attribute, backspace, then display a space that changes the color and opacity. Thus, text formatting remains the same regardless of the type of decoder. Optional Closed Group Extensions To support new features and characters not defined by the current standard, the CEA maintains a set of code assignments requested by various caption providers and decoder manufacturers. These code assignments (currently used to select various Asian character sets) are not compatible with caption decoders in the United States and videos using them should not be distributed in the U.S. market. 358 Chapter 8: NTSC, PAL, and SECAM Overview Closed group extensions require two bytes. Table 8.33 lists the currently assigned closed group extensions to support captioning in the Asian languages. Optional Extended Characters An additional 64 accented characters (eight character sets of eight characters each) may be supported by decoders, permitting the display of other languages such as Spanish, French, Portuguese, German, Danish, Italian, Finnish, and Swedish. If supported, these accented characters are available in all caption and text modes. Each of the extended characters incorporates an automatic backspace for backward compatibility with current decoders. Thus, an extended character must be preceded by the standard ASCII version of the character. Standard decoders display the ASCII character and ignore the accented character. Extended decoders display the ASCII character, and on receiving the accented character, backspace, then display the accented character. Thus, text formatting remains the same regardless of the type of decoder. Extended characters require two bytes. The first byte is 0x12 or 0x13 for data channel one (0x1A or 0x1B for data channel two), followed by a value of 0x20–0x3F. Extended Data Ser vices Line 284 may contain extended data ser- vice information, interleaved with the caption and text information, as bandwidth is available. In this case, control codes are not transmitted Non-display Control Byte Display Control Byte D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 D1 D0 Background Attribute 0100 0101 0110 0111 0 0 1 CH 1 1 1 0 1 0 1000 1001 standard character set (normal size) standard character set (double size) first private character set second private character set People’s Republic of China character set (GB 2312) Korean Standard character set (KSC 5601-1987) 1 0 1 0 first registered character set Notes: 1. CH: “0” = data channel 1, “1” = data channel 2. Table 8.33. Closed Captioning Optional Closed Group Extensions. VBI Data 359 twice, as they may be for the caption and text ser vices. Information is transmitted as packets and operates as a separate unique data channel. Data for each packet may or may not be contiguous and may be separated into subpackets that can be inserted anywhere space is available in the line 284 information stream. There are four types of extended data characters: Control: Control characters are used as a mode switch to enable the extended data mode. They are the first character of two and have a value of 0x01 to 0x0F. Type: Type characters follow the control character (thus, they are the second character of two) and identify the packet type. They have a value of 0x01 to 0x0F. Checksum: Checksum characters always follow the “end of packet” control character. Thus, they are the second character of two and have a value of 0x00 to 0x7F. Informational: These characters may be ASCII or non-ASCII data. They are transmitted in pairs up to and including 32 characters. A NUL character (0x00) is used to ensure pairs of characters are always sent. Control Characters Table 8.34 lists the control codes. Current class describes a program currently being transmitted. Future programming describes a program to be transmitted later. It contains the same information and formats as the current class. Channel class describes non-programspecific information about the channel. Miscellaneous describes miscellaneous information. Public service class transmits data or messages of a public service nature. Private data class is used in proprietary systems for whatever that system wishes. Type Definitions (Current Class and Future Programming) Program Identification Number (0x01) This packet uses four characters to specify a scheduled start time and date relative to Coordinated Universal Time (UTC). The format is shown in Table 8.35. Minutes have a range of 0–59. Hours have a range of 0–23. Dates have a range of 1–31. Months have a range of 1–12. “T” indicates if a program is routinely tape delayed for the Mountain and Pacific time zones. The “D,” “L,” and “Z” bits are ignored by the decoder. When all characters are a “1,” it indicates the end of the current program. Length / Time-in-Show (0x02) This packet has 2, 4, or 6 characters and indicates the scheduled length of the program and elapsed time for the program. The format is shown in Table 8.36. Minutes and seconds have a range of 0–59. Hours have a range of 0–63. Program Name (0x03) This packet contains 2–32 ASCII charac- ters that specify the title of the program. Program Type (0x04) This packet contains 2–32 characters that specify the type of program. Each character is coded to a keyword, as shown in Table 8.37. Content Advisory (0x05) This packet, commonly referred to regard- ing the “V-chip,” contains the information shown in Table 8.38 to indicate the program rating. FV indicates if fantasy violence is present. V indicates if violence is present. S indicates if sexual situations are present. L indicates if 360 Chapter 8: NTSC, PAL, and SECAM Overview Control Code 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F Function start continue start continue start continue start continue start continue start continue start continue end Class current future channel information miscellaneous public service reser ved private data all Table 8.34. CEA-608 Control Codes. D6 D5 D4 D3 D2 D1 D0 1 m5 m4 m3 m2 m1 m0 1 D h4 h3 h2 h1 h0 1 L d4 d3 d2 d1 d0 1 Z T m3 m2 m1 m0 Character minute hour date month Table 8.35. CEA-608 Program Identification Number Format. VBI Data 361 D6 D5 D4 D3 D2 D1 D0 1 m5 m4 m3 m2 m1 m0 1 h5 h4 h3 h2 h1 h0 1 m5 m4 m3 m2 m1 m0 1 h5 h4 h3 h2 h1 h0 1 s5 s4 s3 s2 s1 s0 0000000 Character length, minute length, hour elapsed time, minute elapsed time, hour elapsed time, second null character Table 8.36. CEA-608 Length / Time-in-Show Format. Code (hex) 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F Keyword education enter tainment movie news religious spor ts other action advertisement animated anthology automobile awards baseball basketball bulletin hockey home horror information instruction international inter view language legal live local math medical meeting militar y miniseries Code (hex) 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F Keyword business classical college combat comedy commentar y concert consumer contemporar y crime dance documentar y drama elementar y erotica exercise music myster y national nature police politics premiere prerecorded product professional public racing reading repair repeat review Code (hex) 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F Table 8.37. CEA-608 Program Types. Keyword fantasy farm fashion fiction food football foreign fund raiser game/quiz garden golf government health high school histor y hobby romance science series ser vice shopping soap opera special suspense talk technical tennis travel variety video weather western 362 Chapter 8: NTSC, PAL, and SECAM Overview adult language is present. D indicates if sexually suggestive dialog is present. Audio Services (0x06) This packet contains two characters as shown in Table 8.39 to indicate the audio language and type available. Caption Services (0x07) This packet contains 2–8 characters as shown in Table 8.40 to indicate the program caption services available. L2–L0 are coded as shown in Table 8.39. Copy and Redistribution Control Packet (0x08) This CGMS-A (Copy Generation Manage- ment System—Analog) and Redistribution Control Descriptor (RCD) packet contains 2 characters as shown in Table 8.41. In the case where either B3 or B4 is a “0,” there is no Analog Protection Service (B1 and B2 are “0”). B0 is the analog source bit. When RCD is a “1,” control of consumer redistribution has been signaled in some manner, such as the presence of the ATSC Redistribution Control Descriptor. Composite Packet-1 (0x0C) This packet is a way of conveying several packets as a single group. It contains the Program Type (5 characters), Content Advisory (1 character), Length (2 characters), Time-inShow (2 characters), and Program Name (0–22 characters), Composite Packet-2 (0x0D) This packet is a way of conveying several packets as a single group. It contains the Program ID (4 characters), Audio Services (2 characters), Caption Services (2 characters), Call Letters (4 characters), Native Channel (2 characters), and Network Name (0–18 characters). Program Description Row 1 to Row 8 (0x10– 0x17) This packet contains 1–8 packet rows, with each packet row containing 0–32 ASCII characters. A packet row corresponds to a line of text on the display. Each packet is used in numerical sequence, and if a packet contains no ASCII characters, a blank line will be displayed. Type Definitions (Channel Information Class) Network Name (0x01) This packet uses 2–32 ASCII characters to specify the network name. Call Letters and Native Channel (0x02) This packet uses four or six ASCII charac- ters to specify the call letters of the channel. When six characters are used, they reflect the over-the-air channel number (2–69) assigned by the FCC. Single-digit channel numbers are preceded by a zero or a null character. Tape Delay (0x03) This packet uses two characters to specify the number of hours and minutes the local station typically delays network programs. The format of this packet is shown in Table 8.42. Minutes have a range of 0–59. Hours have a range of 0–23. This delay applies to all programs on the channel that have the “T” bit set in their Program ID packet (Table 8.35). Transmission Signal Identifier (0x04) This packet contains four characters that convey the unique 16-bit Transmission Signal Identifier (TSID) assigned to the originating analog licensee. The format of this packet is shown in Table 8.43. VBI Data 363 D6 D5 D4 D3 D2 D1 D0 1 D / a2 a1 a0 r2 r1 r0 1 V / FV S L / a3 g2 g1 g0 r2–r0: MPA rating 000 not applicable 001 G 010 PG 011 PG-13 100 R 101 NC-17 110 X 111 not rated g2–g0: U.S. TV rating 000 not rated 001 TV-Y 010 TV-Y7 011 TV-G 100 TV-PG 101 TV-14 110 TV-MA 111 not rated a3–a0: xxx0 MPA rating LD01 U.S. TV parental guidelines 0011 Canada English language rating 0111 Canada French language rating 1011 reserved 1111 reserved g2–g0: Canada English language rating 000 E 001 C 010 C8 + 011 G 100 PG 101 14 + 110 18 + 111 reserved g2–g0: Canada French language rating 000 E 001 G 010 8 ans + 011 13 ans + 100 16 ans + 101 18 ans + 110 reserved 111 reserved Table 8.38. CEA-608 Content Advisory Format. D6 D5 D4 D3 D2 D1 D0 1 L2 L1 L0 T2 T1 T0 1 L2 L1 L0 S2 S1 S0 Character main audio program second audio program (SAP) L2–L0: 000 unknown 001 english 010 spanish 011 french 100 german 101 italian 110 other 111 none T2–T0: 000 unknown 001 mono 010 simulated stereo 011 true stereo 100 stereo surround 101 data service 110 other 111 none S2–S0: 000 unknown 001 mono 010 video descriptions 011 non-program audio 100 special effects 101 data service 110 other 111 none Table 8.39. CEA-608 Audio Services Format. 364 Chapter 8: NTSC, PAL, and SECAM Overview D6 D5 D4 D3 D2 D1 D0 1 L2 L1 L0 F C T Character service code FCT: 000 001 010 011 100 101 110 111 line 21, data channel 1 captioning line 21, data channel 1 text line 21, data channel 2 captioning line 21, data channel 2 text line 284, data channel 1 captioning line 284, data channel 1 text line 284, data channel 2 captioning line 284, data channel 2 text Table 8.40. CEA-608 Caption Services Format. D6 D5 D4 D3 D2 D1 D0 1 0 B4 B3 B2 B1 B0 1 0 0 0 0 0 RCD B4–B3 CGMS–A Services: 00 copying permitted without restriction 01 no more copies 10 one generation copy allowed 11 no copying permitted B2–B1 Analog Protection Services (APS) 00 no Analog Protection Service 01 PSP on, color striping off 10 PSP on, 2-line color striping on 11 PSP on, 4-line color striping on Table 8.41. CEA-608 Copy and Redistribution Control Packet Format. VBI Data 365 D6 D5 D4 D3 D2 D1 D0 1 m5 m4 m3 m2 m1 m0 1 – h4 h3 h2 h1 h0 Character minute hour Table 8.42. CEA-608 Tape Delay Format. D6 D5 D4 D3 D2 D1 D0 1 – – t3 t2 t1 t0 1 – – t7 t5 t5 t4 1 – – t11 t10 t9 t8 1 – – t15 t14 t13 t12 Character TSID 0 TSID 1 TSID 2 TSID 4 Table 8.43. CEA-608 Transmission Signal Identifier (TSID) Format. D6 D5 D4 D3 D2 D1 D0 1 m5 m4 m3 m2 m1 m0 1 D h4 h3 h2 h1 h0 1 L d4 d3 d2 d1 d0 1 Z T m3 m2 m1 m0 1 – – – D2 D1 D0 1 Y5 Y4 Y3 Y2 Y1 Y0 Character minute hour date month day year Table 8.44. CEA-608 Time of Day Format. 366 Chapter 8: NTSC, PAL, and SECAM Overview Type Definitions (Miscellaneous) Time of Day (0x01) This packet uses six characters to specify the current time of day, month, and date relative to Coordinated Universal Time (UTC). The format is shown in Table 8.44. Minutes have a range of 0–59. Hours have a range of 0–23. Dates have a range of 1–31. Months have a range of 1–12. Days have a range of 1 (Sunday) to 7 (Saturday). Years have a range of 0–63 (added to 1990). “T” indicates if a program is routinely tape delayed for the Mountain and Pacific time zones. “D” indicates whether daylight savings time currently is being observed. “L” indicates whether the local day is February 28th or 29th when it is March 1st UTC. “Z” indicates whether the seconds should be set to zero (to allow calibration without having to transmit the full 6 bits of seconds data). Impulse Capture ID (0x02) This packet carries the program start time and length, and can be used to tell a VCR to record this program. The format is shown in Table 8.45. Start and length minutes have a range of 0–59. Start hours have a range of 0–23; length hours have a range of 0–63. Dates have a range of 1–31. Months have a range of 1–12. “T” indicates if a program is routinely tape delayed for the Mountain and Pacific time zones. The “D,” “L,” and “Z” bits are ignored by the decoder. Supplemental Data Location (0x03) This packet uses 2–32 characters to specify other lines where additional VBI data may be found. Table 8.46 shows the format. “F” indicates field one (“0”) or field two (“1”). N may have a value of 7–31, and indicates a specific line number. Local Time Zone and DST Use (0x04) This packet uses two characters to specify the viewer time zone and whether the locality observes daylight savings time. The format is shown in Table 8.47. Hours have a range of 0–23. This is the nominal time zone offset, in hours, relative to UTC. “D” is a “1” when the area is using daylight savings time. Out-of-Band Channel Number (0x40) This packet uses two characters to specify a channel number to which all subsequent outof-band packets refer. This is the CATV channel number to which any following out-of-band packets belong to. The format is shown in Table 8.48. Channel Map Pointer (0x41) This packet uses two characters to specify the channel number containing the Channel Map Header and Channel Map packets. Channel Map Header Packet (0x42) This packet uses four characters to specify the number of channels in the channel map and current version number for the current map. Channel Map Packet (0x43) This packet uses two or four characters to specify the user channel number and its corresponding tuner channel number. Up to 6 optional closed caption characters are included to convey the user channel’s call letters or network ID. VBI Data 367 D6 D5 D4 D3 D2 D1 D0 1 m5 m4 m3 m2 m1 m0 1 D h4 h3 h2 h1 h0 1 L d4 d3 d2 d1 d0 1 Z T m3 m2 m1 m0 1 m5 m4 m3 m2 m1 m0 1 h5 h4 h3 h2 h1 h0 Character start, minute start, hour start, date start, month length, minute length, hour Table 8.45. CEA-608 Impulse Capture ID Format. D6 D5 D4 D3 D2 D1 D0 1 F N4 N3 N2 N1 N0 Character location Table 8.46. CEA-608 Supplemental Data Location Format. D6 D5 D4 D3 D2 D1 D0 1 D h4 h3 h2 h1 h0 0000000 Character hour null Table 8.47. CEA-608 Local Time Zone and DST Use Format. D6 D5 D4 D3 D2 D1 D0 1 c5 c4 c3 c2 c1 c0 1 c11 c10 c9 c8 c7 c6 Character channel low channel high Table 8.48. CEA-608 Out-of-Band Channel Number Format. 368 Chapter 8: NTSC, PAL, and SECAM Overview Type Definitions (Public Service Class) National Weather Service Code (0x01) This packet conveys a weather-related emergency broadcast message that indicates the category, affected counties, and expiration time. National Weather Service Message (0x02) This packet conveys up to 32 characters of an actual text message as delivered by the National Weather Service. Caption (CC) and Text (T) Channels CC1, CC2, T1, and T2 are on line 21. CC3, CC4, T3, and T4 are on line 284. A fifth channel on line 284 carries the Extended Data Services. T1-T4 are similar to CC1-CC4, but take over all or half of the screen to display scrolling text information. CC1 is usually the main caption channel. CC2 or CC3 is occasionally used for supporting a second language version. Closed Captioning for PAL For (M) PAL, caption data may be present on lines 18 and 281; however, it may occasionally reside on any line between 18–22 and 281– 285 due to editing. For (B, D, G, H, I, N, NC) PAL videotapes, caption data may be present on lines 22 and 335; however, it may occasionally reside on any line between 22–26 and 335–339 due to editing. The data format, amplitudes, and rise and fall times match those used in the United States. The timing, as shown in Figure 8.58, is slightly different due to the 625-line horizontal timing. 50 ±2 IRE 10.5 ± 0.25 µS 13.0 µS 7 CYCLES OF 0.500 MHZ (CLOCK RUN–IN) 4.43 MHZ COLOR BURST (10 CYCLES) BLANK LEVEL 43 IRE SYNC LEVEL 10.00 ± 0.25 µS 27.5 µS TWO 7–BIT + PARITY ASCII CHARACTERS (DATA) S D0–D6 P D0–D6 P T A A A R R R I I T T T Y Y 34.0 µS Figure 8.58. 625-Line Lines 22 and 335 Closed Captioning Timing. 240–288 NS RISE / FALL TIMES (2T BAR SHAPING) VBI Data 369 Widescreen Signaling and CGMS To facilitate the handling of various aspect ratios of program material received by TVs, a widescreen signaling (WSS) system has been developed. This standard allows a WSSenhanced 16:9 TV to display programs in their correct aspect ratio. 625i Systems 625i (576i) systems are based on ITU-R BT.1119 and ETSI EN 300 294. For YPbPr and S-video interfaces, WSS is present on the Y signal. For analog RGB interfaces, WSS is present on all three signals. The Analog Copy Generation Management System (CGMS-A) is also supported by the WSS signal. Data Timing For (B, D, G, H, I, N, NC) PAL, WSS data is normally on line 23, as shown in Figure 8.59. However, due to video editing, WSS data may reside on any line between 23–27. The clock frequency is 5 MHz (±100 Hz). The signal waveform should be a sine-squared pulse, with a half-amplitude duration of 200 ±10 ns. The signal amplitude is 500 mV ±5%. The NRZ data bits are processed by a biphase code modulator, such that one data period equals 6 elements at 5 MHz. Data Content The WSS consists of a run-in code, a start code, and 14 bits of data, as shown in Table 8.49. Run-In The run-in consists of 29 elements at 5 MHz of a specific sequence, shown in Table 8.49. Start Code The start code consists of 24 elements at 5 MHz of a specific sequence, shown in Table 8.49. Group A Data The group A data consists of 4 data bits that specify the aspect ratio. Each data bit generates 6 elements at 5 MHz. b0 is the LSB. 500 MV ±5% COLOR BURST BLANK LEVEL RUN IN START CODE 29 5 MHZ ELEMENTS 24 5 MHZ ELEMENTS DATA (B0 - B13) 84 5 MHZ ELEMENTS 43 IRE SYNC LEVEL 11.00 ± 0.25 µS 27.4 µS Figure 8.59. 625-Line Line 23 WSS Timing. 190–210 NS RISE / FALL TIMES (2T BAR SHAPING) 370 Chapter 8: NTSC, PAL, and SECAM Overview run-in start code group A (aspect ratio) group B (enhanced services) group C (subtitles) group D (reser ved) 29 elements at 5 MHz 24 elements at 5 MHz 24 elements at 5 MHz “0” = 000 111 “1” = 111 000 24 elements at 5 MHz “0” = 000 111 “1” = 111 000 18 elements at 5 MHz “0” = 000 111 “1” = 111 000 18 elements at 5 MHz “0” = 000 111 “1” = 111 000 1 1111 0001 1100 0111 0001 1100 0111 (0x1F1C 71C7) 0001 1110 0011 1100 0001 1111 (0x1E 3C1F) b0, b1, b2, b3 b4, b5, b6, b7 (b7 = “0” since reserved) b8, b9, b10 b11, b12, b13 Table 8.49. 625-Line WSS Information. b0, b1, b2, b3 0001 1000 0100 1101 0010 1011 0111 1110 Aspect Ratio Label 4:3 14:9 14:9 16:9 16:9 > 16:9 14:9 16:9 Format full format letterbox letterbox letterbox letterbox letterbox full format full format (anamorphic) Position on 4:3 Display – center top center top center center – Active Lines 576 504 504 430 430 – 576 576 Minimum Requirements case 1 case 2 case 2 case 3 case 3 case 4 – – Table 8.50. 625-Line WSS Group A (Aspect Ratio) Data Bit Assignments and Usage. VBI Data 371 Table 8.50 lists the data bit assignments and usage. The number of active lines listed in Table 8.50 are for the exact aspect ratio (a = 1.33, 1.56, or 1.78). The aspect ratio label indicates a range of possible aspect ratios (a) and number of active lines: 4:3 14:9 16:9 >16:9 a ≤ 1.46 1.46 < a ≤ 1.66 1.66 < a ≤ 1.90 a > 1.90 527–576 463–526 405–462 < 405 To allow automatic selection of the display mode, a 16:9 receiver should support the following minimum requirements: Case 1: The 4:3 aspect ratio picture should be centered on the display, with black bars on the left and right sides. Case 2: The 14:9 aspect ratio picture should be centered on the display, with black bars on the left and right sides. Alternately, the picture may be displayed using the full display width by using a small (typically 8%) horizontal geometrical error. Case 3: The 16:9 aspect ratio picture should be displayed using the full width of the display. Case 4: The >16:9 aspect ratio picture should be displayed as in Case 3 or use the full height of the display by zooming in. Group B Data The group B data consists of four data bits that specify enhanced services. Each data bit generates six elements at 5 MHz. Data bit b4 is the LSB. Bits b5 and b6 are used for PALplus. b4: mode 0 camera mode 1 film mode b5: color encoding 0 normal PAL 1 Motion Adaptive ColorPlus b6: helper signals 0 not present 1 present Group C Data The group C data consists of three data bits that specify subtitles. Each data bit generates six elements at 5 MHz. Data bit b8 is the LSB. b8: teletext subtitles 0 no 1 yes b9, b10: open subtitles 00 no 01 outside active picture 10 inside active picture 11 reserved Group D Data The group D data consists of three data bits that specify surround sound and copy protection. Each data bit generates six elements at 5 MHz. Data bit b11 is the LSB. b11: surround sound 0 no 1 yes b12: copyright 0 no copyright asserted or unknown 1 copyright asserted b13: copy protection 0 copying not restricted 1 copying restricted 372 Chapter 8: NTSC, PAL, and SECAM Overview 525i Systems EIA-J CPR–1204 and IEC 61880 define a widescreen signaling standard for 525i (480i) systems. For YPbPr and S-video interfaces, WSS is present on the Y signal. For analog RGB interfaces, WSS is present on all three signals. Data Timing Lines 20 and 283 are used to transmit the WSS information, as shown in Figure 8.60. However, due to video editing, it may reside on any line between 20–24 and 283–287. The clock frequency is FSC/8 or about 447.443 kHz; FSC is the color subcarrier frequency of 3.579545 MHz. The signal waveform should be a sine-squared pulse, with a halfamplitude duration of 2.235 μs ±50 ns. The signal amplitude is 70 ±10 IRE for a “1,” and 0 ±5 IRE for a “0.” Data Content The WSS consists of 2 bits of start code, 14 bits of data, and 6 bits of CRC, as shown in Table 8.51. The CRC used is X6 + X + 1, all pre- set to “1.” Start Code The start code consists of a “1” data bit fol- lowed by a “0” data bit, as shown in Table 8.51. Word 0 Data Word 0 data consists of 2 data bits: b0, b1: 00 01 10 11 4:3 aspect ratio 4:3 aspect ratio 16:9 aspect ratio reser ved normal letterbox anamorphic Word 1 Data Word 1 data consists of 4 data bits: b2, b3, b4, b5: 0000 copy control information 1111 default Copy control information is transmitted in Word 2 data when Word 1 data is “0000.” When copy control information is not to be transferred, Word 1 data must be set to the default value “1111.” Word 2 Data Word 2 data consists of 14 data bits. When Word 1 data is “0000,” Word 2 data consists of copy control information. Word 2 copy control data must be transferred at the rate of two or more frames per two seconds. Bits b6 and b7 specify the copy generation management system in an analog signal (CGMS-A). CGMS-A consists of two bits of digital information: b6, b7: 00 01 10 11 copying permitted reser ved one copy permitted no copying permitted This CGMS-A information must also usually be conveyed via the line 284 Extended Data Services Copy and Redistribution Control packet discussed in the closed captioning section. Bits b8 and b9 specify the Analog Protection Service (APS) added to the analog NTSC or PAL video signal: b8, b9: 00 01 10 11 no Analog Protection Service PSP on, color striping off PSP on, 2-line color striping on PSP on, 4-line color striping on VBI Data 373 70 ±10 IRE COLOR BURST START START CODE CODE "1" "0" DATA (B0 - B19) BLANK LEVEL 40 IRE SYNC LEVEL 11.20 ± 0.30 µS 49.1 ± 0.44 µS Figure 8.60. 525-Line Lines 20 and 283 WSS Timing. 2235 ±50 NS RISE / FALL TIMES (2T BAR SHAPING) start code start code word 0 word 1 word 2 CRC “1” “0” b0, b1 b2, b3, b4, b5 b6, b7, b8, b9, b10, b11, b12, b13 b14, b15, b16, b17, b18, b19 Table 8.51. 525-Line WSS Data Bit Assignments and Usage. 374 Chapter 8: NTSC, PAL, and SECAM Overview PSP is a pseudo-sync pulse operation that, if on, will be present on the composite, S-video, and Y (of YPbPr) analog video outputs. Color striping operation inverts the normal phase of the first half of the color burst signal on certain scan lines on the composite and S-video analog video outputs. This Analog Protection Service (APS) information must also usually be conveyed via the line 284 Extended Data Services Copy and Redistribution Control packet discussed in the closed captioning section. Bit b10 specifies whether the source originated from an analog pre-recorded medium. b10: 0 not analog pre-recorded medium 1 analog pre-recorded medium Bits b11, b12, and b13 are reserved and are “000.” Teletext Teletext allows the transmission of text, graphics, and data. Data may be transmitted on any line, although the VBI interval is most commonly used. The teletext standards are specified by ETSI EN 300 706, ITU-R BT.653, and EIA-516. For YPbPr and S-video interfaces, teletext is present on the Y signal. For analog RGB interfaces, teletext is present on all three signals. There are many systems that use the teletext physical layer to transmit proprietary information. The advantage is that teletext has already been approved in many countries for broadcast, so certification for a new transmission technique is not required. The data rate for teletext is much higher than that used for closed captioning, approaching up to 7 Mbps in some cases. Therefore, ghost cancellation is needed to recover the transmitted data reliably. There are seven teletext systems defined, as shown in Table 8.52. System B (also known as World System Teletext, or WST) has become the de facto standard and most widely adopted solution. Parameter bit-rate (Mbps) data amplitude data per line bit-rate (Mbps) data amplitude data per line System A System B 625-Line Video Systems 6.203125 6.9375 67 IRE 66 IRE 40 bytes 45 bytes 525-Line Video Systems – 5.727272 – 70 IRE – 37 bytes System C 5.734375 70 IRE 36 bytes 5.727272 70 IRE 36 bytes System D 5.6427875 70 IRE 37 bytes 5.727272 70 IRE 37 bytes Table 8.52. Summary of Teletext Systems and Parameters. VBI Data 375 EIA-516, also referred to as NABTS (North American Broadcast Teletext Specification), was used a little in the United States, and was an expansion of the BT.653 525-line system C standard. Figure 8.61 illustrates the teletext data on a scan line. If a line normally contains a color burst signal, it will still be present if teletext data is present. The 16 bits of clock run-in (or clock sync) consists of alternating “1’s” and “0’s.” Figures 8.62 and 8.63 illustrate the structure of teletext systems B and C, respectively. System B Teletext Over view Since teletext System B is the defacto tele- text standard, a basic overview is presented here. A teletext service typically consists of pages, with each page corresponding to a screen of information. The pages are transmitted one at a time, and after all pages have been transmitted, the cycle repeats, with a typical cycle time of about 30 seconds. However, the broadcaster may transmit some pages more frequently than others, if desired. The teletext service is usually based on up to eight magazines (allowing up to eight independent teletext services), with each magazine containing up to 100 pages. Magazine 1 uses page numbers 100–199, magazine 2 uses page numbers 200–299, etc. Each page may also have sub-pages, used to extend the number of pages within a magazine. Each page contains 24 rows, with up to 40 characters per row. A character may be a letter, number, symbol, or simple graphic. There are also control codes to select colors and other attributes such as blinking and double height. In addition to teletext information, the teletext protocol may be used to transmit other information, such as subtitling, program delivery control (PDC), and private data. Subtitling Subtitling is similar to the closed caption- ing used in the United States. Open subtitles are the insertion of text directly into the picture prior to transmission. Closed subtitles are transmitted separately from the picture. The transmission of closed subtitles in the UK uses teletext page 888. In the case where multiple BLANK LEVEL SYNC LEVEL COLOR BURST CLOCK RUN-IN DATA AND ADDRESS FIgure 8.61. Teletext Line Format. 376 Chapter 8: NTSC, PAL, and SECAM Overview TELETEXT APPLICATION LAYER PRESENTATION LAYER NEXT HEADER PACKET LAST PACKET OF PAGE PACKET 26 PACKET 28 PACKET 27 HEADER PACKET PAGE SESSION LAYER PAGE ADDRESS (1 BYTE) NEXT HEADER PACKET PACKET 27 HEADER PACKET DATA GROUP TRANSPORT LAYER MAGAZINE / PACKET ADDRESS (2 BYTES) DATA BLOCK BYTE SYNC (1 BYTE) DATA PACKET CLOCK SYNC (2 BYTES) DATA UNIT NETWORK LAYER LINK LAYER PHYSICAL LAYER Figure 8.62. Teletext System B Structure. VBI Data 377 TELETEXT ITU-T T.101, ANNEX D RECORD HEADER RECORD N RECORD 1 APPLICATION LAYER PRESENTATION LAYER SESSION LAYER DATA GROUP HEADER (8 BYTES) DATA GROUP N DATA GROUP 1 TRANSPORT LAYER P HEADER (5 BYTES) BYTE SYNC (1 BYTE) CLOCK SYNC (2 BYTES) DATA BLOCK S (1 BYTE) NETWORK LAYER DATA PACKET LINK LAYER DATA UNIT PHYSICAL LAYER Figure 8.63. Teletext System C Structure. 378 Chapter 8: NTSC, PAL, and SECAM Overview languages are transmitted using teletext, separate pages are used for each language. Program Delivery Control (PDC) Program Delivery Control (defined by ETSI EN 300 231 and ITU-R BT.809) is a system that controls VCR recording using teletext information. The VCR can be programmed to look for and record various types of programs or a specific program. Programs are recorded even if the transmission time changes for any reason. There are two methods of transmitting PDC information via teletext: methods A and B. Method A places the data on a viewable teletext page, and is usually transmitted on scan line 16. This method is also known as the Video Programming System (VPS). Method B places the data on a hidden packet (packet 26) in the teletext signal. This packet 26 data contains the data on each program, including channel, program data, and start time. Data Broadcasting Data broadcasting may be used to transmit information to private receivers. Typical applications include real-time financial information, airport flight schedules for hotels and travel agents, passenger information for railroads, software upgrades, etc. Packets 0–23 A typical teletext page uses 24 packets, numbered 0–23, that correspond to the 24 rows on a displayed page. Packet 24 can add a status row at the bottom for user prompting. For each packet, three bits specify the magazine address (1–8), and five bits specify the row address (0–23). The magazine and row address bits are Hamming error protected to permit single-bit errors to be corrected. To save bandwidth, the whole address isn’t sent with all packets. Only packet 0 (also called the header packet) has all the address information such as row, page, and magazine address data. Packets 1–28 contain information that is part of the page identified by the most recent packet 0 of the same magazine. The transmission of a page starts with a header packet. Subsequent packets with the same magazine address provide additional data for that page. These packets may be transmitted in any order, and interleaved with packets from other magazines. A page is considered complete when the next header packet for that magazine is received. The general format for packet 0 is: clock run-in framing code magazine and row address page number subcode control codes display data 2 bytes 1 byte 2 bytes 2 bytes 4 bytes 2 bytes 32 bytes The general format for packets 1–23 is: clock run-in framing code magazine and row address display data 2 bytes 1 byte 2 bytes 40 bytes Packet 24 This packet defines an additional row for user prompting. Teletext decoders may use the data in packet 27 to react to prompts in the packet 24 display row. VBI Data 379 Packet 25 This packet defines a replacement header line. If present, the 40 bytes of data are displayed instead of the channel, page, time, and date from packet 8.30. Packet 26 Packet 26 consists of: clock run-in 2 bytes framing code 1 byte magazine and row address 2 bytes designation code 1 byte 13 3-byte data groups, each consisting of 7 data bits 6 address bits 5 mode bits 6 Hamming bits There are 15 variations of packet 26, defined by the designation code. Each of the 13 data groups specify a specific display location and data relating to that location. This packet is also used to extend the addressable range of the basic character set in order to support other languages, such as Arabic, Spanish, Hungarian, Chinese, etc. For PDC, packet 26 contains data for each program, identifying the channel, program date, start time, and the cursor position of the program information on the page. When the user selects a program, the cursor position is linked to the appropriate packet 26 preselection data. This data is then used to program the VCR. When the program is transmitted, the program information is transmitted using packet 8.30 format 2. A match between the preselection data and the packet 8.30 data turns the VCR record mode on. Packet 27 Packet 27 tells the teletext decoder how to respond to user selections for packet 24. There may be up to four packet 27s (packets 27/0 through 27/3), allowing up to 24 links. Packet 27 consists of: clock run-in framing code magazine and row address designation code link 1 (red) link 2 (green) link 3 (yellow) link 4 (cyan) link 5 (next page) link 6 (index) link control data page check digit 2 bytes 1 byte 2 bytes 1 byte 6 bytes 6 bytes 6 bytes 6 bytes 6 bytes 6 bytes 1 byte 2 bytes Each link consists of: 7 data bits 6 address bits 5 mode bits 6 hamming bits This packet contains information linking the current page to six page numbers (links). The four colored links correspond to the four colored Fastext page request keys on the remote. Typically, these four keys correspond to four colored menu selections at the bottom of the display using packet 24. Selection of one of the colored page request keys results in the selection of the corresponding linked page. The fifth link is used for specifying a page the user might want to see after the current page, such as the next page in a sequence. The sixth link corresponds to the Fastext index key on the remote, and specifies the page address to go to when the index is selected. Packets 28 and 29 These are used to define level 2 and level 3 pages to support higher resolution graphics, additional colors, alternate character sets, etc. They are similar in structure to packet 26. 380 Chapter 8: NTSC, PAL, and SECAM Overview Packet 8.30 Format 1 Packet 8.30 (magazine 8, packet 30) isn’t associated with any page, but is sent once per second. This packet is also known as the Television Service Data Packet, or TSDP. It contains data that notifies the teletext decoder about the transmission in general and the time. clock run-in framing code magazine and row address designation code initial teletext page network ID time offset from UTC date (Modified Julian Day) UTC time TV program label status display 2 bytes 1 byte 2 bytes 1 byte 6 bytes 2 bytes 1 byte 3 bytes 3 bytes 4 bytes 20 bytes The Designation Code indicates whether the transmission is during the VBI or full-field. Initial Teletext Page tells the decoder which page should be captured and stored on power-up. This is usually an index or menu page. The Network Identification code identifies the transmitting network. The TV Program Label indicates the program label for the current program. Status Display is used to display a transmission status message. Packet 8.30 Format 2 This format is used for PDC recorder con- trol, and is transmitted once per second per stream. It contains a program label indicating the start of each program, usually transmitted about 30 seconds before the start of the program to allow the VCR to detect it and get ready to record. clock run-in framing code magazine and row address designation code initial teletext page label channel ID program control status country and network ID program ID label country and network ID program type status display 2 bytes 1 byte 2 bytes 1 byte 6 bytes 1 byte 1 byte 2 bytes 5 bytes 2 bytes 2 bytes 20 bytes The content is the same as for Format 1, except for the 13 bytes of information before the status display information. Label channel ID (LCI) identifies each of up to four PDC streams that may be transmitted simultaneously. The Program Control Status (PCS) indicates real-time status information, such as the type of analog sound transmission. The Country and Network ID (CNI) is split into two groups. The first part specifies the country and the second part specifies the network. Program ID Label (PIL) specifies the month, day, and local time of the start of the program. Program Type (PTY) is a code that indicates an intended audience or a particular series. Examples are “adult,” “children,” “music,” “drama,” etc. VBI Data 381 Packet 31 Packet 31 is used for the transmission of data to private receivers. It consists of: clock run-in framing code data channel group message bits format type address length address repeat indicator continuity indicator data length user data CRC 2 bytes 1 byte 1 byte 1 byte 1 byte 1 byte 0–6 bytes 0–1 byte 0–1 byte 0–1 byte 28–36 bytes 2 bytes AMOL (Automated Measurement of Lineups) AMOL I Lines 20, 22, 283, and/or 284 are used to transmit the AMOL I information, as shown in Figure 8.64. However, it may reside on any 480i VBI line, due to unintentional shifting caused by editing, compression, etc. The 1 Mbps payload may change as often as every frame. Each of the 48 data bits is 1000 ±100 ns wide with a maximum rise and fall time of 300 ns. A logical “1” has an amplitude of 55 ±5 IRE; a logical “0” has an amplitude of 0–10 IRE. AMOL II Lines 20, 22, 283, and/or 284 are used to transmit the AMOL II information, as shown in Figure 8.65. However, it may reside on any 480i VBI line, due to unintentional shifting caused by editing, compression, etc. The 2 Mbps payload may change as often as every frame. Each of the 96 data bits is 500 ±50 ns wide with a maximum rise and fall time of 150 ns. A logical “1” has an amplitude of 55 ±5 IRE; a logical “0” has an amplitude of 0–10 IRE. Raw VBI Data Raw, or oversampled, VBI data is simply digitized VBI data. It is typically oversampled using a 2× video sample clock, such as 27 MHz for 480i and 54 MHz for 480p video. Use of the 2× video sample clock enables transferring the raw VBI data over a standard 8-bit BT.656 interface. VBI data may be present on any scan line, except during the serration and equalization inter vals. The raw VBI data is then converted to binary (or sliced) data and processed and/or passed through to the composite, S-video, and YPbPr analog video outputs so it may be decoded by the TV. In the conversion from raw to sliced VBI data, the VBI decoders must compensate for varying DC offsets, amplitude variations, ghosting, and timing variations. Hysteresis must also be used to prevent the VBI decoders from turning on and off rapidly due to noise and transmission errors. Once the desired VBI signal is found for (typically) 15 consecutive frames, VBI decoding should commence. When the desired VBI signal is not found on the appropriate scan lines for (typically) 45 consecutive frames, VBI decoding should stop. Sliced VBI Data Sliced, or binary, VBI data is commonly available from NTSC/PAL video decoders. This has the advantage of lower data rates since binary, rather than oversampled, data is present. The primary disadvantage is the vari- 382 Chapter 8: NTSC, PAL, and SECAM Overview 70 ±10 IRE COLOR BURST DATA (B0 - B47) 250 ±50 NS RISE / FALL TIMES (2T BAR SHAPING) BLANK LEVEL 40 IRE SYNC LEVEL 12.00 ± 1.00 µS Figure 8.64. 525-Line Lines 20, 22, 283, and 284 AMOL I Timing. 70 ±10 IRE COLOR BURST DATA (B0 - B95) 125 ±25 NS RISE / FALL TIMES (2T BAR SHAPING) BLANK LEVEL 40 IRE SYNC LEVEL 12.00 ± 1.00 µS Figure 8.65. 525-Line Lines 20, 22, 283, and 284 AMOL II Timing. VBI Data 383 ety of techniques NTSC/PAL video decoder chip manufacturers use to transfer the sliced VBI data over the video interface. NTSC/PAL Decoder Considerations Closed Captioning In addition to caption and text commands that clear the display, five other events typically force the display to be cleared: (3) S-video analog inputs (DC offset indicator) In addition to automatically processing the video signal to fit a 4:3 or 16:9 display based on the WSS data, the decoder should also support manual overrides in case the user wishes a specific mode of operation due to personal preferences. Software uses this aspect ratio information, user preferences, and display format to assist in properly processing the program for display. (1) A change in the caption display mode, such as switching from CC1 to T1. (2) A loss of video lock, such as on a channel change, forces the display to be cleared. The currently active display mode does not change. For example, if CC1 was selected before loss of video lock, it remains selected. (3) Activation of autoblanking. If the caption signal has not been detected for (typically) 15 consecutive frames, or no new data for the selected channel has been received for (typically) 45 frames, the display memory is cleared. Once the caption signal has been detected for (typically) 15 consecutive frames, or new data has been received, it is displayed. (4) A clear command (from the remote control for example) forces the display to be cleared. (5) Disabling caption decoding also forces the display to be cleared. Widescreen Signaling The decoder must be able to handle a vari- ety of WSS inputs including: (1) PAL or NTSC WSS signal on composite, Svideo, or Y (of YPbPr). (2) SCART analog inputs (DC offset indicator) Ghost Cancellation Ghost cancellation (the removal of undesired reflections present in the signal) is required due to the high data rate of some services, such as teletext. Ghosting greater than 100 ns and –12 dB corrupts teletext data. Ghosting greater than –3 dB is difficult to remove cost-effectively in hardware or software, while ghosting less than –12 dB need not be removed. Ghost cancellation for VBI data is not as complex as ghost cancellation for active video. Unfortunately, the GCR (ghost cancellation reference) signal is not usually present. Thus, a ghost cancellation algorithm must determine the amount of ghosting using other available signals, such as the serration and equalization pulses. The NTSC GCR signal is specified in ATSC A/49 and ITU-R BT.1124. If present, it occupies lines 19 and 282. The GCR permits the detection of ghosting from –3 to +45 μs, and follows an 8-field sequence. The PAL GCR signal is specified in BT.1124 and ETSI ETS 300 732. If present, it occupies line 318. The GCR permits the detection of ghosting from –3 to +45 μs, and follows a 4-frame sequence. 384 Chapter 8: NTSC, PAL, and SECAM Overview Enhanced Television Programming The enhanced television programming standard (SMPTE 363M) is used for creating and delivering enhanced and interactive programs. The enhanced content can be delivered over a variety of mediums—including analog and digital television broadcasts—using terrestrial, cable, and satellite networks. In defining how to create enhanced content, the specification defines the minimum receiver functionality. To minimize the creation of new specifications, it leverages Internet technologies such as HTML and Java-script. The benefits of doing this are that there are already millions of pages of potential content, and the ability to use existing web-authoring tools. The specification mandates that receivers support, as a minimum, HTML 4.0, Javascript 1.1, and Cascading Style Sheets. Supporting additional capabilities, such as Java and VRML, is optional. This ensures content is available to the maximum number of viewers. For increased capability, a new “tv:” attribute is added to the HTML. This attribute enables the insertion of the television program into the content, and may be used in an HTML document anywhere that a regular image may be placed. Creating an enhanced content page that displays the current television channel anywhere on the display is as easy as inserting an image in an HTML document. The specification also defines how the receivers obtain the content and how they are informed that enhancements are available. The latter task is accomplished with triggers. Triggers Triggers alert receivers to content enhancements, and contain information about the enhancements. Among other things, triggers contain a universal resource locator (URL) that defines the location of the enhanced content. Content may reside locally—such as when delivered over the network and cached to a local hard drive—or it may reside on the Internet or another network. Triggers may also contain a human-readable description of the content. For example, it may contain the description “Press ORDER to order this product,” which can be displayed for the viewer. Triggers also may contain expiration information, indicating how long the enhancement should be offered to the viewer. Lastly, triggers may contain scripts that trigger the execution of Javascript within the associated HTML page, to support synchronization of the enhanced content with the video signal and updating of dynamic screen data. The processing of triggers is defined in SMPTE 363M and is independent of the method used to carry them. Transports Besides defining how content is displayed and how the receiver is notified of new content, the specification also defines how content is delivered. Because a receiver may not have an Internet connection, the specification describes two models for delivering content. These two models are called transports, and the two transports are referred to as Transport Type A and Transport Type B. If the receiver has a back-channel (or return path) to the Internet, Transport Type A will broadcast the trigger and the content will be pulled over the Internet. If the receiver does not have an Internet connection, Transport Type B provides for delivery of both triggers and content via the broadcast medium. Announcements are sent over the network to associate triggers with References 385 content streams. An announcement describes the content, and may include information regarding bandwidth, storage requirements, and language. Delivery Protocols For traditional bi-directional Internet communication, the Hypertext Transfer Protocol (HTTP) defines how data is transferred at the application level. For uni-directional broadcasts where a two-way connection is not available, SMPTE 364M defines a uni-directional application-level protocol for data delivery: Uni-directional Hypertext Transfer Protocol (UHTTP). Like HTTP, UHTTP uses traditional URL naming schemes to reference content. Content can reference enhancement pages using the standard “http:” and “ftp:” naming schemes. A “lid:,” or local identifier, URL is also available to allow reference to content that exists locally (such as on the receiver’s hard drive) as opposed to on the Internet or other network. Bindings How data is delivered over a specific network is called “binding.” Bindings have been defined for NTSC and PAL. NTSC Bindings Transport Type A triggers are broadcast on data channel 2 of the CEA-608 captioning signal. Transport Type B binding also includes a mechanism for delivering IP multicast packets over the vertical blanking interval (VBI), otherwise known as IP over VBI (IP/VBI). At the lowest level, the television signal transports NABTS (North American Basic Teletext Standard) packets during the VBI. These NABTS packets are recovered to form a sequential data stream (encapsulated in a SLIP-like protocol) that is unframed to produce IP packets. PAL Bindings Both transport types are based on carriage of IP multicast packets in VBI lines of a PAL system by means of teletext packets 30 or 31. Transport Type A triggers are carried in UDP/IP multicast packets, delivered to address 224.0.23.13 and port 2670. Transport Type B (described in SMPTE 357M) carries a single trigger in a single UDP/IP multicast packet, delivered on the address and port defined in the SDP announcement for the enhanced television program. The trigger protocol is very lightweight in order to provide quick synchronization. References 1. Advanced Television Enhancement Forum, Enhanced Content Specification, 1999. 2. ATSC A/49, 13 May 1993, Ghost Canceling Reference Signal for NTSC. 3. BBC Technical Requirements for Digital Television Services, Version 1.0, February 3, 1999, BBC Broadcast. 4. CEA-608, August 2005, Line 21 Data Service. 5. EIA-189–A, July 1976, Encoded Color Bar Signal. 6. EIA-516, May 1988, North American Basic Teletext Specification (NABTS). 7. EIA-J CPR–1204, Transfer Method of Video ID Information Using Vertical Blanking Interval (525-Line System), March 1997. 8. ETSI EN 300 163, Television Systems: NICAM 728: Transmission of Two Channel Digital Sound with Terrestrial Television Systems B, G, H, I, K1, and L, March 1998. 386 Chapter 8: NTSC, PAL, and SECAM Overview 9. ETSI EN 300 231, Television Systems: Specification of the Domestic Video Programme Delivery Control System (PDC), April 2003. 10. ETSI EN 300 294, Television Systems: 625Line Television Widescreen Signaling (WSS), April 2003. 11. ETSI EN 300 706, Enhanced Teletext Specification, April 2003. 12. ETSI EN 300 708, Television Systems: Data Transmission within Teletext, April 2003. 13. ETSI ETS 300 731, Television Systems: Enhanced 625-Line Phased Alternate Line (PAL) Television: PALplus, March 1997. 14. ETSI ETS 300 732, Television Systems: Enhanced 625-Line PAL/SECAM Television; Ghost Cancellation Reference (GCR) Signals, January 1997. 15. Faroudja, Yves Charles, NTSC and Beyond, IEEE Transactions on Consumer Electronics, Vol. 34, No. 1, February 1988. 16. IEC 61880, 1998–1, Video Systems (525/ 60)—Video and Accompanied Data Using the Vertical Blanking Interval—Analog Inter face. 17. ITU-R BS.707–3, 1998, Transmission of Multisound in Terrestrial Television Systems PAL B, G, H, and I and SECAM D, K, K1, and L. 18. ITU-R BT.470–6, 1998, Conventional Television Systems. 19. ITU-R BT.471–1, 1986, Nomenclature and Description of Colour Bar Signals. 20. ITU-R BT.472–3, 1990, Video Frequency Characteristics of a Television System to Be Used for the International Exchange of Programmes Between Countries That Have Adopted 625-Line Colour or Monochrome Systems. 21. ITU-R BT.473–5, 1990, Insertion of Test Signals in the Field-Blanking Interval of Monochrome and Colour Television Signals. 22. ITU-R BT.569–2, 1986, Definition of Parameters for Simplified Automatic Measurement of Television Insertion Test Signals. 23. ITU-R BT.653–3, 1998, Teletext Systems. 24. ITU-R BT.809, 1992, Programme Delivery Control (PDC) System for Video Recording. 25. ITU-R BT.1118, 1994, Enhanced Compatible Widescreen Television Based on Conventional Television Systems. 26. ITU-R BT.1119–2, 1998, Wide-Screen Signaling for Broadcasting. 27. ITU-R BT.1124, 1994, Reference Signals for Ghost Canceling in Analogue Television Systems. 28. ITU-R BT.1197–1, 1998, Enhanced Widescreen PAL TV Transmission System (the PALplus System). 29. ITU-R BT.1298, 1997, Enhanced WideScreen NTSC TV Transmission System. 30. Multichannel TV Sound System BTSC System Recommended Practices, EIA Television Systems Bulletin No. 5 (TVSB5). 31. NTSC Video Measurements, Tektronix, Inc., 1997. 32. SMPTE 12M–1999, Television, Audio and Film—Time and Control Code. 33. SMPTE 170M–2004, Television—Composite Analog Video Signal—NTSC for Studio Applications. 34. SMPTE 262M–1995, Television, Audio and Film—Binary Groups of Time and Control Codes—Storage and Transmission of Data. 35. SMPTE 309M–1999, Television—Transmission of Date and Time Zone Information in Binary Groups of Time and Control Code. 36. SMPTE 357M–2002, Television—Declarative Data Essence—Internet Protocol Multicast Encapsulation. 37. SMPTE 361M–2002, Television—NTSC IP and Trigger Binding to VBI. References 387 38. SMPTE 363M–2002, Television—Declarative Data Essence—Content Level 1. 39. SMPTE 364M–2001, Declarative Data Essence—Unidirectional Hypertext Transport Protocol. 40. SMPTE RP-164–1996, Location of Vertical Interval Time Code. 41. SMPTE RP-186–1995, Video Index Information Coding for 525- and 625-Line Television Systems. 42. SMPTE RP-201–1999, Encoding Film Transfer Information Using Vertical Interval Time Code. 43. Specification of Television Standards for 625-Line System-I Transmissions, 1971, Independent Television Authority (ITA) and British Broadcasting Corporation (BBC). 44. Television Measurements, NTSC Systems, Tektronix, Inc., 1998. 45. Television Measurements, PAL Systems, Tektronix, Inc., 1990. 388 Chapter 9: NTSC and PAL Digital Encoding and Decoding Chapter 9: NTSC and PAL Digital Encoding and Decoding Chapter 9 NTSC and PAL Digital Encoding and Decoding Although not exactly digital video, the NTSC and PAL composite color video formats are currently the most common formats for video. Although the video signals themselves are analog, they can be encoded and decoded almost entirely digitally. Analog NTSC and PAL encoders and decoders have been available for some time. However, they have been difficult to use, required adjustment, and offered limited video quality. Using digital techniques to implement NTSC and PAL encoding and decoding offers many advantages such as ease of use, minimum analog adjustments, and excellent video quality. In addition to composite video, S-video is supported by consumer and pro-video equipment, and should also be implemented. S-video uses separate luminance (Y) and chrominance (C) analog video signals so higher quality may be maintained by eliminating the Y/C separation process. This chapter discusses the design of a digital encoder (Figure 9.1) and decoder (Figure 9.21) that support composite and S-video (M) NTSC and (B, D, G, H, I, NC) PAL video signals. (M) and (N) PAL are easily accommodated with some slight modifications. NTSC encoders and decoders are usually based on the YCbCr, YUV, or YIQ color space. PAL encoders and decoders are usually based on the YCbCr or YUV color space. 388 NTSC and PAL Encoding 389 Video Standard Sample Clock Rate 9 MHz (M) NTSC, (M) PAL 13.5 MHz (B, D, G, H, I, N, NC) PAL 12.27 MHz 9 MHz 14.75 MHz 13.5 MHz Applications SVCD BT.601 MPEG-2 DV square pixels SVCD square pixels BT.601 MPEG-2 DV Active Resolution 480 × 480i 7201 × 480i 704 × 480i 720 × 480i 640 × 480i 480 × 576i 768 × 576i 7202 × 576i 704 × 576i 720 × 576i Total Resolution 572 × 525i Field Rate (per second) 858 × 525i 59.94 interlaced 780 × 525i 576 × 625i 944 × 625i 864 × 625i 50 interlaced Table 9.1. Common NTSC/PAL Sample Rates and Resolutions. 1Typically 716 true active samples between 10% blanking points. 2Typically 702 true active samples between 50% blanking points. NTSC and PAL Encoding YCbCr input data has a nominal range of 16–235 for Y and 16–240 for Cb and Cr. RGB input data has a range of 0–255; pro-video applications may use a nominal range of 16–235. As YCbCr values outside these ranges result in overflowing the standard YIQ or YUV ranges for some color combinations, one of three things may be done, in order of preference: (a) allow the video signal to be generated using the extended YIQ or YUV ranges; (b) limit the color saturation to ensure a legal video signal is generated; or (c) clip the YIQ or YUV levels to the valid ranges. 4:1:1, 4:2:0, or 4:2:2 YCbCr data must be converted to 4:4:4 YCbCr data before being converted to YIQ or YUV data. The chrominance lowpass filters will not perform the interpolation properly. Table 9.1 lists some of the common sample rates and resolutions. 2× Oversampling 2× oversampling generates 8:8:8 YCbCr or RGB data, simplifying the analog output filters. The oversampler is also a convenient place to convert from 8-bit to 10-bit data, providing an increase in video quality. Color Space Conversion Choosing the 10-bit video levels to be white = 800 and sync = 16, and knowing that the sync-to-white amplitude is 1V, the full-scale output of the D/A converters (DACs) is therefore set to 1.305V. (M) NTSC, (M, N) PAL Since (M) NTSC and (M, N) PAL have a 7.5 IRE blanking pedestal and a 40 IRE sync amplitude, the color space conversion equations are derived so as to generate 0.660V of active video. 390 Chapter 9: NTSC and PAL Digital Encoding and Decoding HSYNC# VSYNC# BLANK# FIELD_0 FIELD_1 CLOCK VIDEO TIMING AND GENLOCK CONTROL BLANK PEDESTAL BLANK RISE / FALL EXPANDER SYNC RISE / FALL EXPANDER Y 2X OVERSAMPLE + + DAC Y + DAC NTSC / PAL 2X OVERSAMPLE CR ---------- 1.3 MHZ LPF MUX + 2X OVERSAMPLE CB ---------- 1.3 MHZ LPF MUX BURST CONTROL SIN ROM COS ROM DTO DAC C Figure 9.1. Typical NTSC/PAL Digital Encoder Implementation. NTSC and PAL Encoding 391 YUV Color Space Processing Modern encoder designs are now based on the YUV color space. For these encoders, the YCbCr to YUV equations are: Y = 0.591(Y601 – 64) U = 0.504(Cb – 512) V = 0.711(Cr – 512) The R´G´B´ to YUV equations are: Y = 0.151R´ + 0.297G´ + 0.058B´ U = –0.074R´ – 0.147G´ + 0.221B´ V = 0.312R´ – 0.261G´ – 0.051B´ For pro-video applications using a 10-bit nominal range of 64–940 for RGB, the R´G´B´ to YUV equations are: Y = 0.177(R´ – 64) + 0.347(G´ – 64) + 0.067(B´ – 64) U = –0.087(R´ – 64) – 0.171(G´ – 64) + 0.258(B´ – 64) V = 0.364(R´ – 64) – 0.305(G´ – 64) – 0.059(B´ – 64) Y has a nominal range of 0 to 518, U a nominal range of 0 to ±226, and V a nominal range of 0 to ±319. Negative values of Y should be supported to allow test signals, keying information, and real-world video to be passed through the encoder with minimum corruption. YIQ Color Space Processing For older NTSC encoder designs based on the YIQ color space, the YCbCr to YIQ equations are: Y = 0.591(Y601 – 64) I = 0.596(Cr – 512) – 0.274(Cb – 512) Q = 0.387(Cr – 512) + 0.423(Cb – 512) The R´G´B´ to YIQ equations are: Y = 0.151R´ + 0.297G´ + 0.058B´ I = 0.302R´ – 0.139G´ – 0.163B´ Q = 0.107R´ – 0.265G´ + 0.158B´ For pro-video applications using a 10-bit nominal range of 64–940 for R´G´B´, the R´G´B´ to YIQ equations are: Y = 0.177(R´ – 64) + 0.347(G´ – 64) + 0.067(B´ – 64) I = 0.352(R´ – 64) – 0.162(G´ – 64) – 0.190(B´ – 64) Q = 0.125(R´ – 64) – 0.309(G´ – 64) + 0.184(B´ – 64) Y has a nominal range of 0 to 518, I a nominal range of 0 to ±309, and Q a nominal range of 0 to ±271. Negative values of Y should be supported to allow test signals, keying information, and real-world video to be passed through the encoder with minimum corruption. YCbCr Color Space Processing If the design is based on the YUV color space, the Cb and Cr conversion to U and V may be avoided by scaling the sin and cos values during the modulation process or scaling the color difference lowpass filter coefficients. This has the advantage of reducing data path processing. 392 Chapter 9: NTSC and PAL Digital Encoding and Decoding NTSC–J Since the version of (M) NTSC used in Japan has a 0 IRE blanking pedestal, the color space conversion equations are derived so as to generate 0.714V of active video. YUV Color Space Processing The YCbCr to YUV equations are: Y = 0.639(Y601 – 64) U = 0.545(Cb – 512) V = 0.769(Cr – 512) The R´G´B´ to YUV equations are: Y = 0.164R´ + 0.321G´ + 0.062B´ U = –0.080R´ – 0.159G´ + 0.239B´ V = 0.337R´ – 0.282G´ – 0.055B´ For pro-video applications using a 10-bit nominal range of 64–940 for R´G´B´, the R´G´B´ to YUV equations are: Y = 0.191(R´ – 64) + 0.375(G´ – 64) + 0.073(B´ – 64) U = –0.094(R´ – 64) – 0.185(G´ – 64) + 0.279(B´ – 64) V = 0.393(R´– 64) – 0.329(G´ – 64) – 0.064(B´ – 64) Y has a nominal range of 0 to 560, U a nominal range of 0 to ±244, and V a nominal range of 0 to ±344. Negative values of Y should be supported to allow test signals, keying information, and real-world video to be passed through the encoder with minimum corruption. Y = 0.639(Y601 – 64) I = 0.645(Cr – 512) – 0.297(Cb – 512) Q = 0.419(Cr – 512) + 0.457(Cb – 512) The R´G´B´ to YIQ equations are: Y = 0.164R´ + 0.321G´ + 0.062B´ I = 0.326R´ – 0.150G´ – 0.176B´ Q = 0.116R´ – 0.286G´ + 0.170B´ For pro-video applications using a 10-bit nominal range of 64–940 for R´G´B´, the R´G´B´ to YIQ equations are: Y = 0.191(R´ – 64) + 0.375(G´ – 64) + 0.073(B´ – 64) I = 0.381(R´ – 64) – 0.176(G´ – 64) – 0.205(B´ – 64) Q = 0.135(R´ – 64) – 0.334(G´ – 64) + 0.199(B´ – 64) Y has a nominal range of 0 to 560, I a nominal range of 0 to ±334, and Q a nominal range of 0 to ±293. Negative values of Y should be supported to allow test signals, keying information, and real-world video to be passed through the encoder with minimum corruption. YCbCr Color Space Processing If the design is based on the YUV color space, the Cb and Cr conversion to U and V may be avoided by scaling the sin and cos values during the modulation process or scaling the color difference lowpass filter coefficients. This has the advantage of reducing data path processing. YIQ Color Space Processing For older encoder designs based on the YIQ color space, the YCbCr to YIQ equations are: NTSC and PAL Encoding 393 (B, D, G, H, I, NC) PAL Since these PAL standards have a 0 IRE blanking pedestal and a 43 IRE sync amplitude, the color space conversion equations are derived so as to generate 0.7V of active video. YUV Color Space Processing The YCbCr to YUV equations are: Y = 0.625(Y601 – 64) U = 0.533(Cb – 512) V = 0.752(Cr – 512) The R´G´B´ to YUV equations are: Y = 0.160R´ + 0.314G´ + 0.061B´ U = –0.079R´ – 0.155G´ + 0.234B´ V = 0.329R´ – 0.275G´ – 0.054B´ For pro-video applications using a 10-bit nominal range of 64–940 for R´G´B´, the R´G´B´ to YUV equations are: Y = 0.187(R´ – 64) + 0.367(G´ – 64) + 0.071(B´ – 64) U = –0.092(R´ – 64) – 0.181(G´ – 64) + 0.273(B´ – 64) V = 0.385(R´ – 64) – 0.322(G´ – 64) – 0.063(B´ – 64) Y has a nominal range of 0 to 548, U a nominal range of 0 to ±239, and V a nominal range of 0 to ±337. Negative values of Y should be supported to allow test signals, keying information, and real-world video to be passed through the encoder with minimum corruption. YCbCr Color Space Processing If the design is based on the YUV color space, the Cb and Cr conversion to U and V may be avoided by scaling the sin and cos val- ues during the modulation process or scaling the color difference lowpass filter coefficients. This has the advantage of reducing data path processing. Luminance (Y) Processing Lowpass filtering to about 6 MHz must be done to remove high-frequency components generated as a result of the 2x oversampling process. An optional notch filter may also be used to remove the color subcarrier frequency from the luminance information. This improves decoded video quality for decoders that use simple Y/C separation. The notch filter should be disabled when generating S-video, RGB, or YPbPr video signals. Next, any blanking pedestal is added during active video, and the blanking and sync information is added. (M) NTSC, (M, N) PAL As (M) NTSC and (M, N) PAL have a 7.5 IRE blanking pedestal, a value of 42 is added to the luminance data during active video. 0 is added during the blank time. After the blanking pedestal is added, the luminance data is clamped by a blanking signal that has a raised cosine distribution to slow the slew rate of the start and end of the video signal. Typical blank rise and fall times are 140 ±20 ns for NTSC and 300 ±100 ns for PAL. Digital composite sync information is added to the luminance data after the blank processing has been performed. Values of 16 (sync present) or 240 (no sync) are assigned. The sync rise and fall times should be processed to generate a raised cosine distribution (between 16 and 240) to slow the slew rate of the sync signal. Typical sync rise and fall times are 140 ±20 ns for NTSC and 250 ±50 ns for 394 Chapter 9: NTSC and PAL Digital Encoding and Decoding PAL, although the encoder should generate sync edges of about 130 or 240 ns to compensate for the analog output filters slowing the sync edges. At this point, we have digital luminance with sync and blanking information, as shown in Table 9.2. NTSC–J When generating NTSC–J video, there is a 0 IRE blanking pedestal. Thus, no blanking pedestal is added to the luminance data during active video. Otherwise, the processing is the same as for (M) NTSC. (B, D, G, H, I, NC) PAL When generating (B, D, G, H, I, NC) PAL video, there is a 0 IRE blanking pedestal. Thus, no blanking pedestal is added to the luminance data during active video. Blanking information is inserted using the same technique as used for (M) NTSC. However, typical blank rise and fall times are 300 ±100 ns. Composite sync information is added using the same technique as used for (M) NTSC, except values of 16 (sync present) or 252 (no sync) are used. Typical sync rise and fall times are 250 ±50 ns, although the encoder should generate sync edges of about 240 ns to compensate for the analog output filters slowing the sync edges. At this point, we have digital luminance with sync and blanking information, as shown in Table 9.2. Analog Luminance (Y) Generation The digital luminance data may drive a 10- bit DAC that generates a 0–1.305V output to generate the Y video signal of an S-video (Y/C) interface. Figures 9.2 and 9.3 show the luminance video waveforms for 75% color bars. The numbers on the luminance levels indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. The video signal at the connector should have a source impedance of 75 Ω. As the sample-and-hold action of the DAC introduces a (sin x)/x characteristic, the video data may be digitally filtered by a [(sin x)/x]–1 filter to compensate. Alternately, as an analog lowpass filter is usually present after the DAC, the correction may take place in the analog filter. As an option, the ability to delay the digital Y information a programmable number of clock cycles before driving the DAC may be useful. If the analog luminance video is lowpass filtered after the DAC, and the analog chrominance video is bandpass filtered after its Video Level white black blank sync (M) NTSC 800 282 240 16 NTSC–J 800 240 240 16 (B, D, G, H, I, NC) PAL 800 252 252 16 (M, N) PAL 800 282 240 16 Table 9.2. 10-Bit Digital Luminance Values. NTSC and PAL Encoding 395 WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK 1.020 V 100 IRE 0.357 V 0.306 V 7.5 IRE WHITE LEVEL (800) 671 626 554 510 442 398 326 BLACK LEVEL (282) BLANK LEVEL (240) 40 IRE 0.020 V SYNC LEVEL (16) Figure 9.2. (M) NTSC Luminance (Y) Video Signal for 75% Color Bars. Indicated luminance levels are 10-bit values. WHITE YELLOW CYAN GREEN MAGENTA RED BLUE BLACK 1.020 V 800 WHITE LEVEL (800) 100 IRE 0.321 V 616 540 493 422 375 299 BLACK / BLANK LEVEL (252) 43 IRE 0.020 V SYNC LEVEL (16) Figure 9.3. (B, D, G, H, I) PAL Luminance (Y) Video Signal for 75% Color Bars. Indicated luminance levels are 10-bit values. 396 Chapter 9: NTSC and PAL Digital Encoding and Decoding DAC, the chrominance video path may have a longer delay (typically up to about 400 ns) than the luminance video path. By adjusting the delay of the Y data, the analog luminance and chrominance video will be aligned more closely after filtering, simplifying the analog design. Color Difference Processing Lowpass Filtering The color difference signals (CbCr, UV, or IQ) should be lowpass filtered using a Gaussian filter. This filter type minimizes ringing and overshoot, avoiding the generation of visual artifacts on sharp edges. If the encoder is used in a video editing application, the filters should have a maximum ripple of ±0.1 dB in the passband. This minimizes the cumulation of gain and loss artifacts due to the filters, especially when multiple passes through the encoding and decoding processes are done. At the final encoding point, Gaussian filters may be used. YCbCr and YUV Color Space Cb and Cr, or U and V, are lowpass filtered to about 1.3 MHz. Typical filter characteristics are <2 dB attenuation at 1.3 MHz and >20 dB attenuation at 3.6 MHz. The filter characteristics are shown in Figure 9.4. YIQ Color Space Q is lowpass filtered to about 0.6 MHz. Typical filter characteristics are <2 dB attenuation at 0.4 MHz, <6 dB attenuation at 0.5 MHz, and >6 dB attenuation at 0.6 MHz. The filter characteristics are shown in Figure 9.5. Typical filter characteristics for I are the same as for U and V. Filter Considerations The modulation process is shown in spec- tral terms in Figures 9.6 through 9.9. The frequency spectra of the modulation process are the same as those if the modulation process were analog, but are repeated at harmonics of the sample rate. Using wide-band (1.3 MHz) filters, the modulated chrominance spectra overlap near the zero frequency regions, resulting in aliasing. Also, there may be considerable aliasing just above the subcarrier frequency. For these reasons, the use of narrower-band lowpass filters (0.6 MHz) may be more appropriate. Wide-band Gaussian filters ensure optimum compatibility with monochrome displays by minimizing the artifacts at the edges of colored objects. A narrower, sharper-cut lowpass filter would emphasize the subcarrier signal at these edges, resulting in ringing. If monochrome compatibility can be ignored, a beneficial effect of narrower filters would be to reduce the spread of the chrominance into the low-frequency luminance (resulting in low-frequency cross-luminance), which is difficult to suppress in a decoder. Also, although the encoder may maintain a wide chrominance bandwidth, the bandwidth of the color difference signals in a decoder is usually much narrower. In the decoder, loss of the chrominance upper sidebands (due to lowpass filtering the video signal to 4.2–5.5 MHz) contributes to ringing and color difference crosstalk on color transitions. Any increase in the decoder chrominance bandwidth causes a proportionate increase in cross-color. NTSC and PAL Encoding 397 AMPLITUDE 1.0 AMPLITUDE 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0 1 2 3 4 5 6 FREQUENCY (MHZ) Figure 9.4. Typical 1.3 MHz Lowpass Digital Filter Characteristics. 0.0 0.0 0.5 1.0 1.5 2.0 FREQUENCY (MHZ) Figure 9.5. Typical 0.6 MHz Lowpass Digital Filter Characteristics. 398 Chapter 9: NTSC and PAL Digital Encoding and Decoding |A| (A) MHZ –10 –5 0 5 10 –FS + FSC (B) |A| –FSC FSC FS – FSC MHZ –10 –5 0 5 10 |A| (C) MHZ –10 –5 0 5 10 Figure 9.6. Frequency Spectra for NTSC Digital Chrominance Modulation (FS = 13.5 MHz, FSC = 3.58 MHz). (A) Lowpass filtered U and V signals. (B) Color subcarrier. (C) Modulated chrominance spectrum produced by convolving (A) and (B). |A| (A) MHZ –10 –5 0 5 10 –FS + FSC |A| –FSC FSC FS – FSC (B) MHZ –10 –5 0 5 10 |A| (C) MHZ –10 –5 0 5 10 Figure 9.7. Frequency Spectra for NTSC Digital Chrominance Modulation (FS = 12.27 MHz, FSC = 3.58 MHz). (A) Lowpass filtered U and V signals. (B) Color subcarrier. (C) Modulated chrominance spectrum produced by convolving (A) and (B). NTSC and PAL Encoding 399 |A| (A) MHZ –10 –5 0 5 10 |A| –FS + FSC –FSC FSC FS – FSC (B) MHZ –10 –5 0 5 10 |A| (C) MHZ –10 –5 0 5 10 Figure 9.8. Frequency Spectra for PAL Digital Chrominance Modulation (FS = 13.5 MHz, FSC = 4.43 MHz). (A) Lowpass filtered U and V signals. (B) Color subcarrier. (C) Modulated chrominance spectrum produced by convolving (A) and (B). |A| (A) –10 –5 0 5 –FS + FSC (B) |A| –FSC FSC –10 –5 0 5 |A| 10 FS – FSC 10 MHZ MHZ (C) MHZ –10 –5 0 5 10 Figure 9.9. Frequency Spectra for PAL Digital Chrominance Modulation (FS = 14.75 MHz, FSC = 4.43 MHz). (A) Lowpass filtered U and V signals. (B) Color subcarrier. (C) Modulated chrominance spectrum produced by convolving (A) and (B). 400 Chapter 9: NTSC and PAL Digital Encoding and Decoding Chrominance (C) Modulation For NTSC–J systems, the equations are: (M) NTSC, NTSC–J During active video, the CbCr, UV, or IQ data modulate sin and cos subcarriers, as shown in Figure 9.1, resulting in digital chrominance (C) data. For this design, the 11bit reference subcarrier phase (see Figure 9.17) and the burst phase are the same (180°). For YUV and YCbCr processing, 180° must be added to the 11-bit reference subcarrier phase during active video time so the output of the sin and cos ROMs have the proper subcarrier phases (0° and 90°, respectively). For YIQ processing, 213° must be added to the 11-bit reference subcarrier phase during active video time so the output of the sin and cos ROMs have the proper subcarrier phases (33° and 123°, respectively). For the following equations, ω = 2πFSC FSC = 3.579545 MHz (±10 Hz) (Cb – 512)(0.545)(sin ωt) + (Cr – 512)(0.769)(cos ωt) In these cases, the values in the sin and cos ROMs are scaled by the indicated values to allow the modulator multipliers to accept Cb and Cr data directly, instead of U and V data. YIQ Color Space As discussed in Chapter 8, the chromi- nance signal may also be represented by: (Q sin (ωt + 33°)) + (I cos (ωt + 33°)) Chrominance amplitudes are ±sqrt(I2 + Q2). (B, D, G, H, I, M, N, NC) PAL During active video, the CbCr or UV data modulate sin and cos subcarriers, as shown in Figure 9.1, resulting in digital chrominance (C) data. For this design, the 11-bit reference subcarrier phase (see Figure 9.17) is 135°. For the following equations, YUV Color Space As discussed in Chapter 8, the chromi- nance signal may be represented by: (U sin ωt) + (V cos ωt) Chrominance amplitudes are ±sqrt(U2 + V2). YCbCr Color Space If the encoder is based on the YCbCr color space, the chrominance signal may be represented by: (Cb – 512)(0.504)(sin ωt) + (Cr – 512)(0.711)(cos ωt) ω = 2πFSC FSC = 4.43361875 MHz (±5 Hz) for (B, D, G, H, I, N) PAL FSC = 3.58205625 MHz (±5 Hz) for (NC) PAL FSC = 3.57561149 MHz (±5 Hz) for (M) PAL PAL Switch In theory, since the [sin ωt] and [cos ωt] subcarriers are orthogonal, the U and V signals can be perfectly separated from each other in the decoder. However, if the video signal is subjected to distortion, such as asymmetrical attenuation of the sidebands due to lowpass filtering, the orthogonality is degraded, resulting in crosstalk between the U and V signals. NTSC and PAL Encoding 401 PAL uses alternate line switching of the V signal to provide a frequency offset between the U and V subcarriers, in addition to the 90° subcarrier phase offset. When decoded, crosstalk components appear modulated onto the alternate line carrier frequency, in solid color areas producing a moving pattern known as Hanover bars. This pattern may be suppressed in the decoder by a comb filter that averages equal contributions from switched and unswitched lines. When PAL switch = 0, the 11-bit reference subcarrier phase (see Figure 9.17) and the burst phase are the same (135°). Thus, 225° must be added to the 11-bit reference subcarrier phase during active video so the output of the sin and cos ROMs have the proper subcarrier phases (0° and 90°, respectively). When PAL switch = 1, 90° is added to the 11-bit reference subcarrier phase, resulting in a 225° burst phase. Thus, an additional 135° must be added to the 11-bit reference subcarrier phase during active video so the output of the sin and cos ROMs have the proper phases (0° and 90°, respectively). Note that in Figure 9.17, while PAL switch = 1, the –V subcarrier is generated, implementing the –V component. YUV Color Space As discussed in Chapter 8, the chromi- nance signal is represented by: (U sin ωt) ± (V cos ωt) with the sign of V alternating from one line to the next (known as the PAL switch). Chrominance amplitudes are ±sqrt(U2 + V2). YCbCr Color Space If the encoder is based on the YCbCr color space, the chrominance signal for (B, D, G, H, I, NC) PAL may be represented by: (Cb – 512)(0.533)(sin ωt) ± (Cr – 512)(0.752)(cos ωt) The chrominance signal for (M, N) PAL may be represented by: (Cb – 512)(0.504)(sin ωt) ± (Cr – 512)(0.711)(cos ωt) In these cases, the values in the sin and cos ROMs are scaled by the indicated values to allow the modulator multipliers to accept Cb and Cr data directly, instead of U and V data. General Processing The subcarrier sin and cos values should have a minimum of nine bits plus sign of accuracy. The modulation multipliers must have saturation logic on the outputs to ensure overflow and underflow conditions are saturated to the maximum and minimum values, respectively. After the modulated color difference signals are added together, the result is rounded to nine bits plus sign. At this point, the digital modulated chrominance has the ranges shown in Table 9.3. The resulting digital chrominance data is clamped by a blanking signal that has the same raised cosine values and timing as the signal used to blank the luminance data. 402 Chapter 9: NTSC and PAL Digital Encoding and Decoding Burst Generation As shown in Figure 9.1, the lowpass fil- tered color difference data are multiplexed with the color burst envelope information. During the color burst time, the color difference data should be ignored and the burst envelope signal inserted on the Cb, U, or Q channel (the Cr, V, or I channel is forced to zero). The burst envelope rise and fall times should generate a raised cosine distribution to slow the slew rate of the burst envelope. Typical burst envelope rise and fall times are 300 ±100 ns. The burst envelope should be wide enough to generate nine or ten cycles of burst information with an amplitude of 50% or greater. When the burst envelope signal is multiplied by the output of the sin ROM, the color burst is generated and will have the range shown in Table 9.3. For pro-video applications, the phase of the color burst should be programmable over a 0° to 360° range to provide optional system phase matching with external video signals. This can be done by adding a programmable value to the 11-bit subcarrier reference phase during the burst time (see Figure 9.17). Analog Chrominance (C) Generation The digital chrominance data may drive a 10-bit DAC that generates a 0–1.305V output to generate the C video signal of an S-video (Y/C) interface. The video signal at the connector should have a source impedance of 75 Ω. Figures 9.10 and 9.11 show the modulated chrominance video waveforms for 75% color bars. The numbers in parentheses indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. If the DAC can’t handle the generation of bipolar video signals, an offset must be added to the chrominance data (and the sign information dropped) before driving the DAC. In this instance, an offset of +512 was used, positioning the blanking level at the midpoint of the 10-bit DAC output level. As the sample-and-hold action of the DAC introduces a (sin x)/x characteristic, the video data may be digitally filtered by a [(sin x)/x]–1 filter to compensate. Alternately, as an analog lowpass filter is usually present after the DAC, the correction may take place in the analog filter. Video Level peak chroma peak burst blank peak burst peak chroma (M) NTSC 328 112 0 –112 –328 NTSC–J 354 112 0 –112 –354 (B, D, G, H, I, NC) PAL 347 117 0 –117 –347 Table 9.3. 10-Bit Digital Chrominance Values. (M, N) PAL 328 117 0 –117 –328 NTSC and PAL Encoding 403 WHITE YELLOW (± 174) CYAN (± 246) GREEN (± 229) MAGENTA (± 229) RED (± 246) BLUE (± 174) BLACK 0.966 V 0.796 V 0.653 V 0.510 V 0.341 V 20 IRE 20 IRE 3.58 MHZ COLOR BURST (9 CYCLES) BLANK LEVEL (512) Figure 9.10. (M) NTSC Chrominance (C) Video Signal for 75% Color Bars. Indicated video levels are 10-bit values. WHITE YELLOW (± 184) CYAN (± 260) GREEN (± 243) MAGENTA (± 243) RED (± 260) BLUE (± 184) BLACK 0.985 V 0.803 V 0.653 V 0.504 V 0.321 V 21.43 IRE 21.43 IRE 4.43 MHZ COLOR BURST (10 CYCLES) BLANK LEVEL (512) Figure 9.11. (B, D, G, H, I) PAL Chrominance (C) Video Signal for 75% Color Bars. Indicated video levels are 10-bit values. 404 Chapter 9: NTSC and PAL Digital Encoding and Decoding WHITE YELLOW (± 174) CYAN (± 246) GREEN (± 229) MAGENTA (± 229) RED (± 246) BLUE (± 174) BLACK 1.020 V 100 IRE 0.449 V 0.357 V 0.306 V 0.163 V 0.020 V 20 IRE 20 IRE 7.5 IRE 40 IRE WHITE LEVEL (800) 671 626 554 510 3.58 MHZ COLOR BURST (9 CYCLES) 442 398 326 BLACK LEVEL (282) BLANK LEVEL (240) SYNC LEVEL (16) Figure 9.12. (M) NTSC Composite Video Signal for 75% Color Bars. Indicated video levels are 10bit values. Analog Composite Video The digital luminance (Y) data and the digital chrominance (C) data are added together, generating digital composite color video with the levels shown in Table 9.4. The result may drive a 10-bit DAC that generates a 0–1.305V output to generate the composite video signal. The video signal at the connector should have a source impedance of 75 Ω. Figures 9.12 and 9.13 show the video waveforms for 75% color bars. The numbers in parentheses indicate the data value for a 10-bit DAC with a full-scale output value of 1.305V. As the sample-and-hold action of the DAC introduces a (sin x)/x characteristic, the video data may be digitally filtered by a [(sin x)/x]–1 filter to compensate. Alternately, as an analog lowpass filter is usually present after the DAC, the correction may take place in the analog filter. NTSC and PAL Encoding 405 WHITE YELLOW (± 184) CYAN (± 260) GREEN (± 243) MAGENTA (± 243) RED (± 260) BLUE (± 184) BLACK 1.020 V 800 WHITE LEVEL (800) 100 IRE 0.471 V 0.321 V 0.172 V 0.020 V 21.43 IRE 21.43 IRE 43 IRE 4.43 MHZ COLOR BURST (10 CYCLES) 616 540 493 422 375 299 BLACK / BLANK LEVEL (252) SYNC LEVEL (16) Figure 9.13. (B, D, G, H, I) PAL Composite Video Signal for 75% Color Bars. Indicated video levels are 10-bit values. Video Level peak chroma white peak burst black blank peak burst peak chroma sync (M) NTSC 973 800 352 282 240 128 109 16 NTSC–J 987 800 352 240 240 128 53 16 (B, D, G, H, I, NC) PAL 983 800 369 252 252 135 69 16 (M, N) PAL 973 800 357 282 240 123 109 16 Table 9.4. 10-Bit Digital Composite Video Levels. 406 Chapter 9: NTSC and PAL Digital Encoding and Decoding Black Burst Video Signal As an option, the encoder can generate a black burst (or house sync) video signal that can be used to synchronize multiple video sources. Figures 9.14 and 9.15 illustrate the black burst video signals. Note that these are the same as analog composite, but do not contain any active video information. The numbers in parentheses indicate the data value for a 10bit DAC with a full-scale output value of 1.305V. 0.449 V 0.357 V 0.306 V 0.163 V 0.020 V 20 IRE 20 IRE 7.5 IRE 40 IRE 3.58 MHZ COLOR BURST (9 CYCLES) BLACK LEVEL (282) BLANK LEVEL (240) SYNC LEVEL (16) Figure 9.14. (M) NTSC Black Burst Video Signal. Indicated video levels are 10-bit values. 0.471 V 0.321 V 0.172 V 0.020 V 21.43 IRE 21.43 IRE 43 IRE 4.43 MHZ COLOR BURST (10 CYCLES) BLACK / BLANK LEVEL (252) SYNC LEVEL (16) Figure 9.15. (B, D, G, H, I) PAL Black Burst Video Signal. Indicated video levels are 10-bit values. NTSC and PAL Encoding 407 Color Subcarrier Generation The color subcarrier can be generated from the sample clock using a discrete time oscillator (DTO). When generating video that may be used for editing, it is important to maintain the phase relationship between the color subcarrier and sync information. Unless the subcarrier phase relative to the sync phase is properly maintained, an edit may result in a momentary color shift. PAL also requires the addition of a PAL switch, which is used to invert the polarity of the V data every other scan line. Note that the polarity of the PAL switch should be maintained through the encoding and decoding process. Since in this design the color subcarrier is derived from the sample clock, any jitter in the sample clock will result in a corresponding subcarrier frequency jitter. In some PCs, the sample clock is generated using a phase-lock loop (PLL), which may not have the necessary clock stability to keep the subcarrier phase jitter below 2°–3°. Frequency Relationships (M) NTSC, NTSC–J As shown in Chapter 8, there is a defined relationship between the subcarrier frequency (FSC) and the line frequency (FH): FSC/FH = 910/4 Assuming (for example only) a 13.5 MHz sample clock rate (FS): FS = 858 FH Combining these equations produces the relationship between FSC and FS: FSC/FS = 35/132 which may also be expressed in terms of the sample clock period (TS) and the subcarrier period (TSC): TS/TSC = 35/132 The color subcarrier phase must be advanced by this fraction of a subcarrier cycle each sample clock. (B, D, G, H, I, N) PAL As shown in Chapter 8, there is a defined relationship between the subcarrier frequency (FSC) and the line frequency (FH): FSC/FH = (1135/4) + (1/625) Assuming (for example only) a 13.5 MHz sample clock rate (FS): FS = 864 FH Combining these equations produces the relationship between FSC and FS: FSC/FS = 709379/2160000 which may also be expressed in terms of the sample clock period (TS) and the subcarrier period (TSC): TS/TSC = 709379/2160000 The color subcarrier phase must be advanced by this fraction of a subcarrier cycle each sample clock. (NC) PAL In the (NC) PAL video standard used in Argentina, there is a different relationship between the subcarrier frequency (FSC) and the line frequency (FH): 408 Chapter 9: NTSC and PAL Digital Encoding and Decoding FSC/FH = (917/4) + (1/625) Assuming (for example only) a 13.5 MHz sample clock rate (FS): FS = 864 FH Combining these equations produces the relationship between FSC and FS: FSC/FS = 573129/2160000 which may also be expressed in terms of the sample clock period (TS) and the subcarrier period (TSC): TS/TSC = 573129/2160000 The color subcarrier phase must be advanced by this fraction of a subcarrier cycle each sample clock. Quadrature Subcarrier Generation A DTO consists of an accumulator in which a smaller number [p] is added modulo to another number [q]. The counter consists of an adder and a register as shown in Figure 9.16. The contents of the register are constrained so that if they exceed or equal [q], [q] is subtracted from the contents. The output signal (XN) of the adder is: XN = (XN–1 + p) modulo q With each clock cycle, [p] is added to produce a linearly increasing series of digital values. It is important that [q] not be an integer multiple of [p] so that the generated values are continuously different and the remainder changes from one cycle to the next. P + MODULO Q REGISTER OUTPUT Figure 9.16. Single Stage DTO. The DTO is used to reduce the sample clock frequency, FS, to the color subcarrier frequency, FSC: FSC = (p/q) FS Since [p] is of finite word length, the DTO output frequency can be varied only in steps. With a [p] word length of [w], the lowest [p] step is 0.5w and the lowest DTO frequency step is: FSC = FS/2w Note that the output frequency cannot be greater than half the input frequency. This means that the output frequency FSC can only be varied by the increment [p] and within the range: 0 < FSC < FS/2 In this application, an overflow corresponds to the completion of a full cycle of the subcarrier. Since only the remainder (which represents the subcarrier phase) is required, the number of whole cycles completed is of no interest. During each clock cycle, the output of the [q] register shows the relative phase of a subcarrier frequency in qths of a subcarrier period. By using the [q] register contents to address a ROM containing a sine wave characteristic, a numerical representation of the sampled subcarrier sine wave can be generated. NTSC and PAL Encoding 409 Single-Stage DTO A single 24-bit or 32-bit modulo [q] regis- ter may be used, with the 11 most significant bits providing the subcarrier reference phase. An example of this architecture is shown in Figure 9.16. Multi-Stage DTO More long-term accuracy may be achieved if the ratio is partitioned into two or three fractions, the more significant of which provides the subcarrier reference phase, as shown in Figure 9.17. To use the full capacity of the ROM and make the overflow automatic, the denominator of the most significant fraction is made a power of two. The 4× HCOUNT denominator of the least significant fraction is used to simplify hardware calculations. Subdividing the subcarrier period into 2048 phase steps, and using the total number of samples per scan line (HCOUNT), the ratio may be partitioned as follows: P1 + --------------(---P----2---)-------------- F-----S---C-- = --------------(--4----)--(---H----C-----O----U-----N----T----)- FS 2048 P1 and P2 are programmed to generate the desired color subcarrier frequency (FSC). The modulo 4× HCOUNT and modulo 2048 counters should be reset at the beginning of each vertical sync of field one to ensure the generation of the correct subcarrier reference (as shown in Figures 8.5 and 8.16). The less significant stage produces a sequence of carry bits which correct the approximate ratio of the upper stage by altering the counting step by one: from P1 to P1 + 1. The upper stage produces an 11-bit subcarrier phase used to address the sine and cosine ROMs. Although the upper stage adder automatically overflows to provide modulo 2048 operation, the lower stage requires additional circuitry because 4× HCOUNT may not be (and usually isn’t) an integer power of two. In this case, the 16-bit register has a maximum capacity of 65535 and the adder generates a carry for any value greater than this. To produce the correct carry sequence, it is necessary, each time the adder overflows, to adjust the next number added to make up the difference between 65535 and 4× HCOUNT. This requires: P3 = 65536 – (4)(HCOUNT) + P2 Although this changes the contents of the lower stage register, the sequence of carry bits is unchanged, ensuring that the correct phase values are generated. The P1 and P2 values are determined for (M) NTSC operation using the following equation: P1 + --------------(---P----2---)-------------- F-----S---C-- == --------------(--4----)--(---H----C-----O----U-----N----T----)- FS 2048 = ⎛ ⎝ 9---41---0--⎠⎞ ⎛ ⎝ H-----C-----O--1--U-----N----T--⎠⎞ 410 Chapter 9: NTSC and PAL Digital Encoding and Decoding UPPER STAGE 11 11 MODULO 11 11 ADDER 2048 P1 REGISTER 16 P3 16 P2 1 MUX 0 67 REG. LOWER STAGE 1 (CARRY) 16 16 16 ADDER MODULO 16 4X HCOUNT REGISTER 525-LINE OPERATION = 0 625-LINE OPERATION = CARRY REG. 1 (CARRY) 10 10 10 ADDER 625-LINE LOWER STAGE MODULO 10 625 REGISTER 11-BIT REFERENCE SUBCARRIER PHASE NTSC = 180˚ PAL = 135˚ (1 LSB = 0.1757813˚) PRO VIDEO ONLY 11 SUBCARRIER ADDER PHASE ADJUST ADDER DURING ACTIVE VIDEO: 11 NTSC = 1024 (180˚) PAL = 768 (135˚) IF PAL SWITCH = 1 PAL = 1280 (225˚) IF PAL SWITCH = 0 = 0 OTHERWISE 11 (A0 - A10) 1 PAL SWITCH 11 (A0 - A10) 1 (A9) 1 (A10) 9 (A0 - A8) 1024 X 9 COS ROM 11 ADDER NTSC = 0 PAL = 0 IF PAL SWITCH = 0 PAL = 512 (90˚) IF PAL SWITCH = 1 1 SIGN (1 = NEGATIVE) 9 COS ωT 0 9 MUX 1 1 PAL OPERATION SIGN (1 = NEGATIVE) 1024 X 9 SIN ROM 1 9 SIN ωT Figure 9.17. 3-Stage DTO Chrominance Subcarrier Generation. NTSC and PAL Encoding 411 The P1 and P2 values are determined for (B, D, G, H, I, N) PAL operation using the following equation: F-----S---C-- == P-----1----+-----(-----4-------)-----(-----H---------C--P-------O-2--------U----------N--------T--------)- FS 2048 = ⎝⎛ 1---1--4-3----5- + 6---2-1--5--⎠⎞ ⎛ ⎝ H-----C-----O--1--U-----N----T--⎠⎞ The P1 and P2 values are determined for the version of (NC) PAL used in Argentina using the following equation: F-----S---C-- == P-----1----+-----(-----4-------)-----(-----H---------C--P-------O-2--------U----------N--------T--------)- FS 2048 = ⎝⎛ 9---14---7-- + 6---12---5--⎠⎞ ⎛ ⎝ H-----C-----O--1--U-----N----T--⎠⎞ The modulo 625 counter, with a [p] value of 67, is used during 625-line operation to more accurately adjust subcarrier generation due to the 0.1072 remainder after calculating the P1 and P2 values. During 525-line operation, the carry signal should always be forced to be zero. Table 9.5 lists some of the common horizontal resolutions, sample clock rates, and their corresponding HCOUNT, P1, and P2 values. Sine and Cosine Generation Regardless of the type of DTO used, each value of the 11-bit subcarrier phase corresponds to one of 2048 waveform values taken at a particular point in the subcarrier cycle period and stored in ROM. The sample points are taken at odd multiples of one 4096th of the total period to avoid end-effects when the sample values are read out in reverse order. Note that only one quadrant of the subcarrier wave shape is stored in ROM, as shown in Figure 9.18. The values for the other quadrants are produced using the symmetrical properties of the sinusoidal waveform. The maximum phase error using this technique is ±0.09° (half of 360/2048), which corresponds to a maximum amplitude error of ±0.08%, relative to the peak-to-peak amplitude, at the steepest part of the sine wave signal. Figure 9.17 also shows a technique for generating quadrature subcarriers from an 11bit subcarrier phase signal. It uses two ROMs to store quadrants of sine and cosine waveforms. XOR gates invert the addresses for generating time-reversed portions of the waveforms and to invert the output polarity to make negative portions of the waveforms. An additional gate is provided in the sign bit for the V subcarrier to allow injection of a PAL switch square wave to implement phase inversion of the V signal on alternate scan lines. Horizontal and Vertical Timing Vertical and horizontal counters are used to control the video timing. Timing Control To control the horizontal and vertical counters, separate horizontal sync (HSYNC#) and vertical sync (VSYNC#) signals are commonly used. A BLANK# control signal is usually used to indicate when to generate active video. If HSYNC#, VSYNC#, and BLANK# are inputs, controlling the horizontal and vertical counters, this is referred to as “slave” timing. HSYNC#, VSYNC#, and BLANK# are generated by another device in the system, and used by the encoder to generate the video. 412 Chapter 9: NTSC and PAL Digital Encoding and Decoding 1 3 5 1021 1023 4096TH'S OF A SUBCARRIER PERIOD 0˚ 90˚ Figure 9.18. Positions of the 512 Stored Sample Values in the sin and cos ROMs for One Quadrant of a Subcarrier Cycle. Samples for other quadrants are generated by inverting the addresses and/or sign values. Typical Application Total Samples per Scan Line (HCOUNT) 4× HCOUNT P1 P2 13.5 MHz (M) NTSC 858 13.5 MHz (B, D, G, H, I) PAL 864 12.27 MHz (M) NTSC 780 14.75 MHz (B, D, G, H, I) PAL 944 3432 3456 3120 3776 543 104 672 2061 597 1040 615 2253 Table 9.5. Typical HCOUNT, P1, and P2 Values for the 3-Stage DTO in Figure 9.17. NTSC and PAL Encoding 413 The horizontal and vertical counters may also be used to generate the basic video timing. In this case, referred to as “master” timing, HSYNC#, VSYNC#, and BLANK# are outputs from the encoder, and used elsewhere in the system. For a BT.656 video interface, horizontal blanking (H), vertical blanking (V), and field (F) information are used. In this application, the encoder would use the H, V, and F timing bits directly, rather than depending on HSYNC#, VSYNC#, and BLANK# control signals. Table 9.6 lists the typical horizontal blank timing for common sample clock rates. A blanking control signal (BLANK#) is used to specify when to generate active video. Horizontal Timing An 11-bit horizontal counter is incre- mented on each rising edge of the sample clock, and reset by HSYNC#. The counter value is monitored to determine when to assert and negate various control signals each scan line, such as the start of burst envelope, end of burst envelope, etc. During slave timing operation, if there is no HSYNC# pulse at the end of a line, the counter can either continue incrementing (recommended) or automatically reset (not recommended). Vertical Timing A 10-bit vertical counter is incremented on each leading edge of HSYNC#, and reset when coincident leading edges of VSYNC# and HSYNC# occur. Rather than exactly coincident falling edges, a coincident window of about ±64 clock cycles should be used to ease interfacing to some video timing controllers. If both the HSYNC# and VSYNC# leading edges are detected within 64 clock cycles of each other, it is assumed to be the beginning of Field 1. The counter value is monitored to determine which scan line is being generated. For interlaced (M) NTSC, color burst information should be disabled on scan lines 1–9 and 264–272, inclusive. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. Typical Application Sync + Back Porch Blanking (Samples) 13.5 MHz (M) NTSC 122 13.5 MHz (B, D, G, H, I) PAL 132 12.27 MHz (M) NTSC 126 14.75 MHz (B, D, G, H, I) PAL 163 Front Porch Blanking (Samples) 16 12 14 13 Table 9.6. Typical BLANK# Horizontal Timing. 414 Chapter 9: NTSC and PAL Digital Encoding and Decoding For noninterlaced (M) NTSC, color burst information should be disabled on scan lines 1–9, inclusive. A 29.97 Hz (30/1.001) offset may be added to the color subcarrier frequency so the subcarrier phase will be inverted from field to field. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. For interlaced (B, D, G, H, I, N, NC) PAL, during fields 1, 2, 5, and 6, color burst information should be disabled on scan lines 1–6, 310– 318, and 623–625, inclusive. During fields 3, 4, 7, and 8, color burst information should be disabled on scan lines 1–5, 311–319, and 622–625, inclusive. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. For noninterlaced (B, D, G, H, I, N, NC) PAL, color burst information should be disabled on scan lines 1–6 and 310–312, inclusive. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. For interlaced (M) PAL, during fields 1, 2, 5, and 6, color burst information should be disabled on scan lines 1–8, 260–270, and 523–525, inclusive. During fields 3, 4, 7, and 8, color burst information should be disabled on scan lines 1–7, 259–269, and 522–525, inclusive. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. For noninterlaced (M) PAL, color burst information should be disabled on scan lines 1–8 and 260–262, inclusive. On the remaining scan lines, color burst information should be enabled and disabled at the appropriate horizontal count values. Early PAL receivers produced colored twitter at the top of the picture due to the swinging burst. To fix this, Bruch blanking was implemented to ensure that the phase of the first burst is the same following each vertical sync pulse. Analog encoders used a meander gate to control the burst reinsertion time by shifting one line at the vertical field rate. A digital encoder simply keeps track of the scan line and field number. Modern receivers do not require Bruch blanking, but it is useful for determining which field is being processed. During slave timing operation, if there is no VSYNC# pulse at the end of a frame, the counter can either continue incrementing (recommended) or automatically reset (not recommended). During master timing operation, for provideo applications, it may be desirable to generate 2.5 scan line VSYNC# pulses during 625line operation. However, this may cause Field 1 vs. Field 2 detection problems in some commercially available video chips. Field ID Signals Although the timing relationship between HSYNC# and VSYNC#, or the BT.656 F bit, is used to specify Field 1 or Field 2, additional signals may be used to specify which one of four or eight fields to generate, as shown in Table 9.7. FIELD_0 should change state coincident with the leading edge of VSYNC# during Fields 1, 3, 5, and 7. FIELD_1 should change state coincident with the leading edge of VSYNC# during Fields 1 and 5. For BT.656 video interface, FIELD_0 and FIELD_1 may be transmitted using ancillary data. NTSC and PAL Encoding 415 Clean Encoding Typically, the only filters present in a conventional encoder are the color difference lowpass filters. This results in considerable spectral overlap between the luminance and chrominance components, making it impossible to separate the signals completely at the decoder. However, additional processing at the encoder can be used to reduce cross-color (luminance-to-chrominance crosstalk) and cross-luminance (chrominance-to-luminance crosstalk) decoder artifacts. Cross-color appears as a coarse rainbow pattern or random colors in regions of fine detail. Cross-luminance appears as a fine pattern on chrominance edges. Cross-color in a decoder may be reduced by removing some of the high-frequency lumi- nance data in the encoder, using a notch filter at FSC. However, while reducing the crosscolor, luminance detail is lost. A better method is to pre-comb filter the luminance and chrominance information in the encoder (see Figure 9.19). High-frequency luminance information is pre-combed to minimize interference with chrominance frequencies in that spectrum. Chrominance information also is pre-combed by averaging over a number of lines, reducing cross-luminance or the hanging dot pattern. This technique allows fine, moving luminance (which tends to generate cross-color at the decoder) to be removed while retaining full resolution for static luminance. However, there is a small loss of diagonal luminance resolution due to it’s being averaged over multiple lines. This is offset by an improvement in the chrominance signal-to-noise ratio (SNR). FIELD_1 Signal 0 0 0 0 1 1 1 1 FIELD_0 Signal 0 0 1 1 0 0 1 1 HSYNC# and VSYNC# Timing Relationship or BT.656 F Bit field 1 field 2 field 1 field 2 field 1 field 2 field 1 field 2 NTSC Field Number 1 odd field 2 even field 3 odd field 4 even field – – – – – – – – Table 9.7. Field Numbering. PAL Field Number 1 even field 2 odd field 3 even field 4 odd field 5 even field 6 odd field 7 even field 8 odd field 416 Chapter 9: NTSC and PAL Digital Encoding and Decoding Y I/U Q/V LPF NTSC = 2.3 MHZ PAL = 3.1 MHZ COMPLEMENTARY FILTERS HPF NTSC = 2.3 MHZ PAL = 3.1 MHZ LUMINANCE COMB FILTER MODULATOR CHROMINANCE COMB FILTER + + NTSC / PAL Figure 9.19. Clean Encoding Example. Bandwidth-Limited Edge Generation Smooth sync and blank edges may be generated by integrating a T, or raised cosine, pulse to generate a T step (Figure 9.20). NTSC systems use a T pulse with T = 125 ns; therefore, the 2T step has little signal energy beyond 4 MHz. PAL systems use a T pulse with T = 100 ns; in this instance, the 2T step has little signal energy beyond 5 MHz. The T step provides a fast risetime, without ringing, within a well-defined bandwidth. The risetime of the edge between the 10% and 90% points is 0.964T. By choosing appropriate sample values for the sync edges, blanking edges, and burst envelope, these values can be stored in a small ROM, which is triggered at the appropriate horizontal count. By reading the contents of the ROM forward and backward, both rising and falling edges may be generated. 1/2 1.0 –125 50% 0 (A) 125 NS TIME (NS) 125 –125 0.5 TIME (NS) 0 125 (B) Figure 9.20. Bandwidth-Limited Edge Generation. (A) NTSC T pulse. (B) The T step, the result of integrating the T pulse. NTSC and PAL Encoding 417 Level Limiting Certain highly saturated colors produce composite video levels that may cause problems in downstream equipment. Invalid video levels greater than 100 IRE or less than –20 IRE (relative to the blank level) may be transmitted, but may cause distortion in VCRs or demodulators and cause sync separation problems. Illegal video levels greater than 120 IRE (NTSC) or 133 IRE (PAL), or below the sync tip level, may not be transmitted. Although usually not a problem in a conventional video application, computer systems commonly use highly saturated colors, which may generate invalid or illegal video levels. It may be desirable to optionally limit these signal levels to around 110 IRE, compromising between limiting the available colors and generating legal video levels. One method of correction is to adjust the luminance or saturation of invalid and illegal pixels until the desired peak limits are attained. Alternately, the frame buffer contents may be scanned, and pixels flagged that would generate an invalid or illegal video level (using a separate overlay plane or color change). The user then may change the color to a more suitable one. In a professional editing application, the option of transmitting all the video information (including invalid and illegal levels) between equipment is required to minimize editing and processing artifacts. Encoder Video Parameters Many industry-standard video parameters have been defined to specify the relative quality of NTSC and PAL encoders. To measure these parameters, the output of the encoder (while generating various video test signals such as those described in Chapter 8) is monitored using video test equipment. Along with a description of several of these parameters, typical AC parameter values for both consumer and studio-quality encoders are shown in Table 9.8. Several AC parameters, such as group delay and K factors, are dependent on the quality of the output filters and are not discussed here. In addition to the AC parameters discussed in this section, there are several others that should be included in an encoder specification, such as burst frequency and tolerance, horizontal frequency, horizontal blanking time, sync rise and fall times, burst envelope rise and fall times, video blanking rise and fall times, and the bandwidths of the YIQ or YUV components. There are also several DC parameters (such as white level and tolerance, blanking level and tolerance, sync height and tolerance, peak-to-peak burst amplitude and tolerance) that should be specified, as shown in Table 9.9. Differential Phase Differential phase distortion, commonly referred to as differential phase, specifies how much the chrominance phase is affected by the luminance level—in other words, how much hue shift occurs when the luminance level changes. Both positive and negative phase errors may be present, so differential phase is expressed as a peak-to-peak measurement, expressed in degrees of subcarrier phase. This parameter is measured using chroma of uniform phase and amplitude superimposed on different luminance levels, such as the modulated ramp test signal or the modulated 5-step portion of the composite test signal. The differential phase parameter for a studio-quality encoder may approach 0.2° or less. 418 Chapter 9: NTSC and PAL Digital Encoding and Decoding Parameter differential phase differential gain luminance nonlinearity hue accuracy color saturation accuracy residual subcarrier SNR (per EIA-250-C) SCH phase analog Y/C output skew H tilt V tilt subcarrier tolerance Consumer Quality NTSC PAL 4 4 2 3 3 0.5 48 0 ±40 0 ±20 5 <1 <1 10 5 Studio Quality NTSC PAL ≤1 ≤1 ≤1 ≤1 ≤1 0.1 > 60 0 ±2 ≤2 <1 <1 10 5 Units degrees % % degrees % IRE dB degrees ns % % Hz Table 9.8. Typical AC Video Parameters for (M) NTSC and (B, D, G, H, I) PAL Encoders. Parameter white relative to blank black relative to blank sync relative to blank burst amplitude Consumer Quality NTSC 714 ±70 54 ±5 –286 ±30 286 ±30 PAL 700 ±70 0 –300 ±30 300 ±30 Studio Quality NTSC 714 ±7 54 ±0.5 –286 ±3 286 ±3 PAL 700 ±7 0 –300 ±3 300 ±3 Units mV mV mV mV Table 9.9. Typical DC Video Parameters for (M) NTSC and (B, D, G, H, I) PAL Encoders. NTSC and PAL Encoding 419 Differential Gain Differential gain distortion, commonly referred to as differential gain, specifies how much the chrominance gain is affected by the luminance level—in other words, how much color saturation shift occurs when the luminance level changes. Both attenuation and amplification may occur, so differential gain is expressed as the largest amplitude change between any two levels, expressed as a percentage of the largest chrominance amplitude. This parameter is measured using chroma of uniform phase and amplitude superimposed on different luminance levels, such as the modulated ramp test signal or the modulated 5-step portion of the composite test signal. The differential gain parameter for a studio-quality encoder may approach 0.2% or less. Luminance Nonlinearity Luminance nonlinearity, also referred to as differential luminance and luminance nonlinear distortion, specifies how much the luminance gain is affected by the luminance level— in other words, a nonlinear relationship between the generated and ideal luminance levels. Using an unmodulated 5-step or 10-step staircase test signal, the difference between the largest and smallest steps, expressed as a percentage of the largest step, is used to specify the luminance nonlinearity. Although this parameter is included within the differential gain and phase parameters, it is traditionally specified independently. Chrominance Nonlinear Phase Distortion Chrominance nonlinear phase distortion specifies how much the chrominance phase (hue) is affected by the chrominance amplitude (saturation)—in other words, how much hue shift occurs when the saturation changes. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the phase differences between each chrominance packet and the burst are measured. The difference between the largest and the smallest measurements is the peak-to-peak value, expressed in degrees of subcarrier phase. This parameter is usually not independently specified, but is included within the differential gain and phase parameters. Chrominance Nonlinear Gain Distortion Chrominance nonlinear gain distortion specifies how much the chrominance gain is affected by the chrominance amplitude (saturation)—in other words, a nonlinear relationship between the generated and ideal chrominance amplitude levels, usually seen as an attenuation of highly saturated chrominance signals. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the test equipment is adjusted so that the middle chrominance packet is 40 IRE. The largest difference between the measured and nominal values of the amplitudes of the other two chrominance packets specifies the chrominance nonlinear gain distortion, expressed in IRE or as a percentage of the nominal amplitude of the worstcase packet. This parameter is usually not independently specified, but is included within the differential gain and phase parameters. Chrominance-to-Luminance Intermodulation Chrominance-to-luminance intermodulation, commonly referred to as cross-modulation, specifies how much the luminance level is affected by the chrominance. This may be the result of clipping highly saturated chromi- 420 Chapter 9: NTSC and PAL Digital Encoding and Decoding nance levels or quadrature distortion and may show up as irregular brightness variations due to changes in color saturation. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the largest difference between the ideal 50 IRE pedestal level and the measured luminance levels (after removal of chrominance information) specifies the chrominance-to-luminance intermodulation, expressed in IRE or as a percentage. This parameter is usually not independently specified, but is included within the differential gain and phase parameters. Hue Accuracy Hue accuracy specifies how closely the generated hue is to the ideal hue value. Both positive and negative phase errors may be present, so hue accuracy is the difference between the worst-case positive and worst-case negative measurements from nominal, expressed in degrees of subcarrier phase. This parameter is measured using EIA or EBU 75% color bars as a test signal. Color Saturation Accuracy Color saturation accuracy specifies how closely the generated saturation is to the ideal saturation value, using EIA or EBU 75% color bars as a test signal. Both gain and attenuation may be present, so color saturation accuracy is the difference between the worst-case gain and worst-case attenuation measurements from nominal, expressed as a percentage of nominal. Residual Subcarrier The residual subcarrier parameter speci- fies how much subcarrier information is present during white or gray (note that, ideally, none should be present). Excessive residual subcarrier is visible as noise during white or gray portions of the picture. Using an unmodulated 5-step or 10-step staircase test signal, the maximum peak-topeak measurement of the subcarrier (expressed in IRE) during active video is used to specify the residual subcarrier relative to the burst amplitude. SCH Phase SCH (subcarrier to horizontal) phase refers to the phase relationship between the leading edge of horizontal sync (at the 50% amplitude point) and the zero crossings of the color burst (by extrapolating the color burst to the leading edge of sync). The error is referred to as SCH phase and is expressed in degrees of subcarrier phase. For PAL, the definition of SCH phase is slightly different due to the more complicated relationship between the sync and subcarrier frequencies—the SCH phase relationship for a given line repeats only once every eight fields. Therefore, PAL SCH phase is defined, per EBU Technical Statement D 23-1984 (E), as “the phase of the +U component of the color burst extrapolated to the half-amplitude point of the leading edge of the synchronizing pulse of line 1 of field 1.” SCH phase is important when merging two or more video signals. To avoid color shifts or picture jumps, the video signals must have the same horizontal, vertical, and subcarrier timing and the phases must be closely matched. To achieve these timing constraints, the video signals must have the same SCH phase relationship since the horizontal sync and subcarrier are continuous signals with a defined relationship. It is common for an encoder to allow adjustment of the SCH phase to simplify NTSC and PAL Encoding 421 merging two or more video signals. Maintaining proper SCH phase is also important since NTSC and PAL decoders may monitor the SCH phase to determine which color field is being decoded. ter of the field or using a field square wave. The peak-to-peak deviation of the tilt is measured (in IRE or percent of white bar amplitude), ignoring the first three and last three lines. Analog Y/C Video Output Skew The output skew between the analog lumi- nance (Y) and chrominance (C) video signals should be minimized to avoid phase shift errors between the luminance and chrominance information. Excessive output skew is visible as artifacts along sharp vertical edges when viewed on a monitor. H Tilt H tilt, also known as line tilt and line time distortion, causes a tilt in line-rate signals, predominantly white bars. This type of distortion causes variations in brightness between the left and right edges of an image. For a digital encoder, such as that described in this chapter, H tilt is primarily an artifact of the analog output filters and the transmission medium. H tilt is measured using a line bar (such as the one in the NTC-7 NTSC composite test signal) and measuring the peak-to-peak deviation of the tilt (in IRE or percent of white bar amplitude), ignoring the first and last microsecond of the white bar. V Tilt V tilt, also known as field tilt and field time distortion, causes a tilt in field-rate signals, predominantly white bars. This type of distortion causes variations in brightness between the top and bottom edges of an image. For a digital encoder, such as that described in this chapter, V tilt is primarily an artifact of the analog output filters and the transmission medium. V tilt is measured using an 18 μs, 100 IRE white bar in the center of 130 lines in the cen- Genlocking Support In many instances, it is desirable to be able to genlock the output (align the timing signals) of an encoder to another composite analog video signal to facilitate downstream video processing. This requires locking the horizontal, vertical, and color subcarrier frequencies and phases together, as discussed in the NTSC/ PAL decoder section of this chapter. In addition, the luminance and chrominance amplitudes must be matched. A major problem in genlocking is that the regenerated sample clock may have excessive jitter, resulting in color artifacts. One genlocking variation is to send an advance house sync (also known as black burst or advance sync) to the encoder. The advancement compensates for the delay from the house sync generator to the encoder output being used in the downstream processor, such as a mixer. Each video source has its own advanced house sync signal, so each video source is time-aligned at the mixing or processing point. Another genlocking option allows adjustment of the subcarrier phase so it can be matched with other video sources at the mixing or processing point. The subcarrier phase must be able to be adjusted from 0° to 360°. Either zero SCH phase is always maintained or another adjustment is allowed to independently position the sync and luminance information in about 10 ns steps. The output delay variation between products should be within about ±0.8 ns to allow 422 Chapter 9: NTSC and PAL Digital Encoding and Decoding video signals from different genlocked devices to be mixed properly. Mixers usually assume the two video signals are perfectly genlocked, and excessive time skew between the two video signals results in poor mixing performance. Alpha Channel Support An encoder designed for pro-video editing applications may support an alpha channel. Eight or ten bits of digital alpha data are input, pipelined to match the pipeline of the encoding process, and converted to an analog alpha signal (discussed in Chapter 7). Alpha is usually linear, with the data generating an analog alpha signal (also called a key) with a range of 0–100 IRE. There is no blanking pedestal or sync information present. In computer systems that support 32-bit pixels, 8 bits are typically available for alpha information. NTSC and PAL Digital Decoding Although the luminance and chrominance components in a NTSC/PAL encoder are usually combined by simply adding the two signals together, separating them in a decoder is much more difficult. Analog NTSC and PAL decoders have been around for some time. However, they have been difficult to use, required adjustment, and offered limited video quality. Using digital techniques to implement NTSC and PAL decoding offers many advantages, such as ease of use, minimum analog adjustments, and excellent video quality. The use of digital circuitry also enables the design of much more robust and sophisticated Y/C separator and genlock implementations. A general block diagram of a NTSC/PAL digital decoder is shown in Figure 9.21. Digitizing the Analog Video The first step in digital decoding of composite video signals is to digitize the entire composite video signal using an A/D converter (ADC). For our example, 10-bit ADCs are used; therefore, indicated values are 10-bit values. The composite and S-video signals are illustrated in Figures 9.2, 9.3. 9.10, 9.11, 9.12, and 9.13. Video inputs are usually AC-coupled and have a 75 Ω AC and DC input impedance. As a result, the video signal must be DC restored every scan line during horizontal sync to position the sync tips at a known voltage level. The video signal must also be lowpass filtered (typically to about 6 MHz) to remove any high-frequency components that may result in aliasing. Although the video bandwidth for broadcast is rigidly defined, there is no standard for consumer equipment. The video source generates as much bandwidth as it can; the receiving equipment accepts as much bandwidth as it can process. Video signals with amplitudes of 0.25× to 2× ideal are common in the consumer market. The active video and/or sync signal may change amplitude, especially in editing situations where the video signal may be composed of several different video sources merged together. In addition, the decoder should be able to handle 100% colors. Although only 75% colors may be broadcast, there is no such limitation for baseband video. With the frequent use of computer-generated text and graphics, highly saturated colors are becoming more common. NTSC and PAL Digital Decoding 423 Figure 9.21. Typical NTSC/PAL Digital Decoder Implementation. NTSC / PAL Y C MUX A/D A/D GAIN ADJUST GENLOCK AND VIDEO TIMING CONTROL BLANK# VSYNC# HSYNC# FIELD_0 FIELD_1 CLOCK Y/C SEPARATOR Y Y BRIGHTNESS, CONTRAST, SATURATION, CB HUE ADJUST CB CHROMINANCE DEMODULATION CR --------------DISPLAY ENHANCEMENT PROCESSING CR 424 Chapter 9: NTSC and PAL Digital Encoding and Decoding DC Restoration To remove any DC offset that may be present in the video signal, and position it at a known level, DC restoration (also called clamping) is done. For composite or luminance (Y) video signals, the analog video signal is DC restored to the REF– voltage of the ADC during each horizontal sync time. Thus, the ADC generates a code of 0 during the sync level. For chrominance (C) video signals, the analog video signal is DC restored to the midpoint of the ADC during the horizontal sync time. Thus, the ADC generates a code of 512 during the blanking level. Automatic Gain Control An automatic gain control (AGC) is used to ensure that a constant value for the blanking level is generated by the ADC. If the blanking level is low or high, the video signal is amplified or attenuated until the blanking level is correct. In S-video applications, the same amount of gain that is applied to the luminance video signal should also be applied to the chrominance video signal. After DC restoration and AGC processing, an offset of 16 is added to the digitized composite and luminance signals to match the levels used by the encoder. Tables 9.2, 9.3, and 9.4 show the ideal ADC values for composite and S-video sources after DC restoration and automatic gain control has been done. Blanking Level Determination The most common method of determining the blanking level is to digitally lowpass filter the video signal to about 0.5 MHz to remove subcarrier information and noise. The back porch is then sampled multiple times to determine an average blanking level value. To limit line-to-line variations and clamp streaking (the result of quantizing errors), the result should be averaged over 3–32 consecutive scan lines. Alternately, the back porch level may be determined during the vertical blanking interval and the result used for the entire field. Video Gain Options The difference from the ideal blanking level is processed and used in one of several ways to generate the correct blanking level: (a) controlling a voltage-controlled amplifier (b) adjusting the REF+ voltage of the ADC (c) multiplying the outputs of the ADC In (a) and (b), an analog signal for controlling the gain may be generated by either a DAC or a charge pump. If a DAC is used, it should have twice the resolution of the ADC to avoid quantizing noise. For this reason, a charge pump implementation may be more suitable. Option (b) is dependent on the ADC being able to operate over a wide range of reference voltages, and is therefore rarely implemented. Option (c) is rarely used due to the resulting quantization errors from processing in the digital domain. Sync Amplitude AGC This is the most common mode of AGC, and is used where the characteristics of the video signal are not known. The difference between the measured and the ideal blanking level is used to determine how much to increase or decrease the gain of the entire video signal. NTSC and PAL Digital Decoding 425 Burst Amplitude AGC Another method of AGC is based on the color burst amplitude. This is commonly used in pro-video applications when the sync amplitude may not be related to the active video amplitude. First, the blanking level is adjusted to the ideal value, regardless of the sync tip position. This may be done by adding or subtracting a DC offset to the video signal. Next, the burst amplitude is determined. To limit line-to-line variations, the burst amplitude may be averaged over 3–32 consecutive scan lines. The difference between the measured and the ideal burst amplitude is used to determine how much to increase or decrease the gain of the entire video signal. During the gain adjustment, the blanking value should not change. AGC Options For some pro-video applications, such as if the video signal levels are known to be correct, if all the video levels except the sync height are correct, or if there is excessive noise in the video signal, it may be desirable to disable the automatic gain control. The AGC value to use may be specified by the user, or the AGC value frozen once determined. Y/C Separation When decoding composite video, the luminance (Y) and chrominance (C) must be separated. The many techniques for doing this are discussed in detail later in the chapter. After Y/C separation, Y has the nominal values shown in Table 9.2. Note that the luminance still contains sync and blanking information. Modulated chrominance has the nominal values shown in Table 9.3. The quality of Y/C separation is a major factor in the overall video quality generated by the decoder. Color Difference Processing Chrominance (C) Demodulation The chrominance demodulator (Figure 9.22) accepts modulated chroma data from either the Y/C separator or the chroma ADC. It generates CbCr, UV, or IQ color difference data. (M) NTSC, NTSC–J During active video, the chrominance data is demodulated using sin and cos subcarrier data, as shown in Figure 9.22, resulting in CbCr, UV, or IQ data. For this design, the 11bit reference subcarrier phase (see Figure 9.32) and the burst phase are the same (180°). For YUV or YCbCr processing, 180° must be added to the 11-bit reference subcarrier phase during active video time so the output of the sin and cos ROMs have the proper subcarrier phases (0° and 90°, respectively). For YIQ processing, 213° must be added to the 11-bit reference subcarrier phase during active video time so the output of the sin and cos ROMs have the proper subcarrier phases (33° and 123°, respectively). For all the equations, ω = 2πFSC FSC = 3.579545 MHz YUV Color Space Processing As shown in Chapter 8, the chrominance signal processed by the demodulator may be represented by: 426 Chapter 9: NTSC and PAL Digital Encoding and Decoding DIGITIZED CHROMINANCE (S-VIDEO) 10 –+ 512 CHROMINANCE FROM 10 Y / C SEPARATOR MUX 10 NTSC = (2)(1.406)(COS ωT) PAL = ± (2)(1.329)(COS ωT) FROM SUBCARRIER GENERATOR 10 NTSC = (2)(1.984)(SIN ωT) PAL = (2)(1.876)(SIN ωT) 512 0.6 MHZ + LOWPASS FILTER 10 CR 0.6 MHZ 10 + LOWPASS CB FILTER Figure 9.22. Chrominance Demodulation Example That Generates CbCr Directly. (U sin ωt) + (V cos ωt) U is obtained by multiplying the chrominance data by [2 sin ωt], and V is obtained by multiplying by [2 cos ωt]: ((U sin ωt) + (V cos ωt)) (2 sin ωt) = U – (U cos 2ωt) + (V sin 2ωt) ((U sin ωt) + (V cos ωt)) (2 cos ωt) = V + (V cos 2ωt) + (U sin 2ωt) The 2ωt components are removed by lowpass filtering, resulting in the U and V signals being recovered. The demodulator multipliers should ensure overflow and underflow conditions are saturated to the maximum and minimum values, respectively. The UV signals are then rounded to nine bits plus sign and lowpass filtered. For (M) NTSC, U has a nominal range of 0 to ±226, and V has a nominal range of 0 to ±319. For NTSC–J used in Japan, U has a nominal range of 0 to ±244, and V has a nominal range of 0 to ±344. NTSC and PAL Digital Decoding 427 YIQ Color Space Processing As shown in Chapter 8, for older decoders, the chrominance signal processed by the demodulator may be represented by: (Q sin (ωt + 33°)) + (I cos (ωt + 33°)) The subcarrier generator of the decoder provides a 33° phase offset during active video, canceling the 33° phase terms in the equation. Q is obtained by multiplying the chrominance data by [2 sin ωt], and I is obtained by multiplying by [2 cos ωt]: ((Q sin ωt) + (I cos ωt)) (2 sin ωt) = Q – (Q cos 2ωt) + (I sin 2ωt) ((Q sin ωt) + (I cos ωt)) (2 cos ωt) = I + (I cos 2ωt) + (Q sin 2ωt) The 2ωt components are removed by lowpass filtering, resulting in the I and Q signals being recovered. The demodulator multipliers should ensure overflow and underflow conditions are saturated to the maximum and minimum values, respectively. The IQ signals are then rounded to nine bits plus sign and lowpass filtered. For (M) NTSC, I has a nominal range of 0 to ±309, and Q has a nominal range of 0 to ±271. For NTSC–J used in Japan, I has a nominal range of 0 to ±334, and Q has a nominal range of 0 to ±293. YCbCr Color Space Processing If the decoder is based on the YCbCr color space, the chrominance signal may be represented by: (Cb – 512)(0.504)(sin ωt) + (Cr – 512)(0.711)(cos ωt) For NTSC–J systems, the equations are: (Cb – 512)(0.545)(sin ωt) + (Cr – 512)(0.769)(cos ωt) In these cases, the values in the sin and cos ROMs are scaled by the reciprocal of the indicated values to allow the demodulator to generate Cb and Cr data directly, instead of U and V data. (B, D, G, H, I, M, N, NC) PAL During active video, the digital chromi- nance (C) data is demodulated using sin and cos subcarrier data, as shown in Figure 9.22, resulting in CbCr or UV data. For this design, the 11-bit reference subcarrier phase (see Figure 9.32) and the burst phase are the same (135°). For all the equations, ω = 2πFSC FSC = 4.43361875 MHz for (B, D, G, H, I, N) PAL FSC = 3.58205625 MHz for (NC) PAL FSC = 3.57561149 MHz for (M) PAL Using a switched subcarrier waveform in the Cr or V channel also removes the PAL switch modulation. Thus, [+2 cos ωt] is used while the PAL switch is a logical zero (burst phase = +135°) and [–2 cos ωt] is used while the PAL switch is a logical one (burst phase = 225°). YUV Color Space As shown in Chapter 8, the chrominance signal is represented by: (U sin ωt) ± (V cos ωt) 428 Chapter 9: NTSC and PAL Digital Encoding and Decoding U is obtained by multiplying the chrominance data by [2 sin ωt] and V is obtained by multiplying by [±2 cos ωt]: ((U sin ωt) ± (V cos ωt)) (2 sin ωt) = U – (U cos 2ωt) ± (V sin 2ωt) ((U sin ωt) ± (V cos ωt)) (± 2 cos ωt) = V ± (U sin 2ωt) + (V cos 2ωt) The 2ωt components are removed by lowpass filtering, resulting in the U and V signals being recovered. The demodulation multipliers should ensure overflow and underflow conditions are saturated to the maximum and minimum values, respectively. The UV signals are then rounded to nine bits plus sign and lowpass filtered. For (B, D, G, H, I, NC) PAL, U has a nominal range of 0 to ±239, and V has a nominal range of 0 to ±337. For (M, N) PAL, U has a nominal range of 0 to ±226, and V has a nominal range of 0 to ±319. YCbCr Color Space If the decoder is based on the YCbCr color space, the chrominance signal for (B, D, G, H, I, NC) PAL may be represented by: (Cb – 512)(0.533)(sin ωt) ± (Cr – 512)(0.752)(cos ωt) The chrominance signal for (M, N) PAL may be represented by: (Cb – 512)(0.504)sin ωt ± (Cr – 512)(0.711)cos ωt In these cases, the values in the sin and cos ROMs are scaled by the reciprocal of the indicated values to allow the demodulator to generate Cb and Cr data directly, instead of U and V data. Hanover Bars If the locally generated subcarrier phase is incorrect, a line-to-line pattern known as Hanover bars results in which pairs of adjacent lines have a real and complementary hue error. As shown in Figure 9.23 with an ideal color of green, two adjacent lines of the display have a hue error (towards yellow), the next two have the complementary hue error (towards cyan), and so on. This can be shown by introducing a phase error (θ) in the locally generated subcarrier: ((U sin ωt) ± (V cos ωt)) (2 sin (ωt – θ)) = (U cos θ) –/+ (V sin θ) ((U sin ωt) ± (V cos ωt)) (±2 cos (ωt – θ)) = (V cos θ) +/– (U sin θ) In areas of constant color, averaging equal contributions from even and odd lines (either visually or using a delay line), cancels the alternating crosstalk component, leaving only a desaturation of the true component by [cos θ]. Lowpass Filtering The decoder requires sharper roll-off fil- ters than the encoder to ensure adequate suppression of the sampling alias components. Note that with a 13.5 MHz sampling frequency, they start to become significant above 3 MHz. The demodulation process for (M) NTSC is shown spectrally in Figures 9.24 and 9.25; the process is similar for PAL. In both figures, (a) represents the spectrum of the video signal and (b) represents the spectrum of the subcarrier used for demodulation. Convolution of (a) and (b), equivalent to multiplication in the time domain, produces the spectrum shown in (c), in which the baseband spectrum has been shifted to be centered about FSC and –FSC. The chrominance is now a baseband signal, which NTSC and PAL Digital Decoding 429 LINE 25 LINE 338 LINE 26 LINE 339 TOWARDS CYAN TOWARDS YELLOW Figure 9.23. Example Display of Hanover Bars. Green is the ideal color. may be separated from the low-frequency luminance, centered at FSC, by a lowpass filter. The lowpass filters after the demodulator are a compromise between several factors. Simply using a 1.3 MHz filter, such as the one shown in Figure 9.26, increases the amount of cross-color since a greater number of luminance frequencies are included. When using lowpass filters with a passband greater than about 0.6 MHz for NTSC (4.2 – 3.58) or 1.07 MHz for PAL (5.5 – 4.43), the loss of the upper sidebands of chrominance also introduces ringing and color difference crosstalk. If a 1.3 MHz lowpass filter is used, it may include some gain for frequencies between 0.6 MHz and 1.3 MHz to compensate for the loss of part of the upper sideband. Filters with a sharp cutoff accentuate chrominance edge ringing; for these reasons slow roll-off 0.6 MHz filters, such as the one shown in Figure 9.27, are usually used. These result in poorer color resolution but minimize cross-color, ringing, and color difference crosstalk on edges. If the decoder is to be used in a pro-video editing environment, the filters should have a maximum ripple of ±0.1 dB in the passband. This is needed to minimize the cumulation of gain and loss artifacts due to the filters, especially when multiple passes through the encoding and decoding processes are required. Luminance (Y) Processing To remove the sync and blanking information, Y data from either the Y/C separator or the luma ADC has the black level subtracted from it. At this point, negative Y values should be supported to allow test signals, keying information, and real-world video to pass through without corruption. A notch filter, with a center frequency of FSC, is usually optional. It may be used to remove any remaining chroma information from the Y data. The notch filter is especially useful to help clean up the Y data when comb filtering Y/C separation is used for PAL, due to the closeness of the PAL frequency packets. 430 Chapter 9: NTSC and PAL Digital Encoding and Decoding |A| (A) MHZ –10 –5 0 5 10 –FS + FSC (B) |A| –FSC FSC FS – FSC MHZ –10 –5 0 5 10 |A| (C) MHZ –10 –5 0 5 10 Figure 9.24. Frequency Spectra for NTSC Digital Chrominance Demodulation (FS = 13.5 MHz, FSC = 3.58 MHz). (A) Modulated chrominance. (B) Color subcarrier. (C) U and V spectrum produced by convolving (A) and (B). |A| (A) MHZ –10 –5 0 5 10 –FS + FSC |A| –FSC FSC FS – FSC (B) MHZ –10 –5 0 5 10 |A| (C) MHZ –10 –5 0 5 10 Figure 9.25. Frequency Spectra for NTSC Digital Chrominance Demodulation (FS = 12.27 MHz, FSC = 3.58 MHz). (A) Modulated chrominance. (B) Color subcarrier. (C) U and V spectrum produced by convolving (A) and (B). NTSC and PAL Digital Decoding 431 AMPLITUDE 1.0 AMPLITUDE 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0 1 2 3 4 FREQUENCY (MHZ) Figure 9.26. Typical 1.3 MHz Lowpass Digital Filter Characteristics. 0.0 0.0 0.5 1.0 1.5 2.0 FREQUENCY (MHZ) Figure 9.27. Typical 0.6 MHz Lowpass Digital Filter Characteristics. 432 Chapter 9: NTSC and PAL Digital Encoding and Decoding User Adjustments Contrast, Brightness, and Sharpness Programmable contrast, brightness, and sharpness adjustments may be implemented, as discussed in Chapter 7. In addition, color transient improvement may be used to improve the image quality. Hue A programmable hue adjustment may be implemented, as discussed in Chapter 7. Alternately, to reduce circuitry in the data path, the hue adjustment is usually implemented as a subcarrier phase offset that is added to the 11-bit reference subcarrier phase during the active video time (see Figure 9.32). The result is to shift the phase of the sin and cos subcarriers by a constant amount. An 11bit hue adjustment allows adjustments in hue from 0° to 360°, in increments of 0.176°. Due to the alternating sign of the V component in PAL decoders, the sign of the phase offset (θ) is set to be the opposite of the V component. A negative sign of the phase offset (θ) is equivalent to adding 180° to the desired phase shift. PAL decoders do not usually have a hue adjustment feature. Saturation A programmable saturation adjustment may be implemented, as discussed in Chapter 7. Alternately, to reduce circuitry in the data path, the saturation adjustment may be done on the sin and cos values in the demodulator. In either case, a burst level error signal and the user-programmable saturation value are multiplied together, and the result is used to adjust the gain or attenuation of the color difference signals. The intent here is to minimize the amount of circuitry in the color difference signal path. The burst level error signal is used in the event the burst (and thus the modulated chrominance information) is not at the correct amplitude and adjusts the saturation of the color difference signals appropriately. For more information on the burst level error signal, please see the Color Killer section. Automatic Skin Tone Correction Skin tone correction may be used in NTSC decoders since the eye is very sensitive to skin tones, and the actual colors may become slightly corrupted during the broadcast process. If the grass is not quite the proper color of green, it is not noticeable; however, a skin tone that has a green or orange tint is unacceptable. Since the skin tones are located close to the +I axis, a typical skin tone corrector looks for colors in a specific area (Figure 9.28), and any colors within that area are made a color that is closer to the skin tone. A simple skin tone corrector may halve the Q value for all colors that have a corresponding +I value. However, this implementation also changes nonskin tone colors. A more sophisticated implementation is if the color has a value between 25% and 75% of full-scale, and is within ±30° of the +I axis, then Q is halved. This moves any colors within the skin tone region closer to ideal skin tone. It should be noted that the phase angle for skin tone varies between companies. Phase angles from 116° to 126° are used; however, using 123° (the +I axis) simplifies the processing. YELLOW 167˚ NTSC and PAL Digital Decoding 433 +I 123˚ RED 103˚ + (R – Y) 90˚ 100 80 60 40 20 % AMPLITUDE MAGENTA 61˚ +Q 33˚ + (B – Y) 0˚ BLUE 347˚ GREEN 241˚ CYAN 283˚ Figure 9.28. Typical Skin Tone Color Range. Color Killer If a color burst of 12.5% or less of ideal amplitude is detected for 128 consecutive scan lines, the color difference signals should be forced to zero. Once a color burst of 25% or more of ideal amplitude is detected for 128 consecutive scan lines, the color difference signals may again be enabled. This hysteresis prevents wandering back and forth between enabling and disabling the color information in the event the burst amplitude is borderline. The burst level may be determined by forcing all burst samples positive and sampling the result multiple times to determine an average value. This should be averaged over three scan lines to limit line-to-line variations. The burst level error is the ideal amplitude divided by the average result. If no burst is detected, this should be used to force the color difference signals to zero and to disable any filtering in the luminance path, allowing maximum resolution luminance to be output. Providing the ability to force the color decoding on or off optionally is useful in some applications, such as video editing. 434 Chapter 9: NTSC and PAL Digital Encoding and Decoding Color Space Conversion YUV or YIQ data is usually converted to YCbCr or R´G´B´ data before being output from the decoder. If converting to R´G´B´ data, the R´G´B´ data must be clipped at the 0 and 1023 values to prevent wrap-around errors. (M) NTSC, (M, N) PAL YUV Color Space Processing Modern decoder designs are now based on the YUV color space. For these decoders, the YUV to YCbCr equations are: Y601 = 1.691Y + 64 Cb = [1.984U cos θB] + [1.984V sin θB] + 512 Cr = [1.406U cos θR] + [1.406V sin θR] + 512 To generate R´G´B´ data with a range of 0– 1023, the YUV to R´G´B´ equations are: R´ = 1.975Y + [2.251U cos θR] + [2.251V sin θR] G´ = 1.975Y – 0.779U – 1.146V B´ = 1.975Y + [4.013U cos θB] + [4.013V sin θB] To generate R´G´B´ data with a nominal range of 64–940 for pro-video applications, the YUV to R´G´B´ equations are: R´ = 1.691Y + 1.928V + 64 G´ = 1.691Y – 0.667U – 0.982V + 64 B´ = 1.691Y + 3.436U + 64 The ideal values for θR and θB are 90° and 0°, respectively. However, for consumer televisions sold in the United States, θR and θB usually have values of 110° and 0°, respectively, or 100° and –10°, respectively, to reduce the visibility of differential phase errors, at the cost of color accuracy. YIQ Color Space Processing For older NTSC decoder designs based on the YIQ color space, the YIQ to YCbCr equations are: Y601 = 1.692Y + 64 Cb = –1.081I + 1.664Q + 512 Cr = 1.181I + 0.765Q + 512 To generate R´G´B´ data with a range of 0– 1023, the YIQ to R´G´B´ equations are: R´ = 1.975Y + 1.887I + 1.224Q G´ = 1.975Y – 0.536I – 1.278Q B´ = 1.975Y – 2.189I + 3.367Q To generate R´G´B´ data with a nominal range of 64–940 for pro-video applications, the YIQ to R´G´B´ equations are: R´ = 1.691Y + 1.616I + 1.048Q + 64 G´ = 1.691Y – 0.459I – 1.094Q + 64 B´ = 1.691Y – 1.874I + 2.883Q + 64 YCbCr Color Space Processing If the design is based on the YUV color space, the UV to CbCr conversion may be avoided by scaling the sin and cos values during the demodulation process, or scaling the color difference lowpass filter coefficients. NTSC–J Since the version of (M) NTSC used in Japan has a 0 IRE blanking pedestal, the color space conversion equations are slightly different from those for standard (M) NTSC. NTSC and PAL Digital Decoding 435 YUV Color Space Processing Modern decoder designs are now based on the YUV color space. For these decoders, the YUV to YCbCr equations are: Y601 = 1.564Y + 64 Cb = 1.835U + 512 Cr = [1.301U cos θR] + [1.301V sin θR] + 512 To generate R´G´B´ data with a range of 0– 1023, the YUV to R´G´B´ equations are: R´ = 1.827Y + [2.082U cos θR] + [2.082V sin θR] G´ = 1.827Y – 0.721U – 1.060V B´ = 1.827Y + 3.712U To generate R´G´B´ data with a nominal range of 64–940 for pro-video applications, the YUV to R´G´B´ equations are: R´ = 1.564Y + 1.783V + 64 G´ = 1.564Y – 0.617U – 0.908V + 64 B´ = 1.564Y + 3.179U + 64 The ideal value for θR is 90°. However, for televisions sold in Japan, θR usually has a value of 95° to reduce the visibility of differential phase errors, at the cost of color accuracy. YIQ Color Space Processing For older NTSC decoder designs based on the YIQ color space, the YIQ to YCbCr equations are: Y601 = 1.565Y + 64 Cb = –1.000I + 1.539Q + 512 Cr = 1.090I + 0.708Q + 512 To generate R´G´B´ data with a range of 0– 1023, the YIQ to R´G´B´ equations are: R´ = 1.827Y + 1.746I + 1.132Q G´ = 1.827Y – 0.496I – 1.182Q B´ = 1.827Y – 2.024I + 3.115Q To generate R´G´B´ data with a nominal range of 64–940 for pro-video applications, the YIQ to R´G´B´ equations are: R´ = 1.564Y + 1.495I + 0.970Q + 64 G´ = 1.564Y – 0.425I – 1.012Q + 64 B´ = 1.564Y – 1.734I + 2.667Q + 64 YCbCr Color Space Processing If the design is based on the YUV color space, the UV to CbCr conversion may be avoided by scaling the sin and cos values during the demodulation process, or scaling the color difference lowpass filter coefficients. (B, D, G, H, I, NC) PAL YUV Color Space Processing The YUV to YCbCr equations are: Y601 = 1.599Y + 64 Cb = 1.875U + 512 Cr = 1.329V + 512 To generate R´G´B´ data with a range of 0– 1023, the YUV to R´G´B´ equations are: R´ = 1.867Y + 2.128V G´ = 1.867Y – 0.737U – 1.084V B´ = 1.867Y + 3.793U 436 Chapter 9: NTSC and PAL Digital Encoding and Decoding To generate R´G´B´ data with a nominal range of 64–940 for pro-video applications, the YUV to R´G´B´ equations are: R´ = 1.599Y + 1.822V + 64 G´ = 1.599Y – 0.631U – 0.928V + 64 B´ = 1.599Y + 3.248U + 64 YCbCr Color Space Processing The UV to CbCr conversion may be avoided by scaling the sin and cos values during the demodulation process, or scaling the color difference lowpass filter coefficients. Genlocking The purpose of the genlock circuitry is to recover a sample clock and the timing control signals (such as horizontal sync, vertical sync, and the color subcarrier) from the video signal. Since the original sample clock is not available, it is usually generated by multiplying the horizontal line frequency, FH, by the desired number of samples per line, using a phase-lock loop (PLL). Also, the color subcarrier must be regenerated and locked to the color subcarrier of the video signal being decoded. There are, however, several problems. Video signals may contain noise, making the determination of sync edges unreliable. The amount of time between horizontal sync edges may vary slightly each line, particularly in analog videotape recorders (VCRs) due to mechanical limitations. For analog VCRs, instantaneous line-to-line variations are up to ±100 ns; line variations between the beginning and end of a field are up to ±5 μs. When analog VCRs are in a special feature mode, such as fast-forwarding or still-picture, the amount of time between horizontal sync signals may vary up to ±20% from nominal. Vertical sync, as well as horizontal sync, information must be recovered. Unfortunately, analog VCRs, in addition to destroying the SCH phase relationship, perform head switching at field boundaries, usually somewhere between the end of active video and the start of vertical sync. When head switching occurs, one video signal (field n) is replaced by another video signal (field n + 1) which has an unknown time offset from the first video signal. There may be up to a ±1/2 line variation in vertical timing each field. As a result, longerthan-normal horizontal or vertical syncs may be generated. By monitoring the horizontal line timing, it is possible to determine automatically whether the video source is in the normal or special feature mode. During normal operation, the horizontal line time typically varies by no more than ±5 μs over an entire field. Line timing outside this ±5 μs window may be used to enable special feature mode timing. Hysteresis should be used in the detection algorithms to prevent wandering back and forth between the normal and special feature operations in the event the video timing is borderline between the two modes. A typical circuit for performing the horizontal and vertical sync detection is shown in Figure 9.29. In the absence of a video signal, the decoder should be designed to optionally freerun, continually generating the video timing to the system, without missing a beat. During the loss of an input signal, any automatic gain circuits should be disabled and the decoder should provide the option either to be transparent (so the input source can be monitored), to auto-freeze the output data (to compensate for short duration dropouts), or to autoblack the HSYNC PHASE ERROR NTSC and PAL Digital Decoding 437 R + NTSC = 128 PAL = 176 FINE SYNC DETECT WEIGHTING FACTORS ROM – + 0.5 MHZ LPF DIGITIZED VIDEO COUNT DECODE 11 HORIZONTAL COUNTER SAMPLE CLOCK HSYNC NOISE GATE COARSE HSYNC DETECT VSYNC AND FIELD DETECT COUNT DECODE 10 VERTICAL COUNTER EXPECTED HSYNC Figure 9.29. Sync Detection and Phase Comparator Circuitry. CLAMP HSYNC# BURST GATE HBLANK (H) VSYNC# FIELD (F) VBLANK (V) 438 Chapter 9: NTSC and PAL Digital Encoding and Decoding output data (to avoid potential problems driving a mixer or VCR). Horizontal Sync Detection Early decoders typically used analog sync slicing techniques to determine the midpoint of the leading edge of the sync pulse and used a PLL to multiply the horizontal frequency rate up to the sample clock rate. However, the lack of accuracy of the analog sync slicer, combined with the limited stability of the PLL, resulted in sample clock jitter and noise amplification. When using comb filters for Y/C separation, the long delay between writing and reading the video data means that even a small sample clock frequency error results in a delay that is a significant percentage of the subcarrier period, negating the effectiveness of the comb filter. Coarse Horizontal Sync Locking The coarse sync locking enables a faster lock-up time to be achieved. Digitized video is lowpass filtered to about 0.5 MHz to remove high-frequency information, such as noise and color subcarrier information. Performing the sync detection on lowpass filtered data also provides edge shaping in the event that fast sync edges (rise and fall times less than one clock cycle) are present. An 11-bit horizontal counter is incremented each sample clock cycle, resetting to 0x001 after counting up to the HCOUNT value, where HCOUNT specifies the total number of samples per line. A value of 0x001 indicates that the beginning of a horizontal sync is expected. When the horizontal counter value is (HCOUNT – 64), a sync gate is enabled, allowing recovered sync information to be detected. Up to five consecutive missing sync pulses should be detected before any correction to the clock frequency or other adjustments is done. Once sync information has been detected, the sync gate is disabled until the next time the horizontal counter value is (HCOUNT – 64). This helps filter out noise, serration, and equalization pulses. If the leading edge of recovered horizontal sync is not within ±64 clock cycles (approximately ±5 μs) of where it is expected to be, the horizontal counter is reset to 0x001 to realign the edges more closely. Additional circuitry may be included to monitor the width of the recovered horizontal sync pulse. If the horizontal sync pulse is not approximately the correct pulse width, ignore it and treat it as a missing sync pulse. If the leading edge of recovered horizontal sync is within ±64 sample clock cycles (approximately ±5 μs) of where it is expected to be, the fine horizontal sync locking circuitry is used to fine-tune the timing. Fine Horizontal Sync Locking One-half the sync amplitude is subtracted from the 0.5 MHz lowpass-filtered video data so the sync timing reference point (50% sync amplitude) is at zero. The leading horizontal sync edge may be determined by summing a series of weighted samples from the region of the sync edge. To perform the filtering, the weighting factors are read from a ROM by a counter triggered by the horizontal counter. When the central weighting factor (A0) is coincident with the 50% amplitude point of the leading edge of sync, the result integrates to zero. Typical weighting factors are: NTSC and PAL Digital Decoding 439 A0 = 102/4096 A1 = 90/4096 A2 = 63/4096 A3 = 34/4096 A4 = 14/4096 A5 = 5/4096 A6 = 2/4096 This arrangement uses more of the timing information from the sync edge and suppresses noise. Note that circuitry should be included to avoid processing the trailing edge of horizontal sync. Figure 9.30 shows the operation of the fine sync phase comparator. Figure 9.30a shows the leading sync edge for NTSC. Figure 9.30b shows the weighting factors being generated, and when multiplied by the sync information, produces the waveform shown in Figure 9.30c. When the A0 coefficient is coincident with the 50% amplitude point of sync, the waveform integrates to zero. Distortion of sync edges, resulting in the locking point being slightly shifted, is minimized by the lowpass filtering, effectively shaping the sync edges prior to processing. Sample Clock Generation The horizontal sync phase error signal from Figure 9.29 is used to adjust the frequency of a line-locked PLL, as shown in Figure 9.31. A line-locked PLL always generates a constant number of clock cycles per line, regardless of any line time variations. The freerunning frequency of the PLL should be the nominal sample clock frequency required (for example, 13.5 MHz). Using a VCO-based PLL has the advantage of a wider range of sample clock frequency adjustments, useful for handling video timing variations outside the normal video specifications. A disadvantage is that, due to jitter in the sample clock, there may be visible hue artifacts and poor Y/C separation. A VCXO-based PLL has the advantage of minimal sample clock jitter. However, the sample clock frequency range may be adjusted only a small amount, limiting the ability of the decoder to handle nonstandard video timing. Ideally, with either design, the rising edge of the sample clock is aligned with the halfamplitude point of the leading edge of horizontal sync, and a fixed number of sample clock cycles per line (HCOUNT) are always generated. An alternate method is to asynchronously sample the video signal with a fixed-frequency clock (for example, 13.5 MHz). Since in this case the sample clock is not aligned with horizontal sync, there is a phase difference between the actual sample position and the ideal sample position. As with the conventional genlock solution, this phase difference is determined by the difference between the recovered and expected horizontal syncs. The ideal sample position is defined to be aligned with a sample clock generated by a line-locked PLL. Rather than controlling the sample clock frequency, the horizontal sync phase error signal is used to control interpolation between two samples of data to generate the ideal sample value. If using comb filtering for Y/C separation, the digitized composite video may be interpolated to generate the ideal sample points, providing better Y/C separation by aligning the samples more precisely. Vertical Sync Detection Digitized video is lowpass filtered to about 0.5 MHz to remove high-frequency information, such as noise and color subcarrier information. The 10-bit vertical counter is incremented by each expected horizontal sync, 440 Chapter 9: NTSC and PAL Digital Encoding and Decoding DIGITAL VALUE 240 128 16 BLANK AMPLITUDE SYNC TIP LINE TIMING REFERENCE (A) AMPLITUDE (B) AMPLITUDE TIME TIME (C) Figure 9.30. Fine Lock Phase Comparator Waveforms. (A) The NTSC sync leading edge. (B) The series of weighting factors. (C) The weighted leading edge samples. NTSC and PAL Digital Decoding 441 HSYNC PHASE ERROR DAC LOOP FILTER VCO OR VCXO SAMPLE CLOCK Figure 9.31. Typical Line-Locked Sample Clock Generation. resetting to 0x001 after counting up to 525 or 625. A value of 0x001 indicates that the beginning of a vertical sync for Field 1 is expected. The end of vertical sync intervals is detected and used to set the value of the vertical counter according to the mode of operation. By monitoring the relationship of recovered vertical and horizontal syncs, Field 1 vs. Field 2 information is detected. If a recovered horizontal sync occurs more than 64, but less than (HCOUNT/2), clock cycles after expected horizontal sync, the vertical counter is not adjusted to avoid double incrementing the vertical counter. If a recovered horizontal sync occurs (HCOUNT/2) or more clock cycles after the vertical counter has been incremented, the vertical counter is again incremented. During special feature operation, there is no longer any correlation between the vertical and horizontal timing information, so Field 1 vs. Field 2 detection cannot be done. Thus, every other detection of the end of vertical sync should set the vertical counter accordingly in order to synthesize Field 1 and Field 2 timing. Subcarrier Generation As with the encoder, the color subcarrier is generated from the sample clock using a DTO (Figure 9.32), and the same frequency relationships apply as those discussed in the encoder section. Unlike the encoder, the phase of the generated subcarrier must be continuously adjusted to match that of the video signal being decoded. The subcarrier locking circuitry phase compares the generated subcarrier and the incoming subcarrier, resulting in an FSC error signal indicating the amount of phase error. This FSC error signal is added to the [p] value to continually adjust the step size of the DTO, adjusting the phase of the generated subcarrier to match that of the video signal being decoded. As a 22-bit single-stage DTO is used to divide down the sample clock to generate the subcarrier in Figure 9.32, the [p] value is determined as follows: FSC/FS = (P/4194303) = (P/(222 – 1)) where FSC = the desired subcarrier frequency and FS = the sample clock rate. Some values of [p] for popular sample clock rates are shown in Table 9.10. Subcarrier Locking The purpose of the subcarrier locking cir- cuitry (Figure 9.33) is to phase lock the generated color subcarrier to the color subcarrier of the video signal being decoded. Digital composite video (or digital chrominance video) has the blanking level subtracted from it. It is also gated with a burst gate to ensure that the data has a value of zero outside the burst time. The burst gate signal should be 442 Chapter 9: NTSC and PAL Digital Encoding and Decoding FSC ERROR ADDER P ADDER MODULO 4194303 REGISTER 11 (MSB) 11-BIT REFERENCE SUBCARRIER PHASE NTSC = 180˚ PAL = 135˚ (1 LSB = 0.1757813˚) 11 DURING ACTIVE VIDEO: ADDER HUE ADJUST = 0 OTHERWISE DURING ACTIVE VIDEO: 11 NTSC = 1024 (180˚) ADDER PAL = 768 (135˚) IF PAL SWITCH = 1 PAL = 1280 (225˚) IF PAL SWITCH = 0 = 0 OTHERWISE 11 1 PAL SWITCH 1 (A9) 1 (A10) 9 (A0–A8) 11 NTSC = 0 ADDER PAL = 0 IF PAL SWITCH = 0 PAL = 512 (90˚) IF PAL SWITCH = 1 1024 X 9 COS ROM 1 SIGN (1 = NEGATIVE) 9 NTSC = (2)(1.406)(COS ωT) PAL = ± (2)(1.329)(COS ωT) 0 9 MUX 1 1 PAL OPERATION 1 SIGN (1 = NEGATIVE) 9 1024 X 9 SIN ROM NTSC = (2)(1.984)(SIN ωT) PAL = (2)(1.876)(SIN ωT) Figure 9.32. Chrominance Subcarrier Generator. NTSC and PAL Digital Decoding 443 Typical Application 13.5 MHz (M) NTSC 13.5 MHz (B, D, G, H, I) PAL 12.27 MHz (M) NTSC 14.75 MHz (B, D, G, H, I) PAL Total Samples per Scan Line (HCOUNT) 858 864 780 944 P 1,112,126 1,377,477 1,223,338 1,260,742 Table 9.10. Typical HCOUNT and P Values for the 1-Stage 22-Bit DTO in Figure 9.32. COMPOSITE OR CHROMINANCE VIDEO BLANK LEVEL – + MONOCHROME SIGNAL STATUS D Q BURST LEVEL DETECT BURST GATE LINE CLOCK PAL OPERATION + R BURST ACCUMULATOR COS ωT DATA ENABLE LINE COUNT 16 SIGN PAL SWITCH LOOP FILTER FSC ERROR Figure 9.33. Subcarrier Phase Comparator Circuitry. 444 Chapter 9: NTSC and PAL Digital Encoding and Decoding timed to eliminate the edges of the burst, which may have transient distortions that will reduce the accuracy of the phase measurement. The color burst data is phase compared to the locally generated burst. Note that the sign information must also be compared so lock will not occur on 180° out-of-phase signals. The burst accumulator averages the sixteen samples, and the accumulated values from two adjacent lines are averaged to produce the error signal. When the local subcarrier is correctly phased, the accumulated values from alternate lines cancel, and the phase error signal is zero. The error signal is sampled at the line rate and processed by the loop filter, which should be designed to achieve a lock-up time of about 10 lines (50 or more lines may be required for noisy video signals). It is desirable to avoid updating the error signal during vertical intervals due to the lack of burst. The resulting FSC error signal is used to adjust the DTO that generates the local subcarrier (Figure 9.32). During PAL operation, the phase detector also recovers the PAL switch information used in generating the switched V subcarrier. The PAL switch D flip-flop is synchronized to the incoming signal by comparing the local switch sense with the sign of the accumulated burst values. If the sense is consistently incorrect for sixteen lines, then the flip-flop is reset. Note the subcarrier locking circuit should be able to handle short-term frequency variations (over a few frames) of ±200 Hz, long-term frequency variations of ±500 Hz, and color burst amplitudes of 25–200% of normal with short-term amplitude variations (over a few frames) of up to 5%. The lock-up time of ten lines is desirable to accommodate video signals that may have been incorrectly edited (i.e., not careful about the SCH phase relation- ship) or nonstandard video signals due to freeze-framing, special effects, and so on. The ten lines enable the subcarrier to be locked before the active video time, ensuring correct color representation at the beginning of the picture. Video Timing Generation HSYNC# (Horizontal Sync) Generation An 11-bit horizontal counter is incre- mented on each rising edge of the sample clock. The count is monitored to determine when to generate the burst gate, HSYNC# output, horizontal blanking, etc. Typically, each time the counter is reset to 0x001, the HSYNC# output is asserted. The exact timing of HSYNC# is dependent on the video interface used, as discussed in Chapter 6. H (Horizontal Blanking) Generation A horizontal blanking signal, H, may be implemented to specify when the horizontal blanking interval occurs. The timing of H is dependent on the video interface used, as discussed in Chapter 6. The horizontal blank timing may be user programmable by incorporating start and stop blank registers. The values of these registers are compared to the horizontal counter value, and used to assert and negate the H control signal. VSYNC# (Vertical Sync) Generation A 10-bit vertical counter is incremented on each rising edge of HSYNC#. Typically, each time the counter is reset to 0x001, the VSYNC# output is asserted. The exact timing of VSYNC# is dependent on the video interface used, as discussed in Chapter 6. NTSC and PAL Digital Decoding 445 F (FIELD) Generation A field signal, F, may be implemented to specify whether Field 1 or Field 2 is being decoded. The exact timing of F is dependent on the video interface used, as discussed in Chapter 6. In instances where the output of an analog VCR is being decoded, and the VCR is in a special effects mode (such as still or fast-forward), there is no longer enough timing information to determine Field 1 vs. Field 2 timing. Thus, the Field 1 and Field 2 timing as specified by the VSYNC#/HSYNC# relationship (or the F signal) should be synthesized and may not reflect the true field timing of the video signal being decoded. V (Vertical Blanking) Generation A vertical blanking signal, V, may be imple- mented to specify when the vertical blanking interval occurs. The exact timing of V is dependent on the video interface used, as discussed in Chapter 6. The vertical blank timing may be user programmable by incorporating start and stop blank registers. The values of these registers are compared to the vertical counter value, and used to assert and negate the V control signal. BLANK# Generation The composite blanking signal, BLANK#, is the logical NOR of the H and V signals. While BLANK# is asserted, RGB data may be forced to be a value of 0. YCbCr data may be forced to an 8-bit value of 16 for Y and 128 for Cb and Cr. Alternately, the RGB or YCbCr data outputs may not be blanked, allowing vertical blanking interval (VBI) data, such as closed captioning, teletext, widescreen signaling, and other information to be output. Field Identification Although the timing relationship between the horizontal sync (HSYNC#) and vertical sync (VSYNC#) signals, or the F signal, may be used to specify whether a Field 1 vs. Field 2 is being decoded, one or two additional signals may be used to specify which one of four or eight fields is being decoded, as shown in Table 9.7. We refer to these additional control signals as FIELD_0 and FIELD_1. FIELD_0 should change state at the beginning of VSYNC#, or coincident with F, during fields 1, 3, 5, and 7. FIELD_1 should change state at the beginning of VSYNC#, or coincident with F, during fields 1 and 5. NTSC Field Identification The beginning of fields 1 and 3 may be determined by monitoring the relationship of the subcarrier phase relative to sync. As shown in Figure 8.5, at the beginning of field 1, the subcarrier phase is ideally 0° relative to sync; at the beginning of field 3, the subcarrier phase is ideally 180° relative to sync. In the real world, there is a tolerance in the SCH phase relationship. For example, although the ideal SCH phase relationship may be perfect at the source, transmitting the video signal over a coaxial cable may result in a shift of the SCH phase relationship due to cable characteristics. Thus, the ideal phase plus or minus a tolerance should be used. Although ±40° (NTSC) or ±20° (PAL) is specified as an acceptable tolerance by the video standards, many decoder designs use a tolerance of up to ±80°. In the event that a SCH phase relationship not within the proper tolerance is detected, the decoder should proceed as if nothing were wrong. If the condition persists for several frames, indicating that the video source may 446 Chapter 9: NTSC and PAL Digital Encoding and Decoding no longer be a stable video source, operation should change to that for an unstable video source. For unstable video sources that do not maintain the proper SCH relationship (such as analog VCRs), synthesized FIELD_0 and FIELD_1 outputs should be generated (for example, by dividing the F output signal by two and four) in the event the signal is required for memory addressing or downstream processing. PAL Field Identification The beginning of fields 1 and 5 may be determined by monitoring the relationship of the –U component of the extrapolated burst relative to sync. As shown in Figure 8.16, at the beginning of field 1, the phase is ideally 0° relative to sync; at the beginning of field 5, the phase is ideally 180° relative to sync. Either the burst blanking sequence or the subcarrier phase may be used to differentiate between fields 1 and 3, fields 2 and 4, fields 5 and 7, and fields 6 and 8. All of the considerations discussed for NTSC in the previous section also apply for PAL. maintain subcarrier locking, the video signal may be (M) PAL. In that case, try (M) PAL operation and verify the burst timing. If the decoder detects more than 575 lines per frame for at least 16 consecutive frames, it can assume the video signal is (B, D, G, H, I, N, NC) PAL or a version of SECAM. First, assume the video signal is (B, D, G, H, I, N) PAL. If the vertical and horizontal timing remain locked, but the decoder is unable to maintain a subcarrier lock, it may mean the video signal is (NC) PAL or SECAM. In that case, try SECAM operation (as that is much more popular), and if that doesn’t subcarrier lock, try (NC) PAL operation. If the decoder detects a video signal format to which it cannot lock, this should be indicated so the user can be notified. Note that auto-detection cannot be performed during special feature modes of analog VCRs, such as fast-forwarding. If the decoder detects a special feature mode of operation, it should disable the auto-detection circuitry. Auto-detection should only be done when a video signal has been detected after the loss of an input video signal. Auto-Detection of Video Signal Type If the decoder can automatically detect the type of video signal being decoded, and configure itself automatically, the user will not have to guess at the type of video signal being processed. This information can be passed via status information to the rest of the system. If the decoder detects less than 575 lines per frame for at least 16 consecutive frames, the decoder can assume the video signal is (M) NTSC or (M) PAL. First, assume the video signal is (M) NTSC as that is much more popular. If the vertical and horizontal timing remains locked, but the decoder is unable to Y/C Separation Techniques The encoder typically combines the luminance and chrominance signals by simply adding them together; the result is that chrominance and high-frequency luminance signals occupy the same portion of the frequency spectrum. As a result, separating them in the decoder is difficult. When the signals are decoded, some luminance information is decoded as color information (referred to as cross-color), and some chrominance information remains in the luminance signal (referred to as cross-luminance). Due to the stable performance of digital decoders, much more com- NTSC and PAL Digital Decoding 447 plex separation techniques can be used than is possible with analog decoders. The presence of crosstalk is bad news in editing situations; crosstalk components from the first decoding are encoded, possibly causing new or additional artifacts when decoded the next time. In addition, when a still frame is captured from a decoded signal, the frozen residual subcarrier on edges may beat with the subcarrier of any following encoding process, resulting in edge flicker in colored areas. Although the crosstalk problem cannot be solved entirely at the decoder, more elaborate Y/C separation minimizes the problem. If the decoder is used in an editing environment, the suppression of cross-luminance and cross-chrominance is more important than the appearance of the decoded picture. When a picture is decoded, processed, encoded, and again decoded, cross-effects can introduce substantial artifacts. It may be better to limit the luminance bandwidth (to reduce crossluminance), producing softer pictures. Also, limiting the chrominance bandwidth to less than 1 MHz reduces cross-color, at the expense of losing chrominance definition. Complementary Y/C separation preserves all of the input signal. If the separated chrominance and luminance signals are added together again, the original composite video signal is generated. Noncomplementary Y/C separation introduces some irretrievable loss, resulting in gaps in the frequency spectrum if the separated chrominance and luminance signals are again added together to generate a composite video signal. The loss is due to the use of narrower filters to reduce cross-color and cross-luminance. Therefore, noncomplementary filtering is usually unsuitable when multiple encoding and decoding operations must be performed, as the frequency spectrum gaps continually increase as the number of decoding operations increase. It does, however, enable the tweaking of luminance and chrominance response for optimum viewing. Simple Y/C Separation With all of these implementations, there is no loss of vertical chrominance resolution, but there is also no suppression of cross-color. For PAL, line-to-line errors due to differential phase distortion are not suppressed, resulting in the vertical pattern known as Hanover bars. Noticeable artifacts of simple Y/C separators are color artifacts on vertical edges. These include color ringing, color smearing, and the display of color rainbows in place of high-frequency gray-scale information. Lowpass and Highpass Filtering The most basic Y/C separator assumes frequencies below a certain point are luminance and above this point are chrominance. An example of this simple Y/C separator is shown in Figure 9.34. Frequencies below 3.0 MHz (NTSC) or 3.8 MHz (PAL) are assumed to be luminance. Frequencies above these are assumed to be chrominance. Not only is high-frequency luminance information lost, but it is assumed to be chrominance information, resulting in crosscolor. Notch Filtering Although broadcast NTSC and PAL sys- tems are strictly bandwidth-limited, this may not be true of other video sources. Luminance information may be present all the way out to 6 or 7 MHz or even higher. For this reason, the designs in Figure 6.35 are usually more appropriate, as they allow high-frequency luminance to pass, resulting in a sharper picture. 448 Chapter 9: NTSC and PAL Digital Encoding and Decoding LOWPASS FILTER COMPOSITE VIDEO NTSC = 3.0 MHZ Y PAL = 3.8 MHZ HIGHPASS FILTER NTSC = 3.0 MHZ C PAL = 3.8 MHZ Figure 9.34. Typical Simple Y/C Separator. COMPOSITE VIDEO + C – NOTCH FILTER NTSC = 3.58 ± 1.3 MHZ Y PAL = 4.43 ± 1.3 MHZ (A) COMPOSITE VIDEO + C – NOTCH FILTER NTSC = 3.58 ± 1.3 MHZ PAL = 4.43 ± 1.3 MHZ NOTCH FILTER NTSC = 3.58 ± 1.3 MHZ Y PAL = 4.43 ± 1.3 MHZ (B) Figure 9.35. Typical Simple Y/C Separator. (A) Complementary filtering. (B) Noncomplementary filtering. NTSC and PAL Digital Decoding 449 Many designs based on the notch filter also incorporate comb filters in the Y and color difference data paths to reduce cross-color and cross-luma artifacts. However, the notch filter still limits the overall Y/C separation quality. PAL Considerations As mentioned before, PAL uses normal and inverted scan lines, referring to whether the V component is normal or inverted, to help correct color shifting effects due to differential phase distortions. For example, differential phase distortion may cause the green vector angle on normal scan lines to lag by 45° from the ideal 241° shown in Figure 8.11. This results in a vector at 196°, effectively shifting the resulting color towards yellow. On inverted scan lines, the vector angle also will lag by 45° from the ideal 120° shown in Figure 8.12. This results in a vector at 75°, effectively shifting the resulting color towards cyan. PAL Delay Line Figure 9.36, made by flipping Figure 8.12 180° about the U axis and overlaying the result onto Figure 8.11, illustrates the cancellation of the phase errors. The average phase of the two errors, 196° on normal scan lines and 286° on inverted scan lines, is 241°, which is the correct phase for green. For this reason, simple PAL decoders usually use a delay line (or line store) to facilitate averaging between two scan lines. Using delay lines in PAL Y/C separators has unique problems. The subcarrier reference changes by –90° (or 270°) over one line period, and the V subcarrier is inverted on alternate lines. Thus, there is a 270° phase difference between the input and output of a line delay. If we want to do a simple addition or subtraction between the input and output of the delay line to recover chrominance information, the phase difference must be 0° or 180°. And there is still that switching V floating around. Thus, we would like to find a way to align the subcarrier phases between lines and compensate for the switching V. Simple circuits, such as the noncomplementary Y/C separator shown in Figure 9.37, use a delay line that is not a whole line (283.75 subcarrier periods), but rather 284 subcarrier periods. This small difference acts as a 90° phase shift at the subcarrier frequency. Since there are an integral number of subcarrier periods in the delay, the U subcarriers at the input and output of the 284 TSC delay line are in phase, and they can simply be added together to recover the U subcarrier. The V subcarriers are 180° out of phase at the input and output of the 284 TSC delay line, due to the switching V, so the adder cancels them out. Any remaining high-frequency vertical V components are rejected by the U demodulator. Due to the switching V, subtracting the input and output of the 284 TSC delay line recovers the V subcarrier while canceling the U subcarrier. Any remaining high-frequency vertical U components are rejected by the V demodulator. Since the phase shift through the 284 TSC delay line is a function of frequency, the subcarrier sidebands are not phase shifted exactly 90°, resulting in hue errors on vertical chrominance transitions. Also, the chrominance and luminance are not vertically aligned since the chrominance is shifted down by one-half line. PAL Modifier Although the performance of the circuit in Figure 9.37 usually is adequate, the 284 TSC delay line may be replaced by a line delay followed by a PAL modifier, as shown in Figure 9.38. The PAL modifier provides a 90° phase shift and inversion of the V subcarrier. Chrominance from the PAL modifier is now in phase 450 Chapter 9: NTSC and PAL Digital Encoding and Decoding +V RED 90˚ 103˚ MAGENTA 61˚ COMPOSITE VIDEO YELLOW 167˚ 196˚ = 241˚– 45˚ 286˚ = 241˚+ 45˚ +U 0˚ BLUE 347˚ GREEN 241˚ CYAN 283˚ Figure 9.36. Phase Error Correction for PAL. + Y – BANDPASS FILTER 3.1–5.7 MHZ + – 284 TSC 2 SIN ωT + 0.5X 0.6 MHZ LPF U ± 2 COS ωT 0.5X 0.6 MHZ LPF V Figure 9.37. Single Delay Line PAL Y/C Separator. NTSC and PAL Digital Decoding 451 COMPOSITE VIDEO LINE DELAY BANDPASS FILTER 3.1–5.7 MHZ PAL MODIFIER 2 SIN 2ωT + Y – C BANDPASS FILTER 3.1–5.7 MHZ + 0.5X U SIN ωT ± V COS ωT U SIN ωT –/+ V COS ωT – U SIN 3ωT–/+ V COS 3 ωT Figure 9.38. Single-Line Delay PAL Y/C Separator Using a PAL Modifier. with the line delay input, allowing the two to be combined using a single adder and share a common path to the demodulators. The averaging sacrifices some vertical resolution; however, Hanover bars are suppressed. Since the chrominance at the demodulator input is in phase with the composite video, it can be used to cancel the chrominance in the composite signal to leave luminance. However, the chrominance and luminance are still not vertically aligned since the chrominance is shifted down by one-half line. The PAL modifier produces a luminance alias centered at twice the subcarrier frequency. Without the bandpass filter before the PAL modifier and the averaging between lines, mixing the original and aliased luminance components would result in a 12.5 Hz beat frequency, noticeable in high-contrast areas of the picture. 2D Comb Filtering In the previous Y/C separators, high-fre- quency luminance information is treated as chrominance information; no attempt is made to differentiate between the two. As a result, the luminance information is interpreted as chrominance information (cross-color) and passed on to the chroma demodulator to recover color information. The demodulator cannot differentiate between chrominance and high-frequency luminance, so it generates color where color should not exist. Thus, occasional display artifacts are generated. 2D (or intra-field) comb filtering attempts to improve the separation of chrominance and luminance at the expense of reduced vertical resolution. Comb filters get their name by having luminance and chrominance frequency responses that look like a comb. Ideally, these frequency responses would match the comblike frequency responses of the interleaved luminance and chrominance signals shown in Figures 8.4 and 8.15. 452 Chapter 9: NTSC and PAL Digital Encoding and Decoding Modern 3-line comb filters typically use 2line delays for storing the last 2 lines of video information (there is a 1-line delay in decoding using this method). Using more than 2-line delays usually results in excessive vertical filtering, reducing vertical resolution. Two-Line Delay Comb Filters The BBC has done research (Reference 4) on various PAL comb filtering implementations (Figures 9.39 through 9.42). Each was evaluated for artifacts and frequency response. The vertical frequency response for each comb filter is shown in Figure 9.43. In the comb filter design of Figure 9.39, the chrominance phase is inverted over two lines of delay. A subtracter cancels most of the luminance, leaving double-amplitude, vertically filtered chrominance. A PAL modifier provides a 90° phase shift and removal of the PAL switch inversion to phase align the chrominance with the 1-line-delayed composite video signal. Subtracting the chrominance from the composite signal leaves luminance. This design has the advantage of vertical alignment of the chrominance and luminance. However, there is a loss of vertical resolution and no suppression of Hanover bars. In addition, it is possible under some circumstances to generate double-amplitude luminance due to the aliased luminance components produced by the PAL modifier. The comb filter design of Figure 9.40 is similar to the one in Figure 9.39. However, the chrominance after the PAL modifier and one line-delayed composite video signal are added to generate double-amplitude chrominance (since the subcarriers are in phase). Again, subtracting the chrominance from the composite signal leaves luminance. In this design, luminance over-ranging is avoided since both the true and aliased luminance signals are halved. There is less loss of vertical resolution and Hanover bars are suppressed, at the expense of increased cross-color. The comb filter design in Figure 9.41 has the advantage of not using a PAL modifier. Since the chrominance phase is inverted over 2 lines of delay, adding them together cancels most of the chrominance, leaving doubleamplitude luminance. This is subtracted from the 1-line-delayed composite video signal to generate chrominance. Chrominance is then subtracted from the 1-line-delayed composite video signal to generate luminance (this is to maintain vertical luminance resolution). UV crosstalk is present as a 12.5 Hz flicker on horizontal chrominance edges, due to the chrominance signals not canceling in the adder since the line-to-line subcarrier phases are not aligned. Since there is no PAL modifier, there is no luminance aliasing or luminance overranging. The comb filter design in Figure 9.42 is a combination of Figures 9.39 and 9.41. The chrominance phase is inverted over two lines of delay. An adder cancels most of the chrominance, leaving double-amplitude luminance. This is subtracted from the 1-line-delayed composite video signal to generate chrominance signal (A). In a parallel path, a subtracter cancels most of the luminance, leaving doubleamplitude, vertically filtered chrominance. A PAL modifier provides a 90° phase shift and removal of the PAL switch inversion to phase align to the (A) chrominance signal. These are added together, generating double-amplitude chrominance. Chrominance then is subtracted from the 1-line-delayed composite signal to generate luminance. The chrominance and luminance vertical frequency responses are the average of those for Figures 9.39 and 9.41. UV crosstalk is similar to that for Figure 9.41, but has half the amplitude. The luminance alias is also half that of Figure 9.39, and Hanover bars are suppressed. COMPOSITE VIDEO NTSC and PAL Digital Decoding 453 LINE DELAY LINE DELAY –+ 0.5X 3.1–5.7 MHZ BANDPASS FILTER 2 SIN 2ωT + Y – C 3.1–5.7 MHZ BANDPASS FILTER Figure 9.39. Two-Line Roe PAL Y/C Separator. COMPOSITE VIDEO LINE DELAY LINE DELAY –+ 0.5X + Y – C 3.1–5.7 MHZ BANDPASS FILTER 2 SIN 2ωT + 3.1–5.7 MHZ BANDPASS FILTER 0.5X Figure 9.40. Two-Line –6 dB Roe PAL Y/C Separator. 454 Chapter 9: NTSC and PAL Digital Encoding and Decoding COMPOSITE VIDEO LINE DELAY LINE DELAY + + – 0.5X + Y – 3.1–5.7 MHZ BANDPASS FILTER C Figure 9.41. Two-Line Cosine PAL Y/C Separator. COMPOSITE VIDEO LINE DELAY LINE DELAY + + – 0.5X (A) 2 SIN 2ωT –+ 0.5X 3.1–5.7 MHZ BANDPASS FILTER + + Y – C 3.1–5.7 MHZ BANDPASS FILTER 0.5X Figure 9.42. Two-Line Weston PAL Y/C Separator. NTSC and PAL Digital Decoding 455 DEMODULATED U, V VERTICAL FREQUENCY RESPONSE 1 FIGURE 9.39 0 0 78 156 234 312 C / P.H. LUMINANCE VERTICAL FREQUENCY RESPONSE 1 0 0 78 156 234 312 C / P.H. 1 FIGURE 9.40 0 0 78 156 234 312 C / P.H. 1 0 0 78 156 234 312 C / P.H. 1 FIGURE 9.41 0 0 78 156 234 312 C / P.H. 1 0 0 78 156 234 312 C / P.H. 1 FIGURE 9.42 0 0 78 156 234 312 C / P.H. 1 0 0 78 156 234 312 C / P.H. Figure 9.43. Vertical Frequency Characteristics of the Comb Filters in Figures 9.39 Through 9.42. 456 Chapter 9: NTSC and PAL Digital Encoding and Decoding From these comb filter designs, the BBC has derived designs optimized for general viewing (Figure 9.44) and standards conversion (Figure 9.45). For PAL applications, the best luminance processing (Figure 9.41) was combined with the optimum chrominance processing (Figure 9.40). The difference between the two designs is the chrominance recovery. For standards conversion (Figure 9.45), the chrominance signal is just the full-bandwidth composite video signal. Standards conversion uses vertical interpolation which tends to reduce moving and high vertical frequency components, including cross-luminance and cross-color. Thus, vertical chrominance resolution after processing usually will be better than that obtained from the circuits for general viewing. The circuit for general viewing (Figure 9.44) recovers chrominance with a goal of reducing cross-effects, at the expense of chrominance vertical resolution. For NTSC applications, the design of comb filters is easier. There are no switched subcar- + Y – COMPOSITE VIDEO LINE DELAY LINE DELAY + + – 0.5X –+ 0.5X 3.1–5.7 MHZ BANDPASS FILTER 2 SIN 2ωT + 3.1–5.7 MHZ BANDPASS FILTER 0.5X C Figure 9.44. Two-Line Delay PAL Y/C Separator Optimized for General Viewing. C + Y – COMPOSITE VIDEO LINE DELAY LINE DELAY + + – 0.5X 3.1–5.7 MHZ BANDPASS FILTER Figure 9.45. Two-Line Delay PAL Y/C Separator Optimized for Standards Conversion and Video Processing. NTSC and PAL Digital Decoding 457 riers to worry about, and the chrominance phases are 180° per line, rather than 270°. In addition, there is greater separation between the luminance and chrominance frequency bands than in PAL, simplifying the separation requirements. In Figures 9.46 and 9.47, the adder generates a double-amplitude composite video signal since the subcarriers are in phase. There is a 180° subcarrier phase difference between the output of the adder and the 1-line-delayed composite video signal, so subtracting the two cancels most of the luminance, leaving double amplitude chrominance. The main disadvantage of the design in Figure 9.46 is the unsuppressed cross-luminance on vertical color transitions. However, this is offset by the increased luminance resolution over simple lowpass filtering. The reasons for processing chrominance in Figure 9.47 are the same as for PAL in Figure 9.45. Adaptive Comb Filtering Conventional comb filters still have prob- lems with diagonal lines and vertical color changes since only vertically aligned samples are used for processing. COMPOSITE VIDEO LINE DELAY LINE DELAY + Y – C 2.3–4.9 MHZ BANDPASS FILTER + + – 0.5X 0.5X Figure 9.46. Two-Line Delay NTSC Y/C Separator for General Viewing. COMPOSITE VIDEO LINE DELAY LINE DELAY C + Y – 2.3–4.9 MHZ BANDPASS FILTER + + – 0.5X 0.5X Figure 9.47. Two-Line Delay NTSC Y/C Separator for Standards Conversion and Video Processing. 458 Chapter 9: NTSC and PAL Digital Encoding and Decoding With diagonal lines, after standard comb filtering, the chrominance information also includes the difference between adjacent luminance values, which may also be interpreted as chrominance information. This shows up as cross-color artifacts, such as a rainbow appearance along the edge of the line. Sharp vertical color transitions generate the hanging dot pattern commonly seen on the scan line between the two color changes. After standard comb filtering, the luminance information contains the color subcarrier. The amplitude of the color subcarrier is determined by the difference between the two colors. Thus, different colors modulate the luminance intensity differently, creating a dot pattern on the scan line between two colors. To eliminate these hanging dots, a chroma trap filter is sometimes used after the comb filter. The adaptive comb filter attempts to solve these problems by processing a 3 × 3, 5 × 5, or larger block of samples. The values of the samples are used to determine which Y/C separation algorithm to use for the center sample. As many as 32 or more algorithms may be available. By looking for sharp vertical transitions of luminance, or sharp color subcarrier phase changes, the operation of the comb filter is changed to avoid generating artifacts. Due to the cost of integrated line stores, the consumer market commonly uses 3-line adaptive comb filtering, with the next level of improvement being 3D motion adaptive comb filtering. 3D Comb Filtering This method (also called inter-field Y/C separation) uses composite video data from the current field and from two fields (NTSC) or four fields (PAL) earlier. Adding the two cancels the chrominance (since it is 180° out of phase), leaving luminance. Subtracting the two cancels the luminance, leaving chrominance. For PAL, an adequate design may be obtained by replacing the line delays in Figure 9.42 with frame delays. This technique provides nearly perfect Y/ C separation for stationary pictures. However, if there is any change between fields, the resulting Y/C separation is erroneous. For this reason, inter-field Y/C separators usually are not used, unless as part of a 3D motion adaptive comb filter. 3D Motion Adaptive Comb Filter A typical implementation that uses 3D (inter-field) comb filtering for still areas, and 2D (intra-field) comb filtering for areas of the picture that contain motion, is shown in Figure 9.48. The motion detector generates a value (K) of 0–1, allowing the luminance and chrominance signals from the two comb filters to be proportionally mixed. Hard switching between algorithms is usually visible. Figure 9.49 illustrates a simple motion detector block diagram. The concept is to compare frame-to-frame changes in the low-frequency luminance signal. Its performance determines, to a large degree, the quality of the image. The motion signal (K) is usually rectified, smoothed by averaging horizontally and vertically over a few samples, multiplied by a gain factor, and clipped before being used. The only error the motion detector should make is to use the 2D comb filter on stationary areas of the image. Alpha Channel Support By incorporating an additional ADC within the NTSC/PAL decoder, an analog alpha signal (also called a key) may be digitized, and pipelined with the video data to maintain synchronization. This allows the designer to change decoders (which may have different pipeline delays) to fit specific applications with- NTSC and PAL Digital Decoding 459 COMPOSITE VIDEO C INTER-FIELD Y/C SEPARATOR Y FOR NO MOTION K MOTION DETECT 1–K + Y + C INTRA-FIELD Y Y/C SEPARATOR FOR MOTION C (2D ADAPTIVE COMB FILTER) Figure 9.48. 3D Motion Adaptive Y/C Separator. 460 Chapter 9: NTSC and PAL Digital Encoding and Decoding COMPOSITE VIDEO LOWPASS FRAME + FILTER DELAY – BANDPASS + FILTER – FRAME DELAY FRAME + DELAY – MOTION K DETECT Y HF FROM INTRA-FIELD Y/C SEPARATOR Figure 9.49. Simple Motion Detector Block Diagram for NTSC. NTSC and PAL Digital Decoding 461 out worrying about the alpha channel pipeline delay. Alpha is usually linear, with an analog range of 0–100 IRE. There is no blanking pedestal or sync information present. Decoder Video Parameters Many industry-standard video parameters have been defined to specify the relative quality of NTSC/PAL decoders. To measure these parameters, the output of the NTSC/PAL decoder (while decoding various video test signals such as those described in Chapter 8) is monitored using video test equipment. Along with a description of several of these parameters, typical AC parameter values for both consumer and studio-quality decoders are shown in Table 9.11. Several AC parameters, such as short-time waveform distortion, group delay, and K factors, are dependent on the quality of the analog video filters and are not discussed here. In addition to the AC parameters discussed in this section, there are several others that should be included in a decoder specification, such as burst capture and lock frequency range, and the bandwidths of the decoded YIQ or YUV video signals. There are also several DC parameters that should be specified, as shown in Table 9.12. Although genlock capabilities are not usually specified, except for clock jitter, we have attempted to generate a list of genlock parameters, shown in Table 9.13. Differential Phase Differential phase distortion, commonly referred to as differential phase, specifies how much the chrominance phase is affected by the luminance level—in other words, how much hue shift occurs when the luminance level changes. Both positive and negative phase errors may be present, so differential phase is expressed as a peak-to-peak measurement, expressed in degrees of subcarrier phase. This parameter is measured using a test signal of uniform phase and amplitude chrominance superimposed on different luminance levels, such as the modulated ramp test signal, or the modulated five-step portion of the composite test signal. The differential phase parameter for a studio-quality decoder may approach 1° or less. Differential Gain Differential gain distortion, commonly referred to as differential gain, specifies how much the chrominance gain is affected by the luminance level—in other words, how much color saturation shift occurs when the luminance level changes. Both attenuation and amplification may occur, so differential gain is expressed as the largest amplitude change between any two levels, expressed as a percentage of the largest chrominance amplitude. This parameter is measured using a test signal of uniform phase and amplitude chrominance superimposed on different luminance levels, such as the modulated ramp test signal, or the modulated five-step portion of the composite test signal. The differential gain parameter for a studio-quality decoder may approach 1% or less. Luminance Nonlinearity Luminance nonlinearity, also referred to as differential luminance and luminance nonlinear distortion, specifies how much the luminance gain is affected by the luminance level. In other words, there is a nonlinear relationship between the decoded luminance level and the ideal luminance level. 462 Chapter 9: NTSC and PAL Digital Encoding and Decoding Parameter differential phase differential gain luminance nonlinearity hue accuracy color saturation accuracy SNR (per EIA/TIA RS-250-C) chrominance-to-luminance crosstalk luminance-to-chrominance crosstalk H tilt V tilt Y/C sampling skew demodulation quadrature Consumer Quality 4 4 2 3 3 48 < –40 < –40 <1 <1 <5 90 ±2 Studio Quality ≤1 ≤1 ≤1 ≤1 ≤1 > 60 < –50 < –50 <1 <1 <2 90 ±0.5 Units degrees % % degrees % dB dB dB % % ns degrees Table 9.11. Typical AC Video Parameters for NTSC and PAL Decoders. Parameter sync input amplitude burst input amplitude video input amplitude (1v nominal) (M) NTSC 40 ±20 40 ±20 0.5 to 2.0 (B, D, G, H, I) PAL 43 ±22 42.86 ±22 0.5 to 2.0 Units IRE IRE volts Table 9.12. Typical DC Video Parameters for NTSC and PAL Decoders. NTSC and PAL Digital Decoding 463 Parameter sync locking time1 sync recover y time2 short-term sync lock range3 long-term sync lock range4 number of consecutive missing horizontal sync pulses before any correction vertical correlation5 short-term subcarrier locking range6 long-term subcarrier locking range7 subcarrier locking time8 subcarrier accuracy Min ±100 ±5 5 ±200 ±500 Max Units 2 fields 2 fields ns μs sync pulses ±5 ns Hz Hz 10 lines ±2 degrees Notes: 1. Time from start of genlock process to vertical correlation specification is achieved. 2. Time from loss of genlock to vertical correlation specification is achieved. 3. Range over which vertical correlation specification is maintained. Short-term range assumes line time changes by amount indicated slowly between two consecutive lines. 4. Range over which vertical correlation specification is maintained. Long-term range assumes line time changes by amount indicated slowly over one field. 5. Indicates vertical sample accuracy. For a genlock system that uses a VCO or VCXO, this specification is the same as sample clock jitter. 6. Range over which subcarrier locking time and accuracy specifications are maintained. Short-term time assumes subcarrier frequency changes by amount indicated slowly over 2 frames. 7. Range over which subcarrier locking time and accuracy specifications are maintained. Long-term time assumes subcarrier frequency changes by amount indicated slowly over 24 hours. 8. After instantaneous 180° phase shift of subcarrier, time to lock to within ±2°. Subcarrier frequency is nominal ±500 Hz. Table 9.13. Typical Genlock Parameters for NTSC and PAL Decoders. Parameters assume a video signal with ≥ 30 dB SNR and over the range of DC parameters in Table 9.12. 464 Chapter 9: NTSC and PAL Digital Encoding and Decoding Using an unmodulated five-step or ten-step staircase test signal, or the modulated five-step portion of the composite test signal, the difference between the largest and smallest steps, expressed as a percentage of the largest step, is used to specify the luminance nonlinearity. Although this parameter is included within the differential gain and phase parameters, it is traditionally specified independently. between the measured and nominal values of the amplitudes of the other two decoded chrominance packets specifies the chrominance nonlinear gain distortion, expressed in IRE or as a percentage of the nominal amplitude of the worst-case packet. This parameter is usually not specified independently, but is included within the differential gain and phase parameters. Chrominance Nonlinear Phase Distortion Chrominance nonlinear phase distortion specifies how much the chrominance phase (hue) is affected by the chrominance amplitude (saturation)—in other words, how much hue shift occurs when the saturation changes. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the decoder output for each chrominance packet is measured. The difference between the largest and the smallest hue measurements is the peak-to-peak value. This parameter is usually not specified independently, but is included within the differential gain and phase parameters. Chrominance Nonlinear Gain Distortion Chrominance nonlinear gain distortion specifies how much the chrominance gain is affected by the chrominance amplitude (saturation). In other words, there is a nonlinear relationship between the decoded chrominance amplitude levels and the ideal chrominance amplitude levels—this is usually seen as an attenuation of highly saturated chrominance signals. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the decoder is adjusted so that the middle chrominance packet (40 IRE) is decoded properly. The largest difference Chrominance-to-Luminance Intermodulation Chrominance-to-luminance intermodulation, commonly referred to as cross-modulation, specifies how much the luminance level is affected by the chrominance. This may be the result of clipping highly saturated chrominance levels or quadrature distortion and may show up as irregular brightness variations due to changes in color saturation. Using a modulated pedestal test signal, or the modulated pedestal portion of the combination test signal, the largest difference between the decoded 50 IRE luminance level and the decoded luminance levels specifies the chrominance-to-luminance intermodulation, expressed in IRE or as a percentage. This parameter is usually not specified independently, but is included within the differential gain and phase parameters. Hue Accuracy Hue accuracy specifies how closely the decoded hue is to the ideal hue value. Both positive and negative phase errors may be present, so hue accuracy is the difference between the worst-case positive and worst-case negative measurements from nominal, expressed in degrees of subcarrier phase. This parameter is measured using EIA or EBU 75% color bars as a test signal. References 465 Color Saturation Accuracy Color saturation accuracy specifies how close the decoded saturation is to the ideal saturation value, using EIA or EBU 75% color bars as a test signal. Both gain and attenuation may be present, so color saturation accuracy is the difference between the worst-case gain and worst-case attenuation measurements from nominal, expressed as a percentage of nominal. H Tilt H tilt, also known as line tilt and line time distortion, causes a tilt in line-rate signals, predominantly white bars. This type of distortion causes variations in brightness between the left and right edges of an image. For a digital decoder, H tilt is primarily an artifact of the analog input filters and the transmission medium. H tilt is measured using a line bar (such as the one in the NTC-7 NTSC composite test signal) and measuring the peak-to-peak deviation of the tilt (in IRE or percentage of white bar amplitude), ignoring the first and last microsecond of the white bar. V Tilt V tilt, also known as field tilt and field time distortion, causes a tilt in field-rate signals, predominantly white bars. This type of distortion causes variations in brightness between the top and bottom edges of an image. For a digital decoder, V tilt is primarily an artifact of the analog input filters and the transmission medium. V tilt is measured using an 18 μs, 100 IRE white bar in the center of 130 lines in the center of the field or using a field square wave. The peak-to-peak deviation of the tilt is measured (in IRE or percentage of white bar amplitude), ignoring the first and last three lines. References 1. Benson, K. Blair, 1986, Television Engineering Handbook, McGraw-Hill, Inc. 2. Clarke, C.K.P., 1986, Colour encoding and decoding techniques for line-locked sampled PAL and NTSC television signals, BBC Research Department Report BBC RD1986/2. 3. Clarke, C.K.P., 1982, Digital standards conversion: comparison of colour decoding methods, BBC Research Department Report BBC RD1982/6. 4. Clarke, C.K.P., 1982, High quality decoding for PAL inputs to digital YUV studios, BBC Research Department Report BBC RD1982/12. 5. Clarke, C.K.P., 1988, PAL decoding: Multidimensional filter design for chrominanceluminance separation, BBC Research Department Report BBC RD1988/11. 6. Drewery, J.O., 1996, Advanced PAL decoding: exploration of some adaptive techniques, BBC Research Department Report BBC RD1996/1. 7. ITU-R BT.470–6, 1998, Conventional Television Systems. 8. NTSC Video Measurements, Tektronix, Inc., 1997. 9. Perlman, Stuart S., et al., An Adaptive Luma-Chroma Separator Circuit for PAL and NTSC TV Signals, International Conference on Consumer Electronics, Digest of Technical Papers, June 6–8, 1990. 10. Sandbank, C.P., Digital Television, John Wiley & Sons, Ltd., 1990. 11. SMPTE 170M–2004, Television—Composite Analog Video Signal—NTSC for Studio Applications. 12. Television Measurements, NTSC Systems, Tektronix, Inc., 1998. 13. Television Measurements, PAL Systems, Tektronix, Inc., 1990. 466 Chapter 10: H.261 and H.263 Chapter 10: H.261 and H.263 Chapter 10 H.261 and H.263 There are several standards for video conferencing, as shown in Table 10.1. Figures 10.1 through 10.3 illustrate the block diagrams of several common video conferencing systems. H.261 Two picture (or frame) types are supported: Intra or I Frame: A frame having no reference frame for prediction. Inter or P Frame: A frame based on a previous frame. ITU-T H.261 was the first video compression and decompression standard developed for video conferencing. Originally designed for bit-rates of p × 64 kbps, where p is in the range 1–30, H.261 is now the minimum requirement of all video conferencing standards, as shown in Table 10.1. A typical H.261 encoder block diagram is shown in Figure 10.4. The video encoder provides a self-contained digital video bitstream which is multiplexed with other signals, such as control and audio. The video decoder performs the reverse process. H.261 video data uses the 4:2:0 YCbCr format shown in Figure 3.7, with the primary specifications listed in Table 10.2. The maximum picture rate may be restricted by having 0, 1, 2, or 3 non-transmitted pictures between transmitted ones. Video Coding Layer As shown in Figure 10.4, the basic functions are prediction, block transformation, and quantization. The prediction error (inter mode) or the input picture (intra mode) is subdivided into 8 sample × 8 line blocks that are segmented as transmitted or non-transmitted. Four luminance blocks and the two spatially corresponding color difference blocks are combined to form a 16 sample × 16 line macroblock as shown in Figure 10.5. The criteria for choice of mode and transmitting a block are not recommended and may be varied dynamically as part of the coding strategy. Transmitted blocks are transformed and the resulting coefficients quantized and variable-length coded. 466 H.261 467 H.310 H.320 H.321 H.322 H.323 network Broadband ISDN ATM LAN Narrowband Switched Digital ISDN Broadband ISDN ATM LAN Guaranteed Bandwidth Packet Switched Networks Non-guaranteed Bandwidth Packet Switched Networks (Ethernet) video codec audio codec multiplexing MPEG-2 H.261 MPEG-2 G.711 G.722 G.728 H.222.0 H.222.1 H.261 H.263 H.264 G.711 G.722 G.722.1 G.728 H.221 H.261 H.263 G.711 G.722 G.728 H.221 H.261 H.263 G.711 G.722 G.728 H.221 H.261 H.263 H.264 G.711 G.722 G.722.1 G.723.1 G.728 G.729 H.225.0 control H.245 H.231 H.242 H.243 H.242 H.230 H.242 H.245 multipoint data T.120 communications interface AAL I.363 AJM I.361 PHY I.432 H.231 H.239 T.120 I.400 H.231 T.120 AAL I.363 AJM I.361 PHY I.400 H.231 T.120 I.400 and TCP/IP H.323 H.239 T.120 TCP/IP H.324 PSTN or POTS H.261 H.263 G.723 H.223 H.245 T.120 V.34 modem 3G-324M Mobile MPEG-4.2 G.722.2 G.723.1 H.223A/B H.245 T.120 Mobile Radio Table 10.1. Video Conferencing Family of Standards. 468 Chapter 10: H.261 and H.263 H.320 VIDEO CODEC (H.261, H.263) AUDIO CODEC (G.7XX) DATA INTERFACE (T.120) CONTROL (H.242, H.230) MUX ----DEMUX (H.221) NETWORK INTERFACE (I.400 SERIES) Figure 10.1. Typical H.320 System. H.323 VIDEO CODEC (H.261, H.263) AUDIO CODEC (G.7XX) DATA PROTOCOLS (T.120, LAPM, ETC.) CONTROL (H.245) LAPM XID PROCEDURES MUX -----DMUX (H.225) NETWORK INTERFACE TCP/IP Figure 10.2. Typical H.323 System. H.261 469 H.324 VIDEO CODEC (H.261, H.263) AUDIO CODEC (G.723) DATA PROTOCOLS (T.120, LAPM, ETC.) CONTROL (H.245) LAPM XID PROCEDURES MUX -----DMUX (H.223) MODEM (V.34 / V.8) MODEM CONTROL (V.25TER) Figure 10.3. Typical H.324 System. Prediction The prediction is inter-picture and may include motion compensation and a spatial filter. The coding mode using prediction is called inter; the coding mode using no prediction is called intra. Motion Compensation Motion compensation is optional in the encoder. The decoder must support the acceptance of one motion vector per macroblock. Motion vectors are restricted—all samples referenced by them must be within the coded picture area. The horizontal and vertical components of motion vectors have integer values not exceeding ±15. The motion vector is used for all four Y blocks in the macroblock. The motion vector for both the Cb and Cr blocks is derived by halving the values of the macroblock vector. A positive value of the horizontal or vertical component of the motion vector indicates that the prediction is formed from samples in the previous picture that are spatially to the right or below the samples being predicted. Loop Filter The prediction process may use a 2D spa- tial filter that operates on samples within a predicted 8 × 8 block. The filter is separated into horizontal and vertical functions. Both are non-recursive with coefficients of 0.25, 0.5, 0.25 except at block edges where one of the taps falls outside the block. In such cases, the filter coefficients are changed to 0, 1, 0. The filter is switched on or off for all six blocks in a macroblock according to the macroblock type. 470 Chapter 10: H.261 and H.263 YCBCR VIDEO IN CODING CONTROL + – SWITCH DCT QUANTIZER INVERSE QUANTIZER 0 SWITCH IDCT + INTRA / INTER FLAG TRANSMIT FLAG QUANTIZER INDICATION QUANTIZING INDEX LOOP FILTER PICTURE MEMORY WITH MOTION COMPENSATED VARIABLE DELAY Figure 10.4. Typical H.261 Encoder. MOTION VECTOR LOOP FILTER ON / OFF Parameters active resolution (Y) frame rate YCbCr sampling structure form of YCbCr coding CIF 352 × 288 29.97 Hz 4:2:0 Uniformly quantized PCM, 8 bits per sample. QCIF 176 × 144 Table 10.2. H.261 YCbCr Parameters. H.261 471 DCT, IDCT Transmitted blocks are first processed by an 8 × 8 DCT (discrete cosine transform). The output from the IDCT (inverse DCT) ranges from –256 to +255 after clipping, represented using 9 bits. The procedures for computing the transforms are not defined, but the inverse transform must meet the specified error tolerance. Quantization Within a macroblock, the same quantizer is used for all coefficients, except the one for intra-DC. The intra-DC coefficient is usually linearly quantized with a step size of 8 and no dead zone. The other coefficients use one of 31 possible linear quantizers, but with a central dead zone about 0 and a step size of an even value in the range of 2–62. Clipping of Reconstructed Picture Clipping functions are used to prevent quantization distortion of transform coefficient amplitudes, possibly causing arithmetic overflows in the encoder and decoder loops. The clipping function is applied to the reconstructed picture, formed by summing the prediction and the prediction error. Clippers force sample values less than 0 to be 0 and values greater than 255 to be 255. Coding Control Although not included as part of H.261, several parameters may be varied to control the rate of coded video data. These include processing prior to coding, the quantizer, block significance criterion, and temporal subsampling. Temporal subsampling is performed by discarding complete pictures. 288 LINES 352 SAMPLES GROUP OF BLOCKS 1 2 3 4 5 6 7 8 9 10 11 12 BLOCK ARRANGEMENT WITHIN A MACROBLOCK CR CB BLOCK 5 BLOCK 4 BLOCK 0 BLOCK 1 Y BLOCK 2 BLOCK 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 MACROBLOCKS Figure 10.5. H.261 Arrangement of Group of Blocks, Macroblocks, and Blocks. 472 Chapter 10: H.261 and H.263 Forced Updating This is achieved by forcing the use of the intra mode of the coding algorithm. To control the accumulation of inverse transform mismatch errors, a macroblock should be forcibly updated at least once every 132 times it is transmitted. Video Bitstream Unless specified otherwise, the most significant bits are transmitted first. This is bit 1 and is the leftmost bit in the code tables. Unless specified otherwise, all unused or spare bits are set to “1.” The video bitstream is a hierarchical structure with four layers. From top to bottom the layers are: Picture Group of Blocks (GOB) Macroblock (MB) Block Picture Layer Data for each picture consists of a picture header followed by data for a group of blocks (GOBs). The structure is shown in Figure 10.6. Picture headers for dropped pictures are not transmitted. PICTURE LAYER PSC TR PTYPE PEI PSPARE PEI GOB GOB ... GOB GOB LAYER GBSC GN GQUANT GEI GSPARE GEI MB MB ... MB MACROBLOCK LAYER MBA MTYPE MQUANT MVD CBP BLOCK 0 ... BLOCK 5 BLOCK LAYER TCOEFF EOB Figure 10.6. H.261 Video Bitstream Layer Structures. H.261 473 Picture Start Code (PSC) PSC is a 20-bit word with a value of 0000 0000 0000 0001 0000. Temporal Reference (TR) TR is a 5-bit binary number representing 32 possible values. It is generated by incrementing the value in the previous picture header by one plus the number of non-transmitted pictures (at 29.97 Hz). The arithmetic is performed with only the five LSBs. Type Information (PTYPE) Six bits of information about the picture are: Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Split screen indicator “0” = off, “1” = on Document camera indicator “0” = off, “1” = on Freeze picture release “0” = off, “1” = on Source format “0” = QCIF, “1” = CIF Optional still image mode “0” = on, “1” = off Spare Extra Insertion Information (PEI) PEI is a bit which when set to “1” indicates the presence of the following optional data field. Spare Information (PSPARE) If PEI is set to “1,” then these 9 bits follow consisting of 8 bits of data (PSPARE) and another PEI bit to indicate if a further 9 bits follow, and so on. Group of Blocks (GOB) Layer Each picture is divided into groups of blocks (GOB). A GOB comprises one-twelfth of the CIF picture area or one-third of the QCIF picture area (see Figure 10.5). A GOB relates to 176 samples × 48 lines of Y and the corresponding 88 × 24 array of Cb and Cr data. Data for each GOB consists of a GOB header followed by macroblock data, as shown in Figure 10.6. Each GOB header is transmitted once between picture start codes in the CIF or QCIF sequence numbered in Figure 10.5, even if no macroblock data is present in that GOB. Group of Blocks Start Code (GBSC) GBSC is a 16-bit word with a value of 0000 0000 0000 0001. Group Number (GN) GN is a 4-bit binary value indicating the position of the group of blocks. The bits are the binary representation of the number in Figure 10.5. Numbers 13, 14, and 15 are reserved for future use. Quantizer Information (GQUANT) GQUANT is a 5-bit binary value that indi- cates the quantizer used for the group of blocks until overridden by any subsequent MQUANT. Values of 1–31 are allowed. Extra Insertion Information (GEI) GEI is a bit which, when set to “1,” indi- cates the presence of the following optional data field. Spare Information (GSPARE) If GEI is set to “1,” then these 9 bits follow consisting of 8 bits of data (GSPARE) and then another GEI bit to indicate if a further 9 bits follow, and so on. 474 Chapter 10: H.261 and H.263 Macroblock (MB) Layer Each GOB is divided into 33 macroblocks as shown in Figure 10.5. A macroblock relates to 16 samples × 16 lines of Y and the corresponding 8 × 8 array of Cb and Cr data. Data for a macroblock consists of a macroblock header followed by data for blocks (see Figure 10.6). Macroblock Address (MBA) MBA is a variable-length codeword indicat- ing the position of a macroblock within a group of blocks. The transmission order is shown in Figure 10.5. For the first macroblock in a GOB, MBA is the absolute address in Figure 10.5. For subsequent macroblocks, MBA is the difference between the absolute addresses of the macroblock and the last transmitted macroblock. The code table for MBA is given in Table 10.3. A codeword is available for bit stuffing immediately after a GOB header or a coded macroblock (called MBA stuffing). This codeword is discarded by decoders. The codeword for the start code is also shown in Table 10.3. MBA is always included in transmitted macroblocks. Macroblocks are not transmitted when they contain no information for that part of the picture. Type Information (MTYPE) MTYPE is a variable-length codeword con- taining information about the macroblock and data elements that are present. Macroblock types, included elements, and variable-length codewords are listed in Table 10.4. MTYPE is always included in transmitted macroblocks. Quantizer (MQUANT) MQUANT is present only if indicated by MTYPE. It is a 5-bit codeword indicating the quantizer to use for this and any following blocks in the group of blocks, until overridden by any subsequent MQUANT. Codewords for MQUANT are the same as for GQUANT. Motion Vector Data (MVD) Motion vector data is included for all motion-compensated (MC) macroblocks, as indicated by MTYPE. MVD is obtained from the macroblock vector by subtracting the vector of the preceding macroblock. The vector of the previous macroblock is regarded as zero for the following situations: (a) Evaluating MVD for macroblocks 1, 12, and 23. (b) Evaluating MVD for macroblocks where MBA does not represent a difference of 1. (c) MTYPE of the previous macroblock was not motion-compensated. Motion vector data consists of a variablelength codeword for the horizontal component, followed by a variable-length codeword for the vertical component. The variable-length codes are listed in Table 10.5. Coded Block Pattern (CBP) The variable-length CBP is present if indi- cated by MTYPE. It indicates which blocks in the macroblock have at least one transform coefficient transmitted. The pattern number is represented as: P0P1P2P3P4P5 where Pn = “1” for any coefficient present for block [n], else Pn = “0.” Block numbering (decimal format) is given in Figure 10.5. The codewords for the CBP number are given in Table 10.6. H.261 475 MBA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 011 010 0011 0010 0001 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 Code 1 0 111 110 1011 1010 1001 1000 0111 0110 0101 MBA 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 11 32 33 MBA stuffing start code 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0101 0101 0101 0100 0100 0100 0100 0100 0100 0011 0011 0011 0011 0011 0011 0011 0011 0001 0000 Code 10 01 00 11 10 011 010 001 000 111 110 101 100 011 010 001 000 111 0000 0001 Table 10.3. H.261 Variable-Length Code Table for MBA. Prediction MQUANT MVD CBP TCOEFF Code intra intra inter inter inter + MC inter + MC inter + MC inter + MC + FIL inter + MC + FIL inter + MC + FIL × × × × × × × × × × × × × × × × × 0001 × 0000 001 × 1 × 0000 1 0000 0000 1 × 0000 0001 × 0000 0000 01 001 × 01 × 0000 01 Table 10.4. H.261 Variable-Length Code Table for MTYPE. 476 Chapter 10: H.261 and H.263 Block Layer A macroblock is made up of four Y blocks, a Cb block, and a Cr block (see Figure 10.5). Data for an 8 sample × 8 line block consists of codewords for the transform coefficients followed by an end of block (EOB) marker as shown in Figure 10.6. The order of block transmission is shown in Figure 10.5. Transform Coefficients (TCOEFF) When MTYPE indicates intra, transform coefficient data is present for all six blocks in a macroblock. Otherwise, MTYPE and CBP signal which blocks have coefficient data transmitted for them. The quantized DCT coefficients are transmitted in the order shown in Figure 7.59. Vector Difference Code Vector Difference Code –16 & 16 0000 0011 001 –15 & 17 0000 0011 011 1 2 & –30 010 0010 –14 & 18 0000 0011 101 –13 & 19 0000 0011 111 –12 & 20 0000 0100 001 –11 & 21 0000 0100 011 –10 & 22 0000 0100 11 –9 & 23 0000 0101 01 –8 & 24 0000 0101 11 –7 & 25 0000 0111 –6 & 26 0000 1001 –5 & 27 0000 1011 –4 & 28 0000 111 –3 & 29 0001 1 3 & –29 0001 0 4 & –28 0000 110 5 & –27 0000 1010 6 & –26 0000 1000 7 & –25 0000 0110 8 & –24 0000 0101 10 9 & –23 0000 0101 00 10 & –22 0000 0100 10 11 & –21 0000 0100 010 12 & –20 0000 0100 000 13 & –19 0000 0011 110 14 & –18 0000 0011 100 –2 & 30 –1 0 0011 011 1 15 & –17 0000 0011 010 Table 10.5. H.261 Variable-Length Code Table for MVD. H.261 477 CBP Code 60 111 4 1101 8 1100 16 1011 32 1010 12 1001 1 48 1001 0 20 1000 1 40 1000 0 28 0111 1 44 0111 0 52 0110 1 56 0110 0 1 0101 1 61 0101 0 2 0100 1 CBP Code 62 0100 24 0011 36 0011 3 0011 63 0011 5 0010 9 0010 17 0010 33 0010 6 0010 10 0010 18 0010 34 0010 7 0001 11 0001 19 0001 0 11 10 01 00 111 110 101 100 011 010 001 000 1111 1110 1101 Table 10.6a. H.261 Variable-Length Code Table for CBP. CBP Code 35 0001 13 0001 49 0001 21 0001 41 0001 1100 1011 1010 1001 1000 14 0001 0111 50 0001 22 0001 42 0001 15 0001 51 0001 23 0001 43 0001 25 0000 37 0000 0110 0101 0100 0011 0010 0001 0000 1111 1110 26 0000 1101 CBP 38 0000 29 0000 45 0000 53 0000 57 0000 30 0000 46 0000 54 0000 58 0000 31 0000 47 0000 55 0000 59 0000 27 0000 39 0000 Code 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 1 0011 0 0010 1 0010 0 0001 1 0001 0 Table 10.6b. H.261 Variable-Length Code Table for CBP. 478 Chapter 10: H.261 and H.263 INTRA (I) FRAME PREDICTED (P) FRAME Figure 10.7. Typical H.261 Decoded Sequence. The most common combinations of successive zeros (RUN) and the following value (LEVEL) are encoded using variable-length codes, listed in Table 10.7. Since CBP indicates blocks with no coefficient data, EOB cannot occur as the first coefficient. The last bit “s” denotes the sign of the level: “0” = positive, “1” = negative. Other combinations of (RUN, LEVEL) are encoded using a 20-bit word: six bits of escape (ESC), six bits of RUN, and eight bits of LEVEL, as shown in Table 10.8. Two code tables are used for the variablelength coding: one is used for the first transmitted LEVEL in inter, inter + MC, and inter + MC + FIL blocks; another is used for all other LEVELs, except for the first one in intra blocks, which is fixed-length coded with eight bits. All coefficients, except for intra-DC, have reconstruction levels (REC) in the range –2048 to 2047. Reconstruction levels are recovered by the following equations, and the results are clipped. QUANT ranges from 1 to 31 and is transmitted by either GQUANT or MQUANT. QUANT = odd: for LEVEL > 0 REC = QUANT × (2 × LEVEL + 1) for LEVEL < 0 REC = QUANT × (2 × LEVEL – 1) QUANT = even: for LEVEL > 0 REC = (QUANT × (2 × LEVEL + 1)) – 1 for LEVEL < 0 REC = (QUANT × (2 × LEVEL – 1)) + 1 for LEVEL = 0 REC = 0 For intra-DC blocks, the first coefficient is typically the transform value quantized with a step size of 8 and no dead zone, resulting in an 8-bit coded value, n. Black has a coded value of 0001 0000 (16), and white has a coded value of 1110 1011 (235). A transform value of 1024 is coded as 1111 1111. Coded values of 0000 0000 and 1000 0000 are not used. The decoded value is 8n, except that an n value of 255 results in a reconstructed transform value of 1024. H.261 479 Run EOB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 Level 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 1 2 3 4 5 1 2 3 4 10 1s 11s 0100 0010 0000 0010 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 011s 0001 0010 0000 0000 0000 0000 0101 0000 0000 0000 0000 0011 0010 0000 0000 Code if first coefficient in block* if not first coefficient in block s 1s 110s 0110 s 0001 s 0010 10s 0001 1101 s 0001 1000 s 0001 0011 s 0001 0000 s 0000 1101 0s 0000 1100 1s 0000 1100 0s 0000 1011 1s 10s 0101 s 0011 00s 0001 1011 s 0000 1011 0s 0000 1010 1s s 100s 0010 11s 0001 0100 s 0000 1010 0s 1s 0100 s 0001 1100 s 0000 1001 1s Table 10.7a. H.261 Variable-Length Code Table for TCOEFF. *Never used in intra macroblocks. 480 Chapter 10: H.261 and H.263 Run Level Code 4 1 0011 0s 4 2 0000 0011 11s 4 3 0000 0001 0010 s 5 1 0001 11s 5 2 0000 0010 01s 5 3 0000 0000 1001 0s 6 1 0001 01s 6 2 0000 0001 1110 s 7 1 0001 00s 7 2 0000 0001 0101 s 8 1 0000 111s 8 2 0000 0001 0001 s 9 1 0000 101s 9 2 0000 0000 1000 1s 10 1 0010 0111 s 10 2 0000 0000 1000 0s 11 1 0010 0011 s 12 1 0010 0010 s 13 1 0010 0000 s 14 1 0000 0011 10s 15 1 0000 0011 01s 16 1 0000 0010 00s 17 1 0000 0001 1111 s 18 1 0000 0001 1010 s 19 1 0000 0001 1001 s 20 1 0000 0001 0111 s 21 1 0000 0001 0110 s 22 1 0000 0000 1111 1s 23 1 0000 0000 1111 0s 24 1 0000 0000 1110 1s 25 1 0000 0000 1110 0s 26 1 0000 0000 1101 1s ESC 0000 01 Table 10.7b. H.261 Variable-Length Code Table for TCOEFF. H.263 481 Run Code Level Code 0 0000 00 –128 forbidden 1 0000 01 –127 1000 0001 : : : : 63 1111 11 –2 1111 1110 –1 1111 1111 0 forbidden 1 0000 0001 2 0000 0010 : : 127 0111 1111 Table 10.8. H.261 Run, Level Codes. Still Image Transmission H.261 allows the transmission of a still image of four times the resolution of the currently selected video format. If the video format is QCIF, a still image of CIF resolution may be transmitted; if the video format is CIF, a still image of 704 × 576 resolution may be transmitted. H.263 ITU-T H.263 improves on H.261 by providing improved video quality at lower bit-rates. The video encoder provides a self-contained digital bitstream which is combined with other signals (such as H.223). The video decoder performs the reverse process. The primary specifications of H.263 regarding YCbCr video data are listed in Table 10.9. It is also possible to negotiate a custom picture size. The 4:2:0 YCbCr sampling is shown in Figure 3.7. Seven frame (or picture) types are supported, with the first two being mandatory (baseline H.263): Intra or I Frame: A frame having no reference frame for prediction. Inter or P Frame: A frame based on a previous frame. PB Frame and Improved PB Frame: A frame representing two frames and based on a previous frame. B Frame: A frame based two reference frames, one previous and one afterwards. EI Frame: A frame having a temporally simultaneous frame which has either the same or smaller frame size. EP Frame: A frame having a two reference frames, one previous and one simultaneous. 482 Chapter 10: H.261 and H.263 Video Coding Layer A typical encoder block diagram is shown in Figure 10.8. The basic functions are prediction, block transformation, and quantization. The prediction error or the input picture are subdivided into 8 × 8 blocks which are segmented as transmitted or non-transmitted. Four luminance blocks and the two spatially corresponding color difference blocks are combined to form a macroblock as shown in Figure 10.9. The criteria for choice of mode and transmitting a block are not recommended and may be varied dynamically as part of the coding strategy. Transmitted blocks are transformed and the resulting coefficients are quantized and variable-length coded. Prediction The prediction is interpicture and may include motion compensation. The coding mode using prediction is called inter; the coding mode using no prediction is called intra. Intra-coding is signaled at the picture level (I frame for intra or P frame for inter) or at the macroblock level in P frames. In the optional PB frame mode, B frames always use the inter mode. Motion Compensation Motion compensation is optional in the en- coder. The decoder must support accepting one motion vector per macroblock (one or four motion vectors per macroblock in the optional advanced prediction or deblocking filter modes). In the optional PB frame mode, each macroblock may have an additional vector. In the optional improved PB frame mode, each macroblock can include an additional forward motion vector. In the optional B frame mode, macroblocks can be transmitted with both a forward and backward motion vector. For baseline H.263, motion vectors are restricted such that all samples referenced by them are within the coded picture area. Many of the optional modes remove this restriction. The horizontal and vertical components of motion vectors have integer or half-integer values not exceeding –16 to +15.5. Several of the optional modes increase the range to [–31.5, +31.5] or [–31.5, +30.5]. A positive value of the horizontal or vertical component of the motion vector typically indicates that the prediction is formed from samples in the previous frame which are spatially to the right or below the samples being predicted. However, for backward motion vectors in B frames, a positive value of the horizontal or vertical component of the motion vector indicates that the prediction is formed from samples in the next frame which are spatially to the left or above the samples being predicted. Quantization The number of quantizers is 1 for the first intra coefficient and 31 for all other coefficients. Within a macroblock, the same quantizer is used for all coefficients except the first one of intra-blocks. The first intra-coefficient is usually the transform DC value linearly quantized with a step size of 8 and no dead zone. Each of the other 31 quantizers are also linear, but with a central dead zone around zero and a step size of an even value in the range of 2–62. Coding Control Although not a part of H.263, several parameters may be varied to control the rate of coded video data. These include processing prior to coding, the quantizer, block significance criterion, and temporal subsampling. H.263 483 YCBCR VIDEO IN CODING CONTROL + – SWITCH DCT QUANTIZER INVERSE QUANTIZER 0 SWITCH IDCT + INTRA / INTER FLAG TRANSMIT FLAG QUANTIZER INDICATION QUANTIZING INDEX LOOP FILTER PICTURE MEMORY WITH MOTION COMPENSATED VARIABLE DELAY MOTION VECTOR LOOP FILTER ON / OFF Figure 10.8. Typical Baseline H.263 Encoder. Parameters active resolution (Y) frame rate YCbCr sampling structure form of YCbCr coding 16CIF 4CIF CIF 1408 × 1152 704 × 576 352 × 288 29.97 Hz 4:2:0 Uniformly quantized PCM, 8 bits per sample. QCIF 176 × 144 Table 10.9. Baseline H.263 YCbCr Parameters. SQCIF 128 × 96 484 Chapter 10: H.261 and H.263 Forced Updating This is achieved by forcing the use of the intra mode. To control the accumulation of inverse transform mismatch errors, a macroblock should be forcibly updated at least once every 132 times it is transmitted. Video Bitstream Unless specified otherwise, the most significant bits are transmitted first. Bit 1, the leftmost bit in the code tables, is the most significant. Unless specified otherwise, all unused or spare bits are set to “1.” The video multiplexer is arranged in a hierarchical structure with four layers. From top to bottom the layers are: Picture Group of Blocks (GOB) or Slice Macroblock (MB) Block Picture Layer Data for each picture consists of a picture header followed by data for a group of blocks (GOBs), followed by an end-of-sequence (EOS) and stuffing bits (PSTUF). The baseline structure is shown in Figure 10.10. Picture headers for dropped pictures are not transmitted. 288 LINES 352 SAMPLES GROUP OF BLOCKS 0 1 2 : 16 17 BLOCK ARRANGEMENT WITHIN A MACROBLOCK CR CB BLOCK 5 BLOCK 4 BLOCK 0 BLOCK 1 Y BLOCK 2 BLOCK 3 1 2 3 4 5 6 7 ... 20 21 22 MACROBLOCKS Figure 10.9. H.263 Arrangement of Group of Blocks, Macroblocks, and Blocks. H.263 485 Picture Start Code (PSC) PSC is a 22-bit word with a value of 0000 0000 0000 0000 1 00000. It must be bytealigned; therefore, 0–7 zero bits are added before the start code to ensure the first bit of the start code is the first, and most significant, bit of a byte. If a custom picture clock frequency (PCF) is indicated, extended TR (ETR) and TR form a 10-bit number where TR stores the eight LSBs and ETR stores the two MSBs. The arithmetic in this case is performed with the 10 LSBs. In the PB frame and improved PB frame mode, TR only addresses P frames. Temporal Reference (TR) TR is an 8-bit binary number representing 256 possible values. It is generated by incrementing its value in the previously transmitted picture header by one and adding the number of non-transmitted 29.97 Hz pictures since the last transmitted one. The arithmetic is performed with only the eight LSBs. PICTURE LAYER PSC TR PTYPE PQUANT CPM PSBI TRB DBQUANT PEI PSUPP PEI GOB GOB ... GOB EOS PSTUF GOB LAYER GBSC GN GSBI GFID GQUANT MB MB ... MB MACROBLOCK LAYER COD MCBPC MODB CBPB CBPY DQUANT MVD MVD2 MVD3 MVD4 MVDB BLOCK 0 ... BLOCK 5 BLOCK LAYER INTRADC TCOEF Figure 10.10. Baseline H.263 Video Bitstream Layer Structures (Without Optional PLUSPTYPE Related Fields in the Picture Layer). 486 Chapter 10: H.261 and H.263 Type Information (PTYPE) PTYPE contains 13 bits of information about the picture: Bit 1 “1” Bit 2 “0” Bit 3 Split screen indicator “0” = off, “1” = on Bit 4 Document camera indicator “0” = off, “1” = on Bit 5 Freeze picture release “0” = off, “1” = on Bit 6–8 Source format “000” = reserved “001” = SQCIF “010” = QCIF “011” = CIF “100” = 4CIF “101” = 16CIF “110” = reserved “111” = extended PTYPE If bits 6-8 are not “111,” the following 5 bits are present in PTYPE: Bit 9 Picture coding type “0” = intra, “1” = inter Bit 10 Optional unrestricted motion vector mode “0” = off, “1” = on Bit 11 Optional syntax-based arithmetic coding mode “0” = off, “1” = on Bit 12 Bit 13 Optional advanced prediction mode “0” = off, “1” = on Optional PB frames mode “0” = normal picture “1” = PB frame If bit 9 is set to “0,” bit 13 must be set to a “0.” Bits 10–13 are optional modes that are negotiated between the encoder and decoder. Quantizer Information (PQUANT) PQUANT is a 5-bit binary number (value of 1–31) representing the quantizer to be used until updated by a subsequent GQUANT or DQUANT. Continuous Presence Multipoint (CPM) CPM is a 1-bit value that signals the use of the optional continuous presence multipoint and video multiplex mode; “0” = off, “1” = on. CPM immediately follows PQUANT if PLUSPTYPE is not present, and is immediately after PLUSPTYPE if PLUSPTYPE is present. Picture Sub-Bitstream Indicator (PSBI) PSBI is an optional 2-bit binary number that is only present if the optional continuous presence multipoint and video multiplex mode is indicated by CPM. H.263 487 Temporal Reference of B Frames in PB Frames (TRB) TRB is present if PTYPE or PLUSTYPE indicate a PB frame or improved PB frame. TRB is a 3-bit or 5-bit binary number of the [number + 1] of nontransmitted pictures (at 29.97 Hz or the custom picture clock frequency indicated in CPCFC) since the last I or P frame or the Ppart of a PB frame or improved PB frame and before the B-part of the PB frame or improved PB frame. The value of TRB is extended to 5 bits when a custom picture clock frequency is in use. The maximum number of non-transmitted pictures is six for 29.97 Hz, or thirty when a custom picture clock frequency is used. Quantizer Information for B Frames in PB Frames (DBQUANT) DBQUANT is present if PTYPE or PLUSTYPE indicate a PB frame or improved PB frame. DBQUANT is a 2-bit codeword indicating the relationship between QUANT and BQUANT as shown in Table 10.10. The division is done using truncation. BQUANT has a range of 1–31. If the result is less than 1 or greater than 31, BQUANT is clipped to 1 and 31, respectively. DBQUANT 00 01 10 11 BQUANT (5 * QUANT) / 4 (6 * QUANT) / 4 (7 * QUANT) / 4 (8 * QUANT) / 4 Table 10.10. Baseline H.263 DBQUANT Codes and QUANT/BQUANT Relationship. Extra Insertion Information (PEI) PEI is a bit which when set to “1” signals the presence of the PSUPP data field. Supplemental Enhancement Information (PSUPP) If PEI is set to “1,” then 9 bits follow consisting of 8 bits of data (PSUPP) and another PEI bit to indicate if a further 9 bits follow, and so on. End of Sequence (EOS) EOS is a 22-bit word with a value of 0000 0000 0000 0000 1 11111. EOS must be byte aligned by inserting 0–7 zero bits before the code so that the first bit of the EOS code is the first, and most significant, bit of a byte. Stuffing (PSTUF) PSTUF is a variable-length word of zero bits. The last bit of PSTUF must be the last, and least significant, bit of a byte. Group of Blocks (GOB) Layer As shown in Figure 10.9, each picture is divided into groups of blocks (GOBs). A GOB comprises 16 lines for the SQCIF, QCIF, and CIF resolutions, 32 lines for the 4CIF resolution, and 64 lines for the 16CIF resolution. Thus, a SQCIF picture contains six GOBs (96/ 16) each with one row of macroblock data. QCIF pictures have nine GOBs (144/16) each with one row of macroblock data. A CIF picture contains eighteen GOBs (288/16) each with one row of macroblock data. 4CIF pictures have eighteen GOBs (576/32) each with two rows of macroblock data. A 16CIF picture has eighteen GOBs (1152/64) each with four rows of macroblock data. GOB numbering starts with 0 at the top of picture, and increases going down vertically. 488 Chapter 10: H.261 and H.263 Data for each GOB consists of a GOB header followed by macroblock data, as shown in Figure 10.10. Macroblock data is transmitted in increasing macroblock number order. For GOB number 0 in each picture, no GOB header is transmitted. A decoder can signal an encoder to transmit only non-empty GOB headers. Group of Blocks Start Code (GBSC) GBSC is a 17-bit word with a value of 0000 0000 0000 0000 1. It must be byte-aligned; therefore, 0–7 zero bits are added before the start code to ensure the first bit of the start code is the first, and most significant, bit of a byte. Group Number (GN) GN is a 5-bit binary number indicating the number of the GOB. Group numbers 1–17 are used with the standard picture formats. Group numbers 1–24 are used with custom picture formats. Group numbers 16–29 are emulated in the slice header. Group number 30 is used in the end of sub-bitstream indicators (EOSBS) code and group number 31 is used in the end of sequence (EOS) code. GOB Sub-Bitstream Indicator (GSBI) GSBI is a 2-bit binary number represent- ing the sub-bitstream number until the next picture or GOB start code. GSBI is present only if continuous presence multipoint and video multiplex (CPM) mode is enabled. GOB Frame ID (GFID) GFID is a 2-bit value indicating the frame ID. It must have the same value in every GOB (or slice) header of a given frame. In general, if PTYPE is the same as for the previous picture header, the GFID value must be the same as the previous frame. If PTYPE has changed from the previous picture header, GFID must have a different value from the previous frame. Quantizer Information (GQUANT) GQUANT is a 5-bit binary number that indicates the quantizer to be used in the group of blocks until overridden by any subsequent GQUANT or DQUANT. The codewords are the binary representations of the values 1–31. Macroblock (MB) Layer Each GOB is divided into macroblocks, as shown in Figure 10.9. A macroblock relates to 16 samples × 16 lines of Y and the corresponding 8 samples × 8 lines of Cb and Cr. Macroblock numbering increases left-to-right and top-to-bottom. Macroblock data is transmitted in increasing macroblock numbering order. Data for a macroblock consists of an MB header followed by block data (Figure 10.10). Coded Macroblock Indication (COD) COD is a single bit that indicates whether or not the block is coded. “0” indicates coded; “1” indicates not coded, and the rest of the macroblock layer is empty. COD is present only in pictures that are not intra. If not coded, the decoder processes the macroblock as an inter-block with motion vectors equal to zero for the whole block and no coefficient data. Macroblock Type and Coded Block Pattern for Chrominance (MCBPC) MCBPC is a variable-length codeword indicating the macroblock type and the coded block pattern for Cb and Cr. Codewords for MCBPC are listed in Tables 10.11 and 10.12. A codeword is available for bit stuffing, and should be discarded by decoders. In some cases, bit stuffing must not occur before the first macroblock of the pic- H.263 489 MB Type CBPC (Cb, Cr) 3 0, 0 3 0, 1 3 1, 0 3 1, 1 4 0, 0 4 0, 1 4 1, 0 4 1, 1 stuf fing 1 001 010 011 0001 0000 0000 0000 0000 Code 01 10 11 0000 1 Table 10.11. Baseline H.263 Variable-Length Code Table for MCBPC for I Frames. ture to avoid start code emulation. The macroblock types (MB type) are listed in Tables 10.13 and 10.14. The coded block pattern for chrominance (CBPC) signifies when a non-intra-DC transform coefficient is transmitted for Cb or Cr. A “1” indicates a non-intra-DC coefficient is present in that block. Macroblock Mode for B Blocks (MODB) MODB is present for macroblock types 0– 4 if PTYPE indicates PB frame. It is a variablelength codeword indicating whether B coefficients and/or motion vectors are transmitted for this macroblock. Table 10.15 lists the codewords for MODB. MODB is coded differently for improved PB frames. Coded Block Pattern for B Blocks (CBPB) The 6-bit CBPB is present if indicated by MODB. It indicates which blocks in the macroblock have at least one transform coefficient transmitted. The pattern number is represented as: where Pn = “1” for any coefficient present for block [n], else Pn = “0.” Block numbering (decimal format) is given in Figure 10.9. Coded Block Pattern for Luminance (CBPY) CBPY is a variable-length codeword speci- fying the Y blocks in the macroblock for which at least one non-intra-DC transform coefficient is transmitted. However, in the advanced intracoding mode, intra-DC is indicated in the same manner as the other coefficients. Table 10.16 lists the codes for CBPY. YN is a “1” if any non-intra-DC coefficient is present for that Y block. Y block numbering (decimal format) is as shown in Figure 10.9. Quantizer Information (DQUANT) DQUANT is a 2-bit codeword signifying the change in QUANT. Table 10.17 lists the differential values for the codewords. QUANT has a range of 1–31. If the value of QUANT as a result of the indicated change is less than 1 or greater than 31, it is made 1 and 31, respectively. P0P1P2P3P4P5 490 Chapter 10: H.261 and H.263 MB Type CBPC (Cb, Cr) Code 0 0, 0 1 0 0, 1 0011 0 1, 0 0010 0 1, 1 0001 01 1 0, 0 011 1 0, 1 0000 111 1 1, 0 0000 110 1 1, 1 0000 0010 1 2 0, 0 010 2 0, 1 0000 101 2 1, 0 0000 100 2 1, 1 0000 0101 3 0, 0 0001 1 3 0, 1 0000 0100 3 1, 0 0000 0011 3 1, 1 0000 011 4 0, 0 0001 00 4 0, 1 0000 0010 0 4 1, 0 0000 0001 1 4 1, 1 0000 0001 0 stuf fing 0000 0000 1 5 0, 0 0000 0000 010 5 0, 1 0000 0000 0110 0 5 1, 0 0000 0000 0111 0 5 1, 1 0000 0000 0111 1 Table 10.12. Baseline H.263 Variable-Length Code Table for MCBPC for P Frames. H.263 491 Frame Type inter inter inter inter inter inter inter inter intra intra intra MB Type Name COD not coded – × 0 inter × 1 inter + q × 2 inter4v × 3 intra × 4 intra + q × 5 inter4v + q × stuf fing – × 3 intra 4 intra + q stuf fing – MCBPC CBPY DQUANT MVD × × × × × × × × × × × × × × × × × × × × × × × × × × MVD2–4 × × Table 10.13. Baseline H.263 Macroblock Types and Included Data for Normal Frames. Frame Type inter inter inter inter inter inter inter inter MB Type not coded 0 1 2 3 4 5 stuf fing Name – inter inter + q inter4v intra intra + q inter4v + q – COD × × × × × × × × MCBPC MODB CBPY × × × × × × × × × × × × × × × × × × × Table 10.14a. Baseline H.263 Macroblock Types and Included Data for PB Frames. 492 Chapter 10: H.261 and H.263 Frame Type inter inter inter inter inter inter inter inter MB Type not coded 0 1 2 3 4 5 stuf fing Name – inter inter + q inter4v intra intra + q inter4v + q – CBPB × × × × × × DQUANT × × × MVD × × × × × × MVDB × × × × × × MVD2–4 × × Table 10.14b. Baseline H.263 Macroblock Types and Included Data for PB Frames. CBPB × MVDB × × Code 0 10 11 Table 10.15. Baseline H.263 Variable-Length Code Table for MODB. H.263 493 CBPY (Y0, Y1, Y2, Y3) Intra Inter 0, 0, 0, 0 0, 0, 0, 1 0, 0, 1, 0 0, 0, 1, 1 0, 1, 0, 0 0, 1, 0, 1 0, 1, 1, 0 0, 1, 1, 1 1, 0, 0, 0 1, 0, 0, 1 1, 0, 1, 0 1, 0, 1, 1 1, 1, 0, 0 1, 1, 0, 1 1, 1, 1, 0 1, 1, 1, 1 1, 1, 1, 1 1, 1, 1, 0 1, 1, 0, 1 1, 1, 0, 0 1, 0, 1, 1 1, 0, 1, 0 1, 0, 0, 1 1, 0, 0, 0 0, 1, 1, 1 0, 1, 1, 0 0, 1, 0, 1 0, 1, 0, 0 0, 0, 1, 1 0, 0, 1, 0 0, 0, 0, 1 0, 0, 0, 0 Code 0011 0010 1 0010 0 1001 0001 1 0111 0000 10 1011 0001 0 0000 11 0101 1010 0100 1000 0110 11 Table 10.16. Baseline H.263 Variable-Length Code Table for CBPY. Differential Value of QUANT –1 –2 1 2 DQUANT 00 01 10 11 Table 10.17. Baseline H.263 DQUANT Codes for QUANT Differential Values. 494 Chapter 10: H.261 and H.263 Motion Vector Data (MVD) Motion vector data is included for all inter- macroblocks and intra-blocks when in PB frame mode. Motion vector data consists of a variablelength codeword for the horizontal component, followed by a variable-length codeword for the vertical component. The variable-length codes are listed in Table 10.18. For the unrestricted motion vector mode, other motion vector coding may be used. Motion Vector Data (MVD2–4) The three codewords MVD2, MVD3, and MVD4 are present if indicated by PTYPE and MCBPC during the advanced prediction or deblocking filter modes. Each consists of a variable-length codeword for the horizontal component followed by a variable-length codeword for the vertical component. The variable-length codes are listed in Table 10.18. In PB frames mode, a macroblock is made up of four Y blocks, a Cb block, a Cr block, and data for six B blocks. The quantized DCT coefficients are transmitted in the order shown in Figure 7.59. In the modified quantization mode, quantized DCT coefficients are transmitted in the order shown in Figure 7.60. DC Coefficient for Intra-Blocks (Intra-DC) Intra-DC is an 8-bit codeword. The values and their corresponding reconstruction levels are listed in Table 10.19. If not in PB frames mode, the intra-DC coefficient is present for every block of the macroblock if MCBPC indicates macroblock type 3 or 4. In PB frames mode, the intra-DC coefficient is present for every P block if MCBPC indicates macroblock type 3 or 4 (the intra-DC coefficient is not present for B blocks). Motion Vector Data for B Macroblock (MVDB) MVDB is present if indicated by MODB during the PB frame and improved PB frame modes. It consists of a variable-length codeword for the horizontal component followed by a variable-length codeword for the vertical component of each vector. The variable-length codes are listed in Table 10.18. Block Layer If not in PB frames mode, a macroblock is made up of four Y blocks, a Cb block, and a Cr block (see Figure 10.9). Data for an 8 sample × 8 line block consists of codewords for the intraDC coefficient and transform coefficients as shown in Figure 10.10. The order of block transmission is shown in Figure 10.9. Transform Coefficient (TCOEF) If not in PB frames mode, TCOEF is present if indicated by MCBPC or CBPY. In PB frames mode, TCOEF is present for B blocks if indicated by CBPB. An event is a combination of a last non-zero coefficient indication (LAST = “0” if there are more non-zero coefficients in the block; LAST = “1” if there are no more non-zero coefficients in the block), the number of successive zeros preceding the coefficient (RUN), and the nonzero coefficient (LEVEL). The most common events are coded using a variable-length code, shown in Table 10.20. The “s” bit indicates the sign of the level; “0” for positive, and “1” for negative. H.263 495 Vector Difference Code –16 16 0000 0000 0010 1 –15.5 16.5 0000 0000 0011 1 –15 17 0000 0000 0101 –14.5 17.5 0000 0000 0111 –14 18 0000 0000 1001 –13.5 18.5 0000 0000 1011 –13 19 0000 0000 1101 –12.5 19.5 0000 0000 1111 –12 20 0000 0001 001 –11.5 20.5 0000 0001 011 –11 21 0000 0001 101 –10.5 21.5 0000 0001 111 –10 22 0000 0010 001 –9.5 22.5 0000 0010 011 –9 23 0000 0010 101 –8.5 23.5 0000 0010 111 –8 24 0000 0011 001 –7.5 24.5 0000 0011 011 –7 25 0000 0011 101 –6.5 25.5 0000 0011 111 –6 26 0000 0100 001 –5.5 26.5 0000 0100 011 –5 27 0000 0100 11 –4.5 27.5 0000 0101 01 –4 28 0000 0101 11 –3.5 28.5 0000 0111 –3 29 0000 1001 –2.5 29.5 0000 1011 –2 30 0000 111 –1.5 30.5 0001 1 –1 31 0011 –0.5 31.5 011 0 1 Table 10.18a. Baseline H.263 Variable-Length Code Table for MVD, MVD2–4, and MVDB. 496 Chapter 10: H.261 and H.263 Vector Difference Code 0.5 –31.5 010 1 –31 0010 1.5 –30.5 0001 0 2 –30 0000 110 2.5 –29.5 0000 1010 3 –29 0000 1000 3.5 –28.5 0000 0110 4 –28 0000 0101 10 4.5 –27.5 0000 0101 00 5 –27 0000 0100 10 5.5 –26.5 0000 0100 010 6 –26 0000 0100 000 6.5 –25.5 0000 0011 110 7 –25 0000 0011 100 7.5 –24.5 0000 0011 010 8 –24 0000 0011 000 8.5 –23.5 0000 0010 110 9 –23 0000 0010 100 9.5 –22.5 0000 0010 010 10 –22 0000 0010 000 10.5 –21.5 0000 0001 110 11 –21 0000 0001 100 11.5 –20.5 0000 0001 010 12 –20 0000 0001 000 12.5 –19.5 0000 0000 1110 13 –19 0000 0000 1100 13.5 –18.5 0000 0000 1010 14 –18 0000 0000 1000 14.5 –17.5 0000 0000 0110 15 –17 0000 0000 0100 15.5 –16.5 0000 0000 0011 0 Table 10.18b. Baseline H.263 Variable-Length Code Table for MVD, MVD2–4, and MVDB. H.263 497 Intra DC Value 0000 0000 0000 0001 0000 0010 0000 0011 : 0111 1111 1111 1111 1000 0001 : 1111 1101 1111 1110 Reconstruction Level not used 8 16 24 : 1016 1024 1032 : 2024 2032 Table 10.19. Baseline H.263 Reconstruction Levels for Intra DC. Other combinations of (LAST, RUN, LEVEL) are encoded using a 22-bit word: 7 bits of escape (ESC), 1 bit of LAST, 6 bits of RUN, and 8 bits of LEVEL. The codes for RUN and LEVEL are shown in Table 10.21. Code 1000 0000 is forbidden unless in the modified quantization mode. All coefficients, except for intra-DC, have reconstruction levels (REC) in the range –2048 to 2047. Reconstruction levels are recovered by the following equations, and the results are clipped. if LEVEL = 0, REC = 0 if QUANT = odd: |REC| = QUANT × (2 × |LEVEL| + 1) if QUANT = even: |REC| = QUANT × (2 × |LEVEL| + 1) – 1 After calculation of |REC|, the sign is added to obtain REC. Sign(LEVEL) is specified by the “s” bit in the TCOEF code in Table 10.20. REC = sign(LEVEL) × |REC| For intra-DC blocks, the reconstruction level is: REC = 8 × LEVEL 498 Chapter 10: H.261 and H.263 Last Run |Level| Code 0 0 1 10s 0 0 2 1111 s 0 0 3 0101 01s 0 0 4 0010 111s 0 0 5 0001 1111 s 0 0 6 0001 0010 1s 0 0 7 0001 0010 0s 0 0 8 0000 1000 01s 0 0 9 0000 1000 00s 0 0 10 0000 0000 111s 0 0 11 0000 0000 110s 0 0 12 0000 0100 000s 0 1 1 110s 0 1 2 0101 00s 0 1 3 0001 1110 s 0 1 4 0000 0011 11s 0 1 5 0000 0100 001s 0 1 6 0000 0101 0000 s 0 2 1 1110 s 0 2 2 0001 1101 s 0 2 3 0000 0011 10s 0 2 4 0000 0101 0001 s 0 3 1 0110 1s 0 3 2 0001 0001 1s 0 3 3 0000 0011 01s 0 4 1 0110 0s 0 4 2 0001 0001 0s 0 4 3 0000 0101 0010 s 0 5 1 0101 1s 0 5 2 0000 0011 00s 0 5 3 0000 0101 0011 s 0 6 1 0100 11s 0 6 2 0000 0010 11s 0 6 3 0000 0101 0100 s 0 7 1 0100 10s Table 10.20a. Baseline H.263 Variable-Length Code Table for TCOEF. H.263 499 Last Run |Level| Code 0 7 2 0000 0010 10s 0 8 1 0100 01s 0 8 2 0000 0010 01s 0 9 1 0100 00s 0 9 2 0000 0010 00s 0 10 1 0010 110s 0 10 2 0000 0101 0101 s 0 11 1 0010 101s 0 12 1 0010 100s 0 13 1 0001 1100 s 0 14 1 0001 1011 s 0 15 1 0001 0000 1s 0 16 1 0001 0000 0s 0 17 1 0000 1111 1s 0 18 1 0000 1111 0s 0 19 1 0000 1110 1s 0 20 1 0000 1110 0s 0 21 1 0000 1101 1s 0 22 1 0000 1101 0s 0 23 1 0000 0100 010s 0 24 1 0000 0100 011s 0 25 1 0000 0101 0110 s 0 26 1 0000 0101 0111 s 1 0 1 0111 s 1 0 2 0000 1100 1s 1 0 3 0000 0000 101s 1 1 1 0011 11s 1 1 2 0000 0000 100s 1 2 1 0011 10s 1 3 1 0011 01s 1 4 1 0011 00s 1 5 1 0010 011s 1 6 1 0010 010s 1 7 1 0010 001s Table 10.20b. Baseline H.263 Variable-Length Code Table for TCOEF. 500 Chapter 10: H.261 and H.263 Last Run |Level| Code 1 8 1 0010 000s 1 9 1 0001 1010 s 1 10 1 0001 1001 s 1 11 1 0001 1000 s 1 12 1 0001 0111 s 1 13 1 0001 0110 s 1 14 1 0001 0101 s 1 15 1 0001 0100 s 1 16 1 0001 0011 s 1 17 1 0000 1100 0s 1 18 1 0000 1011 1s 1 19 1 0000 1011 0s 1 20 1 0000 1010 1s 1 21 1 0000 1010 0s 1 22 1 0000 1001 1s 1 23 1 0000 1001 0s 1 24 1 0000 1000 1s 1 25 1 0000 0001 11s 1 26 1 0000 0001 10s 1 27 1 0000 0001 01s 1 28 1 0000 0001 00s 1 29 1 0000 0100 100s 1 30 1 0000 0100 101s 1 31 1 0000 0100 110s 1 32 1 0000 0100 111s 1 33 1 0000 0101 1000 s 1 34 1 0000 0101 1001 s 1 35 1 0000 0101 1010 s 1 36 1 0000 0101 1011 s 1 37 1 0000 0101 1100 s 1 38 1 0000 0101 1101 s 1 39 1 0000 0101 1110 s 1 40 1 0000 0101 1111 s ESC 0000 011 Table 10.20c. Baseline H.263 Variable-Length Code Table for TCOEF. H.263 501 Run Code Level Code 0 0000 00 –128 forbidden 1 0000 01 –127 1000 0001 : : : : 63 1111 11 –2 1111 1110 –1 1111 1111 0 forbidden 1 0000 0001 2 0000 0010 : : 127 0111 1111 Table 10.21. Baseline H.263 Run, Level Codes. PLUSPTYPE Picture Layer Option PLUSTYPE is present when indicated by bits 6–8 of PTYPE, and is used to enable the H.263 version 2 options. When present, the PLUSTYPE and related fields immediately follow PTYPE, preceding PQUANT. If PLUSPTYPE is present, then CPM immediately follows PLUSPTYPE. If PLUSPTYPE is not present, then CPM immediately follows PQUANT. PSBI always immediately follows CPM (if CPM = “1”). PLUSTYPE is a 12- or 30-bit codeword, comprised of up to three subfields: UFEP, OPPTYPE, and MPPTYPE. The PLUSTYPE and related fields are illustrated in Figure 10.11. Update Full Extended PTYPE (UFEP) UFEP is a 3-bit codeword present if “extended PTYPE” is indicated by PTYPE. A value of “000” indicates that only MPP- TYPE is included in the picture header. A value “001” indicates that both OPP- TYPE and MPPTYPE are included in the picture header. If the picture type is intra or EI, this field must be set to “001.” In addition, if PLUSPTYPE is present in each of a continuing sequence of pictures, this field shall be set to “001” every 5 seconds or every five frames, whichever is larger. UFEP should be set to “001” more often in errorprone environments. Values other than “000” and “001” are reser ved. 502 Chapter 10: H.261 and H.263 Optional Part of PLUSPTYPE (OPPTYPE) This field contains features that are not likely to be changed from one frame to another. If UFEP is “001,” the following bits are present in OPPTYPE: Bit 1–3 Source format “000” = reserved “001” = SQCIF “010” = QCIF “011” = CIF “100” = 4CIF “101” = 16CIF “110” = custom source format “111” = reserved Bit 4 Custom picture clock frequency “0” = standard, “1” = custom Bit 5 Unrestricted motion vector (UMV) mode “0” = off, “1” = on Bit 6 Syntax-based arithmetic coding (SAC) mode “0” = off, “1” = on Bit 7 Advanced prediction (AP) mode “0” = off, “1” = on Bit 8 Advanced intra-coding (AIC) mode “0” = off, “1” = on Bit 9 Bit 10 Bit 11 Bit 12 Bit 13 Bit 14 Bit 15 Bit 16 Bit 17 Bit 18 Deblocking filter (DF) mode “0” = off, “1” = on Slice-structured (SS) mode mode “0” = off, “1” = on Reference picture selection (RPS) mode “0” = off, “1” = on Independent segment decoding (ISD) mode “0” = off, “1” = on Alternative Inter-VLC (AIV) mode “0” = off, “1” = on Modified quantization (MQ) mode “0” = off, “1” = on “1” “0” “0” “0” ... PLUSTYPE CPM PSBI CPFMT EPAR CPCFC ETR UUI SSS ELNUM RLNUM RPSMF TRPI TRP BCI BCM RPRP ... Figure 10.11. H.263 PLUSPTYPE and Related Fields. H.263 503 Mandatory Part of PLUSPTYPE (MPPTYPE) Regardless of the value of UFEP, the fol- lowing 9 bits are also present in MPPTYPE: Bit 1–3 Picture code type “000” = I frame (intra) “001” = P frame (inter) “010” = Improved PB frame “011” = B frame “100” = EI frame “101” = EP frame “110” = reserved “111” = reserved Bit 4 Reference picture resampling (RPR) mode “0” = off, “1” = on Bit 5 Reduced resolution update (RRU) mode “0” = off, “1” = on Bit 6 Rounding type (RTYPE) mode “0” = off, “1” = on Bit 7 “0” Bit 8 “0” Bit 9 “1” Custom Picture Format (CPFMT) CPFMT is a 23-bit value that is present if the use of a custom picture format is specified by PLUSPTYPE and UFEP is “001.” Bit 5–13 Bit 14 Bit 15–23 Picture width indication (PWI) number of samples per line = (PWI + 1) × 4 “1” Picture height indication (PHI) number of lines per frame = (PHI + 1) ×4 Extended Pixel Aspect Ratio (EPAR) EPAR is a 16-bit value present if CPFMT is present and “extended PAR” is indicated by CPFMT. Bit 1–8 Bit 9–16 PAR width PAR height Custom Picture Clock Frequency Code (CPCFC) CPCFC is an 8-bit value present only if PLUSPTYPE is present, UFEP is “001,” and PLUSPTYPE indicates a custom picture clock frequency. The custom picture clock frequency (in Hz) is: 1,800,000 / (clock divisor × clock conversion factor) Bit 1 Bit 2–8 Clock conversion factor code “0” = 1000, “1” = 1001 Clock divisor Bit 1–4 Pixel aspect ratio code “0000” = reserved “0001” = 1:1 “0010” = 12:11 “0011” = 10:11 “0100” = 16:11 “0101” = 40:33 “0110” – “1110” = reserved “1111” = extended PAR Extended Temporal Reference (ETR) ETR is a 2-bit value present if a custom pic- ture clock frequency is in use. It is the two MSBs of the 10-bit TR value. 504 Chapter 10: H.261 and H.263 Unlimited Unrestricted Motion Vectors Indicator (UUI) UUI is a 1- or 2-bit variable-length value indicating the effective range limit of motion vectors. It is present if the optional unrestricted motion vector mode is indicated in PLUSPTYPE and UFEP is “001.” A value of “1” indicates the motion vector range is limited according to Tables 10.22 and 10.23. A value of “01” indicates the motion vector range is not limited except by the picture size. Picture Width 4–352 356–704 708–1408 1412–2048 Horizontal Motion Vector Range –32, +31.5 –64, +63.5 –128, +127.5 –256, +255.5 Table 10.22. Optional Horizontal Motion Range. Picture Height 4–288 292–576 580–1152 Vertical Motion Vector Range –32, +31.5 –64, +63.5 –128, +127.5 Table 10.23. Optional Vertical Motion Range. Slice Structured Submode Bits (SSS) SSS is a 2-bit value present only if the optional slice structured mode is indicated in PLUSPTYPE and UFEP is “001.” If the slice structured mode is in use but UFEP is not “001,” the last SSS value remains in effect. Bit 1 Bit 2 Rectangular slices “0” = no, “1” = yes Arbitrary slice ordering “0” = sequential, “1” = arbitrary Enhancement Layer Number (ELNUM) ELNUM is a 4-bit value present only dur- ing the temporal, SNR, and spatial scalability mode. It identifies a specific enhancement layer. The first enhancement layer above the base layer is designated as enhancement layer number 2, and the base layer is number 1. Reference Layer Number (RLNUM) RLNUM is a 4-bit value present only dur- ing the temporal, SNR, and spatial scalability mode UFEP is “001.” The layer number for the frames used as reference anchors is identified by the RLNUM. Reference Picture Selection Mode Flags (RPSMF) RPSMF is a 3-bit codeword present only during the reference picture selection mode and UFEP is “001.” When present, it indicates which back-channel messages are needed by the encoder. If the reference picture selection mode is in use but RPSMF is not present, the last value of RPSMF that was sent remains in ef fect. “000” – “011” = reserved “100” = neither ACK nor NACK needed “101” = need ACK “110” = need NACK “111” = need both ACK and NACK H.263 505 Temporal Reference for Prediction Indication (TRPI) TRPI is a 1-bit value present only during the reference picture selection mode. When present, it indicates the presence of the following TRP field. “0” = TRP field not present; “1” = TRP field present. TRPI is “0” whenever the picture header indicates an I frame or EI frame. Temporal Reference for Prediction (TRP) TRP is a 10-bit value indicating the tempo- ral reference used for encoding prediction, except in the case of B frames. For B frames, the frame having the temporal reference specified by TRP is used for the prediction in the forward direction. If the custom picture clock frequency is not being used, the two MSBs of TRP are zero and the LSBs contain the 8-bit TR value in the picture header of the reference picture. If a custom picture clock frequency is being used, TRP is a 10-bit number consisting of the concatenation of ETR and TR from the reference picture header. If TRP is not present, the previous anchor picture is used for prediction, as when not in the reference picture selection mode. TRP is valid until the next PSC, GSC, or SSC. Back-Channel Message Indication (BCI) BCI is a 1- or 2-bit variable-length code- word present only during the optional reference picture selection mode. “1” indicates the presence of the optional back-channel message (BCM) field. “01” indicates the absence or the end of the back-channel message field. BCM and BCI may be repeated when present. Back-Channel Message (BCM) The variable-length back-channel mes- sage is present if the preceding BCI field is set to “1.” Reference Picture Resampling Parameters (RPRP) A variable-length field present only during the optional reference picture resampling mode. This field carries the parameters of the reference picture resampling mode. Optional H.263 Modes Unrestricted Motion Vector Mode In this optional mode, motion vectors are allowed to point outside the picture. The edge samples are used as prediction for the “nonexisting” samples. The edge sample is found by limiting the motion vector to the last full sample position within the picture area. Motion vector limiting is done separately for the horizontal and vertical components. Additionally, this mode includes an extension of the motion vector range so that larger motion vectors can be used (Tables 10.22 and 10.23). These longer motion vectors improve the coding efficiency for the larger picture formats, such 4CIF or 16CIF. A significant gain is also achieved for the other picture formats if there is movement along the picture edges, camera movement, or background movement. When this mode is employed within H.263 version 2, new reversible variable-length codes (RVLCs) are used for encoding the motion vectors, as shown in Table 10.24. These codes are single-valued, as opposed to the baseline double-valued VLCs. The double-valued codes were not popular due to limitations in their extensibility and their high cost of implementation. The RVLCs are also easier to implement. Each row in Table 10.24 represents a motion vector difference in half-pixel units. “…x1x0” denotes all bits following the leading “1” in the binary representation of the absolute value of the motion vector difference. The “s” bit denotes the sign of the motion vector differ- 506 Chapter 10: H.261 and H.263 Absolute Value of Motion Vector Difference in Half-Pixel Units 0 1 “x0” + 2 (2–3) “x1x0” + 4 (4–7) “x2x1x0” + 8 (8–15) “x3x2x1x0” + 16 (16–31) “x4x3x2x1x0” + 32 (32–63) “x5x4x3x2x1x0” + 64 (64–127) “x6x5x4x3x2x1x0” + 128 (128–255) “x7x6x5x4x3x2x1x0” + 256 (256–511) “x8x7x6x5x4x3x2x1x0” + 512 (512–1023) “x9x8x7x6x5x4x3x2x1x0” + 1024 (1024–2047) “x10x9x8x7x6x5x4x3x2x1x0” + 2048 (2048–4095) Code 1 0s0 0x01s0 0x11x01s0 0x21x11x01s0 0x31x21x11x01s0 0x41x31x21x11x01s0 0x51x41x31x21x11x01s0 0x61x51x41x31x21x11x01s0 0x71x61x51x41x31x21x11x01s0 0x81x71x61x51x41x31x21x11x01s0 0x91x81x71x61x51x41x31x21x11x01s0 0x101x91x81x71x61x51x41x31x21x11x01s0 Table 10.24. H.263 Reversible Variable-Length Codes for Motion Vectors. ence: “0” for positive and “1” for negative. The binary representation of the motion vector difference is interleaved with bits that indicate if the code continues or ends. The “0” in the last position indicates the end of the code. RVLCs can also be used to increase resilience to channel errors. Decoding can be performed by processing the motion vectors in the forward and reverse directions. If an error is detected while decoding in one direction, the decoder can proceed in the reverse direction, improving the error resilience of the bitstream. In addition, the motion vector range is extended up to [–256, +255.5], depending on the picture size. Syntax-Based Arithmetic Coding Mode In this optional mode, the variable-length coding is replaced with arithmetic coding. The SNR and reconstructed pictures will be the same, but the bit-rate can be reduced by about 5% since the requirement of a fixed number of bits for information is removed. The syntax of the picture, group of blocks, and macroblock layers remains exactly the same. The syntax of the block layer changes slightly in that any number of TCOEF entries may be present. It is worth noting that use of this mode is not widespread. H.263 507 Advanced Prediction Mode In this optional mode, four motion vectors per macroblock (one for each Y block) are used instead of one. In addition, overlapped block motion compensation (OBMC) is used for the Y blocks of P frames. If one motion vector is used for a macroblock, it is defined as four motion vectors with the same value. If four motion vectors are used for a macroblock, the first motion vector is the MVD codeword and applies to Y1 in Figure 10.9. The second motion vector is the MVD2 codeword that applies to Y2, the third motion vector is the MVD3 codeword that applies to Y3, and the fourth motion vector is the MVD4 codeword that applies to Y4. The motion vector for Cb and Cr of the macroblock is derived from the four Y motion vectors. The encoder has to decide which type of vector to use. Four motion vectors use more bits, but provide improved prediction. This mode improves inter-picture prediction and yields a significant improvement in picture quality for the same bit-rate by reducing blocking artifacts. PB Frames Mode Like MPEG, H.263 optionally supports PB frames. A PB frame consists of one P frame (predicted from the previous P frame) and one B frame (bi-directionally predicted from the previous and current P frame), as shown in Figure 10.12. With this coding option, the picture rate can be increased without substantially increasing the bit-rate. However, an improved PB frames mode is supported in Annex M. This original PB frames mode is retained only for purposes of compatibility with systems made prior to the adoption of Annex M. Continuous Presence Multipoint and Video Multiplex Mode In this optional mode, up to four independent H.263 bitstreams can be multiplexed into a single bitstream. The sub-bitstream with the lowest identifier number (sent via the SBI field) is considered to have the highest priority unless a different priority convention is established by external means. This feature is designed for use in continuous presence multipoint application or other situations in which separate logical channels are not available, but the use of multiple video bitstreams is desired. It is not to be used with H.324. For ward Error Correction Mode This optional mode provides forward error correction (code and framing) for transmission of H.263 video data. It is not to be used with H.324. Both the framing and the forward error correction code are the same as in H.261. Advanced Intra-Coding Mode This optional mode improves compression for intra-macroblocks. It uses intra-block prediction from neighboring intra-blocks, a modified inverse quantization of intra-DCT coefficients, and a separate VLC table for intracoefficients. This mode significantly improves the compression performance over the intracoding of baseline H.263. An additional 1- or 2-bit variable-length codeword, INTRA_MODE, is added to the macroblock layer immediately following the MCBPC field to indicate the prediction mode: “0” = DC only “10” = Vertical DC and AC “11” = Horizontal DC and AC 508 Chapter 10: H.261 and H.263 PB FRAME BI-DIRECTIONAL PREDICTION FORWARD PREDICTION BI-DIRECTIONAL (B) FRAME PREDICTED (P) FRAME Figure 10.12. Baseline H.263 PB Frames. For intra-coded blocks, if the prediction mode is DC only, the zig-zag scan order in Figure 7.59 is used. If the prediction mode is vertical DC and AC, the alternate-vertical scanning order in Figure 7.60 is used. If the prediction mode is horizontal DC and AC, the alternatehorizontal scanning order in Figure 7.61 is used. For non-intra-blocks, the zig-zag scan order in Figure 7.59 is used. Deblocking Filter Mode This optional mode introduces a deblock- ing filter inside the coding loop. The filter is applied to the edge boundaries of 8 × 8 blocks to reduce blocking artifacts. The filter coefficients depend on the macroblock’s quantizer step size, with larger coefficients used for a coarser quantizer. This mode also allows the use of four motion vectors per macroblock, as specified in the advanced prediction mode, and also allows motion vectors to point outside the picture, as in the unrestricted motion vector mode. The computationally expensive overlapping motion compensation operation of the advanced pre- diction mode is not used so as to keep the complexity of this mode minimal. The result is better prediction and a reduction in blocking artifacts. Slice Structured Mode In this optional mode, a slice layer is sub- stituted for the GOB layer. This mode provides error resilience, makes the bitstream easier to use with a packet transport delivery scheme, and minimizes video delay. The slice layer consists of a slice header followed by consecutive complete macroblocks. Two additional modes can be signaled to reflect the order of transmission (sequential or arbitrary) and the shape of the slices (rectangular or not). These add flexibility to the slice structure so that it can be designed for different applications. Supplemental Enhancement Information With this optional mode, additional supple- mental information may be included in the bitstream to signal enhanced display capability. Typical enhancement information can signal full- or partial-picture freezes, picture H.263 509 freeze releases, or chroma keying for video compositing. The supplemental information may be present in the bitstream even though the decoder may not be capable of using it. The decoder simply discards the supplemental information, unless a requirement to support the capability has been negotiated by external means. Improved PB Frames Mode This optional mode represents an improve- ment compared to the baseline H.263 PB frames option. This mode permits forward, backward, and bi-directional prediction for B frames in a PB frame. The operation of the MODB field changes are shown in Table 10.25. Bi-directional prediction methods are the same in both PB frame modes except that, in the improved PB frame mode, no delta vector is transmitted. In forward prediction, the B macroblock is predicted from the previous P macroblock, and a separate motion vector is then transmitted. In backwards prediction, the predicted macroblock is equal to the future P macroblock, and therefore no motion vector is transmitted. Improved PB frames are less susceptible to changes that may occur between frames, such as when there is a scene cut between the previous P frame and the PB frame. Reference Picture Selection Mode In baseline H.263, a frame may be pre- dicted from the previous frame. If a portion of the reference frame is lost due to errors or packet loss, the quality of future frames is degraded. Using this optional mode, it is possible to select which reference frame to use for prediction, minimizing error propagation. Four back-channel messaging signals (NEITHER, ACK, NACK, and ACK+NACK) are used by the encoder and decoder to specify which picture segment will be used for prediction. For example, a NACK sent to the encoder from the decoder indicates that a given frame has been degraded by errors. Thus, the encoder may choose not to use this frame for future prediction, and instead use a different, unaffected, reference frame. This reduces error propagation, maintaining improved picture quality in error-prone environments. Temporal, SNR, and Spatial Scalability Mode In this optional mode, there is support for temporal, SNR, and spatial scalability. Scalability allows for the decoding of a sequence at more than one quality level. This is done by using a hierarchy of pictures and enhancement pictures partitioned into one or more layers. The lowest layer is called the base layer. The base layer is a separately decodable bitstream. The enhancement layers can be decoded in conjunction with the base layer to increase the picture rate, increase the picture quality, or increase the picture size. Temporal scalability is achieved using bidirectionally predicted pictures, or B frames. They allow prediction from either or both a previous and subsequent picture in the base layer. This results in improved compression as compared to that of P frames. These B frames differ from the B-picture part of a PB frame or improved PB frame in that they are separate entities in the bitstream. SNR scalability refers to enhancement information that increases the picture quality without increasing resolution. Since compression introduces artifacts, the difference between a decoded picture and the original is the coding error. Normally, the coding error is 510 Chapter 10: H.261 and H.263 CBPB × × × MVDB × × Code 0 10 110 1110 11110 111111 Coding Mode bi-directional prediction bi-directional prediction forward prediction forward prediction backward prediction backward prediction Table 10.25. H.263 Variable-Length Code Table for MODB for Improved PB Frame Mode. lost at the encoder and never recovered. With SNR scalability, the coding errors are sent to the decoder, enabling an enhancement to the decoded picture. The extra data serves to increase the signal-to-noise ratio (SNR) of the picture, hence the term “SNR scalability.” Spatial scalability is closely related to SNR scalability. The only difference is that before the picture in the reference layer is used to predict the picture in the spatial enhancement layer, it is interpolated by a factor of two either horizontally or vertically (1D spatial scalability), or both horizontally and vertically (2D spatial scalability). Other than the upsampling process, the processing and syntax for a spatial scalability picture is the same as for an SNR scalability picture. Since there is very little syntactical distinction between frames using SNR scalability and frames using spatial scalability, the frames used for either purpose are called EI frames and EP frames. The frame in the base layer which is used for upward prediction in an EI or EP frame may be an I frame, a P frame, the P-part of a PB frame, or the P-part of an improved PB frame (but not a B frame, the B-part of a PB frame, or the B-part of an improved PB frame). This mode can be useful for networks having varying bandwidth capacity. Reference Picture Resampling Mode In this optional mode, the reference frame is resampled to a different size prior to using it for prediction. This allows having a different source reference format than the frame being predicted. It can also be used for global motion estimation, or estimation of rotating motion, by warping the shape, size, and location of the reference frame. Reduced Resolution Update Mode An optional mode is provided which allows the encoder to send update information for a frame encoded at a lower resolution, while still maintaining a higher resolution for the reference frame, to create a final frame at the higher resolution. This mode is best used when encoding a highly active scene, allowing an encoder to increase the frame rate for moving parts of a scene, while maintaining a higher resolution in more static areas of the scene. H.263 511 The syntax is the same as baseline H.263, but interpretation of the semantics is different. The dimensions of the macroblocks are doubled, so the macroblock data size is one-quarter of what it would have been without this mode enabled. Therefore, motion vectors must be doubled in both dimensions. To produce the final picture, the macroblock is upsampled to the intended resolution. After upsampling, the full resolution frame is added to the motion-compensated frame to create the full resolution frame for future reference. Independent Segment Decoding Mode In this optional mode, picture segment boundaries are treated as picture boundaries—no data dependencies across segment boundaries are allowed. Use of this mode prevents the propagation of errors, providing error resilience and recovery. This mode is best used with slice layers, where, for example, the slices can be sized to match a specific packet size. Alternative Inter-VLC Mode The intra-VLC table used in the advanced intra-coding mode can also be used for interblock coding when this optional mode is enabled. Large quantized coefficients and small runs of zeros, typically present in intra-blocks, become more frequent in inter-blocks when small quantizer step sizes are used. When bit savings are obtained, and the use of the intra quantized DCT coefficient table can be detected at the decoder, the encoder will use the intra-table. The decoder will first try to decode the quantized coefficients using the inter-table. If this results in addressing coefficients beyond the 64 coefficients of the 8 × 8 block, the decoder will use the intra-table. Modified Quantization Mode This optional mode improves the bit-rate control for encoding, reduces CbCr quantization error, expands the range of DCT coefficients, and places certain restrictions on coefficient values. In baseline H.263, the quantizer value may be modified at the macroblock level. However, only a small adjustment (±1 or ±2) in the value of the most recent quantizer is permitted. The modified quantization mode allows the modification of the quantizer to any value. In baseline H.263, the Y and CbCr quantizers are the same. The modified quantization mode also increases CbCr picture quality by using a smaller quantizer step size for the Cb and Cr blocks relative to the Y blocks. In baseline H.263, when a quantizer smaller than eight is employed, quantized coefficients exceeding the range of [–127, +127] are clipped. The modified quantization mode also allows coefficients that are outside the range of [–127, +127] to be represented. Therefore, when a very fine quantizer step size is selected, an increase in Y quality is obtained. Enhanced Reference Picture Selection Mode An optional Enhanced Reference Picture Selection (ERPS) mode offers enhanced coding efficiency and error resilience. It manages a multi-picture buffer of stored pictures. Data-Partitioned Slice Mode An optional Data-Partitioned Slice (DPS) mode offers enhanced error resilience. It separates the header and motion vector data from the DCT coefficient data and protects the motion vector data by using a reversible representation. 512 Chapter 10: H.261 and H.263 Additional Supplemental Enhancement Information Specification An optional Additional Supplemental Enhancement Information Specification provides backward-compatible enhancements, such as: (a) Indication of using a specific fixed-point IDCT (b) Picture Messages, including message types: • Arbitrary binary data • Text (arbitrary, copyright, caption, video description or Uniform Resource Identifier) • Picture header repetition (current, previous, next with reliable temporal reference or next with unreliable temporal reference) • Interlaced field indications (top or bottom) • Spare reference picture identification Profiles Profiles specify the syntax (i.e., algorithms) for common application-specific configurations. Profile 1 Profile 1 (H.320 coding efficiency version 2 backward-compatibility profile) provides compatibility with H.242 and H.320. It is comprised of Profile 0 plus the following modes: • Advanced Intra-Coding • Deblocking Filter • Supplemental Enhancement Informa- tion: full-picture freeze • Modified Quantization Profile 2 Profile 2 (version 1 backward-compatibility profile) provides enhanced coding efficiency for the first version of H.263. It is comprised of Profile 0 plus the following modes: • Advanced Prediction Profile 3 Profile 3 (version 2 interactive and stream- ing wireless profile) provides enhanced coding efficiency performance and enhanced error resilience for wireless devices. It is comprised of Profile 0 plus the following modes: Profile 0 The Baseline Profile, or Profile 0, uses no optional modes of operation. • Advanced Intra-Coding • Deblocking Filter • Slice Structured • Modified Quantization H.263 513 Profile 4 Profile 4 (version 3 interactive and stream- ing wireless profile) provides enhanced coding efficiency performance and enhanced error resilience for wireless devices. It is comprised of Profiles 0 and 3 plus the following modes: Profile 7 Profile 7 (conversational interlace profile) provides enhanced coding efficiency performance for low-delay applications, plus support of interlaced video sources. It is comprised of Profiles 0 and 5 plus the following modes: • Data-Partitioned Slice • Supplemental Enhancement Information: previous picture header repetition Profile 5 Profile 5 (conversational high compression profile) provides enhanced coding efficiency without adding the delay associated with the use of B pictures and without adding error resilience features. It is comprised of Profiles 0, 1, and 2 plus the following modes: • Unrestricted Motion Vectors: UUI = “1” • Enhanced Reference Picture Selection Profile 6 Profile 6 (conversational Internet profile) provides enhanced coding efficiency performance without adding the delay associated with the use of B pictures, and adding some error resilience suitable for use on Internet Protocol (IP) networks. It is comprised of Profiles 0 and 5 plus the following modes: • Slice Structured with Arbitrary Slice Ordering (ASO) • Supplemental Enhancement Information: previous picture header repetition • Supplemental Enhancement Information: interlaced field indications for 240-line and 288-line pictures Profile 8 Profile 8 (high latency profile) provides enhanced coding efficiency performance for applications without critical delay constraints. It is comprised of Profiles 0 and 6 plus the following modes: • Reference Picture Resampling • Temporal Scalability: B pictures Levels Levels specify various parameters (resolution, frame rate, bit-rate, etc.) within a profile. Level 10 Support up to 176×144 resolution, up to 64 kbps. Level 20 Support up to 352×288 resolution, up to 128 kbps. Level 30 Support up to 352×288 resolution, up to 384 kbps. 514 Chapter 10: H.261 and H.263 Level 40 Support up to 352×288 resolution, up to 2 Mbps. Level 45 Support up to 176×144 resolution, up to 128 kbps. Level 50 Support up to 352×288 resolution, up to 4 Mbps. Level 60 Support up to 720×288 resolution, up to 8 Mbps. Level 70 Support up to 720×576 resolution, up to 16 Mbps. References 1. Efficient Motion Vector Estimation and Coding for H.263-Based Very Low Bit-Rate Video Compression, by Guy Cote, Michael Gallant, and Faouzi Kossentini, Department of Electrical and Computer Engineering, University of British Columbia. 2. H.263+: Video Coding at Low Bit-Rates, by Guy Cote, Berna Erol, Michael Gallant, and Faouzi Kossentini, Department of Electrical and Computer Engineering, University of British Columbia. 3. ITU-T H.261, Video Codec for Audiovisual Services at p × 64 kbits, 3/93. 4. ITU-T H.263, Video Coding for Low Bit-Rate Communication, 01/2005. 515 Chapter 11: Consumer DV Chapter 11 Consumer DV The DV (digital video) format is used by tape-based digital camcorders, and is based on IEC 61834 (25 Mbps bit-rate) and the newer SMPTE 314M and 370M specifications (25, 50, or 100 Mbps bit-rate). The compression algorithm used is neither motion-JPEG nor MPEG, although it shares much in common with MPEG I frames. A proprietary compression algorithm is used that can be edited since it is an intra-frame technique. The digitized video is stored in memory before compression is done. The correlation between the two fields stored in the buffer is measured. If the correlation is low, indicating inter-field motion, the two fields are individually compressed. Normally, the entire frame is compressed. In either case, DCT-based compression is used. To achieve a constant 25, 50, or 100 Mbps bit-rate, DV uses adaptive quantization, which uses the appropriate DCT quantization table for each frame. Figure 11.1 illustrates the contents of one track as written on tape. The ITI sector (insert and track information) contains information on track status and serves in place as a conventional control track during video editing. The audio sector, shown in Figure 11.2, contains both audio data and auxiliary audio data (AAUX). The video sector, shown in Figure 11.3, contains video data and auxiliary video data (VAUX). VAUX data includes recording date and time, lens aperture, shutter speed, color balance, and other camera settings. The subcode sector stores a variety of information, including timecode, teletext, closed captioning in multiple languages, subtitles and karaoke lyrics in multiple languages, titles, table of contents, chapters, etc. The subcode sector, AAUX data, and VAUX data use 5-byte blocks of data called packs. 515 516 Chapter 11: Consumer DV ITI SECTOR EDIT GAP 625 BITS AUDIO PREAMBLE 500 BITS AUDIO SECTOR 14 SYNC BLOCKS AUDIO POSTAMBLE 550 BITS EDIT GAP 700 BITS VIDEO PREAMBLE 500 BITS VIDEO SECTOR 149 SYNC BLOCKS VIDEO POSTAMBLE 975 BITS EDIT GAP 1550 BITS SUBCODE PREAMBLE 1200 BITS SUBCODE SECTOR 12 SYNC BLOCKS SUBCODE POSTAMBLE 1325 BITS (1200 BITS) OVERWRITE MARGIN 1250 BITS Figure 11.1. Sector Arrangement for One Track for a 480i System. The total bits per track, excluding the overwrite margin, is 134,975 (134,850). There are 10 (12) of these tracks per video frame. 576i system parameters (if different) are shown in parentheses. SYNC BLOCK NUMBER SYNC ID AAUX DATA 0 2 BYTES 3 BYTES 5 BYTES AUDIO DATA 72 BYTES SYNC ID AAUX DATA 1 2 BYTES 3 BYTES 5 BYTES AUDIO DATA 72 BYTES ... SYNC ID AAUX DATA 8 2 BYTES 3 BYTES 5 BYTES AUDIO DATA 72 BYTES SYNC ID 9 2 BYTES 3 BYTES SYNC ID 13 2 BYTES 3 BYTES OUTER PARITY ... OUTER PARITY INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES Figure 11.2. Structure of Sync Blocks in an Audio Sector. Audio 517 SYNC BLOCK NUMBER SYNC ID 0 2 BYTES 3 BYTES SYNC ID 1 2 BYTES 3 BYTES SYNC ID 2 2 BYTES 3 BYTES SYNC ID 136 2 BYTES 3 BYTES SYNC ID 137 2 BYTES 3 BYTES SYNC ID 138 2 BYTES 3 BYTES SYNC ID 148 2 BYTES 3 BYTES VAUX DATA 77 BYTES VAUX DATA 77 BYTES VIDEO DATA 77 BYTES ... VIDEO DATA 77 BYTES VAUX DATA 77 BYTES OUTER PARITY ... OUTER PARITY INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES INNER PARITY 8 BYTES Figure 11.3. Structure of Sync Blocks in a Video Sector. Audio An audio frame starts with an audio sample within –50 samples of the beginning of line 1 (480i systems) or the middle of line 623 (576i systems). Each track contains nine audio sync blocks, with each audio sync block containing 5 bytes of audio auxiliary data (AAUX) and 72 bytes of audio data, as illustrated in Figure 11.2. Audio samples are shuffled over tracks and data-sync blocks within a frame. The remaining five audio sync blocks are used for error correction. Two 44.1 kHz, 16-bit channels require a data rate of about 1.64 Mbps. Four 32 kHz, 12bit channels require a data rate of about 1.536 Mbps. Two 48 kHz, 16-bit channels require a data rate of about 1.536 Mbps. IEC 61834 IEC 61834 supports a variety of audio sampling rates: 48 kHz (16 bits, 2 channels) 44.1 kHz (16 bits, 2 channels) 32 kHz (16 bits, 2 channels) 32 kHz (12 bits, 4 channels) 518 Chapter 11: Consumer DV Audio sampling may be either locked or unlocked to the video frame frequency. Audio data is processed in frames. At a locked 48 kHz sample rate, each frame contains either 1600 or 1602 audio samples (480i system) or 1920 audio samples (576i system). For the 480i system, the number of audio samples per frame follows a five-frame sequence: 1600, 1602, 1602, 1602, 1602, 1600, ... With a locked 32 kHz sample rate, each frame contains either 1066 or 1068 audio samples (480i system) or 1280 audio samples (576i system). For the 480i system, the number of audio samples per frame follows a fifteen-frame sequence: 1066, 1068, 1068, 1068, 1068, 1068, 1068, 1066, 1068, 1068, 1068, 1068, 1068, 1068, 1068, ... For unlocked audio sampling, there is no exact number of audio samples per frame, although minimum and maximum values are specified. SMPTE 314M/370M SMPTE 314M and 370M support a more limited option, with audio sampling locked to the video frame frequency: 48 kHz (16 bits, 2 channels) for 25 Mbps 48 kHz (16 bits, 4 channels) for 50 Mbps 48 kHz (16 bits, 8 channels) for 100 Mbps Audio data is processed in frames. At a locked 48 kHz sample rate, each frame contains either 1600 or 1602 audio samples (60field/frame system) or 1920 audio samples (50field/frame system). For the 60-field/frame system, the number of audio samples per frame follows a five-frame sequence: 1600, 1602, 1602, 1602, 1602, 1600, ... The audio capacity is capable of 1620 samples per frame for the 60-field/frame system or 1944 samples per frame for the 50-field/frame system. The unused space at the end of each frame is filled with arbitrary data. Audio Auxiliary Data (AAUX) AAUX information is added to the shuffled audio data as shown in Figure 11.2. The AAUX pack includes a 1-byte pack header and four bytes of data (payload), resulting in a 5-byte AAUX pack. Since there are nine of them per video frame, they are numbered from 0 to 8. An AAUX source (AS) pack and an AAUX source control (ASC) pack must be included in the compressed stream. Only the AS and ASC packs are currently supported by SMPTE 314M and 370M, although IEC 61834 supports many other pack formats. AAUX Source (AS) Pack The format for this pack is shown in Table 11.1. LF Locked audio sample rate “0” = locked to video “1” = unlocked to video AF Audio frame size. Specifies the number of audio samples per frame SM Stereo mode “0” = multi-stereo audio “1” = lumped audio Audio 519 IEC 61834 PC0 PC1 PC2 PC3 PC4 D7 D6 D5 D4 D3 D2 D1 D0 0 1 0 1 0 0 0 0 LF 1 AF SM CHN PA AM 1 ML 50/60 STYPE EF TC SMP QU SMPTE 314M/370M D7 D6 D5 D4 D3 D2 D1 D0 PC0 0 1 0 1 0 0 0 0 PC1 LF 1 AF PC2 0 CHN 1 AM PC3 1 1 50/60 STYPE PC4 1 1 SMP QU Table 11.1. AAUX Source (AS) Pack. PA CHN AM Specifies if the audio signals recorded in CH1 (CH3) are related to the audio signals recorded in CH2 (CH4) “0” = one of pair channels “1” = independent channels Number of audio channels within an audio block “00” = one channel per block “01” = two channels per block “10” = reserved “11” = reserved Specifies the content of the audio signal on each channel ML Multi-language flag “0” = recorded in multi-language “1” = not recorded in multi-language 50/60 50- or 60-field system “0” = 60-field system “1” = 50-field system STYPE For SMPTE 314M/370M, specifies the number of audio blocks per frame “00000” = 2 audio blocks “00001” = reserved “00010” = 4 audio blocks “00011” = 8 audio blocks “00100” to “11111” = reserved 520 Chapter 11: Consumer DV EF TC SMP QU For IEC 61834, specifies the video system “00000” = standard-definition “00001” = reserved “00010” = high-definition “00011” to “11111” = reserved Audio emphasis flag “0” = on “1” = off Emphasis time constant “1” = 50/15 µs “0” = reserved Audio sampling frequency “000” = 48 kHz “001” = 44.1 kHz “010” = 32 kHz “011” to “111” = reserved Audio quantization “000” = 16 bits linear “001” = 12 bits nonlinear “010” = 20 bits linear “011” to “111” = reserved AAUX Source Control (ASC) Pack The format for this pack is shown in Table 11.2. CGMS Copy generation management system “00” = copying permitted without restriction “01” = reserved “10” = one copy permitted “11” = no copy permitted ISR Previous input source “00” = analog input “01” = digital input “10” = reserved “11” = no information CMP Number of times of compression “00” = once “01” = twice “10” = three or more “11” = no information SS Source and recorded situation “00” = scrambled source with audience restrictions and recorded without descrambling “01” = scrambled source without audience restrictions and recorded without descrambling “10” = source with audience restrictions or descrambled source with audience restrictions “11” = no information EFC Audio emphasis flags “00” = emphasis off “01” = emphasis on “10” = reserved “11” = reserved REC S Recording start point “0” = at recording start point “1” = not at recording start point REC E Recording end point “0” = at recording end point “1” = not at recording end point REC M Recording mode “001” = original “011” = one CH insert “100” = four CHs insert “101” = two CHs insert “111” = invalid recording FADE S Fading of recording start point “0” = fading off “1” = fading on Video 521 IEC 61834 PC0 PC1 PC2 PC3 PC4 D7 D6 0 1 CGMS REC S REC E DRF 1 D5 D4 0 1 ISR REC M D3 D2 0 0 CMP SPD GEN D1 D0 0 1 SS ICH SMPTE 314M/370M D7 D6 D5 D4 D3 D2 PC0 0 1 0 1 0 0 PC1 CGMS 1 1 1 1 PC2 REC S REC E FADE S FADE E 1 1 PC3 DRF SPD PC4 1 1 1 1 1 1 D1 D0 0 1 EFC 1 1 1 1 Table 11.2. AAUX Source Control (ASC) Pack. FADE E Fading of recording end point “0” = fading off “1” = fading on ICH Insert audio channel “000” = CH1 “001” = CH2 “010” = CH3 “011” = CH4 “100” = CH1, CH2 “101” = CH3, CH4 “110” = CH1, CH2, CH3, CH4 “111” = no information DRF SPD GEN Direction flag “0” = reverse direction “1” = forward direction Playback speed Indicates the category of the audio source Video As shown in Table 11.3, IEC 61834 uses 4:1:1 YCbCr for 720 × 480i video (Figure 3.5) and 4:2:0 YCbCr for 720 × 576i video (Figure 3.11). 522 Chapter 11: Consumer DV SMPTE 314M uses 4:1:1 YCbCr (Figure 3.5) for both video standards for the 25 Mbps implementation. 4:2:2 YCbCr (Figure 3.3) is used for the 50 and 100 Mbps implementations. DCT Blocks The Y, Cb, and Cr samples for one frame are divided into 8 × 8 blocks, called DCT blocks. Each DCT block, with the exception of the right-most DCT blocks for Cb and Cr during 4:1:1 mode, transform 8 samples × 8 lines of video data. Rows 1, 3, 5, and 7 of the DCT block process field 1, while rows 0, 2, 4, and 6 process field 2. For 480i systems, there are either 10,800 (4:2:2) or 8100 (4:1:1) DCT blocks per video frame. For 576i systems, there are either 12,960 (4:2:2) or 9720 (4:1:1, 4:2:0) DCT blocks per video frame. Macroblocks As shown in Figure 11.4, each macroblock in the 4:2:2 mode consists of four DCT blocks. As shown in Figures 11.5 and 11.6, each macroblock in the 4:1:1 and 4:2:0 modes consists of six DCT blocks. For 480i systems, the macroblock arrangement for one frame of 4:1:1 and 4:2:2 YCbCr data is shown in Figures 11.7 and 11.8, respectively. There are either 2700 (4:2:2) or 1350 (4:1:1) macroblocks per video frame. For 576i systems, the macroblock arrangement for one frame of 4:2:0, 4:1:1, and 4:2:2 YCbCr data is shown in Figures 11.9, 11.10, and 11.11, respectively. There are either 3240 (4:2:2) or 1620 (4:1:1, 4:2:0) macroblocks per video frame. Superblocks Each superblock consists of 27 macroblocks. For 480i systems, the superblock arrangement for one frame of 4:1:1 and 4:2:2 YCbCr data is shown in Figures 11.7 and 11.8, respectively. There are either 100 (4:2:2) or 50 (4:1:1) superblocks per video frame. For 576i systems, the superblock arrangement for one frame of 4:2:0, 4:1:1, and 4:2:2 YCbCr data is shown in Figures 11.9, 11.10, and 11.11, respectively. There are either 120 (4:2:2) or 60 (4:1:1, 4:2:0) superblocks per video frame. Compression Like MPEG and H.263, DV uses DCTbased video compression. However, in this case, DCT blocks are comprised from two fields, with each field providing samples from four scan lines and eight horizontal samples. Two DCT modes, called 8-8-DCT and 2-4-8DCT, are available for the transform process, depending upon the degree of content variation between the two fields of a video frame. The 8-8-DCT is your normal 8 × 8 DCT, and is used when there a high degree of correlation (little motion) between the two fields. The 2-48-DCT uses two 4 × 8 DCTs (one for each field), and is used when there is a low degree of correlation (lots of motion) between the two fields. Which DCT is used is stored in the DC coefficient area using a single bit. The DCT coefficients are quantized to 9 bits, then divided by a quantization number so as to limit the amount of data in one video segment to five compressed macroblocks. Video 523 Each DCT block is classified into one of four classes based on quantization noise and maximum absolute values of the AC coefficients. The 2-bit class number is stored in the DC coefficient area. An area number is used for the selection of the quantization step. The area number, of which there are four, is based on the horizontal and vertical frequencies. The quantization step is decided by the class number, area number, and quantization number (QNO). Quantization information is passed in the DIF header of video blocks. Variable-length coding converts the quantized AC coefficients to variable-length codes. Figures 11.12 and 11.13 illustrate the arrangement of compressed macroblocks. Video Auxiliary Data (VAUX) VAUX information is added to the shuffled video data as shown in Figure 11.3. The VAUX pack includes a 1-byte pack header and 4 bytes of data (payload), resulting in a 5-byte VAUX pack. Since there are 45 of them per video frame, they are numbered from 0 to 44. A VAUX source (VS) pack and a VAUX source control (VSC) pack must be included in the compressed stream. Only the VS and VSC packs are currently supported by SMPTE 314M, although IEC 61834 supports many other pack formats. VAUX Source (VS) Pack The format for this pack is shown in Table 11.4. TVCH The number of the television channel, from 0–999. A value of 0xEEE is reserved for prerecorded tape or a line input. A value of 0xFFF is reserved for “no information.” B/W Black and white flag “0” = black and white video “1” = color video Parameters active resolution (Y) frame rate YCbCr sampling structure IEC 61834 SMPTE 314M SMPTE 370M form of YCbCr coding active line numbers 480i System 720 × 480i 29.97 Hz 576i System 720 × 576i 25 Hz 4:1:1 4:1:1, 4:2:2 4:2:0 4:1:1, 4:2:2 Uniformly quantized PCM, 8 bits per sample. 23–262, 285–524 23–310, 335–622 720p System 1280 × 720p 50 Hz 59.94 Hz 1080i System 1920 × 1080i 25 Hz 29.97 Hz 4:2:2 4:2:2 Uniformly quantized PCM, 10 bits per sample. 26–745 21–560, 584–1123 Table 11.3. IEC 61834, SMPTE 314M, and SMPTE 370M YCbCr Parameters. 524 Chapter 11: Consumer DV CB CR Y DCT 0 DCT 3 DCT 2 DCT 1 LEFT RIGHT Figure 11.4. 4:2:2 Macroblock Arrangement. CB CR Y DCT 0 DCT 1 DCT 2 DCT 5 DCT 4 DCT 3 LEFT RIGHT EXCEPT FOR RIGHT-MOST MACROBLOCK CB CR TOP DCT 0 DCT 5 DCT 4 DCT 1 Y BOTTOM DCT 2 LEFT DCT 3 RIGHT FOR RIGHT-MOST MACROBLOCK Figure 11.5. 4:1:1 Macroblock Arrangement. CB CR TOP DCT 0 Y BOTTOM DCT 2 LEFT DCT 5 DCT 4 DCT 1 DCT 3 RIGHT Figure 11.6. 4:2:0 Macroblock Arrangement. Video 525 480 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 0 11 12 23 24 8 9 20 21 0 11 12 23 24 8 9 20 21 0 11 12 23 24 1 10 13 22 25 7 10 19 22 1 10 13 22 25 7 10 19 22 1 10 13 22 2 9 14 21 26 6 11 18 23 2 9 14 21 26 6 11 18 23 2 9 14 21 25 3 8 15 20 0 5 12 17 24 3 8 15 20 0 5 12 17 24 3 8 15 20 4 7 16 19 1 4 13 16 25 4 7 16 19 1 4 13 16 25 4 7 16 19 26 5 6 17 18 2 3 14 15 26 5 6 17 18 2 3 14 15 26 5 6 17 18 MACROBLOCK Figure 11.7. Relationship Between Superblocks and Macroblocks (4:1:1 YCbCr, 720 × 480i). 526 Chapter 11: Consumer DV 480 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 10 S10,0 11 S11,0 12 S12,0 13 S13,0 14 S14,0 15 S15,0 16 S16,0 17 S17,0 18 S18,0 19 S19,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 S10,1 S11,1 S12,1 S13,1 S14,1 S15,1 S16,1 S17,1 S18,1 S19,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 S10,2 S11,2 S12,2 S13,2 S14,2 S15,2 S16,2 S17,2 S18,2 S19,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 S10,3 S11,3 S12,3 S13,3 S14,3 S15,3 S16,3 S17,3 S18,3 S19,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 S10,4 S11,4 S12,4 S13,4 S14,4 S15,4 S16,4 S17,4 S18,4 S19,4 0 5 6 11 12 17 18 23 24 1 4 7 10 13 16 19 22 25 2 3 8 9 14 15 20 21 26 MACROBLOCK Figure 11.8. Relationship Between Superblocks and Macroblocks (4:2:2 YCbCr, 720 × 480i). Video 527 576 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 10 S10,0 11 S11,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 S10,1 S11,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 S10,2 S11,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 S10,3 S11,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 S10,4 S11,4 0 5 6 11 12 17 18 23 24 1 4 7 10 13 16 19 22 25 2 3 8 9 14 15 20 21 26 MACROBLOCK Figure 11.9. Relationship Between Superblocks and Macroblocks (4:2:0 YCbCr, 720 × 576i). 528 Chapter 11: Consumer DV 576 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 10 S10,0 11 S11,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 S10,1 S11,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 S10,2 S11,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 S10,3 S11,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 S10,4 S11,4 0 11 12 23 24 8 9 20 21 0 11 12 23 24 8 9 20 21 0 11 12 23 24 1 10 13 22 25 7 10 19 22 1 10 13 22 25 7 10 19 22 1 10 13 22 2 9 14 21 26 6 11 18 23 2 9 14 21 26 6 11 18 23 2 9 14 21 25 3 8 15 20 0 5 12 17 24 3 8 15 20 0 5 12 17 24 3 8 15 20 4 7 16 19 1 4 13 16 25 4 7 16 19 1 4 13 16 25 4 7 16 19 26 5 6 17 18 2 3 14 15 26 5 6 17 18 2 3 14 15 26 5 6 17 18 MACROBLOCK Figure 11.10. Relationship Between Superblocks and Macroblocks (4:1:1 YCbCr, 720 × 576i). Video 529 576 LINES 720 SAMPLES SUPERBLOCK 0 0 S0,0 1 S1,0 2 S2,0 3 S3,0 4 S4,0 5 S5,0 6 S6,0 7 S7,0 8 S8,0 9 S9,0 10 S10,0 11 S11,0 12 S12,0 13 S13,0 14 S14,0 15 S15,0 : : 21 S21,0 22 S22,0 23 S23,0 1 S0,1 S1,1 S2,1 S3,1 S4,1 S5,1 S6,1 S7,1 S8,1 S9,1 S10,1 S11,1 S12,1 S13,1 S14,1 S15,1 : S21,1 S22,1 S23,1 2 S0,2 S1,2 S2,2 S3,2 S4,2 S5,2 S6,2 S7,2 S8,2 S9,2 S10,2 S11,2 S12,2 S13,2 S14,2 S15,2 : S21,2 S22,2 S23,2 3 S0,3 S1,3 S2,3 S3,3 S4,3 S5,3 S6,3 S7,3 S8,3 S9,3 S10,3 S11,3 S12,3 S13,3 S14,3 S15,3 : S21,3 S22,3 S23,3 4 S0,4 S1,4 S2,4 S3,4 S4,4 S5,4 S6,4 S7,4 S8,4 S9,4 S10,4 S11,4 S12,4 S13,4 S14,4 S15,4 : S21,4 S22,4 S23,4 0 5 6 11 12 17 18 23 24 1 4 7 10 13 16 19 22 25 2 3 8 9 14 15 20 21 26 MACROBLOCK Figure 11.11. Relationship Between Superblocks and Macroblocks (4:2:2 YCbCr, 720 × 576i). 530 Chapter 11: Consumer DV Y0 (14 BYTES) COMPRESSED MACROBLOCK 12 BYTES Y1 (14 BYTES) 12 BYTES CR (10 BYTES) CB (10 BYTES) DC0 AC X0 AC DC1 AC X1 AC DC2 AC DC3 AC DCT MODE (1 BIT) CLASS NUMBER (2 BITS) X0 AND X1 DATA AREA = 2 BYTES EACH Figure 11.12. 4:2:2 Compressed Macroblock Arrangement. COMPRESSED MACROBLOCK Y0 (14 BYTES) Y1 (14 BYTES) Y2 (14 BYTES) Y3 (14 BYTES) CR (10 BYTES) CB (10 BYTES) DC0 AC DC1 AC DC2 AC DC3 AC DC4 AC DC5 AC DCT MODE (1 BIT) CLASS NUMBER (2 BITS) Figure 11.13. 4:2:0 and 4:1:1 Compressed Macroblock Arrangement. Video 531 IEC 61834 PC0 PC1 PC2 PC3 PC4 D7 D6 D5 D4 D3 D2 D1 D0 0 1 1 0 0 0 0 0 TVCH (tens of units, 0–9) TVCH (units, 0–9) B/W EN CLF TVCH (hundreds of units, 0–9) SRC 50/60 STYPE TUN SMPTE 314M PC0 PC1 PC2 PC3 PC4 D7 D6 D5 D4 D3 D2 D1 D0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 B/W EN CLF 1 1 1 1 1 1 50/60 STYPE VISC SMPTE 370M PC0 PC1 PC2 PC3 PC4 D7 D6 D5 D4 D3 D2 D1 D0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50/60 STYPE 0 1 1 1 1 1 1 1 Table 11.4. VAUX Source (VS) Pack. 532 Chapter 11: Consumer DV EN CLF CLF valid flag “0” = CLF is valid “1” = CLF is invalid Color frames identification code For 480i systems: “00” = color frame A “01” = color frame B “10” = reserved “11” = reserved For 576i systems: “00” = 1st, 2nd field “01” = 3rd, 4th field “10” = 5th, 6th field “11” = 7th, 8th field SRC Defines the input source of the video signal 50/60 Same as for AAUX STYPE For SMPTE 314M, specifies the video signal type “00000” = 4:1:1 compression “00001” = reserved : “00011” = reserved “00100” = 4:2:2 compression “00101” = reserved : “11111” = reserved For SMPTE 370M, specifies the video signal type “00000” = reserved : “10011” = reserved “10100” = 1080i30 or 1080i25 “10101” = 1035i30 “10110” = reserved “10111” = reserved “11000” = 720p60 or 720p50 “11001” = reserved : “11111” = reserved TUN VISC For IEC 61834, specifies the video system “00000” = standard-definition “00001” = reserved “00010” = high-definition “00011” to “11111” = reserved Tuner Category consists of 3-bit area number and a 5-bit satellite number. “11111111” indicates no information is available. “10001000” = --180 : “00000000” = 0 : “01111000” = 180 “01111111” = no information other values = reserved VAUX Source Control (VSC) Pack The format for this pack is shown in Table 11.5. CGMS Same as for AAUX ISR Same as for AAUX CMP Same as for AAUX SS Same as for AAUX REC S Same as for AAUX REC M Recording mode “00” = original “01” = reserved “10” = insert “11” = invalid recording Video 533 IEC 61834 PC0 PC1 PC2 PC3 PC4 D7 D6 0 1 CGMS REC S 1 FF FS 1 D5 D4 1 0 ISR REC M FC IL D3 D2 0 0 CMP 1 SF SC GEN D1 D0 0 1 SS DISP BCS SMPTE 314M/370M D7 D6 D5 D4 D3 D2 D1 D0 PC0 PC1 PC2 PC3 PC4 0 1 1 0 0 CGMS 1 1 1 1 1 0 0 1 FF FS FC IL 1 1 1 1 1 1 0 0 1 1 1 1 DISP 1 0 0 1 1 1 Table 11.5. VAUX Source Control (VSC) Pack. BCS DISP Broadcast system. Indicates the type of information of display format with DISP “00” = type 0 (IEC 61880, CEA-608) “01” = type 1 (ETS 300 294) “10” = reserved “11” = reserved Aspect ratio information FF Frame/Field flag. Indicates whether both fields/frames are output in order or only one of them is output twice during one/ two frame period. “0” = one field/frame output twice “1” = both fields/frames output in order FS First/Second flag. Indicates which field/frame should be output during field/frame 1 period. “0” = field/frame 2 “1” = field/frame 1 534 Chapter 11: Consumer DV FC IL SF SC GEN Frame change flag. Indicates if the picture of the current frame is the same picture of the immediate previous one/two frames. “0” = same picture “1” = different picture Interlace flag. Indicates if the data of two fields which construct one frame are interlaced or noninterlaced. “0” = noninterlaced “1” = interlaced or unrecognized Still-field picture flag. Indicates the time difference between the two fields within a frame. “0” = 0 seconds “1” = 1001/60 or 1/50 second Still camera picture flag “0” = still camera picture “1” = not still camera picture Indicates the category of the video source Digital Interfaces IEC 61834 and SMPTE 314M both specify the data format for a generic digital interface. This data format may be sent via IEEE 1394 or SDTI, for example. Figure 11.14 illustrates the frame data structure. Each of the 720 × 480i 4:1:1 YCbCr frames are compressed to 103,950 bytes. Including overhead and audio increases the amount of data to 120,000 bytes. The compressed 720 × 480i frame is divided into ten DIF (data in frame) sequences. Each DIF sequence contains 150 DIF blocks of 80 bytes each, used as follows: 135 DIF blocks for video 9 DIF blocks for audio 6 DIF blocks used for Header, Subcode, and Video Auxiliary (VAUX) information Figure 11.14 illustrates the DIF sequence structure in detail. Each video DIF block contains 80 bytes of compressed macroblock data: 3 bytes for DIF block ID information 1 byte for the header that includes the quantization number (QNO) and block status (STA) 14 bytes each for Y0, Y1, Y2, and Y3 10 bytes each for Cb and Cr 720 × 576i frames may use either the 4:2:0 YCbCr format (IEC 61834) or the 4:1:1 YCbCr format (SMPTE 314M), and require 12 DIF sequences. Each 720 × 576i frame is compressed to 124,740 bytes. Including overhead and audio increases the amount of data to 144,000 bytes, requiring 300 packets to transfer. Note that the organization of data transferred over the interface differs from the actual DV recording format since error correction is not required for digital transmission. In addition, although the video blocks are numbered in sequence in Figure 11.15, the sequence does not correspond to the left-toright, top-to-bottom transmission of blocks of video data. Compressed macroblocks are shuffled to minimize the effect of errors and aid in error concealment. Audio data is also shuffled. Data is transmitted in the same shuffled order as recorded. To illustrate the video data shuffling, DV video frames are organized as superblocks, with each superblock being composed of 27 compressed macroblocks, as shown in Figures 11.7 through 11.11. A group of 5 superblocks (one from each superblock column) make up Digital Interfaces 535 1 FRAME IN 1.001 / 30 SECOND (10 DIF SEQUENCES) DIFS0 DIFS1 DIFS2 DIFS3 DIFS4 DIFS5 DIFS6 DIFS7 DIFS8 DIFS9 1 DIF SEQUENCE IN 1.001 / 300 SECOND (150 DIF BLOCKS) HEADER SUBCODE VAUX (1 DIF) (2 DIF) (3 DIF) 135 VIDEO AND 9 AUDIO DIF BLOCKS 150 DIF BLOCKS IN 1.001 / 30 SECOND DIF0 DIF1 DIF2 DIF3 DIF4 DIF5 DIF6 DIF148 DIF149 ID HEADER (3 BYTES) (1 BYTE) 1 DIF BLOCK IN 1.001 / 45000 SECOND DATA (76 BYTES) Y0 (14 BYTES) Y1 (14 BYTES) Y2 (14 BYTES) Y3 (14 BYTES) CR (10 BYTES) CB (10 BYTES) DC0 AC DC1 AC DC2 AC DC3 AC DC4 AC DC5 AC COMPRESSED MACROBLOCK Figure 11.14. Packet Formatting for 25 Mbps 4:1:1 YCbCr 720 × 480i Systems. one DIF sequence. Tables 11.6 and 11.7 illustrate the transmission order of the DIF blocks. For the 50 Mbps SMPTE 314M format, each compressed 720 × 480i or 720 × 576i frame is divided into two channels. Each channel uses either ten (480i systems) or twelve DIF sequences (576i systems). IEEE 1394 Using the IEEE 1394 interface for transferring DV information is discussed in Chapter 6. SDTI The general concept of SDTI is discussed in Chapter 6. SMPTE 314M Data SMPTE 221M details how to transfer SMPTE 314M DV data over SDTI. IEC 61834 Data SMPTE 222M details how to transfer IEC 61834 DV data over SDTI. 536 Chapter 11: Consumer DV H SC0 SC1 VA0 VA1 VA2 0 1 2 3 4 5 H = HEADER SECTION SC0, SC1 = SUBCODE SECTION VA0, VA1, VA2 = VAUX SECTION A0–A8 = AUDIO SECTION V0–V134 = VIDEO SECTION A0 V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A1 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 A8 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133 V134 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 Figure 11.15. DIF Sequence Detail. 100 Mbps DV Differences The 100 Mbps SMPTE 370M format supports 1920 × 1080i and 1280 × 720p sources. 1920 × 1080i sources are scaled to 1280 × 1080i. 1280 × 720p sources are scaled to 960 × 720p. 4:2:2 YCbCr sampling is used. Each compressed frame is divided into four channels. Each channel uses either ten (1080i30 or 720p60 systems) or twelve DIF sequences (1080i25 or 720p50 systems). HDV Format Developed by Canon, Sharp, Sony and JVC, HDV supports recording 1920 × 1080 and 1280 × 720 content, using a standard DV tape. Based on 25Mbps MPEG-2 packetized elementary streams and 19Mbps MPEG-2 transport streams, video compression uses MPEG-2 and audio compression uses MPEG-1 Layer II. AVCHD Format Developed by Panasonic and Sony, AVCHD supports recording 1920 × 1080, 1280 × 720, 720 × 480 and 720 × 480 content, using 8cm DVD-RW, 8cm BD-R/RE, SD Memory Card or HDD instead of tape. Based on 24Mbps MPEG-2 transport streams, video compression uses MPEG-4.10 (H.264) and audio compression uses Dolby® Digital or LPCM. References 537 DIF Sequence Number 0 1 Video DIF Block Number 0 1 2 3 4 133 134 0 1 2 3 4 133 134 Compressed Macroblock Superblock Macroblock Number Number DIF Sequence Number 2, 2 0 6, 1 0 8, 3 0 0, 0 0 4, 4 0 : n–1 0, 0 26 4, 4 26 3, 2 0 7, 1 0 9, 3 0 1, 0 0 5, 4 0 : 1, 0 26 5, 4 26 Note: 1. n = 10 for 480i systems, n = 12 for 576i systems. Video DIF Block Number 0 1 2 3 4 133 134 Compressed Macroblock Superblock Number : Macroblock Number 1, 2 0 5, 1 0 7, 3 0 n–1, 0 0 3, 4 0 : n–1, 0 26 3, 4 26 Table 11.6. Video DIF Blocks and Compressed Macroblocks for 25 Mbps (4:1:1 or 4:2:0 YCbCr). References 1. IEC 61834–1, Recording—Helical-scan digital video cassette recording system using 6.35mm magnetic tape for consumer use (525-60, 625-50, 1125-60 and 1250-50 systems)–Part 1: General specifications. 2. IEC 61834–2, Recording—Helical-scan digital video cassette recording system using 6.35mm magnetic tape for consumer use (525-60, 625-50, 1125-60 and 1250-50 systems)–Part 2: SD format for 525-60 and 625-50 systems. 3. IEC 61834–4, Recording—Helical-scan digital video cassette recording system using 6.35mm magnetic tape for consumer use (525-60, 625-50, 1125-60 and 1250-50 systems)–Part 4: Pack header table and contents. 538 Chapter 11: Consumer DV DIF Sequence Number 0 1 Video DIF Block Number 0, 0 0, 1 1, 0 1, 1 2, 0 134, 0 134, 1 0, 0 0, 1 1, 0 1, 1 2, 0 134, 0 134, 1 Compressed Macroblock Superblock Macroblock Number Number DIF Sequence Number 4, 2 0 5, 2 0 12, 1 0 13, 1 0 16, 3 0 : n–1 8, 4 26 9, 4 26 6, 2 0 7, 2 0 14, 1 0 15, 1 0 18, 3 0 : 10, 4 26 11, 4 26 Note: 1. n = 10 for 480i systems, n = 12 for 576i systems. Video DIF Block Number 0, 0 0, 1 1, 0 1, 1 2, 0 134, 0 134, 1 Compressed Macroblock Superblock Number : Macroblock Number 2, 2 0 3, 2 0 10, 1 0 11, 1 0 14, 3 0 : 6, 4 26 7, 4 26 Table 11.7. Video DIF Blocks and Compressed Macroblocks for 50 Mbps (4:2:2 YCbCr). 4. SMPTE 314M–2005, Television—Data Structure for DV-Based Audio, Data and Compressed Video–25 and 50 Mbps. 5. SMPTE 321M–2002, Television—Data Stream Format for the Exchange of DVBased Audio, Data and Compressed Video Over a Serial Data Transport Interface. 6. SMPTE 322M–2004, Television—Format for Transmission of DV Compressed Video, Audio and Data Over a Serial Data Transport Interface. 7. SMPTE 370M–2006, Television—Data Structure for DV-Based Audio, Data and Compressed Video at 100 Mb/s 1080/60i, 1080/50i, 720/60p, 720/50p. Chapter 12: MPEG-1 Chapter 12 MPEG-1 MPEG vs. JPEG 539 MPEG-1 audio and video compression was developed for storing and distributing digital audio and video. Features include random access, fast forward, and reverse playback. MPEG-1 is used as the basis for the original video CDs (VCD). The channel bandwidth and image resolution were set by the available media at the time (CDs). The goal was playback of digital audio and video using a standard compact disc with a bit-rate of 1.416 Mbps (1.15 Mbps of this is for video). MPEG-1 is an ISO standard (ISO/IEC 11172), and consists of six parts: system video audio low bit-rate audio conformance testing simulation software ISO/IEC 11172–1 ISO/IEC 11172–2 ISO/IEC 11172–3 ISO/IEC 13818–3 ISO/IEC 11172–4 ISO/IEC 11172–5 The bitstreams implicitly define the decompression algorithms. The compression algorithms are up to the individual manufacturers, allowing a proprietary advantage to be obtained within the scope of an international standard. MPEG vs. JPEG JPEG (ISO/IEC 10918) was designed for still continuous-tone grayscale and color images. It doesn’t handle bi-level (black and white) images efficiently, and pseudo-color images have to be expanded into the unmapped color representation prior to processing. JPEG images may be of any resolution and color space, with both lossy and lossless algorithms available. Since JPEG is such a general purpose standard, it has many features and capabilities. By adjusting the various parameters, compressed image size can be traded against reconstructed image quality over a wide range. Image quality ranges from “browsing” (100:1 compression ratio) to “indistinguishable from the source” (about 3:1 compression ratio). Typically, the threshold of visible difference between the source and reconstructed images is somewhere between a 10:1 to 20:1 compression ratio. JPEG does not use a single algorithm, but rather a family of four, each designed for a certain application. The most familiar lossy algorithm is sequential DCT. Either Huffman encoding (baseline JPEG) or arithmetic encoding may be used. When the image is decoded, it is decoded left-to-right, top-to-bottom. 539 540 Chapter 12: MPEG-1 Progressive DCT is another lossy algorithm, requiring multiple scans of the image. When the image is decoded, a coarse approximation of the full image is available right away, with the quality progressively improving until complete. This makes it ideal for applications such as image database browsing. Either spectral selection, successive approximation, or both may be used. The spectral selection option encodes the lower-frequency DCT coefficients first (to obtain an image quickly), followed by the higher-frequency ones (to add more detail). The successive approximation option encodes the more significant bits of the DCT coefficients first, followed by the less significant bits. The hierarchical mode represents an image at multiple resolutions. For example, there could be 512 × 512, 1024 × 1024, and 2048 × 2048 versions of the image. Higherresolution images are coded as differences from the next smaller image, requiring fewer bits than they would if stored independently. Of course, the total number of bits is greater than that needed to store just the highest-resolution image. Note that the individual images in a hierarchical sequence may be coded progressively if desired. Also supported is a lossless spatial algorithm that operates in the pixel domain as opposed to the transform domain. A prediction is made of a sample value using up to three neighboring samples. This prediction then is subtracted from the actual value and the difference is losslessly coded using either Huffman or arithmetic coding. Lossless operation achieves about a 2:1 compression ratio. Since video is just a series of still images, and baseline JPEG encoders and decoders were readily available, people used baseline JPEG to compress real-time video (also called motion JPEG or MJPEG). However, this technique does not take advantage of the frame-toframe redundancies to improve compression, as does MPEG. Perhaps most important, JPEG is symmetrical, meaning the cost of encoding and decoding is roughly the same. MPEG, on the other hand, was designed primarily for mastering a video once and playing it back many times on many platforms. To minimize the cost of MPEG hardware decoders, MPEG was designed to be asymmetrical, with the encoding process requiring about 100× the computing power of the decoding process. Since MPEG is targeted for specific applications, the hardware usually supports only a few specific resolutions. Also, only one color space (YCbCr) is supported using 8-bit samples. MPEG is also optimized for a limited range of compression ratios. If capturing video for editing, you can use either baseline JPEG or I-frame-only (intraframe) MPEG to compress to disc in real-time. Using JPEG requires that the system be able to transfer data and access the hard disk at bitrates of about 4 Mbps for SIF (Standard Input Format) resolution. Once the editing is done, the result can be converted into MPEG for maximum compression. Quality Issues At bit-rates of about 3–4 Mbps, “broadcast quality” is achievable with MPEG-1. However, sequences with complex spatial-temporal activity (such as sports) may require up to 5–6 Mbps due to the frame-based processing of MPEG-1. MPEG-2 allows similar “broadcast quality” at bit-rates of about 4–6 Mbps by supporting field-based processing. Audio Overview 541 Several factors affect the quality of MPEGcompressed video: • the resolution of the original video source • the bit-rate (channel bandwidth) allowed after compression • motion estimator effectiveness One limitation of the quality of the compressed video is determined by the resolution of the original video source. If the original resolution was too low, there will be a general lack of detail. Motion estimator effectiveness determines motion artifacts, such as a reduction in video quality when movement starts or when the amount of movement is above a certain threshold. Poor motion estimation will contribute to a general degradation of video quality. Most importantly, the higher the bit-rate (channel bandwidth), the more information that can be transmitted, allowing fewer motion artifacts to be present or a higher-resolution image to be displayed. Generally speaking, decreasing the bit-rate does not result in a graceful degradation of the decoded video quality. The video quality rapidly degrades, with the 8 × 8 blocks becoming clearly visible once the bit-rate drops below a given threshold. Audio Overview MPEG-1 uses a family of three audio coding schemes, called Layer I, Layer II, and Layer III, with increasing complexity and sound quality. The three layers are hierarchical: a Layer III decoder handles Layers I, II, and III; a Layer II decoder handles only Layers I and II; a Layer I decoder handles only Layer I. All layers support 16-bit audio using 16, 22.05, 24, 32, 44.1, or 48 kHz sampling rates. For each layer, the bitstream format and the decoder are specified. The encoder is not specified, to allow for future improvements. All layers work with similar bit-rates: Layer I: Layer II: Layer III: 32–448 kbps 8–384 kbps 8–320 kbps Two audio channels are supported with four modes of operation: normal stereo joint (intensity and/or ms) stereo dual channel mono single channel mono For normal stereo, one channel carries the left audio signal and one channel carries the right audio signal. For intensity stereo (supported by all layers), high frequencies (above 2 kHz) are combined. The stereo image is preserved but only the temporal envelope is transmitted. For ms stereo (supported by Layer III only), one channel carries the sum signal (L+R) and the other the difference (L–R) signal. In addition, pre-emphasis, copyright marks, and original/copy indication are supported. Sound Quality To determine which layer should be used for a specific application, look at the available bit-rate, as each layer was designed to support certain bit-rates with a minimum degradation of sound quality. Layer I, a simplified version of Layer 2, has a target bit-rate 192 kbps per channel or higher. Layer II is identical to MUSICAM, and has a target bit-rate 128 kbps per channel. It was designed as a trade-off between sound quality and encoder complexity. It is most useful for bit-rates around 96–128 kbps per channel. 542 Chapter 12: MPEG-1 Layer III (also known as mp3) merges the best ideas of MUSICAM and ASPEC and has a target bit-rate of about 64 kbps per channel. The Layer III format specifies a set of advanced features that all address a single goal: to preserve as much sound quality as possible, even at relatively low bit-rates. Background Theory All layers use a coding scheme based on psychoacoustic principles—in particular, masking effects where, for example, a loud tone at one frequency prevents another, quieter, tone at a nearby frequency from being heard. Suppose you have a strong tone with a frequency of 1000 Hz, and a second tone at 1100 Hz that is 18 dB lower in intensity. The 1100 Hz tone will not be heard; it is masked by the 1000 Hz tone. However, a tone at 2000 Hz 18 dB below the 1000 Hz tone will be heard. In order to have the 1000 Hz tone mask it, the 2000 Hz tone will have to be about 45 dB down. Any relatively weak frequency near a strong frequency is masked; the further you get from a frequency, the smaller the masking effect. Curves have been developed that plot the relative energy versus frequency that is masked (concurrent masking). Masking effects also occur before (premasking) and after (postmasking) a strong frequency if there is a significant (30–40 dB) shift in level. The reason is believed to be that the brain needs processing time. Premasking time is about 2–5 ms; postmasking can last up to 100 ms. Adjusting the noise floor reduces the amount of needed data, enabling further compression. CDs use 16 bits of resolution to achieve a signal-to-noise ratio (SNR) of about 96 dB, which just happens to match the dynamic range of hearing pretty well (meaning most people will not hear noise during silence). If 8-bit resolution were used, there would be a noticeable noise during silent moments in the music or between words. However, noise isn’t noticed during loud passages due to the masking effect, which means that around a strong sound you can raise the noise floor since the noise will be masked anyway. For a stereo signal, there usually is redundancy between channels. All layers may exploit these stereo effects by using a joint stereo mode, with the most flexible approach being used by Layer III. Video Coding Layer MPEG-1 permits resolutions up to 4095 × 4095 at 60 frames per second (progressive scan). What many people think of as MPEG-1 is a subset known as Constrained Parameters Bitstream (CPB). The CPB is a limited set of sampling and bit-rate parameters designed to standardize buffer sizes and memory bandwidths, allowing a nominal guarantee of interoperability for decoders and encoders, while still addressing the widest possible range of applications. Devices not capable of handling these are not considered to be true MPEG-1. Table 12.1 lists some of the constrained parameters. The CPB limits video to 396 macroblocks (101,376 pixels). Therefore, MPEG-1 video is typically coded at SIF resolutions of 352 × 240p or 352 × 288p. During encoding, the original BT.601 resolution of 704 × 480i or 704 × 576i is scaled down to SIF resolution. This is usually done by ignoring Field 2 and scaling down Field 1 horizontally. During decoding, the SIF resolution is scaled up to the 704 × 480i or 704 × 576i resolution. Note that some entire active scan lines and samples on a scan line are ignored to ensure the number of Y samples can be evenly divided by 16. Table 12.2 lists Video Coding Layer 543 some of the more common MPEG-1 resolutions. The coded video rate is limited to 1.856 Mbps. However, the bit-rate is the most often waived parameter, with some applications using up to 6 Mbps or higher. MPEG-1 video data uses the 4:2:0 YCbCr format shown in Figure 3.7. Interlaced Video MPEG-1 was designed to handle progressive (also referred to as noninterlaced) video. Early on, in an effort to improve video quality, several schemes were devised to enable the use of both fields of an interlaced picture. For example, both fields can be combined into a single frame of 704 × 480p or 704 × 576p resolution and encoded. During decoding, the fields are separated. This, however, results in motion artifacts due to a moving object being in slightly different places in the two fields. Coding the two fields separately avoids motion artifacts, but reduces the compression ratio since the redundancy between fields isn’t used. There were many other schemes for handling interlaced video, so MPEG-2 defined a standard way of handling it (covered in Chapter 13). Encode Preprocessing Better images can be obtained by preprocessing the video stream prior to MPEG encoding. To avoid serious artifacts during encoding of a particular picture, prefiltering can be applied over the entire picture or just in specific problem areas. Prefiltering before compression processing is analogous to anti-alias filtering prior to A/D conversion. Prefiltering may take into account texture patterns, motion, and edges, and may be applied at the picture, slice, macroblock, or block level. MPEG encoding works best on scenes with little fast or random movement and good lighting. For best results, foreground lighting should be clear and background lighting diffused. Foreground contrast and detail should be normal, but low contrast backgrounds containing soft edges are preferred. Editing tools typically allow you to preprocess potential problem areas. The MPEG-1 specification has example filters for scaling down from BT.601 to SIF resolution. In this instance, field 2 is ignored, throwing away half the vertical resolution, and a decimation filter is used to reduce the horizontal resolution of the remaining scan lines by a factor of two. Appropriate decimation of the Cb and Cr components must still be carried out. Better video quality may be obtained by deinterlacing prior to scaling down to SIF resolution. When working on macroblocks (defined later), if the difference between macroblocks between two fields is small, average both to generate a new macroblock. Otherwise, use the macroblock area from the field of the same parity to avoid motion artifacts. Coded Frame Types There are four types of coded frames. I (intra) frames (~1 bit/pixel) are frames coded as a stand-alone still image. They allow random access points within the video stream. As such, I frames should occur about two times a second. I frames should also be used where scene cuts occur. 544 Chapter 12: MPEG-1 horizontal resolution vertical resolution picture area pel rate picture rate bit-rate ≤ 768 samples ≤ 576 scan lines ≤ 396 macroblocks ≤ 396 × 25 macroblocks per second ≤ 30 frames per second ≤ 1.856 Mbps Table 12.1. Some of the Constrained Parameters for MPEG-1. Resolution Frames per Second 352 × 240p 352 × 240p 352 × 288p 320 × 240p1 384 × 288p1 29.97 23.976 25 29.97 25 Notes: 1. Square pixel format. Table 12.2. Common MPEG-1 Resolutions. P (predicted) frames (~0.1 bit/pixel) are coded relative to the nearest previous I or P frame, resulting in forward prediction processing, as shown in Figure 12.1. P frames provide more compression than I frames, through the use of motion compensation, and are also a reference for B frames and future P frames. B (bi-directional) frames (~0.015 bit/pixel) use the closest past and future I or P frame as a reference, resulting in bi-directional prediction, as shown in Figure 12.1. B frames provide the most compression and decrease noise by averaging two frames. Typically, there are two B frames separating I or P frames. D (DC) frames are frames coded as a stand-alone still image, using only the DC component of the DCTs. D frames may not be in a sequence containing any other frame types and are rarely used. A group of pictures (GOP) is a series of one or more coded frames intended to assist in random accessing and editing. The GOP value is configurable during the encoding process. The smaller the GOP value, the better the response to movement (since the I frames are closer together), but the lower the compression. FORWARD PREDICTION Video Coding Layer 545 FRAME DISPLAY ORDER 1 2 3 4 5 6 7 FRAME TRANSMIT ORDER 1 3 4 2 6 7 5 BI-DIRECTIONAL PREDICTION INTRA (I) FRAME BI-DIRECTIONAL (B) FRAME PREDICTED (P) FRAME Figure 12.1. MPEG-1 I, P, and B Frames. Some frames are transmitted out of display sequence, complicating the interpolation process, and requiring frame reordering by the MPEG decoder. Arrows show inter-frame dependencies. In the coded bitstream, a GOP must start with an I frame and may be followed by any number of I, P, or B frames in any order. In display order, a GOP must start with an I or B frame and end with an I or P frame. Thus, the smallest GOP size is a single I frame, with the largest size unlimited. Originally, each GOP was to be coded and displayed independently of any other GOP. However, this is not possible unless no B frames precede I frames, or if they do, they use only backward motion compensation. This results in both open and closed GOP formats. A closed GOP is a GOP that can be decoded without using frames of the previous GOP for motion compensation. An open GOP requires that they be available. Motion Compensation Motion compensation improves compression of P and B frames by removing temporal redundancies between frames. It works at the macroblock level (defined later). The technique relies on the fact that within a short sequence of the same general image, most objects remain in the same location, while others move only a short distance. The motion is described as a two-dimensional motion vector that specifies where to retrieve a macroblock from a previously decoded frame to predict the sample values of the current macroblock. 546 Chapter 12: MPEG-1 After a macroblock has been compressed using motion compensation, it contains both the spatial difference (motion vectors) and content difference (error terms) between the reference macroblock and macroblock being coded. Note that there are cases where information in a scene cannot be predicted from the previous scene, such as when a door opens. The previous scene doesn’t contain the details of the area behind the door. In cases such as this, when a macroblock in a P frame cannot be represented by motion compensation, it is coded the same way as a macroblock in an I frame (using intra-picture coding). Macroblocks in B frames are coded using either the closest previous or future I or P frames as a reference, resulting in four possible codings: • intra-coding no motion compensation • forward prediction closest previous I or P frame is the reference • backward prediction closest future I or P frame is the reference • bi-directional prediction two frames are used as the reference: the closest previous I or P frame and the closest future I or P frame Backward prediction is used to predict uncovered areas that appear in previous frames. I Frames Image blocks and prediction error blocks have a high spatial redundancy. Several steps are used to remove this redundancy within a frame to improve the compression. The inverse of these steps is used by the decoder to recover the data. Macroblock A macroblock (shown in Figure 7.55) con- sists of a 16-sample × 16-line set of Y components and the corresponding two 8-sample × 8line Cb and Cr components. A block is an 8-sample × 8-line set of Y, Cb, or Cr values. Note that a Y block refers to onefourth the image size as the corresponding Cb or Cr blocks. Thus, a macroblock contains four Y blocks, one Cb block, and one Cr block, as seen in Figure 12.2. There are two types of macroblocks in I frames, both using intra-coding, as shown in Table 12.9. One (called intra-d) uses the current quantizer scale; the other (called intra-q) defines a new value for the quantizer scale If the macroblock type is intra-q, the macroblock header specifies a 5-bit quantizer scale factor. The decoder uses this to calculate the DCT coefficients from the transmitted quantized coefficients. Quantizer scale factors may range from 1–31, with zero not allowed. If the macroblock type is intra-d, no quantizer scale is sent, and the decoder uses the current one. DCT Each 8 × 8 block (of input samples or pre- diction error terms) is processed by an 8 × 8 DCT (discrete cosine transform), resulting in an 8 × 8 block of horizontal and vertical frequency coefficients, as shown in Figure 7.56. Input sample values are 0–255, resulting in a range of 0–2040 for the DC coefficient and a range of about –1000 to +1000 for the AC coefficients. Video Coding Layer 547 EACH MACROBLOCK IS 16 SAMPLES BY 16 LINES (4 Y BLOCKS) BLOCK ARRANGEMENT WITHIN A MACROBLOCK CR CB BLOCK 5 BLOCK 4 BLOCK 0 BLOCK 1 Y BLOCK 2 BLOCK 3 EACH Y BLOCK IS 8 SAMPLES BY 8 LINES Figure 12.2. MPEG-1 Macroblocks and Blocks. Quantizing The 8 × 8 block of frequency coefficients is uniformly quantized, limiting the number of allowed values. The quantizer step scale is derived from the quantization matrix and quantizer scale and may be different for different coefficients and may change between macroblocks. The quantizer step size of the DC coefficients is fixed at eight. The DC quantized coefficient is determined by dividing the DC coefficient by eight and rounding to the nearest integer. AC coefficients are quantized using the intra-quantization matrix. Zig-Zag Scan Zig-zag scanning, starting with the DC component, generates a linear stream of quantized frequency coefficients arranged in order of increasing frequency, as shown in Figure 7.59. This produces long runs of zero coefficients. Coding of Quantized DC Coefficients After the DC coefficients have been quan- tized, they are losslessly coded. Coding of Y blocks within a macroblock fol- lows the order shown in Figure 12.2. The DC value of block 4 is the DC predictor for block 1 of the next macroblock. At the beginning of each slice, the DC predictor is set to 1024. The DC values of each Cb and Cr block are coded using the DC value of the corresponding block of the previous macroblock as a predictor. At the beginning of each slice, both DC predictors are set to 1024. The DCT DC differential values are organized by their absolute value as shown in Table 12.16. [size], which specifies the number of additional bits to define the level uniquely, is transmitted by a variable-length code, and is different for Y and CbCr since the statistics are different. For example, a size of four is followed by four additional bits. The decoder reverses the procedure to recover the quantized DC coefficients. 548 Chapter 12: MPEG-1 Coding of Quantized AC Coefficients After the AC coefficients have been quan- tized, they are scanned in the zig-zag order shown in Figure 7.59 and coded using runlength and level. The scan starts in position 1, as shown in Figure 7.59, as the DC coefficient in position 0 is coded separately. The run-lengths and levels are coded as shown in Table 12.18. The “s” bit denotes the sign of the level; “0” is positive and “1” is negative. For run-level combinations not shown in Table 12.18, an escape sequence is used, consisting of the escape code (ESC), followed by the run-length and level codes from Table 12.19. After the last DCT coefficient has been coded, an EOB code is added to tell the decoder that there are no more quantized coefficients in this 8 × 8 block. P Frames Macroblocks There are eight types of macroblocks in P frames, as shown in Table 12.10, due to the additional complexity of motion compensation. Skipped macroblocks are predicted macroblocks with a zero motion vector. Thus, no correction is available; the decoder copies skipped macroblocks from the previous frame into the current frame. The advantage of skipped macroblocks is that they require very few bits to transmit. They have no code; they are coded by having the macroblock address increment code skip over them. If the [macroblock quant] column in Table 12.10 has a “1,” the quantizer scale is transmitted. For the remaining macroblock types, the DCT correction is coded using the previous value for quantizer scale. If the [motion forward] column in Table 12.10 has a “1,” horizontal and vertical forward motion vectors are successively transmitted. If the [coded pattern] column in Table 12.10 has a “1,” the 6-bit coded block pattern is transmitted as a variable-length code. This tells the decoder which of the six blocks in the macroblock are coded (“1”) and which are not coded (“0”). Table 12.14 lists the codewords assigned to the 63 possible combinations. There is no code for when none of the blocks is coded; it is indicated by the macroblock type. For macroblocks in I frames and for intracoded macroblocks in P and B frames, the coded block pattern is not transmitted, but is assumed to be a value of 63 (all blocks are coded). To determine which type of macroblock to use, the encoder typically makes a series of decisions, as shown in Figure 12.3. DCT Intra-block AC coefficients are trans- formed in the same manner as they are for I frames. Intra-block DC coefficients are transformed differently; the predicted values are set to 1024, unless the previous block was intra coded. Non-intra-block coefficients represent differences between sample values rather than actual sample values. They are obtained by subtracting the motion-compensated values of the previous frame from the values in the current macroblock. There is no prediction of the DC value. Input sample values are –255 to +255, resulting in a range of about –2000 to +2000 for the AC coefficients. Video Coding Layer 549 MOTION COMPENSATION NON-INTRA CODED NOT CODED INTRA NO MOTION COMPENSATION NON-INTRA CODED NOT CODED INTRA QUANT NO QUANT QUANT NO QUANT QUANT NO QUANT QUANT NO QUANT PRED-MCQ PRED-MC PRED-M INTRA-Q INTRA-D PRED-CQ PRED-C SKIPPED INTRA-Q INTRA-D Figure 12.3. MPEG-1 P Frame Macroblock Type Selection. Quantizing Intra-blocks are quantized in the same manner as they are for I frames. Non-intra-blocks are quantized using the quantizer scale and the non-intra quantization matrix. The AC and DC coefficients are quantized in the same manner. Coding of Intra-Blocks Intra-blocks are coded the same way as I frame intra blocks. There is a difference in the handling of the DC coefficients in that the predicted value is 128, unless the previous block was intra coded. Coding of Non-Intra-Blocks The coded block pattern (CBP) is used to specify which blocks have coefficient data. These are coded similarly to the coding of intra blocks, except the DC coefficient is coded in the same manner as the AC coefficients. B Frames Macroblocks There are twelve types of macroblocks in B frames, as shown in Table 12.11, due to the additional complexity of backward motion compensation. Skipped macroblocks are macroblocks having the same motion vector and macroblock type as the previous macroblock, which cannot be intra coded. The advantage of skipped macroblocks is that they require very few bits to transmit. They have no code; they are coded by having the macroblock address increment code skip over them. If the [macroblock quant] column in Table 12.11 has a “1,” the quantizer scale is transmitted. For the rest of the macroblock types, the DCT correction is coded using the previous value for the quantizer scale. 550 Chapter 12: MPEG-1 FORWARD MC A BACKWARD MC A INTERPOLATED MC A NON-INTRA CODED QUANT NO QUANT PRED-*CQ PRED-*C NOT CODED PRED-* OR SKIPPED A INTRA QUANT NO QUANT INTRA-Q INTRA-D Figure 12.4. MPEG-1 B Frame Macroblock Type Selection. If the [motion forward] column in Table 12.11 has a “1,” horizontal and vertical forward motion vectors are successively transmitted. If the [motion backward] column in Table 12.11 has a “1,” horizontal and vertical backward motion vectors are successively transmitted. If both forward and backward motion types are present, the vectors are transmitted in this order: coded; this is indicated by the macroblock type. For macroblocks in I frames and for intracoded macroblocks in P and B frames, the coded block pattern is not transmitted, but is assumed to be a value of 63 (all blocks are coded). To determine which type of macroblock to use, the encoder typically makes a series of decisions, shown in Figure 12.4. horizontal forward vertical forward horizontal backward vertical backward If the [coded pattern] column in Table 12.11 has a “1,” the 6-bit coded block pattern is transmitted as a variable-length code. This tells the decoder which of the six blocks in the macroblock are coded (“1”) and which are not coded (“0”). Table 12.14 lists the codewords assigned to the 63 possible combinations. There is no code for when none of the blocks is Coding DCT coefficients of blocks are trans- formed into quantized coefficients and coded in the same way they are for P frames. D Frames D frames contain only DC-frequency data and are intended to be used for fast visible search applications. The data contained in a D frame should be just sufficient for the user to locate the desired video. Video Bitstream 551 Video Bitstream Figure 12.5 illustrates the video bitstream, a hierarchical structure with seven layers. From top to bottom the layers are: Vertical_size This 12-bit binary value specifies the height of the viewable portion of the Y component. The height in macroblocks is defined as (vertical_size + 15)/16. Video Sequence Sequence Header Group of Pictures (GOP) Picture Slice Macroblock (MB) Block Note that start codes (0x000001xx) must be byte aligned by inserting 0–7 “0” bits before the start code. Video Sequence Sequence_end_code This 32-bit field has a value of 0x000001B7 and terminates a video sequence. Sequence Header Data for each sequence consists of a sequence header followed by data for group of pictures (GOPs). The structure is shown in Figure 12.5. Sequence_header_code This 32-bit field has a value of 0x000001B3 and indicates the beginning of a sequence header. Horizontal_size This 12-bit binary value specifies the width of the viewable portion of the Y component. The width in macroblocks is defined as (horizontal_size + 15)/16. Pel_aspect_ratio This 4-bit codeword indicates the pixel aspect ratio, as shown in Table 12.3. Picture_rate This 4-bit codeword indicates the frame rate, as shown in Table 12.4. Bit_rate An 18-bit binary value specifying the bit- stream bit-rate, measured in units of 400 bps rounded upwards. A zero value is not allowed; a value of 0x3FFFF specifies variable bit-rate operation. If constrained_parameters_flag is a “1,” the bit-rate must be ≤1.856 Mbps. Marker_bit Always a “1.” Vbv_buf fer_size This 10-bit binary number specifies the minimum size of the video buffering verifier needed by the decoder to properly decode the sequence. It is defined as: B = 16 × 1024 × vbv_buffer_size If the constrained_parameters_flag bit is a “1,” the vbv_buffer_size must be ≤40 kB. 552 Chapter 12: MPEG-1 SEQUENCE HEADER SEQUENCE HEADER CODE HORIZONTAL SIZE VERTICAL SIZE ASPECT RATIO PICTURE RATE BIT RATE VBV BUFFER SIZE CONSTRAINED PARAMETERS FLAG INTRA QUANTIZER MATRIX NON-INTRA QUANTIZER MATRIX EXTENSION START CODE SEQUENCE EXTENSION DATA USER DATA START CODE USER DATA GOP GOP GOP LAYER GROUP START CODE TIME CODE CLOSED GOP BROKEN LINK EXTENSION START CODE GROUP EXTENSION DATA USER DATA START CODE USER DATA PICTURE PICTURE PICTURE LAYER PICTURE START CODE TEMPORAL REFERENCE PICTURE CODING TYPE VBV DELAY FULL PEL FORWARD VECTOR FORWARD F CODE FULL PEL BACKWARD VECTOR BACKWARD F CODE EXTRA INFORMATION PICTURE EXTENSION START CODE PICTURE EXTENSION DATA USER DATA START CODE USER DATA SLICE SLICE SLICE EXTRA QUANTIZER SLICE LAYER START INFORMATION MB MB SCALE CODE SLICE MB LAYER MB STUFFING MB ESCAPE MB ADDRESS INCREMENT MB TYPE QUANTIZER SCALE MOTION HORIZONTAL FORWARD MOTION VERTICAL FORWARD MOTION HORIZONTAL BACKWARD MOTION VERTICAL BACKWARD CODED BLOCK B0 PATTERN END B5 OF MB BLOCK LAYER DCT DC SIZE LUMINANCE DCT DC SIZE DIFFERENTIAL DCT DC SIZE CHROMINANCE DCT DC SIZE DIFFERENTIAL DCT DC COEFFICIENT FIRST DCT DC COEFFICIENT NEXT END OF BLOCK Figure 12.5. MPEG-1 Video Bitstream Layer Structures. Marker and reserved bits not shown. Video Bitstream 553 Height / Width forbidden 1.0000 0.6735 0.7031 0.7615 0.8055 0.8437 0.8935 0.9157 0.9815 1.0255 1.0695 1.0950 1.1575 1.2015 reser ved Example square pixel 576-line 16:9 480-line 16:9 576-line 4:3 480-line 4:3 Aspect Ratio Code 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Table 12.3. MPEG-1 pel_aspect_ratio Codewords. Frames per Second forbidden 24/1.001 24 25 30/1.001 30 50 60/1.001 60 reser ved reser ved reser ved reser ved reser ved reser ved reser ved Picture Rate Code 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Table 12.4. MPEG-1 picture_rate Codewords. 554 Chapter 12: MPEG-1 Constrained_parameters_flag This bit is set to a “1” if the following con- straints are met: horizontal_size ≤ 768 samples vertical_size ≤ 576 lines ((horizontal_size + 15)/16) × ((vertical_size + 15)/16) ≤ 396 ((horizontal_size + 15)/16) × ((vertical_size + 15)/16) × picture_rate ≤ 396*25 picture_rate ≤ 30 frames per second forward_f_code ≤ 4 backward_f_code ≤ 4 Load_intra_quantizer_matrix This bit is set to a “1” if intra_quantizer_matrix follows. If set to a “0,” the default values below are used until the next occurrence of a sequence header. 8 16 19 22 26 27 29 34 16 16 22 24 27 29 34 37 19 22 26 27 29 34 34 38 22 22 26 27 29 34 37 40 22 26 27 29 32 35 40 48 26 27 29 32 35 40 48 58 26 27 29 34 38 46 56 69 27 29 35 38 46 56 69 83 Intra_quantizer_matrix An optional list of 64 8-bit values that replace the current intra quantizer values. A value of zero is not allowed. The value for intra_quant [0, 0] is always 8. These values take effect until the next occurrence of a sequence header. Load_non_intra_quantizer_matrix This bit is set to a “1” if non_intra_quantizer_matrix follows. If set to a “0,” the default values below are used until the next occurrence of a sequence header. 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 Non_intra_quantizer_matrix An optional list of 64 8-bit values that replace the current non-intra quantizer values. A value of zero is not allowed. These values take effect until the next occurrence of a sequence header. Extension_start_code This optional 32-bit string of 0x000001B5 indicates the beginning of sequence_extension_data. sequence_extension_data continues until the detection of another start code. Sequence_extension_data These n × 8 bits are present only if extension_start_code is present. User_data_start_code This optional 32-bit string of 0x000001B2 indicates the beginning of user_data. user_data continues until the detection of another start code. User_data These n × 8 bits are present only if user_data_start_code is present. user_data must not contain a string of 23 or more consecutive zero bits. Video Bitstream 555 Timecode drop_frame_flag time_code_hours time_code_minutes marker_bit time_code_seconds time_code_pictures Range of Value 0–23 0–59 1 0–59 0–59 Number of Bits 1 5 6 1 6 6 Table 12.5. MPEG-1 time_code Field. Group of Pictures (GOP) Layer Data for each group of pictures consists of a GOP header followed by picture data. The structure is shown in Figure 12.5. Group_start_code This 32-bit value of 0x000001B8 indicates the beginning of a group of pictures. Time_code These 25 bits indicate timecode informa- tion, as shown in Table 12.5. [drop_frame_flag] may be set to “1” only if the picture rate is 30/ 1.001 (29.97) Hz. Closed_gop This 1-bit flag is set to “1” if the group of pictures has been encoded without motion vectors referencing the previous group of pictures. This bit allows support of editing the compressed bitstream. Broken_link This 1-bit flag is set to a “0” during encod- ing. It is set to a “1” during editing when the B frames following the first I frame of a group of pictures cannot be correctly decoded. Extension_start_code This optional 32-bit string of 0x000001B5 indicates the beginning of group_extension_data. group_extension_data continues until the detection of another start code. Group_extension_data These n × 8 bits are present only if extension_start_code is present. User_data_start_code This optional 32-bit string of 0x000001B2 indicates the beginning of user_data. user_data continues until the detection of another start code. User_data These n × 8 bits are present only if user_data_start_code is present. user_data must not contain a string of 23 or more consecutive zero bits. 556 Chapter 12: MPEG-1 Picture Layer Data for each picture layer consists of a picture header followed by slice data. The structure is shown in Figure 12.5. Picture_start_code This has a 32-bit value of 0x00000100. Vbv_delay For constant bit-rates, the 16-bit vbv_delay binary value sets the initial occupancy of the decoding buffer at the start of decoding a picture so that it doesn’t overflow or underflow. For variable bit-rates, vbv_delay has a value of 0xFFFF. Temporal_reference For the first frame in display order of each group of pictures, the temporal_reference value is zero. This 10-bit binary value then increments by one, modulo 1024 for each frame in display order. Full_pel_for ward_vector This 1-bit flag is present if picture_coding_type is “010” (P frames) or “011” (B frames). If a “1,” the forward motion vectors are based on integer samples, rather than half-samples. Picture_coding_type This 3-bit codeword indicates the frame type (I frame, P frame, B frame, or D frame), as shown in Table 12.6. D frames are not to be used in the same video sequence as other frames. Coding Type forbidden I frame P frame B frame D frame reser ved reser ved reser ved Code 000 001 010 011 100 101 110 111 Table 12.6. MPEG-1 picture_coding_type Code. For ward_f_code This 3-bit binary number is present if picture_coding_type is “010” (P frames) or “011” (B frames). Values of “001” to “111” are used; a value of “000” is forbidden. Two parameters used by the decoder to decode the forward motion vectors are derived from this field: forward_r_size and forward_f. forward_r_size is one less than forward_f_code. forward_f is defined in Table 12.7. Forward F Code 001 010 011 100 101 110 111 Forward F Value 1 2 4 8 16 32 64 Table 12.7. MPEG-1 forward_f_code Values. Video Bitstream 557 Full_pel_backward_vector This 1-bit flag is present if picture_coding_type is “011” (B frames). If a “1,” the backward motion vectors are based on integer samples, rather than half-samples. User_data_start_code This optional 32-bit string of 0x000001B2 indicates the beginning of user_data. user_data continues until the detection of another start code. Backward_f_code This 3-bit binary number is present if picture_coding_type is “011” (B frames). Values of “001” to “111” are used; a value of “000” is forbidden. Two parameters used by the decoder to decode the backward motion vectors are derived from this field: backward_r_size and backward_f. backward_r_size is one less than backward_f_code. backward_f is defined the same as forward_f. User_data These n × 8 bits are present only if user_data_start_code is present. User data must not contain a string of 23 or more consecutive zero bits. Slice Layer Data for each slice layer consists of a slice header followed by macroblock data. The structure is shown in Figure 12.5. Extra_bit_picture A bit which, when set to “1,” indicates that extra_information_picture follows. Extra_information_picture If extra_bit_picture = “1,” then these 9 bits follow consisting of 8 bits of data (extra_information_picture) and then another extra_bit_picture to indicate if a further 9 bits follow, and so on. Extension_start_code This optional 32-bit string of 0x000001B5 indicates the beginning of picture_extension_data. picture_extension_data continues until the detection of another start code. Picture_extension_data These n × 8 bits are present only if extension_start_code is present. Slice_start_code The first 24 bits of this 32-bit field have a value of 0x000001. The last 8 bits are the slice_vertical_position, and have a value of 0x01–0xAF. slice_vertical_position specifies the vertical position in macroblock units of the first macroblock in the slice. The value of the first row of macroblocks is one. Quantizer_scale This 5-bit binary number has a value of 1– 31 (a value of 0 is forbidden). It specifies the scale factor of the reconstruction level of the DCT coefficients. The decoder uses this value until another quantizer_scale is received at either the slice or macroblock layer. Extra_bit_slice A bit which, when set to “1,” indicates that extra_information_slice follows. 558 Chapter 12: MPEG-1 Extra_information_slice If extra_bit_slice = “1,” then these 9 bits fol- low consisting of 8 bits of data (extra_information_slice) and then another extra_bit_slice to indicate if a further 9 bits follow, and so on. Macroblock (MB) Layer Data for each macroblock layer consists of a macroblock header followed by motion vectors and block data. The structure is shown in Figure 12.5. Macroblock_stuf fing This optional 11-bit field is a fixed bit string of “0000 0001 111” and may be used to increase the bit-rate to match the storage or transmission requirements. Any number of consecutive macroblock_stuffing fields may be used. Macroblock_escape This optional 11-bit field is a fixed bit string of “0000 0001 000” and is used when the difference between the current macroblock address and the previous macroblock address is greater than 33. It forces the value of macroblock_address_increment to be increased by 33. Any number of consecutive macroblock_escape fields may be used. Macroblock_address_increment This is a variable-length codeword that specifies the difference between the current macroblock address and the previous macroblock address. It has a maximum value of 33. Values greater than 33 are encoded using the macroblock_escape field. The variable-length codes are listed in Table 12.8. Macroblock_type This is a variable-length codeword that specifies the coding method and macroblock content. The variable-length codes are listed in Tables 12.9 through 12.12. Quantizer_scale This optional 5-bit binary number has a value of 1–31 (a value of 0 is forbidden). It specifies the scale factor of the reconstruction level of the received DCT coefficients. The decoder uses this value until another quantizer_scale is received at either the slice or macroblock layer. The quantizer_scale field is present only when [macroblock quant] = “1” in Tables 12.9 through 12.12. Motion_horizontal_for ward_code This optional variable-length codeword contains forward motion vector information as defined in Table 12.13. It is present only when [motion forward] = “1” in Tables 12.9 through 12.12. Motion_horizontal_for ward_r This optional binary number (of forward_r_size bits) is used to help decode the forward motion vectors. It is present only when [motion forward] = “1” in Tables 12.9 through 12.12, forward_f_code ≠ “001,” and motion_horizontal_forward_code ≠ “0.” Motion_vertical_for ward_code This optional variable-length codeword contains forward motion vector information as defined in Table 12.13. It is present only when [motion forward] = “1” in Tables 12.9 through 12.12. Video Bitstream 559 Increment Value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Code 1 011 010 0011 0010 0001 1 0001 0 0000 111 0000 110 0000 1011 0000 1010 0000 1001 0000 1000 0000 0111 0000 0110 0000 0101 11 Increment Value 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Code 0000 0101 10 0000 0101 01 0000 0101 00 0000 0100 11 0000 0100 10 0000 0100 011 0000 0100 010 0000 0100 001 0000 0100 000 0000 0011 111 0000 0011 110 0000 0011 101 0000 0011 100 0000 0011 011 0000 0011 010 0000 0011 001 0000 0011 000 Table 12.8. MPEG-1 Variable-Length Code Table for macroblock_address_increment. Macroblock Type intra-d intra-q Macroblock Quant 0 1 Motion Forward 0 0 Motion Backward 0 0 Coded Pattern 0 0 Intra Macroblock 1 1 Code 1 01 Table 12.9. MPEG-1 Variable-Length Code Table for macroblock_type for I Frames. Macroblock Type pred-mc pred-c pred-m intra-d pred-mcq pred-cq intra-q skipped Macroblock Quant 0 0 0 0 1 1 1 Motion Forward 1 0 1 0 1 0 0 Motion Backward 0 0 0 0 0 0 0 Coded Pattern 1 1 0 0 1 1 0 Intra Macroblock 0 0 0 1 0 0 1 Code 1 01 001 0001 1 0001 0 0000 1 0000 01 Table 12.10. MPEG-1 Variable-Length Code Table for macroblock_type for P Frames. 560 Chapter 12: MPEG-1 Macroblock Type pred-i pred-ic pred-b intra-bc pred-f pred-fc intra-d pred-icq pred-fcq pred-bcq intra-q skipped Macroblock Quant 0 0 0 0 0 0 0 1 1 1 1 Motion Forward 1 1 0 0 1 1 0 1 1 0 0 Motion Backward 1 1 1 1 0 0 0 1 0 1 0 Coded Pattern 0 1 0 1 0 1 0 1 1 1 0 Intra Macroblock 0 0 0 0 0 0 1 0 0 0 1 Code 10 11 010 011 0010 0011 0001 1 0001 0 0000 11 0000 10 0000 01 Table 12.11. MPEG-1 Variable-Length Code Table for macroblock_type for B Frames. Macroblock Quant 0 Motion Forward 0 Motion Backward 0 Coded Pattern 0 Intra Macroblock 1 Code 1 Table 12.12. MPEG-1 Variable-Length Code Table for macroblock_type for D Frames. Video Bitstream 561 Motion Vector Difference –16 –15 –14 –13 –12 –11 –10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 Code 0000 0011 001 0000 0011 011 0000 0011 101 0000 0011 111 0000 0100 001 0000 0100 011 0000 0100 11 0000 0101 01 0000 0101 11 0000 0111 0000 1001 0000 1011 0000 111 0001 1 0011 011 1 Motion Vector Difference 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Code 010 0010 0001 0 0000 110 0000 1010 0000 1000 0000 0110 0000 0101 10 0000 0101 00 0000 0100 10 0000 0100 010 0000 0100 000 0000 0011 110 0000 0011 100 0000 0011 010 0000 0011 000 Table 12.13. MPEG-1 Variable-Length Code Table for motion_horizontal_forward_code, motion_vertical_forward_code, motion_horizontal_backward_code, and motion_vertical_backward_code. 562 Chapter 12: MPEG-1 Motion_vertical_for ward_r This optional binary number (of forward_r_size bits) is used to help decode the forward motion vectors. It is present only when [motion forward] = “1” in Tables 12.9 through 12.12, forward_f_code ≠ “001,” and motion_vertical_forward_code ≠ “0.” Motion_horizontal_backward_code This optional variable-length codeword contains backward motion vector information as defined in Table 12.13. It is present only when [motion backward] = “1” in Tables 12.9 through 12.12. Motion_horizontal_backward_r This optional binary number (of backward_r_size bits) is used to help decode the backward motion vectors. It is present only when [motion backward] = “1” in Tables 12.9 through 12.12, backward_f_code ≠ “001,” and motion_horizontal_backward_code ≠ “0.” Motion_vertical_backward_code This optional variable-length codeword contains backward motion vector information as defined in Table 12.13. The decoded value helps decide if motion_vertical_backward_r appears in the bitstream. This parameter is present only when [motion backward] = “1” in Tables 12.9 through 12.12. Motion_vertical_backward_r This optional binary number (of backward_r_size bits) is used to help decode the backward motion vectors. It is present only when [motion backward] = “1” in Tables 12.9 through 12.12, backward_f_code ≠ “001,” and motion_vertical_backward_code ≠ “0.” Coded_block_pattern This optional variable-length codeword is used to derive the coded block pattern (CBP) as shown in Table 12.14. It is present only if [coded pattern] = “1” in Tables 12.9 through 12.12, and indicates which blocks in the macroblock have at least one transform coefficient transmitted. The coded block pattern binary number is represented as: P0P1P2P3P4P5 where Pn = “1” for any coefficient present for block [n], else Pn = “0.” Block numbering (decimal format) is given in Figure 12.2. End_of_macroblock This optional 1-bit field has a value of “1.” It is present only for D frames. Block Layer Data for each block layer consists of coefficient data. The structure is shown in Figure 12.5. Dct_dc_size_luminance This optional variable-length codeword is used with intra-coded Y blocks. It specifies the number of bits used for dct_dc_differential. The variable-length codewords are shown in Table 12.15. Video Bitstream 563 Coded Block Pattern 60 4 8 16 32 12 48 20 40 28 44 52 56 1 61 2 62 24 36 3 63 5 Code 111 1101 1100 1011 1010 1001 1 1001 0 1000 1 1000 0 0111 1 0111 0 0110 1 0110 0 0101 1 0101 0 0100 1 0100 0 0011 11 0011 10 0011 01 0011 00 0010 111 Coded Block Pattern 9 17 33 6 10 18 34 7 11 19 35 13 49 21 41 14 50 22 42 15 51 23 Code 0010 110 0010 101 0010 100 0010 011 0010 010 0010 001 0010 000 0001 1111 0001 1110 0001 1101 0001 1100 0001 1011 0001 1010 0001 1001 0001 1000 0001 0111 0001 0110 0001 0101 0001 0100 0001 0011 0001 0010 0001 0001 Coded Block Pattern 43 25 37 26 38 29 45 53 57 30 46 54 58 31 47 55 59 27 39 Code 0001 0000 0000 1111 0000 1110 0000 1101 0000 1100 0000 1011 0000 1010 0000 1001 0000 1000 0000 0111 0000 0110 0000 0101 0000 0100 0000 0011 1 0000 0011 0 0000 0010 1 0000 0010 0 0000 0001 1 0000 0001 0 Table 12.14. MPEG-1 Variable-Length Code Table for coded_block_pattern. DCT DC Size Luminance 0 1 2 3 4 Code 100 00 01 101 110 DCT DC Size Luminance 5 6 7 8 Code 1110 1111 0 1111 10 1111 110 Table 12.15. MPEG-1 Variable-Length Code Table for dct_dc_size_luminance. 564 Chapter 12: MPEG-1 Dct_dc_dif ferential This optional variable-length codeword is present after dct_dc_size_luminance if dct_dc_size_luminance ≠ “0.” The values are shown in Table 12.16. Dct_coef ficient_first This optional variable-length codeword is used for the first DCT coefficient in non-intracoded blocks, and is defined in Tables 12.18 and 12.19. Dct_dc_size_chrominance This optional variable-length codeword is used with intra-coded Cb and Cr blocks. It specifies the number of bits used for dct_dc_differential. The variable-length codewords are shown in Table 12.17. Dct_dc_dif ferential This optional variable-length codeword is present after dct_dc_size_chrominance if dct_dc_size_chrominance ≠ “0.” The values are shown in Table 12.16. Dct_coef ficient_next Up to 63 optional variable-length code- words present only for I, P, and B frames. They are the DCT coefficients after the first one, and are defined in Tables 12.18 and 12.19. End_of_block This 2-bit value (present only for I, P, and B frames) is used to indicate that no additional non-zero coefficients are present. The value of this parameter is “10.” Video Bitstream 565 DCT DC Differential Size –255 to –128 8 –127 to –64 7 –63 to –32 6 –31 to –16 5 –15 to –8 4 –7 to –4 3 –3 to –2 2 –1 1 0 0 1 1 2 to 3 2 4 to 7 3 8 to 15 4 16 to 31 5 32 to 63 6 64 to 127 7 128 to 255 8 Code (Y) 1111110 111110 11110 1110 110 101 01 00 100 00 01 101 110 1110 11110 111110 1111110 Code (CbCr) 11111110 1111110 111110 11110 1110 110 10 01 00 01 10 110 1110 11110 111110 1111110 11111110 Additional Code 00000000 to 01111111 0000000 to 0111111 000000 to 011111 00000 to 01111 0000 to 0111 000 to 011 00 to 01 0 1 10 to 11 100 to 111 1000 to 1111 10000 to 11111 100000 to 111111 1000000 to 1111111 10000000 to 11111111 Table 12.16. MPEG-1 Variable-Length Code Table for dct_dc_differential. DCT DC Size Chrominance 0 1 2 3 4 Code 00 01 10 110 1110 DCT DC Size Chrominance 5 6 7 8 Code 1111 0 1111 10 1111 110 1111 1110 Table 12.17. MPEG-1 Variable-Length Code Table for dct_dc_size_chrominance. 566 Chapter 12: MPEG-1 Run Level Code end_of_block 0 (note 2) 1 0 (note 3) 1 1 1 0 2 2 1 0 3 3 1 4 1 1 2 5 1 6 1 7 1 0 4 2 2 8 1 9 1 10 1s 11 s 011 s 0100 s 0101 s 0010 1 s 0011 1 s 0011 0 s 0001 10 s 0001 11 s 0001 01 s 0001 00 s 0000 110 s 0000 100 s 0000 111 s 0000 101 s Notes: 1. s = sign of level; “0” for positive; “1” for negative. 2. Used for dct_coefficient_first 3. Used for dct_coefficient_next. Run escape 0 0 1 3 10 11 12 13 0 1 2 4 5 14 15 16 Level 5 6 3 2 1 1 1 1 7 4 3 2 2 1 1 1 Code 0000 01 0010 0110 s 0010 0001 s 0010 0101 s 0010 0100 s 0010 0111 s 0010 0011 s 0010 0010 s 0010 0000 s 0000 0010 10 s 0000 0011 00 s 0000 0010 11 s 0000 0011 11 s 0000 0010 01 s 0000 0011 10 s 0000 0011 01 s 0000 0010 00 s Table 12.18a. MPEG-1 Variable-Length Code Table for dct_coefficient_first and dct_coefficient_next. Video Bitstream 567 Run Level Code Run 0 8 0000 0001 1101 s 0 0 9 0000 0001 1000 s 0 0 10 0000 0001 0011 s 0 0 11 0000 0001 0000 s 0 1 5 0000 0001 1011 s 1 2 4 0000 0001 0100 s 1 3 3 0000 0001 1100 s 2 4 3 0000 0001 0010 s 3 6 2 0000 0001 1110 s 5 7 2 0000 0001 0101 s 9 8 2 0000 0001 0001 s 10 17 1 0000 0001 1111 s 22 18 1 0000 0001 1010 s 23 19 1 0000 0001 1001 s 24 20 1 0000 0001 0111 s 25 21 1 0000 0001 0110 s 26 Note: 1. s = sign of level; “0” for positive; “1” for negative. Level 12 13 14 15 6 7 5 4 3 2 2 1 1 1 1 1 Code 0000 0000 1101 0 s 0000 0000 1100 1 s 0000 0000 1100 0 s 0000 0000 1011 1 s 0000 0000 1011 0 s 0000 0000 1010 1 s 0000 0000 1010 0 s 0000 0000 1001 1 s 0000 0000 1001 0 s 0000 0000 1000 1 s 0000 0000 1000 0 s 0000 0000 1111 1 s 0000 0000 1111 0 s 0000 0000 1110 1 s 0000 0000 1110 0 s 0000 0000 1101 1 s Table 12.18b. MPEG-1 Variable-Length Code Table for dct_coefficient_first and dct_coefficient_next. 568 Chapter 12: MPEG-1 Run Level Code Run 0 16 0000 0000 0111 11 s 0 0 17 0000 0000 0111 10 s 1 0 18 0000 0000 0111 01 s 1 0 19 0000 0000 0111 00 s 1 0 20 0000 0000 0110 11 s 1 0 21 0000 0000 0110 10 s 1 0 22 0000 0000 0110 01 s 1 0 23 0000 0000 0110 00 s 1 0 24 0000 0000 0101 11 s 1 0 25 0000 0000 0101 10 s 1 0 26 0000 0000 0101 01 s 1 0 27 0000 0000 0101 00 s 1 0 28 0000 0000 0100 11 s 6 0 29 0000 0000 0100 10 s 11 0 30 0000 0000 0100 01 s 12 0 31 0000 0000 0100 00 s 13 0 32 0000 0000 0011 000 s 14 0 33 0000 0000 0010 111 s 15 0 34 0000 0000 0010 110 s 16 0 35 0000 0000 0010 101 s 27 0 36 0000 0000 0010 100 s 28 0 37 0000 0000 0010 011 s 29 0 38 0000 0000 0010 010 s 30 0 39 0000 0000 0010 001 s 31 Note: 1. s = sign of level; “0” for positive; “1” for negative. Level 40 8 9 10 11 12 13 14 15 16 17 18 3 2 2 2 2 2 2 1 1 1 1 1 Code 0000 0000 0010 000 s 0000 0000 0011 111 s 0000 0000 0011 110 s 0000 0000 0011 101 s 0000 0000 0011 100 s 0000 0000 0011 011 s 0000 0000 0011 010 s 0000 0000 0011 001 s 0000 0000 0001 0011 s 0000 0000 0001 0010 s 0000 0000 0001 0001 s 0000 0000 0001 0000 s 0000 0000 0001 0100 s 0000 0000 0001 1010 s 0000 0000 0001 1001 s 0000 0000 0001 1000 s 0000 0000 0001 0111 s 0000 0000 0001 0110 s 0000 0000 0001 0101 s 0000 0000 0001 1111 s 0000 0000 0001 1110 s 0000 0000 0001 1101 s 0000 0000 0001 1100 s 0000 0000 0001 1011 s Table 12.18c. MPEG-1 Variable-Length Code Table for dct_coefficient_first and dct_coefficient_next. Video Bitstream 569 Run Level 0 1 2 : 63 –256 –255 –254 : –129 –128 –127 –126 : –2 –1 0 1 : 127 128 129 : 255 Fixed Length Code 0000 00 0000 01 0000 10 : 1111 11 forbidden 1000 0000 0000 0001 1000 0000 0000 0010 : 1000 0000 0111 1111 1000 0000 1000 0000 1000 0001 1000 0010 : 1111 1110 1111 1111 forbidden 0000 0001 : 0111 1111 0000 0000 1000 0000 0000 0000 1000 0001 : 0000 0000 1111 1111 Table 12.19. Run, Level Encoding Following an Escape Code for dct_coefficient_first and dct_coefficient_next. 570 Chapter 12: MPEG-1 System Bitstream The system bitstream multiplexes the audio and video bitstreams into a single bitstream, and formats it with control information into a specific protocol as defined by MPEG-1. Packet data may contain either audio or video information. Up to 32 audio and 16 video streams may be multiplexed together. Two types of private data streams are also supported. One type is completely private; the other is used to support synchronization and buffer management. Maximum packet sizes usually are about 2048 bytes, although much larger sizes are supported. When stored on CD-ROM, the length of the packs coincides with the sectors. Typically, there is one audio packet for every six or seven video packets. Figure 12.6 illustrates the system bitstream, a hierarchical structure with three layers. From top to bottom the layers are: ISO/IEC 11172 Layer Pack Packet Note that start codes (0x000001xx) must be byte aligned by inserting 0–7 “0” bits before the start code. Pack Layer Data for each pack consists of a pack header followed by a system header (optional) and packet data. The structure is shown in Figure 12.6. Pack_start_code This 32-bit field has a value of 0x000001BA and identifies the start of a pack. Fixed_bits These four bits always have a value of “0010.” System_clock_reference_32–30 The system_clock_reference (SCR) is a 33- bit number coded using three fields separated by marker bits. System_clock_reference indicates the intended time of arrival of the last byte of the system_clock_reference field at the input of the decoder. The value of system_clock_reference is the number of 90 kHz clock periods. Marker_bit This bit always has a value of “1.” System_clock_reference_29–15 ISO/IEC 11172 Layer ISO_11172_end_code This 32-bit field has a value of 0x000001B9 and terminates a system bitstream. Marker_bit This bit always has a value of “1.” System_clock_reference_14–0 Marker_bit This bit always has a value of “1.” Marker_bit This bit always has a value of “1.” ISO 11172 LAYER PACK START CODE PACK PACK START CODE PACK System Bitstream 571 PACK START CODE PACK ISO 11172 END CODE PACK LAYER SYSTEM CLOCK REFERENCE MUX RATE SYSTEM HEADER PACKET N PACKET N+1 PACKET LAYER PACKET START CODE STREAM ID PACKET LENGTH STD DATA PTS DATA DTS DATA PACKET N DATA Figure 12.6. MPEG-1 System Bitstream Layer Structures. Marker and reserved bits not shown. Mux_rate This 22-bit binary number specifies the rate at which the decoder receives the bitstream. It specifies units of 50 bytes per second, rounded upwards. A value of zero is not allowed. Marker_bit This bit always has a value of “1.” Marker_bit This bit always has a value of “1.” Rate_bound This 22-bit binary number specifies an integer value greater than or equal to the maximum value of mux_rate. It may be used by the decoder to determine if it is capable of decoding the entire bitstream. System Header System_header_start_code This 32-bit field has a value of 0x000001BB and identifies the start of a system header. Header_length This 16-bit binary number specifies the number of bytes in the system header following header_length. Marker_bit This bit always has a value of “1.” Audio_bound This 6-bit binary number, with a range of 0–32, specifies an integer value greater than or equal to the maximum number of simultaneously active audio streams. Fixed_flag This bit specifies fixed bit-rate (“1”) or vari- able bit-rate (“0”) operation. 572 Chapter 12: MPEG-1 CSPS_flag This bit specifies whether the bitstream is a constrained system parameter stream (“1”) or not (“0”). System_audio_lock_flag This bit has a value of “1” if there is a con- stant relationship between the audio sampling rate and the decoder’s system clock frequency. System_video_lock_flag This bit has a value of “1” if there is a con- stant relationship between the video picture rate and the decoder’s system clock frequency. Marker_bit This bit always has a value of “1.” Video_bound This 5-bit binary number, with a range of 0–16, specifies an integer value greater than or equal to the maximum number of simultaneously active video streams. Reser ved_byte These eight bits always have a value of “1111 1111.” Stream_ID This optional 8-bit field, as defined in Table 12.20, indicates the type and stream number to which the following STD_buffer_bound_scale and STD_buffer_size_bound fields refer to. Each audio and video stream present in the system bitstream must be specified only once in each system header. Stream Type all audio streams all video streams reser ved stream private stream 1 padding stream private stream 2 audio stream number xxxxx video stream number xxxx reser ved data stream number xxxx Stream ID 1011 1000 1011 1001 1011 1100 1011 1101 1011 1110 1011 1111 110x xxxx 1110 xxxx 1111 xxxx Table 12.20. MPEG-1 stream_ID Code. STD_buf fer_bound_scale This optional 1-bit field specifies the scal- ing factor used to interpret STD_buffer_size_bound. For an audio stream, it has a value of “0.” For a video stream, it has a value of “1.” For other stream types, it can be either a “0” or a “1.” It is present only if stream_ID is present. STD_buf fer_size_bound This optional 13-bit binary number speci- fies a value greater than or equal to the maximum decoder input buffer size. If STD_buffer_bound_scale = “0,” then STD_buffer_size_bound measures the size in units of 128 bytes. If STD_buffer_bound_scale = “1,” then STD_buffer_size_bound measures the size in units of 1024 bytes. It is present only if stream_ID is present. Fixed_bits This optional 2-bit field has a value of “11.” It is present only if stream_ID is present. System Bitstream 573 Packet Layer Packet_start_code_prefix This 24-bit field has a value of 0x000001. Together with the stream_ID that follows, it indicates the start of a packet. Stream_ID This 8-bit binary number specifies the type and number of the bitstream present, as defined in Table 12.20. Packet_length This 16-bit binary number specifies the number of bytes in the packet after the packet_length field. Stuf fing_byte This optional parameter has a value of “1111 1111.” Up to 16 consecutive stuffing_bytes may be used to meet the requirements of the storage medium. It is present only if stream_ID ≠ private stream 2. STD_bits These optional two bits have a value of “01” and indicate that the STD_buffer_scale and STD_buffer_size fields follow. This field may be present only if stream_ID ≠ private stream 2. STD_buf fer_scale This optional 1-bit field specifies the scal- ing factor used to interpret STD_buffer_size. For an audio stream, it has a value of “0.” For a video stream, it has a value of “1.” For other stream types, it can be either a “0” or a “1.” This field is present only if STD_bits is present and stream_ID ≠ private stream 2. STD_buf fer_size This optional 13-bit binary number speci- fies the size of the decoder input buffer. If STD_buffer_scale = “0,” then STD_buffer_size measures the size in units of 128 bytes. If STD_buffer_scale = “1,” then STD_buffer_size measures the size in units of 1024 bytes. This field is present only if STD_bits is present and stream_ID ≠ private stream 2. P TS_bits These optional 4 bits have a value of “0010” and indicate the following presentation time stamps are present. This field may be present only if stream_ID ≠ private stream 2. Presentation_time_stamp_32–30 The optional presentation_time_stamp (PTS) is a 33-bit number coded using three fields, separated by marker bits. PTS indicates the intended time of display by the decoder. The value of PTS is the number of periods of a 90 kHz system clock. This field is present only if PTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional bit always has a value of “1.” It is present only if PTS_bits is present and stream_ID ≠ private stream 2. Presentation_time_stamp_29–15 This optional field is present only if PTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional bit always has a value of “1.” It is present only if PTS_bits is present and stream_ID ≠ private stream 2. 574 Chapter 12: MPEG-1 Presentation_time_stamp_14–0 This optional field is present only if PTS_bits is present and stream_ID ≠ private stream 2. Presentation_time_stamp_14–0 This optional field is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if PTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. DTS_bits These optional 4 bits have a value of “0011” and indicate the following presentation and decoding time stamps are present. This field may be present only if stream_ID ≠ private stream 2. Presentation_time_stamp_32–30 The optional presentation_time_stamp (PTS) is a 33-bit number coded using three fields, separated by marker bits. PTS indicates the intended time of display by the decoder. The value of PTS is the number of periods of a 90 kHz system clock. This field is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Presentation_time_stamp_29–15 This optional field is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Fixed_bits This optional 4-bit field has a value of “0001.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Decoding_time_stamp_32–30 The optional decoding_time_stamp (DTS) is a 33-bit number coded using three fields, separated by marker bits. DTS indicates the intended time of decoding by the decoder of the first access unit that commences in the packet. The value of DTS is the number of periods of a 90 kHz system clock. It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Decoding_time_stamp_29–15 This optional field is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. Video Decoding 575 Decoding_time_stamp_14–0 This optional field is present only if DTS_bits is present and stream_ID ≠ private stream 2. Marker_bit This optional 1-bit field always has a value of “1.” It is present only if DTS_bits is present and stream_ID ≠ private stream 2. NonP TS_nonDTS_bits These optional 8 bits have a value of “0000 1111” and are present if the STD_bits field, PTS_bits field, or DTS_bits field (and their corresponding following fields) are not present. Fast Playback Considerations Fast forward operation can be implemented by using D frames or the decoding only of I frames. However, decoding only I frames at the faster rate places a major burden on the transmission medium and the decoder. Alternately, the source may be able to sort out the desired I frames and transmit just those frames, allowing the bit-rate to remain constant. Pause Mode Considerations This requires the decoder to be able to control the incoming bitstream. If it doesn’t, when playback resumes there may be a delay and skipped frames. Packet_data_byte This is [n] bytes of data from the bitstream specified by the packet layer stream_ID. The number of data bytes may be determined from the packet_length parameter. Video Decoding A system demultiplexer parses the system bitstream, demultiplexing the audio and video bitstreams. The video decoder essentially performs the inverse of the encoder. From the coded video bitstream, it reconstructs the I frames. Using I frames, additional coded data, and motion vectors, the P and B frames are generated. Finally, the frames are output in the proper order. Reverse Playback Considerations This requires the decoder to be able to decode each group of pictures in the forward direction, store them, and display them in reverse order. To minimize the storage requirements of the decoder, groups of pictures should be small or the frames may be reordered. Reordering can be done by transmitting frames in another order or by reordering the coded pictures in the decoder buffer. Decode Postprocessing The SIF data usually is converted to 720 × 480i or 720 × 576i. Suggested upsampling filters are discussed in the MPEG-1 specification. The original decoded lines correspond to Field 1. Field 2 uses interpolated lines. 576 Chapter 12: MPEG-1 Real-World Issues System Bitstream Termination A common error is the improper placement of sequence_end_code in the system bitstream. When this happens, some decoders may not know that the end of the video occurred, and output garbage. Another problem occurs when a system bitstream is shortened just by eliminating trailing frames, removing sequence_end_code altogether. In this case, the decoder may be unsure when to stop. Timecodes Since some decoders rely on the timecode information, it should be implemented. To minimize problems, the video bitstream should start with a timecode of zero and increment by one each frame. Variable Bit-Rates Although variable bit-rates are supported, a constant bit-rate should be used if possible. Since vbv_delay doesn’t make sense for a variable bit-rate, the MPEG-1 standard specifies that it be set to the maximum value. However, some decoders use vbv_delay with variable bit-rates. This could result in a 2– 3 second delay before starting video, causing the first 60–90 frames to be skipped. Constrained Bitstreams Most MPEG-1 decoders can handle only the constrained parameters subset of MPEG-1. To ensure maximum compatibility, only the constrained parameters subset should be used. Source Sample Clock Good compression with few artifacts requires a video source that generates or uses a very stable sample clock. This ensures that the vertical alignment of samples over the entire picture is maintained. With poorly designed sample clock generation, the artifacts usually get worse towards the right side of the picture. References 1. Digital Video Magazine, “Not All MPEGs Are Created Equal,” by John Toebes, Doug Walker, and Paul Kaiser, August 1995. 2. Digital Video Magazine, “Squeeze the Most From MPEG,” by Mark Magel, August 1995. 3. ISO/IEC 11172–1, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 1: Systems. 4. ISO/IEC 11172–2, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 2: Video. 5. ISO/IEC 11172–3, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 3: Audio. 6. ISO/IEC 11172–4, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 4: Compliance testing. 7. ISO/IEC 11172–5, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Part 5: Software simulation. 8. Watkinson, John, The Engineer’s Guide to Compression, Snell and Wilcox Handbook Series. 577 Chapter 13: MPEG-2 Chapter 13 MPEG-2 MPEG-2 extends MPEG-1 to cover a wider range of applications. The MPEG-1 chapter should be reviewed to become familiar with the basics of MPEG before reading this chapter. The primary application targeted during the definition process was all-digital transmission of broadcast-quality video at bit-rates of 4– 9 Mbps. However, MPEG-2 is useful for many other applications, such as HDTV, and now supports bit-rates of 1.5–60 Mbps. MPEG-2 is an ISO standard (ISO/IEC 13818), and consists of eleven parts: systems video audio conformance testing software simulation DSM-CC extensions advanced audio coding RTI extension DSM-CC conformance IPMP ISO/IEC 13818–1 ISO/IEC 13818–2 ISO/IEC 13818–3 ISO/IEC 13818–4 ISO/IEC 13818–5 ISO/IEC 13818–6 ISO/IEC 13818–7 ISO/IEC 13818–9 ISO/IEC 13818–10 ISO/IEC 13818–11 As with MPEG-1, the compressed bitstreams implicitly define the decompression algorithms. The compression algorithms are up to the individual manufacturers, within the scope of an international standard. The Digital Storage Media Command and Control (DSM-CC) extension (ISO/IEC 13818–6) is a toolkit for developing control channels associated with MPEG-2 streams. In addition to providing VCR-type features such as fast-forward, rewind, pause, etc., it may be used for a wide variety of other purposes, such as packet data transport. DSM-CC works in conjunction with next-generation packet networks, working alongside Internet protocols as RSVP, RTSP, RTP, and SCP. The Real Time Interface (RTI) extension (ISO/IEC 13818-9) defines a common interface point to which terminal equipment manufacturers and network operators can design. RTI specifies a delivery model for the bytes of an MPEG-2 System stream at the input of a real decoder, whereas MPEG-2 System defines an idealized byte delivery schedule. IPMP (Intellectual Property Management and Protection) is a digital rights management (DRM) standard, adapted from the MPEG-4 IPMP extension specification. Rather than a complete system, a variety of functions are provided within a framework. 577 578 Chapter 13: MPEG-2 Audio Overview In addition to the non-backwards-compatible audio extension (ISO/IEC 13818–7), MPEG-2 supports up to five full-bandwidth channels compatible with MPEG-1 audio coding. It also extends the coding of MPEG-1 audio to half sampling rates (16 kHz, 22.05 kHz, and 24 kHz) for improved quality for bitrates at or below 64 kbps per channel. MPEG-2.5 is an unofficial, yet common, extension to the audio capabilities of MPEG-2. It adds sampling rates of 8 kHz, 11.025 kHz, and 12 kHz. Video Overview With MPEG-2, profiles specify the syntax (i.e., algorithms) and levels specify various parameters (resolution, frame rate, bit-rate, etc.). Main Profile@Main Level is targeted for SDTV applications, while Main Profile@High Level is targeted for HDTV applications. Levels MPEG-2 supports four levels, which specify resolution, frame rate, coded bit-rate, and so on for a given profile. Low Level (LL) MPEG-1 Constrained Parameters Bit- stream (CPB), supporting up to 352 × 288 at up to 30 frames per second. Maximum bit-rate is 4 Mbps. Main Level (ML) MPEG-2 Constrained Parameters Bit- stream (CPB) supports up to 720 × 576 at up to 30 frames per second and is intended for SDTV applications. Maximum bit-rate is 15–20 Mbps. High 1440 Level This level supports up to 1440 × 1088 at up to 60 frames per second and is intended for HDTV applications. Maximum bit-rate is 60–80 Mbps. High Level (HL) High Level supports up to 1920 × 1088 at up to 60 frames per second and is intended for HDTV applications. Maximum bit-rate is 80– 100 Mbps. Profiles MPEG-2 supports six profiles, which specify which coding syntax (algorithms) is used. Tables 13.1 through 13.8 illustrate the various combinations of levels and profiles allowed. Simple Profile (SP) Main profile without the B frames, intended for software applications and perhaps digital cable TV. Main Profile (MP) Supported by most MPEG-2 decoder chips, it should satisfy 90% of the consumer SDTV and HDTV applications. Typical resolutions are shown in Table 13.6. Multiview Profile (MVP) By using existing MPEG-2 tools, it is possi- ble to encode video from two cameras shooting the same scene with a small angle difference. 4:2:2 Profile (422P) Previously known as “studio profile,” this profile uses 4:2:2 YCbCr instead of 4:2:0, and with main level, increases the maximum bitrate up to 50 Mbps (300 Mbps with high level). It was added to support pro-video SDTV and HDTV requirements. Video Overview 579 Level high high 1440 main low Simple – – yes – Nonscalable Main yes yes yes yes Multiview – – yes – Profile 4:2:2 yes – yes – Scalable SNR Spatial High – – yes – yes yes yes – yes yes – – Table 13.1. MPEG-2 Acceptable Combinations of Levels and Profiles. Constraint chroma format picture types scalable modes intra dc precision (bits) sequence scalable extension picture spatial scalable extension picture temporal scalable extension repeat first field Simple 4:2:0 I, P – 8, 9, 10 Nonscalable Main Multiview 4:2:0 I, P, B – 4:2:0 I, P, B Temporal 8, 9, 10 8, 9, 10 Profile 4:2:2 4:2:0 or 4:2:2 I, P, B – 8, 9, 10, 11 SNR 4:2:0 I, P, B SNR 8, 9, 10 Scalable Spatial 4:2:0 I, P, B SNR or Spatial 8, 9, 10 High 4:2:0 or 4:2:2 I, P B SNR or Spatial 8, 9, 10, 11 no no yes no yes yes yes no no no no no yes yes no no yes no no no no constrained unconstrained constrained unconstrained Table 13.2. Some MPEG-2 Profile C