DIGITAL COMMUNICATION Second Edition • Edward A. Lee University of California at Berkeley David G. Messerschmitt University of California at Berkeley Kluwer Academic Publishers BostonlDordrechtlLondon Distributors for North America: Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, Massachusetts 02061 USA Distributors for all other countries: Kluwer Academic Publishers Group Distribution Centre Post Oflice Box 322 3300 AH Dordrecht, THE NETHERLANDS Library or Congress Cataloging-In-Publlcation Data Lee, Edward A., 1957- Digilal communicalionl Edward A. Lee and David G. Messerschrnilt. -- 2nd ed. p. em. Includes bibliographical references and index, ISBN 0-7923,9391-0 (.c;d-free p.per) 1. Digital communications. I. Messelschmitl. David G. II. Title. TK5103.7.LA4 1994 621382--dc20 93-26197 elP Copyrigbt © 1994 by Kluwer Academic Publishers. Fifth Printing 1999. All nghls reserved. No pari of this pUblicatibn may be reproduced, stored in a retrieval system or transmilled in any form or by any means, mechanical. pholo-copying. recording, or otherwise, without the prior wrillen permission of Ihe publisher, Kluwcr Academic Publishers, lOt Philip Drive, Assinippi Park, Norwell. Massachusells 02061- CONTENTS PREFACE NOTES TO THE INSTRUCTOR PART I: THE BASICS 1 INTRODUCTION 1 1.1 APPLICATIONS OF DIGITAL COMMUNICATION 2 1.2 DIGITAL vs. ANALOG COMMUNICATIONS 5 1.3 PLAN OF THE BOOK 7 1.4 FURTHER READING 8 2 DETERMINISTIC SIGNAL PROCESSING 11 2.1 SIGNALS 11 2.2 LTI SYSlEMS AND FOURIER TRANSFORMS 13 2.3 THE NYQUIST SAMPLING THEOREM 15 2.4 PASSBAND SIGNALS and MODULATION 17 2.5 Z TRANSFORMS AND RATIONAL TRANSFER FUNCTIONS 21 2.6 SIGNAL SPACE REPRESENTATIONS 31 2.7 FURTHER READING 39 2-A SUMMARY OF FOURIER TRANSFORM PROPERTIES 39 2-B SPECTRAL FACTORIZATION 41 3 STOCHASTIC SIGNAL PROCESSING 48 3.1 RANDOM VARIABLES 48 3.2 RANDOM PROCESSES 57 3.3 MARKOV CHAINS 68 3.4 THE POISSON PROCESS AND QUEUEING 75 3.5 FURTHER READING 85 3-A POWER SPECfRUM OF A CYCLOSTATIONARY PROCESS 86 3-B POWER SPECTRUM OF A MARKOV CHAIN 87 3-C DERIVATION OF POISSON PROCESS 90 3-D MOMENT GENERATING FUNCTION OF SHOT NOISE 91 4 LIMITS OF COMMUNICATION 97 4.1 JUST ENOUGH INFORMATION ABOUT ENTROPY 99 4.2 CAPACITY OF DISCRETE-TIME CHANNELS 102 4.3 FURTHER READING 110 4-A ASYMPTOTIC EQUIPARTITION THEOREM 110 5 PHYSICAL MEDIA AND CHANNELS 5.1 COMPOSITE CHANNELS 116 5.2 TRANSMISSION LINES 119 5.3 OPTICAL FIBER 127 5.4 MlCROWAVE RADIO 142 5.5 TELEPHONE CHANNELS 160 5.6 MAGNETIC RECORDING CHANNELS 167 5.7 FURTHER READING 171 115 PART II: MODULATION AND DETECTION 6 MODULATION 6.1 AN OVERVIEW OF BASIC PAM TECHNIQUES 179 6.2 PULSE SHAPES 187 6.3 BASEBAND PAM 191 6.4 PASSBAND PAM 199 6.5 ALPHABET DESIGN 213 6.6 THE MATCHED FILTER - ISOLATED PULSE CASE 224 6.7 SPREAD SPECTRUM 229 6.8 ORTHOGONAL MULTIPULSE MODULAnON 230 6.9 COMBINED PAM AND MULTIPULSE MODULATION 249 6.10 OPTICAL FIBER RECEPTION 261 6.11 MAGNETIC RECORDING 262 6.12 FURTHER READING 263 6-A MODULATING RANDOM PROCESSES 263 6-B THE GENERALIZED NYQUIST CRITERION 266 178 7 SIGNAL and RECEIVER DESIGN 7.1 SIGNAL MODEL 282 7.2 SPECIFIC MODULATION TECHNIQUES 286 7.3 PAM WITH Ir-rIERSYMBOL INTERFERENCE 294 7.4 BANDWIDTH and SIGNAL DIMENSIONALITY 304 7.5 FURTHER READING 307 279 8 NOISE 8.1 COlVIPLEX-VALlJ'ED GAUSS IAN PROCESSES 311 8.2 FUNDAMENTAL RESULTS 316 8.3 PERFORMANCE of PAM 320 8.4 PERFORMANCE of MINIMUM-DISTANCE RECEIVERS 329 8.5 PA.1v1 with lSI 334 8.6 SPREAD SPECTRUM 337 8.7 CAPACITY AND MODULATION 344 8.8 QUAc~TIJM NOISE in OPTICAL SYSTEMS 360 8.9 FURTHER READING 371 311 9 DETECTION 378 9.1 DETECTION OF A SINGLE REAL-VALUED SYMBOL 380 92 DETECTION OF A SIGNAL VECTOR 3&5 9.3 KNOWN SIGNALS IN GAUSSIAN NOISE 390 9.4 OPTIMAL INCOHERENT DETECTION 402 9.5 OPTIMAL DETECTORS for PAM WITH lSI 406 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 409 9.7 SHOT NOISE SIGNAL WITH KNOWN INTENSITY 424 9.8 FlJRTHER READThl'G 427 9-A KARHUNEN-LOEVE EXPANSION 42& 9-B GENERAL ML AND MAP SEQUENCE DETECTORS 430 9-C BIT ERROR PROBABILITY FOR SEQUENCE DETECTORS 432 10 EQUALIZATION 10.1 OPTIMAL ZERO-FORCING EQUALIZATION 445 10.2 GENERALIZED EQUALIZATION METHODS 464 10.3 FRACfIONALLY SPACED EQUALIZER 482 10.4 TRANSVERSAL FILTER EQUALIZERS 486 10.5 lSI and CHANNEL CAPACITY 487 10.6 FURTHER READING 511 lO-A DFE ERROR PROPAGATION 511 442 11 ADAPTIVE EQUALIZATION 11.1 CONS7RAINED-COMPLEXITY EQUALIZERS 519 11.2 ADAPTIVE LINEAR EQUALIZER 532 11.3 ADAPTIVE DFE 541 11.4 FRACfIONALLY SPACED EQUALIZER 543 11.5 PASSBAND EQUALIZATION 546 11.6 FURTIffiR READING 549 ll-A SG ALGORITHM ERROR VECTOR NORM 549 PART III: CODING 517 12 SPECTRUM CONTROL 12.1 GOALS OF LINE CODES 556 12.2 LINE CODE OPTIONS 558 12.3 FILTERING FOR SPECTRUM CONTROL 573 12.4 CONTlNUOUS·PHASE MODULATION 589 12.5 SCRAMBLING 591 12.6 FURTIffiR READING 597 12-A MAXIMAL-LENGTH FEEDBACK SHIFT REGISTERS 598 555 13 ERROR CONTROL 13.1 BLOCK CODES 613 13.2 CONVOLUTIONAL CODES 626 13.3 mSTORICAL NOTES AND FURTIffiR READING 636 13-A LINEARITY OF CODES 637 13-B PATH ENUMERATORS 642 609 14 SIGNAL-SPACE CODING 14.1 MULTIDIMENSIONAL SIGNAL CONSTELLATIONS 652 14.2 lRELLIS CODES 668 14.3 COSET CODES 684 14.4 SIGNAL-SPACE CODING AND lSI 688 14.5 FURTIffiR READING 694 650 PART IV: SYNCHRONIZATION 15 PHASE-LOCKED LOOPS 15.1 IDEAL CONTINUOUS·TIME PLL 702 15.2 DISCRETE·TIME PLLs 7\1} 15.3 PHASE DETECTORS 713 15.4 VARIATIONS ON A THEME: VCOs 718 15.5 FURTHER READING 720 16 CARRI ER RECOVERY 16.1 DECISION-DIREClED CARRIER RECOVERY 726 16.2 POWER OF N CARRIER RECOVERY 733 16.3 FURTHER READING 734 17 TIMING RECOVERY 17.1 TIMING RECOVERY PERFOR.\

=1 t--{>--1 DEGRADATION L~ DEGRADATION ~ DEGRADATION FIgur_ 1-3. A<:hatA of r~ativerepeatersreduces the effect of <:ascaded degradations by regenerating the digital signal at intermediate points in the transmission. SEC. 1.2 DIGITAL V5. ANALOG COMMUNICATIONS 7 • The multiplexing and switching of digital signals is much lower in cost than for analog signals. This is particularly true for multiplexing, because frequencydivision multiplexing of analog signals requires complicated filters (Chapter 18). • Some economical media, such as optical fiber and laser disks, are better suited to digital transmission than analog. • Digital communication of analog waveforms such as voice and video requires an additional step of sampling and analog-to-digital conversion. The cost of this conversion was initially an impediment to the widespread use of digital communication, but with integrated circuit technology this cost is rapidly becoming insignificant, particularly for voiceband signals. • Regenerative repeaters for digital communication are considerably more complicated than their analog counterparts (which are just amplifiers). However, the capacity of these systems is so large that this added cost is insignificant to individual users. • With modern compression and transmission technology (the latter being the subject of this book), PCM transmission of analog signals can be accomplished with less bandwidth than analog transmission of the same signal. This characteristic is critical in radio transmission, because the radio spectrum is in short supply. It can also have important economic advantages for other media, such as the telephone subscriber loop and coaxial cable television. • Digital communication raises complicated synchronization issues (Chapters 1519) that are largely avoided in analog communication. The bottom line is that it took a while, about 20 years, for digital communication to almost completely supplant its analog competitors, but that revolution is now nearly complete. This is a result of a combination of economic factors, technological advances, and demands for new services. 1.3. PLAN OF THE BOOK This book concentrates on the techniques that are used to design a digital communication system starting with any of the common physical media. Our concern is thus with how to get a bit stream from one location to another, and not so much with how this bit stream is used. In the particular context of a computer network, this aspect of the system is called the physical layer. We also address the problems of multiple bit streams sharing a common medium, called multiple-access. In Chapters 2-4 some basics required for the understanding of later material are covered. Many readers will have a prior background in many of these basics, in which case only a superficial reading to pick up notation is required. We have also covered some basic topics with which the reader may not be so familiar. These include spectral factorization of rational transfer functions (Chapter 2), signal space (Chapter 2), Markov chains and Poisson processes (Chapter 3), and information theoretic bounds (Chapter 4). The characteristics of the physical media commonly encountered are covered in Chapter 5. 8 INTRODUCTION Chapters 6-14 cover the theory of modulation, detection, and coding that is necessary to understand how a single bit stream is transported over a physical medium. The need for this theory arises because all physical media are analog and continuous-time in nature. It is ironic that much of the design of a digital communication system is inevitably related to the analog and continuous-time nature of the medium, even though this is not evident at the abstracted interface to the user. The design of a digital communication system or network raises many difficult synchronization issues that are covered in Chapters 15-17. Often a large part of the effort in the design of a digital communication system involves phase-locked loops, timing, and carrier recovery. A complete telecommunication network requires that many bit streams originating with many different users be transported simultaneously while sharing facilities and media. This leads to the important topic of multiple access of more than one user to a single physical medium for transmission. This is covered in Chapters 18-19. 1.4. FURTHER READING There are a number of excellent books on digital communication. While these books have a somewhat different emphasis from this one, they provide very useful supplementary material. The books by Roden [1], Benedetto, Biglieri, and Castellani [2], and Gitlin, Hayes, and Weinstein [3] cover similar material to this one, perhaps with a bit less practical emphasis. The books by Blahut [4] and Bingham [5] are valued for their practical orientation. Two texts provide additional detail on topics in this book: the recent book by Proakis [6] is an excellent treatise on applied information theory and advanced topics such as coding, spread spectrum, and multipath channels; the book by Viterbi and Omura [7] gives a detailed treatment of source and channel coding as applied to digital communication, as does Biglieri, Divsalar, McLane, and Simon [8]. An excellent treatment of the statistical communication theory as applied to digital communication is given by Schwartz [9]. On the topics of modulation, equalization, and coding the book by Lucky, Salz, and Weldon is somewhat dated but still recommended reading [10]. The same applies to the book by Wozencraft and Jacobs, which emphasizes principles of detection [11]. Books by Keiser and Strange [121 and Bellamy [13] give broad coverage of digital transmission at a descriptive level. Practical details of digital transmission can be found in a book published by AT&T Technologies [14], in the book by Bylanski and Ingram [15], and for the particular case of PCM encoding, in the book by Cattermole [16]. A couple of books expand on our brief description of digital switching, including McDonald [17] and Pearce [18]. For the information theory background that gives a solid theoretical foundation for digital communication, the books by Gallager [19], Cover and Thomas [20], Blahut [21], and McEliece [22] are recommended. Schwartz [23] and Bertsekas and Gallager [24] are recommended comprehensive texts on computer networks. There are also many elementary texts that cover both digital and analog communication, as well as the basic systems, transforms, and random process theory. Simulation techniques for communication systems are covered comprehensively in Jeruchim, SEC. 1.4 FURTHER READING 9 Balaban, Shanmugan [25]. PROBLEMS 1-1. For an AID convener, define a signal-to-error ratio as the signal power divided by the quantization error power, expressed in dB. A uniform quantizer, which has equally-spaced thresholds, has two pardffieters: the number of bits n and the step size d. (a) If we were to increase n by one, to n +I, for the saIne input signal, what would be t.!}e appropriate change to d? (b) Without doing a detailed analysis, what do you suppose would be the effect on signal-to-error ratio of increasing from n to n + r bitsisample? (c) What effect win this same change have on the bit rate? (d) Using the prior results, what is the fo.rm of t.!}e rela~onship between signal-to-error rat.0 and the bit rate? (You may have unknown constants in your equation.) 1-2, An analog signal is transmitted using a PCM system. Discuss qualitatively the effects of bit errors on the recovered analog signal. 1-3. Discuss qualitatively the sources of delay that you would expect in a ?eM system. 1-4. Suppose you have a source of data t.brat outputs a bit stream with a bit rate that varies with time, but also has a peak or maximum bit rate. Describe qualitatively how you might transmit this bit stream over a link that provides a constant bit rate. REFERENCES 1. M. S. Roden, Digital And Data Communication Systems, Prentice-Hall, Englewood Cliffs, N.J. (1982). 2. S. Benedetto, E. Biglieri, and Y. Castellani, Digital Transmission Theory, Prentice-Hal4 Inc., Englewood Cliffs, NJ (1987). 3. R. D. Gitlin, J. F. Hayes, and S. B. Weinstein, Data Communications Principles, Plenum Press, New Yark and London (1992). 4. R. E. Blah-ut, "Digital Transmissiof1- of In-form-atioo-," Ad-diso-n-Wesley, (1990)-. 5. J. A. C. Bingham, The Theory and Practice of Mockm Design. John Wi1e.y & Sons, New York (1988). 6. J. G. Proakis, Digital Communications, Second Edition, McGraw-Hill Book Co., New York (ll)8l)}. 7. A. J. Yiterti and J. K. Omura, Principles of Digital Communication and Coding, McGraw-Hill (1979). 8. E. Biglieri, D. Divsalar, P. J. McLane, and M. K. Simon, introduction to Trellis-Coded Modulation with Applications, Macmillan, New York (1991). 9'. M. Schwartz, information Transmission, Modulation. and Noise, McGraw-BiIl, New York (1980). 10. R. W. Lucky, J. Salz, and E. J. Weldon, Jr., Principles of Data Communication, McGraw-Hill Book Co., New York (1968). 10 INTRODUCTION 11. 1. M. Wozencraft and 1. M. Jacobs, Principles ofCommunication Engineering, Wiley, New Yqrk (1965). 12. B. E. Keiser and E. Strange, Digital Telephony and Network Integration, Van Nostrand Reinhold, New York (1985). 13. J. Bellamy, Digital Telephony, lohn Wiley, New York (1982). 14. Bell Laboratories Members of Technical Staff, TrllflSffCission Systems for Currrmunicarions, Western Electric Co., Winston-Salem N.C. (1970). 15. P. Bylanski and D. G. W. Ingram, Digital Transmission Systems, Peter Peregrinus Ltd., Stevenage England (1976). 16. K. W. Cattermole, Principles of Pulse Code Modulation, Iliffe Books Ltd., London England (1969). 17. L C McDonald, Fl/JJ.damentals ofDigital Switching, Plenum Press, New York (1983). 18. J. G. Pearce, Telecommunications Switching, Plenum, New York (1981). 19. R. Gallager, Information Theory and Reliable Communication, John Wiley and Sons, Inc., New York (1968). 20. T. M. Cover and J. A. Thomas, «Elements ofInformation Theory," Wiley, (1991). 21. R. E. Blahut, "Principles and Practice of Information Theory," Addison-Wesley, (1987). 22. R. J. MeEliece, The Theory ofInformation and Coding, Addison Wesley Pub. Cu. (1977). 23. M. SchwMtoZ, Tele{;mnmumcation NeJworks: Protocols, Modeling, and .4.nalysis, AddisonWesley, Reading, Mass. (1987). 24. D. Bertsekas and R. Gallager, "Data Networks," Prentice-Hall, (1987). 25. M. C. Jeruchim, P. Balaban, and K. S. Shanmugan, Simulation of Communication Systems, Plenum Press, New York (1992). DETERMINISTIC SIGNAL PROCESSING In this chapter we review some basic concepts in order to establish the notation used in the remainder of the book. In addition, we cover in more detail several specific topics that some readers may not be familiar with, including complex signals and systems, the convergence of bilateral Z-transforms, and signal space geometry. The latter allow& &imple geometric interpre,tation of many signal processing operations, and demonstrates relationships among many seemingly disparate topics. 2.1. SIGNALS A continuous-time signal is a function x (t) of the real valued variable t, usuarry denoting time. A discrete-time signal is a sequence {Xk}, where k usually indexes a discrete progression in time. Throughout this book we will see systems containing both continuous-time and discrete-time signals. Often a discrete-time signal results from sampling a continuous-time signal; this is written xk =x (kT), where T is the sampling interval, and 21tIT is the sampling frequency, in radians per second. The sampling operation can be represented as J x" =x(kT) = x (t)o(t-kT) dt, (2.1) where OCt) is the Dirac delta function or continuous-time impulse. The discrete-time 12 DETERMINISTIC SIGNAL PROCESSING signal x" has a continuous-time pulse amplitude modulation (PAM) representation 1(t) = L X"D(t-kT), ,,=~ (2.2) in terms of impulses. A continuous-time signal can be constructed from a discrete-time signal as represented symbolically in Figure 2-1. A discrete-time input to a continuous-time system implies first the generation of the continuous-time impulse train in (2.2), and then its application to a continuous-time filter F (jill), yielding y(t)= L x,,!(t-kT). ,,=~ (2.3) 2.LL Complex-Valued Signals In digital ~communication systems, ~mplex-valued signals are often a {;onvenient mathematical representation for a pair of real-valued signals. A complexvalued signal consists of a real signal and an imaginary signal, which may be visualized as two voltages induced across two resistors or two sequences of numbers. Example 2-1. A complex-valued signal we encounter frequently is the complex exponential, = = X/c e-jwkT cos(rokT) - jsin(rokT), x(t) = e-jCJll = cos(rot) - jsin(rot). (2.4) We consistently use j to represent~. D Complex-valued signals are processed just as real-valued signals are, except that the rules of complex arithmetic are followed. Exercise 2-1. Draw diagrams specifying the addition and multiplication of tw'O complex-valued continuous-time signals in terms of real-valued additions and multiplications. 0 ,T T, f (t) /\ ~ I l l Xl F Uw) I~ y (/) • Figure 2-1. Construction of a continuous-time signal from a discrete-time signal. When we show ~ discrete-tirneinput toa continuous-time system, we imply first the gener~tion of the impulse train in (2.2). An example is shown above the system. SEC.2.1 SIGNALS 13 The real part of the signal x (t) is written Re( x (t) } and the imaginary part Im( x (t) }. In addition, we write the complex conjugate of a signal x (t) as x"(t), and the. squared modulus as Ix (t) 12. We don't use any special notation to distinguish real-valued from complex-valued signals because it will generally be clear from context. Complex signals will often be represented in block diagrams using double lines, as shown in Figure 2-2b. 2.1.2. Energy and Average Power The energy of a signal x (t) or (Xk ) is defined to be f Ix(t) 12dt , (2.5) The average power is J 1 H lim - Ix(t) 12dt , 't ~ 00 2't -'t . L 1 +K 2 lim. . IXk I . K~00(2K+I)Tk=_K (2.6) 2.2. l TI SYSTEMS AND FOURIER TRANSFORMS The Fourier transform is valuable in the analysis of modulation systems and linear time-invariant systems. For the convenience of the reader, the properties of both discrete and continuous-time Fourier transforms are summarized in appe.ndix 2A. In this section we establish notation and review a few basic facts. 2.2.1. Unear Time Invariant (lTI) Systems If a system linear and time invariant (LTI), then it is characterized by its pulse response hk (for a discrete-time system) or h (t) (for a continuous-time system). The output of the LTI system can be expressed in terms of the input and impulse response as a convolution; for the discrete-time case, and the continuous-time case, m =-00 (2.7) * J y (t) = x (t) h (t) = x (t) h (t - t) d t . (2.8) An LTI system is real if its impulse response is real-valued, and complex if its impulse response is complex-valued. A complex system can be represented, using the rules of complex arithmetic, as a set of four real systems, as shown in Figure 2-2. Exercise 2-2. Show that if a complex system has a real-valued input it can be implernemed using t'W'O real 14 DETERMINISTIC SIGNAL PROCESSING Re(x(t)) -'--'-T-----4 (a) Iro( x (I) ) --------tt Re( y(t») (b) ~ ~ Iro( y(t)) Figure 2-2. A complex-valued LTI system with a complex-valued input and output. systems and sketch the configuration. Show that the same is true if a real system has a complex-valued input, and again sketch the configuration. 0 Exercise 2-3. The notion of linearity extends to complex LTI systems. Demonstrate that if the four real systems required to implement a complex system are linear, then the resulting complex system is linear. It follows immediately that real-valued LTI systems are linear with respect to complex-valued inputs. 0 2.2.2. The Fourier Transform The Fourier transform pair for a continuous-time signal x (t) is (2.9) while the discrete-time Fourier transform (DTFT) pair for Xk is ~ = L. X (e jroT ) Xk e - jOlkT , k=-~ ="2 f T mT Xk X (e jroT )eJrokT dOJ. 1t_mT (2.10) The notation X (e jroT ) deserves some explanation. X (e jroT ) is the Z-transform X (z), defined as :E = X (z) xk z-k , (2.11) k=--- evaluated at z = eJ roT. Furthermore, the argument of the function, e j roT, is periodic in (I), emphasizing that the DTFT itself is periodic in OJ with period equal to the sampling rate 21tIT. The j in X (j OJ) comes from the observation that X (j OJ) is the Laplace transform X (s ) evaluated at s = j OJ. If h (t) is the impulse response of a continuous-time system, then the Laplace transform H (s) is called the transfer function, and the Fourier transform H (j (I)) is called the frequency response. Correspondingly, for a discrete-time impulse response hk , the transfer function is H(z) and the frequency response is H(e jroT ). Discrete- SEC. 2.2 LTI SYSTEMS AND FOURIER TRANSFORMS 15 time and continuous-time systems will often be distinguished only by the form of the argument of their transfer function or frequency response. Exercise 2-4. Starting with the convolution, show that the Fourier transfonn of the output of an LTI system is Y(j 00) = H(j oo)X(j 00) , Y (e juif ) = H (ejroT)X (e jCJJT ) (2.12) for the continuous-time and discrete-time cases, where X(j 00) and X (e jCJJT ) are the Fourier transforms of the input signals. 0 The magnitude of the frequency response !HUm)! Of' lH(ejroT)1 is called the mag- nitude r£;sRonse. The argument of the frequency response arg (H U co)) or arg (H (e lro )) is called the phase response. The reason for these terms is explored in Problem 2-2. ~ A fundamental result allows uS to analyze any system with a combination of continuous-time and discrete-time signals. Exercise 2-5. Given the definition (2.2) of a continuous-time PAM signal ;f(t) derived from a discretetime signal: xk' show that for ail 00 (2.13) In words, the Fourier transfonn of a PAM representation of a discrete-time signal is equal to Lire DTFr of the discrete time signal for all ro. 0 2.3. THE NYQUIST SAMPLING THEOREM Suppose that we sample a continuous-time signal xCt) to get xk =x(kT). (2.14) From (2.2) we obtain x(t)=x(t) L o(t -mT}. m =--00 (2.15) Multiplication in the time-domain corresponds to convolution in the frequency domain, so 16 DETERMINISTIC SIGNAL PROCESSING ~ XUro) 2~ [ X(M] * [ 2; m ~_B(ro - ~ ml] j I: =1- XUQ) 8(00 - n - 21tm )dQ (2.16) T__ m=-- T I; = 1- XU (00 _ 21tm)] . T m=- T Combining this with (2.13) we get the very important relation -1- X f,e jOlT) - ~ ~ ,X,{UO f, _ 21tm)1 J • T m=- T (2.17) This fundamental sampling theorem relates the signals x (t) and Xk in the frequency domain. Systems with both discrete and continuous-time signals can now be handled easily. Exercise 2-6. Use (2.17) to show that the frequency response of a completely discrete-time system equivalent to that in Figure 2-3 is 1- F (e jooT ) = I,F U«(0 + m 27t)]. Tm T (2.18) o = Notice that in (2.17) a component of X (fro) at any ro roo is indistinguishable in the sampled version from a component at ro =roo + 21tm/T for any integer m < This phenomenon is called aliasing. Example 2-2. Given 11 signal x(t) wit.h Fourie~ transf{)nTI x}.rro) shown in Figure 2-4a, the Fourier transform of the sampled signal X (j (0) = X (e 1OO ) is shown in Figure 2-4b. The overlap evident in Figure 2-4b, called aliasing distortion, makes it very difficult to recover x (t) from its samples. 0 P F(jw) f (t ) G-------+ Yt Figure 2-3. A discrete-time system using a continuous-time filter. SEC. 2.3 THE NYQUIST SAMPLING THEOREM 17 (a) LI I lro -2-7t 7t T T 7t -27t T T f ~ ~;~< (b) >T< ~:;;:»~:.~" c" " - -F_-.Iro -3-7t 7t 7t -37t T T T T Figure 2-4. The Fourier transform of a continuous-til1'le signal (a) and its sampled version (b), where the sample rate is 27tIT. Exercise 2-7. (Nyquist sampling theorem.) Show that. from (2.17), a continuous-time signal can be reconstructed from its samples if it is sampled at a rate at least twice its highest frequency component. More precisely. if a signal x (t) with Fourier transform XU (0) is sampled at frequency 2TtlT (radians per second), then x (t) can be reconstructed from the samples if XU (0) == 0 for all 1001 > TtIT. 0 The sampling theorem gives a sufficient but not necessary condition for reconstructing a signal from its samples. In the absence of aliasing distortion, a lowpass signal can be reconstructed from its samples using an ideal low pass filter with cutoff frequency niT, * [ i x (t) == x(t) sin(nt IT) ] == x sin[n(t-mT)IT] ntIT m = -00 m n(t -mT )IT (2.19) 2.4. PASSBAND SIGNALS and MODULATION Passband signals are fundamentally important for digital communication over channels, such as radio, where the signal spectrum must be confined to a narrow band of frequencies. In this section we will first define a useful building block called a phase splitter, then develop a complex baseband representation for any passband signal, and finally describe several useful modulation techniques for translating the frequency spectrum of a signal. 2.4.1. Phase Splitter and Analytic Signal A phase splitter is a filter with impulse response (t) and transfer function <1>0 (0), where 18 DETERMINISTIC SIGNAL PROCESSING 1, ro~O c't>(j ro) ={ 0, ro < 0 . (2.20) The filter passes only positive frequencies, and rejects negative frequencies. Clearly, since c't>(j ro) does not display complex-conjugate symmetry, $(t) is a complex-valued impulse response. Regardless of the input to a phase splitter, the output must have only positive frequency components. A signal with only positive frequency components is called an analytic signal. Obviously, any analytic signal is complex-valued in the time domain. Closely related to the phase splitter is the Hilbert transform, a filter with transfer function H (j ro) = - j sgn(ro) . (2.21) It has a real-valued impulse response, since its transfer function has complexconjugate symmetry. A Hilbert transform does not modify the amplitude spectrum of the input, but does give a - 1t phase shift at all frequencies. If the input to H (j ro) is x (t), then the output, the Hilbert transform of x (t), is denoted by f(t). Exercise 2-8. If the real-valued input to a phase splitter is x (t), then show that the output is 1h{x (t) + j :f(t)}. Thus, the real part of the output is half the input, and the imaginary part is half the Hilbert transform of the input. 0 The phase splitter and Hilbert transform filter both have a discontinuity in either amplitude or phase at d.c. They are therefore very difficult to implement at baseband frequencies. However, in many applications, we will apply a phase splitter to a passband signal, which makes the implementation much easier because the transfer function in the region of d.c. actually does not matter! 2.4.2. Complex Baseband Representation of Passband Signals Suppose y (t) is a real-valued passband signal that happens to have a spectrum centered at ro =roc' We can develop a representation of y (t) in terms of a complex- valued baseband signal u (t); that is, a signal with its spectrum concentrated at d.c. Consider the system shown in Figure 2-5a. Since y (t) is real-valued, it has a spec- trum concentrated at - roc as well as roc' If we pass y (t) first through a phase splitter, then the output analytic signal is missing the negative frequency terms. The remaining positi.ve frequency terms can be shifted to d.c. by multiplying by a complex exponential e -} roct, yielding the complex baseband representation u (t). We add the strange-looking factor of ~ for a good reason. Mathematically, the complex baseband signal can be represented as 1 . u(t)= "2,(Y(t)+N(t»e-}roc t (2.22) SEC. 2.4 PASSBAND SIGNALS and MODULATION 19 y(t) ""2cp(t) (a) u(t) li e-iO>.t ~Re( } :y (t) li eiO>.l (b) J:'1 yu ro) , - roc 0 (1_ roc ~uro) 0 Figure 2-5. Derivation of the complex baseband representation u (t) from a passband signa! yet). (a) Obtainif1g u(l) from yet). {b} Recovering y(t) from u(t). Also shown are typi- cal spectra of the two signals. where U Uro) is a replica of the positive-frequency components ot YUw) shifted to d,c. Exercise 2-9. Show that u (t) has the same energy as y (t), because of the factor of -./2. This equal-energy property is important when we deal with noise and signal-to-noise ratios (Chapter 6). 0 As shown in Figure 2-5b, the original passband signal can be recovered from the complex baseba.'1d representation through the equation yet) ='¥2·Re{ u(t)ejov: } . (2.23) This can also be easily verified by substituting (2.22) into (2.23). (2.23) is called the canonical representation of a passband signal in terms of a complex baseband signal. Any real-valued passband signal can be represented in this canonical form. where u(t) can be detennined from (2.22) or Figure 2-5a. 2.4.3. Modulation It is often useful to shift the spectrum of a signal, a process known as modula- lion. Example 2-3. A telephone channel passes only frequencies in the range from about 300Hz to about 3300Hz. Any signal tra.'1smitted over such a channel must be bandlimited in t.Ire same range, or it will not get through intact. Similarly, commercial broadcast AM radio occupies electromagnetic frequencies from 550kHz to 1.6MHz. An audio signal is limited to below 20kHz. Modulation is necessary to translate an audio signal into a frequency band suitable for AM transmission. 0 Actually, modulation is the opposite of the canonical representation; rather than deriving an equivalent baseband representation of a passband signal. modulation generates an equivalent passband representation of a baseband signal. The canonical representation teaches us that a real-valued passband signal (necessary to transmit over a physical medium, Chapter 5) corresponds in general to a complex-valued baseband signal. Thus, assume that the baseband signal U(I) is complex-valued, and generate modulated signal (2.23). The resulting modulator and demodulator are shown in Figure 2-6, which is actually the same as Figure 2-5 reversed. Again, the factor .fi ensures 20 DETERM1NISTIC SIGNAL PROCESSING that the modulated signal y (t) has the same energy as u~e baseband signal u (t). This representation of modulation is very general. All of the commonly used modulation techniques can be represented in this fonn. These techniques are distinguished by how they map an information-bearing signal into the complex baseband signal u (t). We will illustrate this with three important modulation techniques: AMOSB, AM-SSB, and QAM. In amplitude modulation double sideband (AM-DSB), a real-valued informationbearing signal a (t) is mapped into a passband signal by letting u (t) = a (t). The baseband signal u (t) is therefore real-valued, and has a spectrum that is symmetric about the carrier frequency. The passband signal is represented mathematically as y(t)=..J'2·Re(a(t)e jro,l} =..J'2·a(J)·coS(illc t) . (2.24) The passband signal is complex-conjugate symmetric about the carrier frequency roc, because the baseband signal is real-valued. In amplitude modulation single sideband (AM-SSB) we again have a real-valued information-bearing signal a (t), but we let the complex baseband signal be an ana- ». lytic signal obtained by passing a (t) through a phase splitter, u (t) = V2(a (t) + jii (t Since the complex baseband signal is analytic, the passband signal has only uppersideband frequency components above the carrier frequency. The advantage of AM- SSB over AM-OSB is that for the same a (t), the bandwidth of the AM-SSB passband signal is half that of the AM-OSB signal, basically because the upper and lower sidebands of AM-DSB are complex-conjugate duplicates of one another. A disadvantage of AM-SSB is the base.band phase splitter, which can be difficult to realize because of the phase discontinuity at d.c. unless the baseband signal should happen to be missing frequencies near d.c. (as is true of telephone speech). The third modulation technique is quadrature amplitude modulation (QAM). 1n this case, we have two real-valued information-bearing signals a (t) and b (1), and simultaneously modulate them by letting u (t) = a (t) + jb (t). The complex baseband signal is neither analytic, nor has complex-conjugate symmetry about d.c.; the passband signal has in general both upper and lower sidebands and no particular U(I) .J2Re( } 1 eJCil.1 d U (j w) () -D - We y(/) Y(jW) I 0 .J2${/) cL We U(I) 11 e - jlA t U Figure 2-6. A modulator tums ~ ~mplex baseband signal U (I) into a real-valued passband signal y (I), by simply reversing Figure 2-5. SEC. 2.4 PASSBAND SIGNALS and MODULATION 21 symmetry about the carrier frequency. The term "QAM" arises from the representation y (t) =-JZ.Re{ (a (t) + jb (t ))e jroct } =..[i·a (t )'cos(roc t) - ..[i·b (t )'sin(roc t) .(2.25) In other words, a QAM signal consists of two independently modulated carrier signals with a Tel2 relative phase shift. For the same baseband signal bandwidth, QAM requires the same passband bandwidth as AM-DSB, which is double that required for AM-SSB; however, it transmits two real-valued information-bearing signals rather than one. Thus, it offers the same spectral efficiency as AM-SSB, but without the requirement for the difficult-to-implement baseband phase splitter. QAM and similar modulation techniques have therefore become the most widely used for digital communication (Chapter 6). 2.5. Z TRANSFORMS AND RATIONAL TRANSFER FUNCTIONS The Z transform, which is closely related to the DTFT, is particularly useful in the study of rational transfer functions. The Z transform is defined as :i: H(z) = hkz-k , k =-00 (2.26) where z is a complex variable. As pointed out before, the DTFT is the Z transform evaluated at z =e jroT , or on the unit circle in the z plane, as shown in Figure 2-7. This justifies the notation H(e jroT ) for a DTFT. When {hk } is the impulse response of a discrete-time LTI system, then H (z) is called the transfer function of the system. The transfer function on the unit circle is called the frequency response. 2.5.1. One Sided Sequences = = A causal sequence {hk } has hk =0 for k < O. An anti-causal sequence has hk k< 0 zero for k > O. K. A left-sided A right-sided sequence is one for which, for sequence correspondingly has hk =0 for k some >K K, for hk 0 some for K. When hk is the impulse response of an LTI system, that system is obviously causal Im{ z } ---+----t"-----'--=:.-/----+ Re {z } Figure 2-7. The Fourier transform of a discrete-time signal is the Z transform evaluated on the unit circle. 22 DETERMINISTIC SIGNAL PROCESSING = (anti-(;ausa!) if the impulse response is right-sided (left-sided) for K O. While physi- cally realizable real-time LTI systems are causal, we will frequently find it useful to model systems as non-causal. Example 2-4. Assume a communication chan.w has the impulse response shown in Figure 2-8a. We can think of this channel as having a fiat propagation delay of M samples pius the non-causal response {hk } as shown in Figure 2-8b. Often the flat delay will not be all essential feature ofLJrechannel, in which case we ignore it. 0 Example 2-5. Suppose we come up with a non-causal filter H (z ) in a theoretical development. This need not concern us too much, since such a filter can be approximate R for some constant R. In words, the ROC will be the region outside a circle of radius R. If the sequence is also stable, then R < 1, as shown in Figure 2-9a. To see this, note that for a causal sequence, the summation in (2.30) becomes Im( z ) (a) (b) (c) Figure 2-9. The ROC of the Z transform of a stable sequence must include the unit circle. Three cases of stable sequences are illustrated: (a) A right-sided. (b) left-sided. and (c) = two-sided sequence. The ROC includes I z I 00 in (a) if the sequence is causal. It includes z = 0 in (b) if the sequence is anti-causal. 24 DETERMINISTIC SIGNAL PROCESSING ~ Ihk liz I-k <00 • k=O (2.31) All the terms in the summation are positive powers of Iz 1-1, and hence get smaller as Iz I gets larger. Thus, if absolute convergence occurs for some Iz1 I > R , it will occur for all z such that i z i ~i zIi . If the sequence is right-sided but not causal, (2.30) becomes r. I hk I Iz ,-k < 00 • k=K (2.32) for some K < O. The positive powers of = Iz I do not converge at z 00, but do converge at all other z. Thus, the ROC cannot include = i z i 00, and should be written R <1 zl <~. Similar results apply to left-sided sequefu."es. Exercise 2-11. (a) Show that the ROC of a left-sided stable sequence is of the form 0 < I z I < R for R>l. (b) Show (hat a left-sided sequence is anti-causal if and only if its ROC includes the origin, O:5:! z! < R, as shown in Figure 2-9b. D To summarize, a right-sided sequence has an ROC consisting of the region out- side a circle. That region includes Iz I =00 if and only if the sequence is causal. A left-sided sequence has an ROC consisting of the inside of a circle. That region includes z =0 if and only if the sequence is anti-causal. In aU cases, the ROC includes the unit circle if and only if the sequence is stable. 2.5.2. Rational Transfer Functions A rational transfer function can be written in any of the forms M L bkz-k M II {1 -£'k Z- I ) M II (z - cd H(z) = ZT.k~O = A.zr.k~I = A·zm.k~I ,(2.33) L Uk Z - k k=O II (1 - dkz-1) k=l II (z - dk ) k=l where it =b(j / a(j and m =N - M + r. Notice that in the middle fonn, the numerator and denominator polynomials are both monic. The ratio of two such monic polynomi- als is also monic (carry out the long division to verify this). The system has M zeros (roots of the numerator) at ck' 1 ~ k ~ M , and N poles (roots of the denominator) at dk , 1 ~ k ~ N. The factor zm represents merely an advance or delay in the impulse response. If m > 0 this factor introduces m zeros at = the origin and m poles at 1zl <>0 (conversely for m < D). If hk is real valued, then H (z) in (233) has real-valued coefficients., and the zeros and poles are always either real valued or come in complex-conjugate pairs (Problem 2-20). Including poles and zeros at z = 0 and Iz I = 00, every rational transfer function has the same number of poles and zeros. This will be illustrated by two examples. SEC. 2.5 Z TRANSFORMS AND RATIONAL TRANSFER FUNCTIONS 25 Example 2-6. The causal FIR transfer function H (z) = I - 0.5 z-I has one zero at z = Ih. and one pole at z = O. The only possible ROC is I zI > 0, which is a degenerate case of Figure 2-9a. 0 Example 2-7. The anti-causal FIR transfer function H (z) = I - 0.5 z has one zero at z = 2 and one pole at = I z I 00. The only possible ROC is I z I < 00, which is a degenerate case of Figure 2-9b. 0 The ROC cannot include any of the poles, since H (z) is unbounded there. Moreover, for rational transfer functions, the ROC is bordered by poles. Referring to Figure 2-9, for a causal and stable H (z), all poles must be inside the unit circle. For an anti-causal and stable H (z), all poles must be outside the unit circle. No stable H (z) can have poles on the unit circle, although it can certainly have zeros on the unit circle. Exercise 2-12. LTI systems that can actually be implemented with computational hardware can be represented by linear constant-coefficient difference equations with zero initial conditions. Show that the system represented by IM N Yk = - ( L b1xk_1 - L aIYk-/) ao I:{J 1=1 (2.34) has transfer function given by (2.33) with r = O. 0 When the denominator in (2.33) is unity (N = 0), the system has a finite impulse response (FIR), otherwise it has an infinite impulse response (IlR). FIR systems are always stable, and are often a good approximation to physical systems. They can have poles only at z = 0 and Iz I = 00, and the ROC therefore includes the entire z plane = = with the possible exception of z 0 and I z I 00. If an FIR system is causal, it has no = = poles at Iz I 00. If it is anti-causal, it has no poles at z O. Example 2-8. Physical channels, such as a coaxial cable (Chapter 5), usually do not have, strictly speaking, a rational transfer function. However, they can be adequately approximated by a rational transfer function. Often the simplest approximation is FIR, obtained by simply truncating the actual impulse response for sufficiently large M. Alternatively, it may be possible to approximate the response with fewer parameters using an IIR transfer function. o 2.5.3. Allpass Transfer Functions An allpass transfer function is any transfer function where the magnitude frequency response is unity for all ro, = IH allPass(e JroT ) I I . (2.35) This can be written as 26 DETERMINISTIC SIGNAL PROCESSING H allpass(e jroT)H a*llpass (ejroT) -- I. (2.36) Applying the inverse DlFf, we get * hk h~k =Ok ' (2.37) where hk is the impulse response of H alIPasS k=1 (2.46) I II (l-hz) *=1 Hmax(z)= J ' IT (l-gk Z) k=1 Ih l (XI,x2' ... ,XII)' (2.58) where X is the vector and XI' ... ,XII are the n components of that vector. The notation " <-> " means that X is the vector which corresponds to components X I, ... ,XII' There are rules for adding two vectors (sum the individual components) and multiplying a vector by a scalar (multiply each of the components by that scalar). 0 Example 2-14. A space of some importance in this book is the Euclidean space of complex-valued vectors. Vectors in this space are identical to (2.58) ex.cept that the components X/c of the vector are complex-valued. Ordinary Euclidean space is of course a special case of this, where the imaginary parts of the vectors are zero. 0 The addition rule produces a new vector X + Y that must be in the linear space. Addi- tion must obey familiar rules of arithmetic, such as the commutative and associative laws, X+Y==Y+X, X + (Y + Z) = (X + Y) + Z . (2.59) The direct sum of two vectors has the interpretation illustrated in Figure 2-14a for the two-dimensional Euclidean space. A linear space must include a zero vector 0, and every vector must have an additive inverse, denoted - X, such that 0+ X=X, X + (-X) == O. (2.60) Multiplication by a scalar a produces a new vector a· X that must be in the vector space. Multiplications must obey the associative law, (2.61) and also follow the rules IX = X, OX == O. (2.62) The geometric interpretation of multiplying a vector by a scalar is shown in Figure 2-14b. Finally, addition and multiplication must obey the distributive laws, a'(X+Y)=a'X+a'Y, (a + P)·X = a'X + p·X . (2.63) Real linear spaces have real-valued scalars as components of vectors, while complex linear spaces have complex-valued components. We will encounter both types. Euclidean space as defined earlier meets all of these requirements, and is therefore a linear space. There are two other examples of linear spaces of particular importance in communication theory: the space of discrete-time signals (which is a generalization of Euclidean space to infinite dimensions), and the space of continuous-time signals. Since these linear spaces model the two basic types of signals we encounter in digital communication systems, we call them signal spaces. SEC. 2.6 SIGNAL SPACE REPRESENTATIONS 33 Example 2-15. Given a complex-valued discrete-time signal {Yk }, define a vector Y <-> (..'Y-bYO,YI, ...). (2.64) The set of all such vectors is similar to Euclidean space as defined in (2.58), the difference being that the number of components is infinite rather than finite. An additional assumption often made is that (2.65) or, in words, that the total energy in the discrete-time signal is finite. This assumption is necessary for mathematical reasons that will become evident shortly. Scalar multiplication and vector addition are the same as for Euclidean spaces. 0 Example 2-16. Define a vector Y to correspond to a continuous-time signal Y(t), Y <-> Y(t), -00 < t < 00 , where as in (2.65), there is an assumption of finite energy, (2.66) (2.67) We can think of this space as a strange Euclidean space with a continuum of coordinates. The definition of multiplication of a signal vector by a scalar and the summation of two signal vectors are the obvious, u' Y <-> ay (t) , X + Y <-> X (t) +Y(t) , (2.68) and the definition of a zero vector is the zero-valued signal. 0 Exercise 2-13. Verify that the linear spaces given by Example 2-15 and Example 2-16 satisfy the properties of (2.59) through (2.63). 0 The following example relates these somewhat abstract concepts to a simple digital communication system. Example 2-17. In a digital communication system, suppose that we want to transmit and receive a single data symbol A , where A assumes a small number of values, for example two values in a binary system. For maximum generality consider A to be complex-valued, although the physical meaning of this will not become evident until Chapter 6. In a form of modulation called pulse amplitude modulation (PAM) (covered in more detail in Chapter 6), the amplitude of a transmitted pulse h (t) is multiplied by the transmitted data symbol A. The transmitted signal is therefore of the form x (t) =A h (t) . (2.69) In accordance with our linear space notation, we can associate the transmitted pulse h (t) with a vector in signal space 34 DETERMINISTIC SIGNAL PROCESSING H <-> h(t) in which case the transmitted signal corresponds to the vector X=A·H. o (2.70) (2.71) 2.6.2. Geometric Structure of Signal Space The definition of a linear space does not capture the most important properties of Euclidean space; namely, its geometric structure. This structure includes such concepts as the length of a vector in the space, and the angle between two vectors. All these properties of Euclidean space can be deduced from the definition of inner product of two vectors. The inner product is defined to be = In. xi * Yi i=l (2.72) for an n -dimensional Euclidean space, where Y/ is the complex conjugate of Yi' It has the interpretation illustrated in Figure 2-15; namely, the inner product of two vectors is equal to the product of the length of the first vector, the length of the second vector, and the cosine of the angle between the vectors. In Figure 2-15, II X II denotes the length of a vector, which has not been defined. However, once the definition of inner product (2.72) has been given, we can deduce a reasonable definition for the length of a vector, since the inner product of a vector e with itself, , is the square of the length of the vector (the angle is zero). A X+Y (a) (b) Figure 2·14. Elementary operations in a two-dimensional linear space. a. Sum of two vectors. b. Multiplication of a vector by a scalar. IrYI~Yy ~=IlXII IIY II cos(8) IIXII Figure 2-15. Geometrical interpretation of inner product. SEC. 2.6 SIGNAL SPACE REPRESENTATIONS 35 special notation is used for , n = II X \12 = L IXj 12 j=l (2.73) where \I X II is called the norm of the vector X and geometrically is the length of the vector. This notation is used in Figure 2-15. Note that II Y \I cos(8) is the length of the component of Y in the direction of X. Hence we get a particularly useful interpretation of the inner product: / II X II is the length of the component of Y in the direction of X, and / \I Y \I is the length of the component of X in the direction of Y. Two vectors X, Y are said to be orthogonal if =o. (2.74) This means that X has no component in the direction of Y and vice versa; they are at right angles! This concept is crucial to the understanding of optimum receiver design in digital communication systems. The inner product as applied to Euclidean space can be generalized to the other linear spaces of interest. The important consequence is that the geometric concepts familiar in Euclidean space can be applied to these spaces as well. Let X and Y be vectors of a linear space on which an inner product is defined. The inner product is a scalar (complex-valued number), and must obey the rules = + (2.75) = a , = · (2.76) > 0, for X*,O. (2.77) These rules are all obeyed by the familiar Euclidean space inner product of (2.72), as can be easily verified. For the other linear spaces of interest, analogous definitions of the inner product satisfying the rules can be made. In particular, define the inner product and norm (deduced from (2.73)) of two discrete-time signals as 00 = L x"y,,·, " =--0<> and of two continuous-time signals as i \I X 112 = 1X" 12 " =--0<> (2.78) f = x(t)y·(t)dt, (2.79) These inner products can be given the same interpretation as in Euclidean space; namely, as the product of the length of two vectors times the cosine of the angle between them. Thus, the inner product serves to define the "angle" between two vectors. Exercise 2-14. Verify that the definitions of inner product of (2.78) and (2.79) satisfy the properties of 36 (2.75) through (2.77). 0 DETERMINISTIC SIGNAL PROCESSING Note that conditions (2.65) and (2.67) that were imposed correspond to the assumption that a vector has finite norm or length. This explains the need for these initial assumptions in the definition of the linear spaces. Example 2-18. Consider again the simple digital communication system of Example 2-17 in which a single transmitted pulse h (t) is multiplied by a data symbol A. Suppose that this transmitted waveform is corrupted by noise before arriving at the receiver, and we decide to implement in the receiver a filter which rejects as much of this noise as possible. In particular. as shown in Figure 2-16, we implement a filter with impulse response h * (-t), the conjugate mirror image of the transmitted pulse, with corresponding frequency response H* U0). As we will see in Chapters 6-8, this is not as arbitrary as it may seem, since this particular filter is an optimum filter to reject noise in a special sense, and is given the special name matched filter. The output of this filter is sampled at time t = 0, resulting in the value J y(O)= x(t)h*ct)dt = . (2.80) The inner product operation is interpreted geometrically as the component of the received signal X in the direction of the transmitted signal vector H (multiplied by the unimportant constant II H II). Intuitively this seems to be a reasonable approach, since components in directions other than that of the transmitted signal may be irrelevant. Thus the optimality of the matched filter is not surprising from a geometric point of view. 0 The geometric properties are so important that the special name inner product space is given to a linear space on which an inner product is defined. Thus, both Example 2-15 and Example 2-16 defined earlier are inner product spaces. If the inner product space has the additional property of completeness, then it is defined to be a Hilbert space. Intuitively the notion of completeness means that there are no "missing" vectors that are arbitrarily close to vectors in the space but are not themselves in the space. Since the spaces used in this book are all complete and hence formally Hilbert spaces, we will not dwell on this property further. In the sequel, all linear spaces considered will be Hilbert spaces. -1_ x(t) .~ ~ h*_C_-I_)_1 yet) yeO) Figure 2·16. A matched filter. SEC. 2.6 SIGNAL SPACE REPRESENTATIONS 37 2.6.3. Subspaces of Signal Space A subspace of a linear space is a subset of the linear space that is itself a linear space. Roughly speaking this means that the sum of any two vectors in the subspace must also be in the subspace, and the product of any vector in the subspace by any scalar must also be in the subspace. Example 2-19. An example of a subspace in three-dimensional Euclidean space is either a line or a plane in the space, where in either case the vector 0 must be in the subspace. 0 Example 2-20. A more general subspace is the set of vectors obtained by forming all possible weighted linear combinations of n vectors Xl' ... ,XII' The subspace so formed is said to be spanned by the set of n vectors. This is iIlustrated.m Figure 2-17 for three-dimensional Euclidean space. In Figure 2-17a, the subspace spanned by X is the dashed line, which is infinite in length and co-linear with the vector X. Any vector on this line can be obtained by multiplying X by the appropriate scalar. In Figure 2-17b, the subspace spanned by X and Y is the plane of infinite extent (depicted by the dashed lines) that is determined by the two vectors. Any vector in this plane can be formed as a linear combination of the two vectors multiplied by appropriate scalars. 0 The projection theorem is an important result that can often be used to derive optimum filters and estimators. What follows is a statement of the projection theorem, which is proven in [1]: (Projection Theorem) Given a subspace M of a Hilbert space H and a vector X in H there is a unique vector PM (X) in M called the projection of X on M which has the property that =0 (2.81) for every vector Y in M. The notation PM denotes a projection operator that maps one vector X into another vector PM (X). X /-------7 / / / / / / // // X / / // Y // (a) ~ / (b) Figure 2-17. Subspaces in three-dimensional Euclidean space. 38 DETERMINISTIC SIGNAL PROCESSING 1 X-PM{X) k::--+-_ Figure 2-18. Illustration of projection for three-dimensional Euclidean space. Example 2-21. A projection is illustrated in Figure 2-18 for three-dimensional Euclidean space, where the subspace M is the plarie fonned by the x -axis and y -axis and X is an arbitrary vector, The projection is the result of dropping a perpendicular line from X down to the plane (this is the dashed line in Figure 2-18). The resulting vector (X - PM (X») is the vector shown parallel to the I :s; IIXII·IIYII with equality if and only if X =K .Y for some scalar K. 0 (2.83) SEC. 2.7 FURTHER READING 39 2.7. FURTHER READING Many textbooks cover the topics of this chapter in a more introductory and complete fashion than we do here. McGillem and Cooper [2], Oppenheim and Willsky [3], and Ziemer, Tranter, and Fannin [4] are useful for techniques applicable to both continuous and discrete-time systems. For discrete-time techniques only, the texts by Oppenheim and Schafer [5] and Jackson [6] are recommended. For continuous-time systems, with some discussion of discrete-time systems, we recommend Schwarz and Friedland [7]. To explore the Fourier transform in more mathematical depth, we recommend Papoulis [8] and Bracewell [9]. APPENDIX 2-A SUMMARY OF FOURIER TRANSFORM PROPERTIES The properties of both discrete and continuous-time Fourier transforms are sum- marized in this appendix. We define the even part f e (x) of a function f (x) to be f e (x) =[f (x) + f *( - x )]12 , (2.84) and the odd part f 0 (x) to be fo(x)=[f(x)-f*(-x)]l2, (2.85) so for example, Xe (e jroT ) = [X (e jroT ) + X*(e - jroT)]/2. We define the rectangular function as follows, (2.86) I; IxlSX rect(x ,X) = { 0; Ix I > X ' (2.87) and the unit step function as I; Ixl~O u (x) ={ 0; Ix I < 0 (2.88) 4{) DETERM1N1ST1C S!GNAL PROCESSING FOURIER TRANSFORM SVMMETRIES Conllnuous time Discrete time X(t) _ XUoo) XI _ X(ei"'T) x(-t) _ X(-jm) X--! ~ X(e-iUlT ) x·(J) ~ X·(-jro) x~· _ X·(e-iUlT ) x·(-t) _ Re{x(I)} _ jIm{x(t») _ X·Uro) X.Uoo) XoUoo) I x~ _ Re{xI} _ jlrn{xl} - X·(e iUlT ) X.(e iUlT ) Xo(e ifOT ) x,(t) ~ Re{X(ft6}} x.~ ~ Re{X(eiUlT )} x,,(t) _ jIm! X(j (0) J. x",! _ jIm{ X (e iUlT )} FOURIER TRANSFORMS PROPERTIES -- m:(J)+by(l) x(t)* y(l) Continuous time aXlj (0) + bY(joo) XUro)YUoo) * x(J)y(t) ~ ;1tX(joo) Y(joo) x(al) -x(1 - 't) - e'~x(t) -t(X cos(mor )x(t) - Ii'" x(t) dt'" - (-jt)'" x(t) f - X ('t)d't --XUt) --l...-X U!E..) lal a X (j oo)e - iUl< X (j 00 - j rfJo) (j m- j rfJo) 4- X(ft6 + jf£io)} Uoo)'"XUoo) d"'XUoo) dCii" -1 ,-X U (0) + 1t1, l(00) JX ('t)d't Joo _. 21tX(-00) Discrete time _ aX(eiUlT)+bY(eiUlT) _ X(eiUlT)Y(eiUlT) _ L rf'WT X(eiUT)Y(ei(Ul-O)T)dn :Dt_ivT XI -K _ X (ej"'T)e - j<»KT ~"""T XI _ X(ei(Ul-"")T) cos/\'=~T;x\. ~ ."""0"" Ill; l2.IX' ( e-,i ( Ul -"" ) T ' ,""X( I -, ei( -, Ul +,,,,) T) , 7 APP.2-A SUMMARY OF FOURIER TRANSFORM PROPERTIES 41 FOURIER TRANSFORM PAIRS' e-ifiltll. ~ 2no(0) - IDo) o' . ej-m.l.T ~ -21t (O)-IDo) T aCt - T) ~ e -}-mT cosCOV) ~ xCll(m-")+ll(m+,,ll I °t_j[ ~ e - jlillrr cos(COokT) ~ n. [(5(0) - COo) + /1(0)+ COo» I sin(COof ) ~ 1t [1\(0)- COo) - /1(0) + COo)] sin(eookT) ~ 1t JT (/I(ro - COo) - /I(ro + COo» sin(Wt) WI ~ ;r~t(ro.w) I sin(WkT) 1t ~ *T reet(ro,W) WkT e_-alu-(t) f-+ -.--;Re(a} >0 JO)+ a ,-A: IIE ~ 1 1 - r-1e- jUlT "(t) ~ 1tO(ro) + 71- Jro =_ L - L I\(t - kT) ~ 27t - /I(ro-21-tm) k =-_ T .. T - L=_ ~ 21t - 21t I\(ro--.m) ,v[ .. NT rect(t ,T) ~ 1T sin (roT) ~ roT 1 ~ -1t sgn(ro) jt APPENDIX 2-8 SPECTRAL FACTORIZATION In this appendix we will derive the spectral factorization (2.55) of a rational transfer function that is non-negative real on the unit circle, and also derive the geometric mean representation of A 2. Exercise 2-17. The purpose of this exercise is to show that any transfer function S (z) that is real valued (not necessarily non-negative) on the unit circle, must have conjugate-reciprocal pole pairs . and zero pairs. (a) Scohnoj.wugtahtaet siyfmSm(eejotno.Tc,) is St real valued :::: S -t . for all ro, then the inverse Fouriertransfonn Sk is (b) Show that the symmetry relationship in (a) implies (2.54). Hence, (2.54) is valid for any S (z) that is real valued on the unit circle. D We can now study how the general factorization of (2.44) is modified for the nonnegative real transfer function. Equation (2.44) tells us that for any stable S (z) there exist monic strictly IPinimum-phase and strictly maximum-phase transfer functions 1 The discrete-time Fourier transform expressions are valid in the range - 1tIT S ro S 1tIT. To extend this range, the given expression should be repeated periodically. 42 DETERMINISTIC SIGNAL PROCESSING such that S (Z) = B 'ZL H min{z)H max{z)H zero{z) , (2.89) where H min{z) includes all zeros and poles inside the unit circle, H max{z) includes all zeros and poles outside the unit circle, and H uro{z) includes all zeros on the unit cir- cle. Exercise 2-17 implies that H min{z) = H:wr. (liz .), since poles and zeros come in conjugate-reciprocal pairs. Thus, the minimum-phase and maximum-phase parts are each reflected transfer functions of the other. Since they are reflected, we know that they are the complex conjugate of one another on the unit circle, and hence the contribution of H min{z )Hmax{z) is real and non-negative on the unit circle. Unfortunately, Exercise 2-17 does not tell us anything new about H Zt:lo{z), since = = for Iz I 1, it is automatically true that z liz·. We thus have to investigate further the nature of B ·zL H zero{z). In particular, we are interested in whatever restrictions there are on its zeros in order for it to be real valued, or non-negative real valued. Exercise 2-18. Establish the following necessary and sufficient conditions on B °zL H -mro(z) to be real valued on the unit circle: if its zeros = are at Zj elf);, 1 ~ i ~ K • then we must have that K =2L (the number of zeros on the unit circle must be even), and the constant coefficient B must be of the fonn = 2L B C exp( - j ~ 9j /2} j =1 (2.90) for any real-valued constant C. 0 The role of the zL term is to force zL H zero{z) to have the same number of terms in positive and negative powers of z (recall that H zero{z) is by assumption causal and monic, and hence only has non-positive powers of z). Exercise 2-18 says that any transfer function with zeros on the unit circle is real valued, as long as the number of zeros is even, the constant coefficient has the proper phase, and it is multiplied by the proper power of z. Exercise 2-18 still doesn't answer the question of when B .zL H zero{z) is non-negative real valued. Exercise 2-19. Show that when the conditions of Exercise 2-18 are satisfied, the resulting transfer function is non-negative real valued on the unit circle if and only if it has L double zeros and the constant C is of the fonn C =(-If A 2 where A is real valued. 0 We can now assert that if S (z) is non-negative real, B·z L Hzero{z)={-lfA2Lne-J°aizLnL (1-eJ°ai z-1)2 2n j = 1 j =1 =A (1- e - jaiz) (1- e jai z-1) , j=l (2.91) which is of the form of the product of a transfer function times its reflected transfer APP.2-8 SPECTRAL FACTORIZATlON 43 function. Hence, combining (2.91) with (2.89), we obtain the spectral factorization (2.55). n To study the multiplicative constant A 2, replace z by (ejO>T) in (2.56) to get ~1-cle-jO>TI2 n S(ejo>T) = A 2 lZI . ' lel I S; 1, Idl I < 1. H - dl e - J o>T 12 j;=1 (2.92) Taking the logarithm of both sides (the base does not matter), and then integrating over the full Nyquist bandwidth, - J I 1:- T ftIT lo gS (e] . f l }T )d co =lo gA 2 + MT ftIT logll -c j . ;e-]o>T I 2 dco 2x -niT 1=1 2x -fflT ftIT I - 1N : -2. T' log 11 - . dj;e - Jail ,2dco. (2.93) 1=1 1t_ftIT Fortunately, the last two terms are zero. To see this, write cl a je, where (}< tal S; 1. Then we wish to evaluate the integral or dl in polar form as - J T x.'T .. log Il-ae J8e- Jail 12 dco. 2x -ftIT (2.94) After some manipulation (see Problem 2-30), this becomes 1 - It flog (l + a 2- 2a 'cosco) dco . = 0 . x(} (2.95) a. Note that the angle of the pole or zero does not affect the integral.. Integral (2.95) can be found in standard integral tables, which show that it evaluates to zer~ Thus, we have established (2.57). While (2.57) was derived only for rational spectra, both the spectral factorization (2.55) and the geometric mean formula (2.57) apply to gen- eral (non-rational) spectra as well. PROBLEMS 2-1. A system with a complex-valued input and output can be described in tenns of systems with reaJ-.valued inputs and outputs, as shown in Figure 2-2. Show that if the impulse response of the system is real-valued, then there is no aosstalk (or cross-coupling) between the real and imaginary partS, whereas if the impulse response is complex-valued then there is crosstalk. 2-2. (a) Show that eifSJl is an eigenfuction of a continuous-time L11 system with impulse response h (t), meaning that the response 10 this input is the same complex exponential multiplied by a complex constant called the eigenvalue. (b) Repeat for a discrete-time L11 system with impulse response ht . 44 DETERMINISTIC SIGNAL PROCESSING (c) Show that for a fixed ro the eigenvalue in (b) is the Fourier transfonn H(e iJDT ) of the discrete- time impulse response Itt. Specifically, show that when the input is eimA:T. the output can be written Yt = H (eia>T)eio»:T . (2.96) Hence that magnitude response IH(eia>T) I gives the gain of the system at each frequency, and the phase response arg (H (ejfJ)T)) gives the phase change. 2-3. Consider the mixed discrete and continuous-time system in Figure 2-19. (a) Find the Fourier transfonn of Y(1). (b) Is the system linear? Justify. (<:) Findwnditions on G fj ro),H (eia>T). and/or F Uro) such that the system is time invariant. 2-4. Derive the following Parseval's relationships for the energy of a signal: i _ - vr _jlx(t)1 2dt=_217t__jlXUro)fl dro. ", = _ lx",1 2 =..2.!7..t. fiX(eiCll1)i2dro. -'ItIT (2.97) 1-S. Given that a discrete-time signal Xi is obtained from a continuons-time signal x (t) by sam- pling, can you relate the energy of the discrete-time signal to the energy of the continuous-time signal? What if the continuous-time signal is mown to be properly bandlimited'? 2-6. Given a discrete-time system with impulse response lit = St + Sk-h what is its transfer func- tion and frequency response? If the input is XJ: =cos(CJ>rIcT) what is the output? Show that the system has a phase response that is piecewise linear in frequency 0>0. 2-7. Show that the phase response 9(ro) = arg (H Uro» of a real system is anti-symmetric. 2-8. What is the impulse response of a real system that produces a constant phase shift of i} and unity gain at all frequencies? Such a system is calI.OO a phose shifter. 2-9. Find the Fourier transform of 1: = ;r(t) Ift=- j(t-mT) + a' 2-10. Show that the output of JIIl LTI system cannot contain frequencies not present in the input. 2-11. Sketch both a QAM modulator and demodulator for two infonnation-bearing signals a(t) and b (t), where your sketch includes real-valued signals only. 2-12. (a) Find an way to implement a general demodulator without using a phase splitter. Hint: You will need a lowpass filter. X(I) ~(t) G{j{1}) hk Wj; i ----i H(~ia>T) SAttPlEAT t =JcT IMPULSE GENERATOR wet) f (t) FUro) yet) Figure 2-19. A mixed continuous and discrete-time system. 45 (b) Repeat Problem 2-11 for this demodulator representation. 2·13. (a) Using (2.7). show that for any complex number z, the sequence zk is an eigenfunction of a discrete-time LTI system. That is, the response to this signal is Yk =H (z )zk . (2.98) (b) How is this related to the frequency response result discussed in Problem 2-2, part c? 2·14. Repeat Problem 2-13 using only the definition of a discrete-time L TI system and not using the convolution sum. (Hint: Note that zk- = zk z"'.) 2·15. Calculate the Z transform of "< I; ,,~O Ilk = { 0; O· (2.99) 2·16. where Uk is the lUIit step function. Show that for any complex number z, z' is an eigen(unction of any continuous-time LTI system. Also show that for any z there exists an s such that elt = zI. Relate the eigenvalue of the system for a fixed z to the Laplace transform J H(s)= e-"h(t)dt . (2.100) 2-17. (a) The Fourier transform is the Laplace transform evaluated at s = j ro, which explains the nota- tion H Uro) used in this book. Show that the signals " ~0 have the same Z transform. " <0 (b) What are the ROC for the two cases? _ak. " < 0 Yk = { 0, " ~0 (2.101) (c) Under what conditions are the two signals stable? Relate this to the ROC. 2-18. Let 2-19. X(z)= _z_ z-a (2.102) and find the time domain signals for both possible ROC. Do this directly without using the results of Problem 2-17. Given (a) (b) (c) 2-20. 46 DETERMINISTIC SIGNAL PROCESSING 2-21. = = Given a transfer function in the middle fonn of (2.33) with r 0, A I, zeros at 1.5·e:l:ilrl4 and ±j, and poles at O.5·e:l:ilrl8, Find all the terms in the factorization (2.44). Write them in tenns of polynomials with real-valued coefficients. 2-22. (a) Let hi be a causal strictly minimum-phase sequence with a rational Z transfonn, and let gi be another causal sequence obtained by taking a zero of H (z) at e and replacing it with a zero at lie'. Show that 1H (e icoT ) 1= IG (eia>T) I. Hint: Find a transfer function A (z) that when multiplied by H (z) yields G (z). (b) Show that N N 1: Ihi 12 ~ 1: Igi 12 i..o i..o (2.104) for all N ~ O. Hint: Define F (z ) = H (z )/(1 - cz-I) and write gi and hi in tenns of!k.. (c) Show that for any two rational transfer functions H (z) and G (z) such that H (z) is minimum phase and IH (e icoT ) 1= IG (eia>T) I, (2.104) is true for all N ~ O. Thus, among all sequences with the same magnitude response, minimum-phase sequences are maximally concentrated near k = 0 in the mean-square sense. From Parseval's fonnula plus the unit magnitude of an allpass filter, clearly both sides of (2.104) approach one another as N -+00. 2-23. Pass a causal input signal Xi through a first-order stable causal aIlpass filter such as that in Example 2-9 to yield a causal output signal Yi. Show that for any N ~ 0 N N 1: IXiI2~ 1: IYi l2 iOO() i..o (2.105) and hence the allpass filter is dispersive in the sense that that it reduces the signal energy in the first N samples while keeping the total signal energy the same (since it has unit magnitude fre· quency response). Hint: Consider a solution method similar to Problem 2·22. 2-24. Use (2.44) and Example 2-11 to derive the factorization in (2.43). 2-25. What is the frequency response of the matched filter in Figure 2-16? 2-26. Given three signals SI' S2' and Ss: I 11~~ 0SI(I) l' ~ ~t (a) Find the nonn of SI and S2 and the inner product of these two signals in signal space. What is the angle between the two signals? (b) Find the nonn of the signal SI + S2' (c) Find a signal ~ that is orthogonal to both SI and S2' (d) Find a signal S4 that is in the subspace spanned by SI and S2 and is orthogonal to SI' (e) Find the signal in the subspace spanned by SI and S2 that is closest to Ss. 2-27. Consider the space of all finite-energy continuous-time signals that are bandlirnited to W radians/sec. (a) Show that this set of signals B is a subspace of signal space. (b) Characterize the subspace consisting of all signals orthogonal to every signal in B . (c) Find the projection of the signal SI in Problem 2-26 on B for W = 1. 2-28. Given two subspaces M 1 and M 2 of a Hilbert space, they are orthogonal if every vector in M 1 is orthogonal to every vector in M 2. The sum of the two subspaces M 1(f)M 2 is the subspace consisting of vectors that are the sum of a vector in M1 and a vector in M2' Given two orthogonal subspaces M 1 and M 2 of a Hilben space H and an arbitrary vector X in H, show that the 47 2-29. projection of X on M 1eM2 can be expressed uniquely as PM.(fIM,(Y0 = PM.(Y0 + PM,(X) , or in words the sum of the projection on M 1 and the projection on M 2. Given a transmitted pulse h (t) it is useful to define an awoco"elationjiulction J p"(k) = h (t)h' (t-kT) dt . (2.106) (2.107) l·30. Show that (2.108) or in words, the autocorrelation function of a pulse can never be larger than the energy of the pulse. Show that (2.95) is equivalent to (2.94). REFERENCES 1. A. W. Naylor and G. R. Sell, Linear Operator Theory in Engineering and Science. Holt, Rinehart an<1 Winston. Inc., New York (1971). 2. C. D. McGillem and G. R. Cooper, Continuous and Discrete Signal and System Analysis. Holt, Rinehart, and Winston (1984). 3. A. V. Oppenheim,A. S. Willsky, and Ian T. Youn~Signals and Systems. Prentice Hall (1983). 4. R. E. Ziemer, W. H. Tranter, and D. R. Fannin, Signals and Systems: Continuous and Discrete, Macmillan Publishing Co., NY (1983). 5. A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Prentice-Hall, Inc. (1989). 6. L. Jackson, Digital Filters and Signal Processing, Kluwer Academic Publishers. Boston. MA (1985). 7. R. J. Schwarz and B. Friedland, Linear Systems, McGraw-Hill Book Co. (1965). 8. A. Papoulis, The Fourier Integral and its Applications. McGraw-HilI Book Co., New York (1962). eo.. 9. R. N. Bracewell, The Fourier Transform and us Applicalions. McGraw-Hill Book New York (1965). STOCHASTIC SIGNAL PROCESSING Although modulation and demodulation are deterministic, the information to be transmitted over a communication system, as well as the noise encountered in the physical transmission medium, is random or stochastic. These phenomena cannot be predicted in advance, but they have certain predictable characteristics which can be summarized in a random process model. The design of a digital communication system heavily exploits these characteristics. In this chapter we review the notation that will be used for random variables and processes, and cover several topics in detail that may be new to some readers and are particularly important in the sequel. These include Chernoff bounding techniques, Bayes' rule, and mixtures of discrete-time and continuous-time random processes. Markov chains are discussed in £ection 3.3, and will be used in a diverse set of applications in Chapters 9, 10, 12-14, and 19. Section 3.4, on Poisson processes, uses the Markov chain results to describe Poisson processes and shot noise, which will be important to the understanding of optical fiber systems in Chapters 5 and 8. 3.1. RANDOM VARIABLES Before reviewing the theory of the stochastic process, we review some theory and notation associated with random variables. In digital communication it is common to encounter combinations of discrete and continuous-valued random variables, SEC. 3.1 RANDOM VARIABLES 49 so this will be emphasized. We denote a random variable by a capital letter, such as X, and an outcome of the random variable by a lower-case letter, such as x. The random variable is a real or complex-valued function defined on the sample space n of all possible outcomes. An event E is a set of possible outcomes and is assigned a probability, written Pr[E], where 0 ~ Pr[E] ~ 1. Since an event is a set, we can define the union of two events, E 1U E 2 or the intersection of events E 1(IE 2' The basic fonnula Pr[E 1uE 2] = Pr[EIl + Pr[E 2] - Pr[E 1(lE 2] (3.1) leads to the very useful union bound, Pr[E lUE21 ~ Pr[E Il + Pr[E 2] . (3.2) The cumulative distribution function (c.df) of a real valued random variable X is the probability of the event X ~ x , . = Fx(x) Pr[X ~x] . (3.3) Where there can be no confusion, we often omit the subscript, writing the c.d.f. as F (x). For a complex-valued random variable Y, Fy(y)= Pr[Re( Y) ~Re(y ),Im( Y) ~Im(y)]. (3.4) For a continuous real-valued random variable, the probability density function (p.df) f X (x ) is defined such that for any interval I c R Pr[X E/] =Ifx(x)dx. I (3.5) For a complex-valued random variable, I is a region in the complex plane. For a real-valued random variable X, f x (x ) = d dx Fx (x), (3.6) where the derivative exists. We will often use the generalized derivative, so that when the c.dJ. includes a step function the corresponding p.d.f. has a Dirac delta function. Example 3-1. For the c.d.f. shown below, jF_(_X)_+- ----..:,X o the p.d.f. consists exclusively of Dirac delta functions, f (x) = OS(5(x) + O.5·(5(x -1) . (3.7) Such a density is characteristic of a discrete random variable. 0 For a discrete-valued random variable X, we will denote the probability of an out- come x En as 50 STOCHASTIC SIGNAL PROCESSING Px (x) = Pr[X = x], (3.8) where we will again omit the subscript where there can be no confusion. The p.dJ. can be written as f x (x) = L Px (y )O(x - y ) . y £ox (3.9) The expected value or mean of X is defined as J E[X ] = xfx (x )dx or E[X] = L x"Px (x) , x £0 (3.10) for continuous-valued and discrete-valued random variables, respectively. For a complex-valued random variable Y, we integrate over the complex plane, J J E[Y] = (x + iz)fy(x + iz)dx dz. (3.11) The fundamental theorem of expectation states that if g (-) is any function defined on the sample space of X , then J E[g(X)] = g(x)fx(x)dx. (3.12) Especially important expectations are the mean and variance, defined as crg Il =E[X] , =E[(X -E[X ])2] =E[X 2] - E[X]2 . For complex-valued random variables, the variance is defined similarly as crg =E[ IX 12] - IE[X] 12 =E[XX*] - E[X]{E[X]}* . The joint c.df of two real-valued random variables X and Y is xy J J Fx,y(x,y)= Pr(X~x,Y~y)= fX,y(a,~)dad~, (3.13) (3.14) (3.15) where f x, y (x , y) is the joint p.d!", The joint p.d.f. can be written in terms of the joint c.dJ. as a2 f(x,y)= aXayF(x,y), (3.16) where we have omitted the subscripts as before. The marginal density f x (x) of a random variable X can be found from the joint p.d.f. from fx(x) = JfX,y(x,y)dy. (3.17) The random variables X and Y are independent or statistically independent if for all intervals I and J, SEC. 3.1 RANDOM VARIABLES 51 Pr[X EI n Y EJ l = Pr[X Ell Pr[Y EJ l , (3.18) which is equivalent to f x, y (x , y) = f x (x )fy (y ) or FX Y (x , Y ) = Fx (x )Fy (y) . (3.19) Independence implies that the cross-correlation is E[XY] = E[X lE[Y] . (3.20) When (3.20) is satisfied, the random variables are said to be uncorrelated. Two random variables can be uncorrelated and yet not be independent. 3.1.1. Moment Generating Function and Chernoff Bound The characteristic function of X is defined as (3.21) for a complex variable s. This is the Laplace transform of f x (x ) evaluated at x =-s . When s is real-valued, which will suffice for applications in this book, (3.21) is called the moment generating function. Exercise 3-1. Define Z =X + Y where X and Y are independent. and show that *z (s) = **x (s )**y (s) . o (3.22) Exercise 3-2. Show that o (3.23) The Chernoff bound, based on the moment generating function, is very useful for bounding the tail probability for a random variable where an exact evaluation is intractable. Exercise 3-3. (a) Show that the probability of event X > x is bounded by = 1 - Fx (x) Pr[X > x] :5 e-s;c **x (s) (3.24) for any real-valued s <:: O. This establishes that the tail of the p.d.f. decreases at least exponentially for any distribution for which the moment generating function exists. (Hint: Write the probability as the integral against a step function, and bound the step function by an exponential.) 52 STOCHASTIC SIGNAL PROCESSING (b) Find the similar bound (3.25) fors ~ O. (c) Show that the s that minimizes the bound (makes it tightest) in (a) and (b) must satisfy as . ax (s) x x (s) = - x x ( - s)= axa(s- s) . (3.26) respectively. 0 3.1.2. Conditional Probabilities and Bayes' Rule The conditional probability that a continuous-valued random variable X is in the interval I given that Y is in the interval 1 is defined for all 1 such that Pr[Y El] '# 0 to be Pr[X EI (l Y El] Pr[X E! IY El] = Pr[Y E l ] ' (3.27) where Pr[Y El] is called a marginal probability because it does not consider the possible effects of X on Y. For complex or vector-valued random variables, I and 1 are regions or volumes, rather than intervals. If X and Yare independent, then Pr[X EI lYE l] = Pr[X EI]. The joint probability can be written in terms of the conditional probabilities, Pr[X EI (lY El] = Pr[X EI IY El] Pr[Y El] . (3.28) Equivalently, Ix. y(x ,y) = IXIY(x Iy) Iy(y) , (3.29) where I Y (y) is called a marginal density. The conditional density I x IY (x Iy ) is well defined only for y such that I Y (y ) '# O. Since Ix. Y (x, Y ) = I Y. x (y , x), (3.29) implies that IXIY(x ly)/y(Y) = I YIX(Y Ix )/x (x) , (3.30) which is a form of Bayes' rule. It is common in digital communication systems to encounter both discretevalued and continuous-valued random variables in the same system. In this case, (3.30) has Dirac delta functions. Exercise 3-4. Suppose that Y is discrete-valued and X is continuous-valued. Show that by integrating (3.30) over small intervals about y , we get the mixed form of Bayes' rule, !XIY(X Iy)py(y) =PYlx(Y Ix)!x(x). (3.31) This involves both probabilities and probability density functions. It has no delta functions as long as X is continuous-valued. If X is also discrete-valued, show that then SEC. 3.1 RANDOM VARIABLES PXly(xly)py(y) =PYJx(Ylx)px(x) , which has only discrete probabilities. 0 53 (3.32) For discrete-valued distributions, the marginal probability can be written in terms of the conditional probabilities as py(y)= LPYlx(Ylx)px(x)= LPy,X(Y'x), x£Q XEQ (3.33) where n is the countable sample space for X. This relation shows us how to obtain the marginal probabilities of a random variable given only joint probabilities, or given only conditional probabilities and the marginal probabilities of the other random vari- able. Using this relation, we can write the conditional probability of X given Y in terms of the conditional probability of Y given X and the marginal probability of X PXI y(x Iy) = PYlx(Ylx)px(x) ------'------ L PYlx(ylx)px(x) x EQx (3.34) This relation is known as Bayes' theorem. The analogous Bayes' theorem for continuous-valued random variables is J f XI y (x Iy) = fYlx(Y Ix)fx(x) ----'------ f YI X (y Ix )fX (x ) dx x £Qx (3.35) 3.1.3. Gaussian Random Variables and the Central Limit Theorem A Gaussian or normal random variable has the p.d.f. f x (x ) _ _1 _ -r:=;=e -(x-IlW202 , O''l27t (3.36) where 0'2 is the variance and 1.1. is the mean. The c.d.f. can be expressed only as an integral, Jx Fx(x) = _1_ e-(u-Il)2I202da O'&_~ (3.37) for which there is no closed-form expression. The standard Gaussian random variable is a zero-mean Gaussian random variable X with variance 0'2 = I. The complementary distribution function of this standard Gaussian is denoted by the special notation Q(x), Q(x)= Pr[X>x]=I-Fx(x)= .~je-u2I2da. '127t x (3.38) Q(x), therefore, is the integral of the tail of the Gaussian density. It is plotted in Fig- ure 3-1 using a log scale for probability. The function is related to the well-tabulated error function (erf(x)) and the complementary error function (erfc(x)) by 54 STOCHASTIC SIGNAL PROCESSING 1.0Q(x) 0.5 0.1 0.01 Q(x)=f1); -J e- x] = x J.I.] . D (3.40) Although Q (-) can only be tabulated or numerically determined, a useful bound fol- lows from the Chernoff bound of Exercise 3-3. Exercise 3-6. (a) Show that the moment generating function of a Gaussian random variable with mean J.I. and variance 0 2 is logex(s) = J.I.S + aZs 212 (3.41) (b) Show from the Chernoff bound (Exercise 3-3) that s: 1 - Fx(x) e-(x - J.L)'I2a' and thus that (3.42) SEC. 3.1 RANDOM VARIABLES 55 (3.43) D Tighter bounds are derived in Problem 3-3 and plotted in Figure 3-1. Use of the Gaussian distribution for modeling noise phenomena can be justified on physical grounds by the central limit theorem. It states, roughly, that the Gaussian distribution is a good model for the cumulative effect of a large number of indepen- dent random variables, regardless of the nature of their individual distributions. More precisely, let (Yj ) for 1 ~ i ~ N denote a set of N statistically independent zero-mean random variables, each with the same p.d.f. f y. (y) =f (y) and finite variance cr. I That is, the random variables are independent and identically distributed (Li.d.). Define a random variable Z that is a normalized sum of the Yj , 1 N <' Z = r== IXi' 'IN j=l (3.44) Then the distribution function of Z approaches Gaussian, (l - Q (z la)), as N --+ 00. If each random variable Yj represents some individual physical phenomenon, and Z is the cumulative effect of these phenomena, then as LV gets large, the distribution of Z becomes Gaussian, regardless of the distribution of each Yj • In view of L~is theorem, it is hardly surprising that the sum of independent Gaussian random variables is Gaussian. Exercise 3-7. For an arbitrary line-at combination of N zero mean independent Gaussian random variables Xi. each with variance a2• Z =a:X 1 + ... +aNXN , (3.45) use the moment generating function to show t.'u!.t Z is itself zero-mean Gaussian Wit.'l variance (3.46) D Two zero-mean Gaussian random variables with variance a2 are jointly Gaussian l] if their joint p.d.f. is f x. y(x , y) _ - 1 [ 21ta2" 1 _ p2 exp _ x 2 - 2pxy + 2crO _ p2)' (3.47) where p is called the correlation coefficient. E(XY] p = cr . (3.48) 56 STOCHASTIC SIGNAL PROCESSING Note that -1 S pSI, and if X and Yare uncorrelated then p =O. Exercise 3-8. Show that two jointly Gaussian random variables are statistically independent if and only if they are uncorrelated. D This definition can be extended to N > 2 jointly Gaussian random variables. If a ran- dom vector X has components that are jointly zero-mean independent Gaussian ran- dom variables with the same variance cJ2, then the joint p.d.f. is 1 [ 1 = f x(x) M 2-.M exp (21t) / (J- - 20_2- II x" 2J , (3.49) where M is the number of components in the vector and "x II is the Euclidean norm (2.73) of the vector. When X is complex-valued with independent real and imaginary parts, (3.49) still holds. Any linear combination of jointly Gaussian random variables is Gaussian (as we saw in Exercise 3-7 for independent zero-mean Gaussian random variables). This can be further generalized. A vector X with M jointly Gaussian real-valued random variables has p.d.f. f x(x) = 1 [1 J M/2 Ih exp - -(x - rnx)TCx- 1(x - rnx) , (21t) ICx I 2 (3.50) where Cx =E[(x - rnx)(X - rnxl] (3.51) is the covariance matrix, ICx I is its determinant, and rnx =E[X] is the vector mean. In the special case that the vector mean is zero and elements of the random vector are independent with equal variances, Cx becomes diagonal and (3.50) reduces to (3.49). An important observation from (3.50) is that the p.d.f. of a Gaussian random vector is completely specified by the vector mean and the pairwise covariances contained in the covariance matrix. Consequently, these two sets of parameters completely specify all the statistical properties of a Gaussian random vector. 3.1.4. Geometric Interpretation Random variables can be interpreted geometrically using the approach of Section 2.6. In particular, consider the set of all complex-valued random variables X with bounded second moments, E [ IX 12] < 00, and associate a vector X with each random variable, (3.52) Exercise 3-9. Make reasonable definitions for the operations of addition of vectors, multiplication by a scalar, the vector 0, and the additive inverse. Show that the set of such vectors form a linear space (Section 2.6.1). D SEC. 3.1 RANDOM VARIABLES 57 An inner product on this space can be defined as =E [xy*] . (3.53) Exercise 3-10. Show that (3.53) is a legitimate inner product (Section 2.6.2). 0 This geometric interpretation pays dividends in understanding the results of linear prediction theory (Section 3.2.3). 3.2. RANDOM PROCESSES A discrete-time random process [Xk } is a sequence of random variables indexed by integers k, while a continuous-time random process X (t) is indexed by a real vari- able t. We write an outcome of (Xk } or (X (t)} as the lower case deterministic signal (xk} or (x (t)}. When there can be no confusion between a signal and a sample of the signal, we omit the braces ('}. Each random sample Xk or X (t) may be complex, vector-valued, or real-valued. Example 3-2. A real-valued random process X(t) is a Gaussian random process if its samples {X (t 1), ...• X(tN)} are jointly Gaussian random variables for any N and for any {t l' ... ,tN}' 0 The first and second moments of the random process are the mean = m (t) E[X (t)] and the autocorrelation Rxx(k,i)=E[XkX/] , Rxx (tl,t2)=E[X(tl)X*(t2)]' where X* is the complex conjugate of X. (3.54) (3.55) Example 3-3. Consider a real-valued, zero-mean Gaussian random process. A random vector X can be constructed from some arbitrary set of samples. For such a vector, the covariance matrix of (3.51) can be obtained from the autocorrelation function (3.55). Consequently, the joint p.d.f. (3.50) of any set of samples can be obtained from the autocorrelation function. Thus. the statistical properties of a zero-mean real-valued Gaussian random process are completely specified by its autocorrelation function. 0 A random process is strict-sense stationary if the p.d.f. for any sample is independent of the time index of the sample, and the joint p.dJ. of any set of samples depends only on the time differences between samples, and not on the absolute time of any sample. It is wide-sense stationary (WSS) if its mean is independent of the time 58 STOCHASTIC SIGNAL PROCESSING index, and its autocorrelation depends only on the time difference between samples, and not on the absolute time. In other words, mk or m (t) must be constant and Rxx(k,n or RXX (tlh) must be a function only of the difference k - i or t 1 - t2. Strict sense stationarity implies wide-sense stationarity, but not the reverse, unless the process is Gaussian. Example 3-4. A real-valued WSS Gaussian random process is also strict-sense stationary. The autocorrelation function and mean of such a process can be used to construct the covariance matrix (3.51) for any set of samples. Since the process is WSS, the entries in the matrix will be independent of the absolute time index of the samples, and will depend instead only on the time differences between samples. Consequently, the joint p.d.f. (3.50) of any set of samples will depend only on these time differences. Hence the process is strict-sense stationary. 0 For a WSS random process the autocorrelation function can be written in terms = = of the time difference between samples, m k - i or 't t 1 - t 2, yielding the simpler notation Rx (m) =E[Xk+mXk*] • Rx ('t) = E[X (t + 't)X*(t)] . (3.56) Rx(O) is the second moment of the samples = Rx(O) E[ IXk 12] , = Rx(O) E[ IX (t) 12] , (3.57) and can be interpreted as the power of a random process. For a WSS random process, the power spectral density or power spectrum is the Fourier transform of the autocorrelation function, m =-00 where T is the sample interval of the discrete-time random process. The power there- fore is the integral of the power spectrum, +rOT J Rx(O) = L Sx(e)fJlT)dro, 21t - 7 r l T ' ~ J Rx(O) = _1 SxUro)dro. 21t -00 (3.59) The power spectrum is real-valued since the autocorrelation function is conjugate symmetric. Rx (m) =R;(-m) or Rx ('t) =R;( - 't). It is also non-negative (see Prob- lem 3-9). Furthermore, if Xk or X (t) is real-valued, then the power spectrum is sym- metric about ro = o. We can also write the power spectrum as a Z transform or a Laplace transform, Sx(Z) = L Rx(m)z -m , m ==-00 Evaluating Sx (z ) on the unit circle or Sx (s) on the j ro axis yields (3.58). (3.60) SEC. 3.2 RANDOM PROCESSES 59 Example 3-5. a;. Consider a zero-mean random process {Xk } where the samples Xk are all independent aqd identically distributed (Li.d.) zero-mean random variables with variance In this caSe Rx(k) =a;ok and the power srCtruIn is a constant, Sx(e jlJlT ) =a;, independent of the frequency, with power Rx (0) =ax' 0 Any zero-mean process with a constant power spectrum is said to be a white random process. This mayor may not imply that the samples of the random process are independent, although for the important Gaussian case they are. Example 3-6. As in Example 3-5, consider a continuous-time random process {X (t)} with the autocorrelation function Rx('t) =NoO('t). (3.61) The power spectrum of this process is a constant, "sx U(0) =No, so {X (t)} is white. The power Rx (0) of this continuous-time white process is infinite. So we immediately run into mathematical difficulties for the continuous-time case that we did not encounter in the discrete-time case. 0 Although the continuous-time white random process of Example 3-6 leads to the non-physical condition that the power is infinite (or undefined), it is an extremely important model. It would appear from the fact that Rx(t) =0 for all t:F- 0 that any two distinct samples of a continuous-time white random process are uncorrelated, but, unfortunately, this makes no mathematical or physical sense. Sampling a continuoustime white random process is an ill-defined concept. Roughly speaking, a continuous-time white random process varies so quickly that it is not possible to determine its characteristics at any instant in time. In spite of these mathematical difficulties, the continuous-time white random processes is useful as a model for noise which has an approximately constant power spectrum over a bandwidth larger than the bandwidth of the system we are considering. In such a system we will always bandlimit the noise to eliminate any out-of-band component. In this event, it makes no difference if we start with a white noise or a more accurate model; the result will be very nearly the same. But using the white noise model results in significantly simpler algebraic manipulation. In this book we will often use the white noise model, and take care to always bandlimit this noise process prior to other operations such as sampling. After bandlimiting, we obtain a wellbehaved process with finite power. Example 3-7. Thermal or Johnson noise in electrical resistors has a power spectrum that is flat to more than 1012 Hz, a bandwidth much greater than most systems of interest (see [1 D. Thus, we can safely use white noise as a model for this thermal noise without compromising accuracy. The noise in the model at frequencies greater than 1012 Hz will always be filtered out at the input to our system anyway. By contrast, in optical systems (Section 5.3), thermal noise is generally insignificant at optical frequencies. Thermal noise is modeled as a Gaussian random process, from the central limit theorem, since it is comprised of the superposition of many independent events (thermal fluctuations of individual electrons). 0 60 STOCHASTIC SIGNAL PROCESSING 3.2.1. Cross-Correlation and Complex Processes Given two random processes X (t) and Y (t), we can define a cross-correlation function, (3.62) If X (t) and Y (t) are each wide-sense stationary, then they are jointly wide-sense stationary if Rxy (t 1,tz) is a function only of (t 1 - t 2)' A complex-valued random process X (t) is defined as X (t) =Re( X (t)} + j ·Im( X (t)} , (3.63) where Re (X (t) } and 1m ( X (t) } are real-valued random processes. The second order statistics of such a process consist of the two autocorrelation functions of the real and imaginary parts, as well as their cross-correlation functions. Complex Gaussian random processes are very important in digital communication systems; they have some special properties that are considered in detail in Chapter 8. 3.2.2. Filtered Random Processes A particular outcome Xk or x (t) of a random process is a signal, and therefore may be filtered or otherwise processed. We can also talk: about filtering the random process Xk or X (t) itself, rather than an outcome. Then we get a new random process with a sample space that is obtained by applying every element of the sample space of the original random process to the input of the filter. Example 3-8. A filtered Gaussian random process is a Gaussian random process. Intuitively, this is true because filtering is linear, and any linear combination of jointly Gaussian random variables is a Gaussian random variable. 0 Consider the two continuous-time LTI systems shown in Figure 3-2 with WSS continuous-time random process inputs. Exercise 3-11. Show that the output of the filter h1..t) is WSS, and that its autocorrelation function and power spectrum are given by * * RwCt) =h('t) h·( - 't) RxCt), = SwU 0» SxU 0» IH U 0» 12 , (3.64) 1 I X(t) h(t) Wet) • 1 I Y(t) get) u(t) • Figure 3-2. Two linear systems with WSS random process inputs. SEC. 3.2 o RANDOM PROCESSES 61 (3.65) Exercise 3-12. Show that if a WSS discrete-time random process Xk is filtered by a filter that has impulse response hk , and the result is Wk , then Wk is WSS and * * = Rw(m) hm h~m Rx (m) , = Sw(e jwT ) Sx(e jwT ) 1H (e jwT ) 12 , (3.66) Sw(z)=Sx(z)H(z)H·(l/z·) . (3.67) o Example 3-9. A white random process X (t) has power spectrum 5x U(0) =No, a constant. If it is filtered by an ideal LPF with transfer function H U.(0) =rect(oo, 21tXlO1) =2 { 0I';' 1001 < 21txlO12 otherwise (3.68) then the power spectrum of the output is No; lool<21txlOI2 SwU (0) =N orect(oo, 21txlOI2) ={ 0; otherwise (3.69) which is a reasonable approximation to thermal noise in a resistor. Furthermore, since thermal noise is the cumulative effect of random motion of a huge number of individual particles, we can apply the central limit theorem to argue that a sample of such thermal noise should be a Gaussian random variable. Thus we conclude that thermal noise is reasonably modeled as white Gaussian noise. 0 The cross-spectral density of two jointly WSS random processes at the filter inputs in Figure 3-2 is defined as the Fourier transform of the cross-correlation function, = Rxy ('t) E[X (t + t)Y,"0 for all (0), then Gx(z) is strictly minimum phase. In this case, its inverse filter Gx- 1(z) is stable, and is also a monic minimum-phase causal filter. If we filter the process Xk 51 with the filter Gx- 1(z), as shown in Figure 3-3, then from (3.67), the output Ik is a white random process with power spectrum (z) = A}. The random process (Ik) is called the innovations process. Its power is Ax' The innovations process and the filter Gx (z) can be used to generate the random process Xto as shown in Figure 3-3. This helps to explain the terminology. Since I k is white, each new sample is uncorrelated with previous samples. Thus each new sam- ple brings new information (an "innovation") about the random process Xk • Viewed another way, the whitening filter G-1(z) removes redundant information from Xk by removing correlated components in the samples. What is left has only uncorrelated samples. Thus we can think of Xk as having two components; the innovation is the new or "random" part, while the remainder is a linear combination of past innovations. 3.2.4. Linear Prediction A linear predictor forms an estimate of the current sample of a discrete-time random process from a linear combination of the past samples. It uses the correlation between samples to construct an informed estimate of the current sample based on the past. If the transfer function of the predictor is F (z), it must be strictly causal, r. F (z) = /tz-k . k=1 (3.73) This ensures that only past samples are used in constructing the prediction. The prediction error, formed by taking the difference between the current sample and the prediction, is generated by a applying a filter with transfer function E (z) =1 - F (z) (3.74) to the random process. E (z) must be stable, causal, and monic to be a legitimate 4__ _....Ir- Xl G_.-1_{Z_}_.....;M G_.{_Z_} Xl Figure 3-3. Generation of the innovations 11 from Xl. and the recovery of Xl from its innovations. SEC. 3.2 RANDOM PROCESSES 63 prediction error filter. The prediction error filter E (z) (or equivalently F (z» should be designed to minimize the power of the prediction error sequence E". We will now = show that E (z) Gx-1(z) is optimal, where G (z) results from the spectral factoriza- tion in (3.72). In Figure 3-4a we show the innovations representation of X", where it is generated by filtering the innovations process I" with the filter Gx (z). Two prototype prediction error filters a shown, a general filter E (z), which is constrained to be causal and monic, and a specific filter Gx-1(z), which is causal and monic. We will now demonstrate that the lower output E" cannot have less power than the upper output I", so the upper filter Gx-1(z) is an optimal prediction error filter. The prediction error for the optimal predictor is therefore precisely the innovation I", and the prediction error power is Ax2• To show this, consider Figure 3-4b, where E (z) is split into two filters Gx-1(z) and Gx (z)E (z). Such a split might be absurd in implementation, but mathematically it is perfectly reasonable. The output of the first filter is the innovations process I". We will now show that the output of the second filter cannot have lower power than that of I". Since both Gx(z) and E(z) are causal and monic filters (Gx(oo)=E(oo)= 1) it follows that Gx(z)E(z) must also be causal and monic = (Gx(oo)E(oo) 1). Let this filter have impulse response !Ie, 0 ~ k = < 00, where f 0 1. Since the input innovations process is white with variance A}, the output variance is A}·i 1!Ie 12~A}, ,,=0 with equality if and only if f" =0, k ~ 1, or in other words E (z) =G;l(z). (3.75) c--l Gz- 1(z) f-- Xt ---l Gz(z) ... ....• ~ (a) .' ..- ....... E(z) .' ~. ~_G_z-_I_(Z_) __,,""_Jt_----1~ Gz(z)E (z) Figure 3-4. Steps in the derivation of the optimal linear prediction error fifter. (a) Comparis- on 01 two prediction error fifters. (b) Decomposition of the general fifter E (z). 64 STOCHASTIC SIGNAL PROCESSING Intuitively, since the predictor is exploiting the correlation of input samples, we would expect the prediction error to be white, since otherwise there would still be correlation to further exploit. Thus, the second filter in Figure 3-4b is counterproductive, since it introduces correlation. However, this intuitive explanation is incomplete, because Gx (z)E (z) could have a flat frequency response, in which case Ek would still be white even though it is not the innovations process for Xk ! This case is addressed by the following exercise. Exercise 3-14. Show that if H (z) is rational, causal, and monic, and has a flat frequency response = IH (eJroT ) I K, then K > 1. Thus, a monic filter with a flat frequency response must have gain larger than unity, and thus its white output has a larger variance than its input. 0 This exercise is instructive, because it shows that any causal and monic filter with a flat frequency response will amplify its inputs. The optimal prediction error filter Gx- 1(z) thus has two key properties: it is a whitening filter, resulting in a white prediction error, and it is minimum-phase. The whitening filter property of the prediction error filter (if not the minimum-phase property) can also be demonstrated by orthogonality arguments (see Problem 3-5), and has a simple geometric interpretation (see Problem 3-6). 3.2.5. Sampling a Random Process A finite power continuous-time random process X (t) can be sampled, yielding a discrete-time random process Yk =X (kT). Since we will be performing this sampling operation often in digital communication systems, it is important to relate the statistics of the continuous-time random processes with those of the discrete-time random process obtained by sampling it. Assuming X (t) is WSS, Ryy(k,i) =E[X(kT)X*(iT)] =Rx(mT) , (3.76) where m =k - i, so the sampled process is WSS with autocorrelation equal to a sam- pled version of the autocorrelation Rx ('t) of the original continuous-time signal. From (2.17), the power spectrum of the continuous-time random process and its sampled discrete-time process are related by I, Sy(eiroT)=-.l SxU(ro-m 21t )). T m=-oo T (3.77) As in the deterministic case, aliasing distortion results when the bandwidth is greater than half the sampling rate, where bandwidth in this case is defined in terms of the power spectrum. Example 3-10. Consider the approximation to thermal noise in Example 3-9. We wish to determine whether samples of such noise are uncorrelated; if they are, then sampled thermal noise is a discrete-time white Gaussian process. From (3.69), the autocorrelation function of the bandlimited noise W (t ) is SEC. 3.2 RANDOM PROCESSES 65 = Rw('r.) NOB sin(Bt) 1t Bt (3.78) where B =21txlO12, or 1,000 GHz. Rw ('r.) has zero crossings at multiples of 1tIB , implying that samples of the random process taken at multiples of 1tIB will be uncorrelated. That is, 1t Rw(mB") =E[W(m1tlB )W(O)] =N o~BOm. (3.79) For these particular sampling rates, therefore, samples of the approximation to thermal noise are a discrete-time Gaussian white noise process. In practice, we are unlikely to sam- ple any signal anywhere near the rate B 11t, or 2,000 GHz. Since IRw('r.) I decays as '[ increases, samples at any reasonable sampling rate are nearLy uncorrelated. 0 Using the techniques discussed so far, we should have no difficulty considering systems that mix discrete and continuous-time random processes as well as deterministic signals. However, there are some subtleties. Consider a discrete-time random process Xk filtered by a continuous-time filter with impulse response h (t) in the sense defined in Section 2.1. The output can be written m =-00 This is pulse amplitude modulation (PAM), described in detail in Chapter 6. (3.80) Example 3-11. The transmission of a discrete-time sequence of data symbols Xm over a continuous-time channel often takes the form of the random process in (3.80). Suppose that h (t) is as shown in Figure 3-5a and that X/c is a random sequence with LLd. samples taking values ±l with equal probability. A possible outcome is shown in Figure 3-5b. The first important observation is that the process Y(t) is not wide-sense stationary because E[Y (tH)Y(t)] is not independent of t. For example, E[Y(TI4)Y(0)] =E[XJ] = I;t:E[Y(T)Y(3TI4)] =0. This process is actually cycLostationary, a weaker form of stationarity. Since this process is not wide-sense stationary, its power spectrum is not defined. 0 The fact that Y (t) in (3.80) is not wide-sense stationary is a major inconvenience. A common gimmick changes our random process into a wide-sense stationary process. Define the random variable e, called a random phase epoch, that is uniformly _(a_) h(_'J6--.------+(, (b) y(O o Tl2 t Figure 3-5. a. An example of a pulse shape for transmitting bits. b. An example of a waveform using this pulse shape. 66 STOCHASTIC SIGNAL PROCESSING distributed on [O,T] and independent of {Xk }. Then define the new random process = = Z(t) Y(t + 8) L Xm h(t+8-mT). m =--00 (3.81) This process has a random phase which is constant over time but chosen randomly at the beginning of time. Physically, this new process reflects our uncertainty about the phase of the signal; the origin in the time axis is of course arbitrary. This redefined process is wide-sense stationary, as shown in Appendix 3-A, with power spectrum SzUoo) = ~ IHUoo) 12Sx(ejeoT). (3.82) Note the dependence on the power spectrum of the discrete-time process and the magnitude-squared spectrum of the pulse h (t). Example 3-12. Consider transmission of a random sequence of uncorrelated random variables Xk with equally probable values ±1 using a pulse shape h (t). The sequence Xk is white and the variance is unity, so the power spectrum of the data sequence is = Sx(e jOlT ) I, (3.83) and the power spectrum of the random phase transmitted signal is Sz (j ro) =.T1 1H (j ro) 12. (3.84) With a white data sequence, the power spectrum has the shape of the magnitude squared of the Fourier transform of the pulse. 0 3.2.6. Reconstruction of Sampled Signal It might appear that (3.77) establishes the conditions under which a random process can be recovered from its samples, just as (2.17) does for deterministic signals. However, this appearance is deceiving because two random processes can have the same power spectrum and not be "equal" in any sense. The power spectrum is merely a second-order statistic, not a full characterization of the process. By a derivation similar to that in Appendix 3-A, we can investigate the recovery of the original continuous-time random process from its samples. A method of sampling and recovering a random process analogous to the deterministic case is shown in Figure 3-6. We first filter the random process using an anti-aliasing filter F U00), then sample, and -1 X(I) ~ -~ ~ FUoo) _ _Y"-k HUoo) Y(I) Figure 3-6. Sampling and recovery of a random process using anti-aliasing filter F U00) and recovery filter H U00). SEC. 3.2 RANDOM PROCESSES 67 finally recover using recovery filter H U(0) to yield the random process Y (t). To make Y (t) WSS we must again introduce a random phase. The way to tell whether the system recovers the input random process is not to calculate the output power spectrum, but rather to investigate the error signal between input and output. In particular, define E (t) =X (t + 8) - Y (t + 8) , (3.85) where 8 is uniformly distributed over [O,T]. We would conclude that the recovery is exact (in a mean-square sense) if E [I E (t) 12] =0 . (3.86) This is not the same as showing that E (t) =0, which cannot be shown using second order statistics only. However, (3.86) is just as good for engineering purposes. The conditions under which (3.86) is valid can be inferred from the following exercise, which can be solved using similar techniques to those used in Appendix 3-A. Exercise 3-15. Show that the power spectrum of E (t) is SE(jro) = ~ IH(jro) 12 I: Sx(j(ro+m 21t »IF(j(ro+m 21t »1 2 T m;tO T T + 11- ~H(jro)F(jro)12Sx(jro). o (3.87) Examining (3.87), the first term is aliasing distortion resulting from a signal at the output of the anti-aliasing filter, if it is not sufficiently bandlimited. In particular, if H U(0) =0 and F U(0) =0 for lool"? niT then this term is identically zero. The second term is in-band distortion due to an improper reconstruction filter H U(0) and also distortion due to bandlimiting of the input prior to sampling. For an ideal reconstruction filter, H Uoo)F U(0) =T, I00 I < niT , (3.88) in which case the error signal has power spectrum 0; I 00 I < niT SEUoo)= { SxUoo); 1001 "?nlT (3.89) and the total error power is j E[E 2(t)] =2.1_ SxUoo)doo. 2n 7tff (3.90) The fact that the reconstruction error is just the error in initially bandlimiting X (t) is not surprising, and corresponds to the deterministic signal case. The results of this subsection are important not only for their implications to the recovery of sampled random processes, but also in the techniques used. We will find 68 STOCHASTIC SIGNAL PROCESSING the need for similar techniques in the optimization problems of Chapter 9. 3.3. MARKOV CHAINS A discrete-time Markov process {'¥k} is a random process that satisfies p ('IIk+II'llko 'Ilk-I' ... ) = p ('Ilk +1 I'Ilk) . (3.91) In words, the future sample 'I'k+l is independent of past samples 'I'k-l ,'I'k-2' . .. if the present sample '¥k ='Ilk is known. The particular case of a Markov process where the samples take on values from a discrete and countable set 12'1' is called a Markov chain. In this section, we will often take 12'1' to be a set of integers. Markov chains are a useful model of a finite state machine with a random input, where the samples of the random input are statistically independent of one another. Since any digital circuit with internal memory (flip flops, registers, or RAMs) is a finite state machine, most digital communication systems contain finite state machines. Markov chains are useful signal generation models for digital communication systems with intersymbol interference or convolutional coding (Chapters 9, 13, and 14). Markov chain theory is also useful in the analysis of error propagation in decision-feedback equalizers (Chapter 10) and in the calculation of the power spectrum of line codes (Chapter 12). The following treatment uses Z-transform techniques familiar to the readers of this book. Sections 3.3.2 through 3.3.4, as well as Appendix 3-B, can be skipped on a first reading, since the techniques are not used until Chapter 10. 3.3.1. State Transition Diagrams Consider a random process 'I'k (real, complex, or vector valued) whose sample outcomes are members of a finite or countably infinite set 12'1' of values. The random process'¥k is a Markov chain if (3.91) is satisfied. The next sample '¥k+l of a Markov chain is independent of the past samples 'I'k-l' 'I'k-2' . .. given the present sample 'I'k' Furthermore, all future samples of the Markov chain are independent of the past given knowledge of the present, as shown in the following exercise. Exercise 3-16. Let 'I'k be a Markov chain and show that for any n > 0, P('Vk+1I l'Vk''Vk-l.··· )=P('Vk+1I l'Vk)' D (3.92) Since knowledge of the current sample 'I'k makes the past samples irrelevant, 'I'k is all we need to predict the future behavior of the Markov chain. For this reason, 'I'k is said to be the state of the Markov chain at time k, and 12'1' is the set of all possible states. SEC. 3.3 MARKOV CHAINS 69 'Ilk rr--------~A'---------_." ..• ~-M Rgure 3-7. A shift register process with independent Inputs Xk IS a Markov chain with state 'Ilk' Example 3-13. A shift register process is shown in Figure 3-7. Xit- M - 2' ... and we define the vector If X" is independent of X"-M-l' (3.93) (M is the memory of the system) then the Markov property (3.91) is satisfied. This follows since'll.10+1 is a function of Xk+l and '¥" OPly. Hence {'¥k } is a vector-valued discrete-time Markov process. If the inputs X" are discrete-valued, then it is also a Markov chain. 0 A Markov chain can be described graphically by a state transition diagram. This graph displays each state of the Markov chain as a node, and also displays the input and output or some other relevant properties for the transitions between states. ExampLe 3-14. The parity of a bit stream X" is defined to be the accumulated modulo-two summation of the bits, and is computed by the circuit in Figure 3-8a. It is sufficient for the input bits to be independent for the random process 'lSk = Y"-l to be Markov. It is easily seen from the diagram that 'P1<+1 depends only on the current state 'P" and the. current input Xit . '¥... has a finite sample space Q'I' = {O,l }, so the parity checker can be represented by the state transition diagram Figure 3-8b, where the arcs are labeled with the input that stimulates the state transition and the output resulting from the transition. The arcs of such a state diagram can alternatively be labeled with the transition probabilities, if the transition probabilities are independent of time. 0 ST\'E INrT (0, I) (1,0) i OUTPUT Figure 3-8. a. A circuit that computes the parity of the bit stream X~. b. The state transition diagram of the corresponding Markov chain. 70 STOCHASTIC SIGNAL PROCESSING A Markov chain \l'k is called homogeneous if the conditional probability P ('lfk I'lfk-I) is not a function of k. Homogeneity is therefore a kind of stationarity or time invariance. A homogeneous Markov chain can be characterized by its state transition probabilities, which we write with. the shorthand p(jli) =P'I'hll'l'iVli) (3.94) for i E 0'1' and j E 0'1" Example 3-15. If in the previous example the incoming bits are not only independent but also identically distributed, then the Markov chain is homogeneous. If furthermore the incoming bits are equally likely to be one and zero, then the state transition ]JJObabilities are all 0.5. 0 It is often convenient to define a random process that is some real-valued function of the state trajectory of a Markov chain, Xk =f r-f\) . (3.95) This is encountered in the modeling of line coding (Chapter 12). The transmitted power spectrum is an important property of the line code, and thus we need to calculate the poYrrel" spe<:trum of (3.95). This proolem is ronsidered in Appendix 3-B. 3.3.2. Transient Response of a Markov Chain For a homogeneous Markov chain, we can find a relation for the evolution of the state probabilities with time. Using (3.33) we write Pk+IU)= L PUIi)Pk(i) i E n., (3.96) for all j E 0'1" where we have defined a notation for the probability of being in state i at time k, (3.97) The new notation emphasizes that Pk (i) is a discrete-time sequence. In applications we often want to determine the probability of being in a certain state j at a certain time k given a set ()f pmbabilitie5 fill being in th<>se states at initial time k =D. We can accomplish this by analyzing (3.96), a system of time-invariant difference equations, using Z-transform techniques. If we define PkU) = 0 for k < 0, then the Z- transform of the state probability for state j is Pj(z)= I.Pt/})z-k. k=O (3.98) Exercise 3-17. Take the Z-transform of both sides of (3.96) to show that P/z)=Po(j)+ L p(jii)z-l.pj(z). i En. o (3.99) SEC. 3.3 MARKOV CHAINS 71 If there are N states, (3.99) gives us N equations with N unknowns Pj(z). These equations can be solved and the inverse Z-transform calculated to determine the state probability Pk (i). Example 3-16. Continuing Example 3-14, the parity check circuit, suppose that the initial state is equally likely to be either zero or one, so PoCO) =Po(l) =0.5. (3.100) Suppose further that the incoming bits Xk are equally likely to be zero or one, so the transition probabilities P (j Ii) are all lh. Then (3.99) becomes P oCz) = 0.5 + 0.5z-1p oCz) + O.5z-l p l(Z) p l(z) = 0.5 + O.5z-l p l(Z) + O.~z-lp o(z) . (3.101) Solving this set of two simultaneous equations, the Z-transforms of the state probabilities are equal, Po(z ) =P 1(z ) = 0.5 1- z-1 (3.102) Using the Z-transform pair in Problem 2-15 we can invert the Z-transform to get Pk(O) =Pk(l) =0.5·uk (3.103) where Uk is the unit step function. The chain is therefore equally likely to be in either state at any point in time beginning at k = O. A Markov chain in which the state probabilities are independent oftime is called stationary. 0 3.3.3. Signal Flow Graph Analysis Translation of a state diagram into a set of equations to be solved is often made easier using signal flow graphs. A signal flow graph is a graphical representation of a linear equation, and in particular can represent the system of equations given by (3.99). Its value lies in the fact that the state diagram can be directly translated into a topologically equivalent signal flow graph representing the equations. In fact, the experienced can write down the signal flow graph directly without ever generating a state diagram. The idea of a graph representing linear equations is illustrated by the following simple example. Example 3-17. The equation w = au + x can be represented by the signal flow graph shown in Figure 3-9a. The nodes of the graph represent the variables u , w, and x, while the two arcs represent the multiplication of the variables by constants, and also the addition. The signal flow graph in Figure 3-9b represents the recursive equation x = au + bw + ex. 0 In general, a node in a signal flow graph represents a variable that is equal to the sum of the incoming arcs. A weight on an arc is a multiplicative factor. Our interest is in signal flow graphs in which the variables are all Z-transforms. 72 STOCHASTIC SIGNAL PROCESSING c (a) (b) (c) Figure 3-9. Several signal flow graphs representing linear equations. Example 3-18. The signal flow graph in Figure 3-9c represents a dynamical system described by the equa- = tions X(z) z-ly(z) + W(z) and Y(z) =z-IW(z). 0 From the last example, it is clear that the equations (3.99) can be represented using a signal flow graph for any given Markov chain, as shown in Figure 3-10. Shown are just two of the states, i and j. Each of the states is represented by two nodes of the graph, one for the Z-transform of the state probability sequence, P j (z), and the other for the initial probability of that state Po(i) (the latter is not a variable in the equations, but a constant). In many cases the initial probability is zero so the corresponding node can be omitted. Example 3-19. Returning to the parity check example of Example 3-14, the equations (3.101) are represented by the signal flow graph in Figure 3-11. Note that this one figure takes the place of the state diagram of Figure 3-8 and the set of equations of (3.101). 0 In retrospect, the signal flow graph is intuitive. Each state transition has a delay Figure 3-10. A signal flow graph representation of the Markov chain dynamical equations (3.99). SEC. 3.3 MARKOV CHAINS 73 Figure 3·11. A signal flow graph representation of the system of state probabilities for the parity examples. operator z-l corresponding to the time it takes for that transition to occur, as well as the probability of that transition. The arcs from the initial state probabilities have no such delay since the initialization is instantaneous, and we can think of that transition as occurring only once at k =O. For Markov chains that start in a particular state, there will only be one such node corresponding to the starting state. Once we have a signal flow graph, we can easily write down the set of equations and then solve them for the Z-transform of the state probabilities. For some problems, a shortcut known as Mason's gain formula allows us to solve these equations directly by inspection of the signal flow graph [2,3,4,5]. 3.3.4. First Passage Problem When we use Markov chains to model the behavior of framing recovery circuits (Chapter 19) and error propagation (Chapter 10), we would like to calculate the average first passage time for an absorption state of the chain. An absorption state is defined as a state with an entry but no exit, so that the steady-state probability of that state is unity. This is illustrated in Figure 3-12 for the case where the absorption state is N. An absorption state must have a self-loop with gain z-l indicating that the chain stays in that state forever. The figure also assumes that there is only one way to get to the absorption state, from state N -1, although that is not necessary for the following analysis. Figure 3·12. A part of a signal flow graph for a Markov chain in which state N is an absorption state, with only one entry from the outside, namely from state N -1. 74 STOCHASTIC SIGNAL PROCESSING What we are often interested in is the first passage time to state N, which is defined as the time-index of the first time we enter that state. Define the probability of entering state N at time k as q" (N). Then we have that (3.104) or in words, the probability of being in state N at time k is equal to the probability of being in that state at time k -1 plus the probability of first entry into that state at time k. This relation follows from the fact that there are only two mutually exclusive ways to be in state N at time k - either we were there before or else we entered the state at time k. From (3.104) we can relate the first passage probability to the state probabil- ity that has already been calculated. Assuming that p o(N) =0, taking the Z transform of (3.104) we get (3.105) Since PN (z) is an absorption state, it turns out that it will always have a factor of (l - z-1) in the denominator which will be canceled, resulting in a QN (z) which is simpler than the PN (z) that we started with. If we define the average or expected time for first entry into state N as f N' then it turns out that we can find this time without the need to take the inverse Z-transform of QN(z). Exercise 3-18. Show that the mean first passage time is (3.106) o Example 3-20. If we toss a fair coin, what is the average number of tosses until we have seen two heads in a row? The signal flow graph for this example is shown below: 0.5z- 1 The numbering of states is the number of heads in a row. We assume that we start with zero heads in a row. At each toss the number of heads in a row increases by one with probability 1/2, or goes back to zero with probability Ih (that is, we get a tail). We define state two (two heads in a row) as an absorption state so that we can calculate the first passage time. Solving the linear equations, we get SEC. 3.3 Finally. o MARKOV CHAINS fN=-~ az 1 4z 4 -2z - 1 Iz=I=6. 75 (3.107) (3.108) 3.4. THE POISSON PROCESS AND QUEUEING There was a time when no random processes could challenge the Gaussian process for the attention of communication theorists. However, the Poisson process, and its generalization, the birth and death process can reasonably claim to hold that distinction. The question often arises in communications as to the distribution for the times of discrete events, such as the arrivals of messages at a digital communication multiplex, or the arrivals of photons in a light beam at an optical detector in an optical communication system. The Poisson process models the most random such distribution, and is an excellent model for many of these situations. To proceed, we need to define the notion of random points in time, where a point in time might denote the arrival of a message from a random source or a photon at a photodetector. Defining some notation, let the time of the k -th arrival be denoted by tk' where of course tk ~ tj for k > j. Further, define a continuous-time random process N (t) that equals the number of arrivals from some starting time toto the current time t. We call N (t) a counting process since it counts the accumulated number of random points in time. Thus, N (t) assumes only non-negative integer values, has ini- tial condition N (t 0) =0, and at each random point in time tk' N (t) increases by one. Such a counting process is pictured in Figure 3-13a, where the arrival times and the value of the counting process are pictured for one typical outcome. ;f') ~ ;f') 2SL. r1 r2 r3 (a) r. (b) Figure 3·13. Typical outcomes from a counting process N (r). a. A counting process which is monotone increasing. b. A counting process, which has both arrivals and departures and hence can increase or decrease. 76 STOCHASTIC SIGNAL PROCESSING In some situations there are only arrivals, so that a counting process of the type pictured in Figure 3-l3a is the appropriate model. In other situations, there are departures as well as arrivals. A typical situation is the queue pictured in Figure 3-14. We can define a counting process N (t) to be the difference between the accumulated number of arrivals and the accumulated number of departures. Example 3-21. Consider a computer communication system that stores arriving messages in a buffer before retransmitting them to some other location. N (t) gives a current count of the number of messages in the system at time t. A typical outcome of such a process is pictured in Figure 3-13b. where it should be noted that the process can never go below zero (since nothing can depart if there is nothing in the buffer). 0 In many instances of practical importance, the count N (t) at time t is all we need to know to predict the future evolution of the system after time t. The manner in which system reached N (t) is irrelevant in terms of predicting the future. For this case, the counting process denotes the state of the system in the same sense as Markov chains in the last section. In particular, we say that the system is in state j at time t if N (t) =j. This is similar to a Markov chain with one important distinction - a Mar- kov chain can only change states at discrete points in time, whereas we now allow the state to change at any continuous point in time. Like Markov chains, a sample of the counting process N (t 0) is a discrete-valued random variable. Just as for Markov chains (3.97), we define a probability of being in state j at time t as q/t) = Pr[N(t) =j] =PN(t)U). (3.109) This notation emphasizes that this probability is a continuous-time function. The only real distinction between (3.97) and (3.109) is that the later is defined for continuoustime and the former for discrete-time. In the following subsections, we analyze a counting process under the specific conditions appropriate for optical communication (Section 3.4.3) and statistical multiplexing (Section 3.4.2). 3.4.1. Birth and Death Process The cases of interest to us are subsumed by a general process called a birth and death process, which is a mathematician's macabre terminology for a counting process with both arrivals and departures. This analysis is given in this section. _____ ARRIVALS QUEUE -~ --.,!~R Figure 3-14. A queueing system, which models among other things the status of a buffer in a communication system. SEC. 3.4 THE POISSON PROCESS AND QUEUEING 77 We have to somehow model the evolution of the system from one state to another. The approach for the Markov chain in (3.94) is inappropriate, since the probability of transition between any two states at any point in time t is most likely zero! While we cannot characterize the probability of transition, what we can characterize is the rate of transition between two states. Suppose for two particular states, the rate of transitions between one state and the other is a constant R. What we mean by this is that in a time bt we can expect an average R bt transitions. If bt is very small, then R fu is a number much smaller than unity, and the probability of more· than one transi- tion in time bt is vanishingly small. Under these conditions, we can think of R bt as the probability of one transition in time bt, and (l-R bt) as the probability of no tran- sition. This logic leads us to a transition diagram and associated set of ditTerential equations. The transition diagram in Figure 3-15 associates a node with each state, and within that node we put the probability of being in that state at time t, which we denote qi (t). Each transition in the diagram is labeled with the rate at which that tran- sition occurs, where the rates in the general case are allowed to be time-varying (nonhomogeneous). Each rate is labeled with a subscript indicating the state in which it originates, where 'A.(t) is the rate for transitions corresponding to births or arrivals and ~(t) corresponds to deaths or departures. Reiterating, the interpretation of these rates is as follows: for a very small time interval bt, the probability of a particular transition is equal to the rate times the time interval. The set of differential equations which describe the evolution of the birth and death process are dqj(t) . d = Aj-l(t)qj-l(t) + ~j+l(t )qj+l(t) - (A.} (t) + ~j (t »qj (t), J ~ 0 t (3.110) q-l(t) =O. These equations can be derived rigorously from fundamental principles [6], but for our purposes they are evident from intuitive considerations. The equations say that the rate of increase of a probability with time for state j is equal to the rate at which transitions into that state from states j -1 and j +1 are occurring (times the c·urrent pro- bability of those states) minus the rate at which transitions out of state j are occurring (times the current probability of state j). e== ~ ~ == "o(t) ~ c( Ill(t) A\(t) ~ c( 1l2(t ) A2(t) ~ c( 1l3(t) Figure 3-15. State transition diagram for a birth and death process. 78 STOCHASTIC SIGNAL PROCESSING We must also specify an initial condition, which for our purposes specifies that the process starts in state zero (no arrivals) at time to, ={r 1, j =0 qj(tfj} 0, j > 0' (3.111) The first order differential equations can be solved for many special cases. Example 3-22. Consider the important <:ase nf a pure birth process in which l1/t)=D. .4Jso assume the birth rates are all the same and a constant with time, Aj (t) =A. The transition diagram for this model is shown in Figure 3-16. This corresponds to the important case where the arrival rate does not depend on the state of the system, the usual case in the problems that we will encounter. Then. (3.1 10) becomes = dqj(J) ~ + A.q/t) A.qj-\(t) (3.112) which is a simple first order differential equation with constant coefficients. Assume that the initial condition is ° qO(0) =1 (3.113) implying that the initial count at t = is 0. We can solve this using very similar techniques to our solution of the Markov chain, but use the Laplace transform in place of the Z- transform. In analogy to (3.98), defining the Laplace transfonn of the state probability, J Q/S) = q/t)e-SI dt o (3.114) Taking the Laplace transform of both sides of (3.112), sQ/s) - q/O) + AQ/S) =AQj_\(S) (3.115) Using (3.111), with to =0, this becomes 1 Qd,s)=--"I' S+II. A Qj(s)=--"IQj_l(S), j>O S+II. (3.116) This set of iterative equations for the state probability Laplace transform is easily solved by iteration, Qj(S) = (5 +I.Y+\ and taking the inverse Lapla<:e tra.'lSfonn, we find that fnr t ~ Da.'ld j ~ D, (3.117) Figure 3·16. State transition diagram for a constant-rate pure birth process. SEC. 3.4 THE POISSON PROCESS AND QUEUEING 79 Pr[N(t) =j] =qj(t) = (A~Y e-t..t. - J! (3.118) This is the well-knk _ . •t ~tO' o (3.122) = This result implies that the number of counts starting at t to is a Poisson distribution with parameter ).{t - t oJ, which is the expected number of arrivals since the start time. Furthermore, index j -k of the Poisson distribution is the number of counts since the start time. The important conclusion is that the number of arrivals in the interval starting at t = to has a distribution which does not depend in any way on what happened prior to to. This is roughly the definition of a Markov process, and a Poisson counting process is in fact a Markov process. For such a process, the number of arrivals in the interval [to.t! is statistically independent of the number of arrivals in any other nonoverlapping interval of time. It is in this sense that the Poisson process is the most random among all monotone non-decreasing counting processes. Exercise 3-21. (Pure death process.) For Ai (t) = 0, consider the case where departures from the system are = proportional to the state index, J.1.j (t) j J.1.. This is an appropriate model for a system in which the departure or death rate is proponional to the size of the population, as in a 80 STOCHASTIC SIGNAL PROCESSING human population. Further, assume that the initial state at t =0 is n. Draw the state transi- tion diagram and show that the state probabilities obey a binomial distribution, pr(N(t)=jJ=q/t)=[i] pJ(t)(l_p(t))"-J , p(t)=e-lU . (3.123) o Now we give an example of a problem in which both births and deaths occur. This is an example of a queueing problem, and it is appropriate at this point to define some terminology used in queueing, particularly as it relates to digital communication. A queue is a buffer ill memory which stores messages. There is some mechanism which clears messages from the queue, which is usually the transmission of the message to another location. This mechanism is called the server to the queue. Assume a server can process only one message at a time, so that if more than one message is being processed (there are multiple communication channels for transmission of messages), then there are an equivalent number of servers. Typically the buffer contains space for a maximum number of messages to wait for service, and the number of messages that can be waiting at any time is called the number of waiting positions. The state of the system, which naturally tracks a counting process, is the number of messages waiting for service plus the number of messages currently being served. Messages arrive at the queue (births) at random times, and they depart from the queue (w.-aths) due to the completion of service. Exercise 3-22. (Queue ~vith one server and no waitin.g positions.) Assume that a queue has constant arrival rate A., a single server which clears a message being served at rate Il, and no waiting positions. If a message arrives while the server is busy then since there are no waiting positions that arrival is lost and leaves the system permanently. Draw the state transition diagram for the system and show that the probability that the server is not busy is I --I qo(t) =-il- + ri qo(O) - l l ). e-{J.l+ )I. A+J.l t. J.l+A) (3.124) o The differential equation approach we have described is capable of describing the transient response of a system starting with any initial condition. Often, however, it is sufficient to know what the state probabilities are in the steady state. There is no such steady state distribution for a Poisson process, since the state grows without bound. How-ever, for queueing systems where the service rate is always guaranteed to be higher than the arrival rate, and where all the rates are independent of time, there will be a steady state distribution. This distribution can be obtained by letting t ~oo in the transient solution we have obtained, or can be obtained much more simply by setting the time derivatives in the differential equations to zero and solving for the resulting probabilities. SEC. 3.4 THE POISSON PROCESS AND QUEUEING 81 Example 3-23. Continuing Exercise 3-22,letting t~oo in (3.124), the steady state probability is qo(oo) =----'::!:-. A+1l (3.125) We can get this same result without solving the differential equation by setting the derivative in (3.110) to zero. 0 In the following two sections we will specialize the general birth and death process to two situations of particular interest to us. 3.4.2. M/M/1 Queue Consider the following queueing model which characterizes a single server queue with the most mathematically tractable assumptions. This model is actually a combination of the pure birth process of Example 3-22 and a pure death process (Exercise 3-21). Assume arrivals occur at a constant rate Aindependent of the number of waiting positions occupied, there are an infinite number of waiting positions so that no arrival ever encounters a full buffer, arrivals wait indefinitely for service, and there is a single server with service rate Il. The departure rate is independent of the number of messages waiting in the queue, as long as there is at least one. The state transition diagram for this queueing model is shown in Figure 3-17. As in most queueing problems, we are content to know the steady state distribution of states. This distribution will only exist if the service rate Il is greater than the arrival rate A, because otherwise the buffer size will grow to infinity. Making that assumption, the differential equations governing the queue are dqj(t) dt =Aqj-l(t)+llqj+l(t)-(A+Il)qj(t), j >0 dqo(t) = dt Ilq 1(t) - Aq o(t) (3.126) with initial condition (assuming there are no positions occupied at time t = 0), I, j =0 qj(O) ={ 0, j > O' (3.127) Figure 3-17. The state transition diagram for the single server queue with an infinite number of waiting positions. 82 STOCHASTIC SIGNAL PROCESSING We could attempt to solve this system of differential equations, but since we are content with the steady state solution, set the derivatives to zero, O='A.qj_l +lUlj+l-('A.+I!)qj; j >0 0= I!ql - A.qo (3.128) where we have also taken the liberty of suppressing the time dependence since we are looking only at the steady state. These equations are easily solved. Exercise 3-23. Show that the solution to (3.128) is (3.129) where p is called the offered load, (3.130) and is less than unity by assumption. Note from (3.129) that the probability that the single server is busy is 1 - q 0 =p, which is obvious since the server has more "capacity" than the arrivals require by a factor of 1.l1A.. Thus, P is also called the server utilization. 0 In many queueing problems the most critical parameter is the delay that a new arrival experiences before being served. This is also called the queueing delay, and represents a significant impainnent in communication systems that utilize a buffer delay discipline to increase the capacity of a communication link (Chapter 18). A related parameter is the waiting time, which is defined to be the queueing delay plus the service time. The calculation of the delay is a little more complicated than what we have done heretofore, so we will simply state the results [6]. The mean delay is given by D = I!(1P- p)' (3.131) Note that as the offered load or server utilization approaches unity, the mean delay grows without bound; conversely, as the utilization approaches zero, the lightly loaded queue, the delay approaches zero. The mean queueing delay is equal to the average service time 1I1! for a utilization of p =V2. 3.4.3. Poisson Process With Time-Varying Rate In optical communication systems, the counting process which gives the accumulated number of arrival times for photons is a Poisson process (Section 5.3). The Poisson process is a pure birth process where the arrival rate is independent of the state of the system, and we have already been exposed to it in Example 3-22 for a constant arrival rate. In optical communication, the arrival rate is actually signal dependent, so in this section we discuss that case. The Poisson process with time-varying rate is the pure birth process in which the incoming rate 'A.(t) is independent of the state of the system. Thus, the system is governed by a first-order differential equation with time-varying coefficients, SEC. 3.4 THE POISSON PROCESS AND QUEUEING 83 (3.132) and we assume the system starts at time to in state j =O. Because of the time-varying coefficients, the Laplace transform is of no help, and we must resort to solving the differential equation directly. This is straightforward (since it is a first order equation), but tedious, so the solution is relegated to Appendix 3-C. Define p.1 A(t) = (u) du , (3.133) 10 which has the interpretation as the average total number of arrivals in the interval [t o,t]. Then the probability of n arrivals in the interval [t o,t] is governed by a Poisson distribution with parameter A(t), qn (t) =A-nn-!(te ) -11.(1) . (3.134) This reduces to the solution given in Example 3-22 for the constant rate case. As a reminder, (3.134) specifies the number N (t) of arrivals during the time interval [t o,t]. This random number of arrivals is Poisson distributed with parameter A(t), and hence has mean and variance = E [N (t)] A(t) , (3.135) As in the constant rate case, it can be shown that the number of arrivals in any two non-overlapping intervals are statistically independent. 3.4.4. Shot Noise In optical communication, a waveform is generated in the photodetector by generating impulses at times corresponding to random arrival times of photons and then filtering these impulses. This is known as a filtered Poisson process, or a shot noise process. If a Poisson process is characterized by a set of arrival times tk for the k-th arrival, and given a filter with impulse response h (t), then a shot noise process is a continuous-time random process X (t) with outcome x(t)=}2h(t -tk)' k (3.136) An outcome of this random process is illustrated in Figure 3-18 for a particular impulse response. In this figure, it is assumed that qualitatively the duration of the impulse response is short relative to the average time between arrivals. If the impulse response were long, this would have an averaging effect resulting in a much smoother outcome. It is shown in Appendix 3-D that the moment generating function of the shot noise process at time t is 10ge**X(t)(S)=A(t)*(eSh(I)-1) . (3.137) 84 STOCHASTIC SIGNAL PROCESSING rA(t) (a) / I~ I 111 (I) •t (c) J", •t (b) )O( )( )( )O( • t 1I k~ ~ (d) ~~l~~ -I I"J '" I'" •t Figure 3·18. Illustration of an outcome of a shot noise process. a. The average arrival rate ¥s. time. b. The random actua1times of arrival, where arrivais occur at the average rate given in a. c. The impulse response of the filter. d. The corresponding outcome. The mean and variance of shot noise are easily derived from (3.137). Exercise 3-24. Show that the mean value of shot noise is the convolution of the filter impulse response with the arrival rate, = * mx(t) =E[X(t)] A(I) h(t) (3.138) and that the variance is the convolution of the square of the filter impulse response with the arrival rate, = * crg(1) E [X 2(t)] - mi(t) = A(t) h 2(1) . (3.139) These relations are known as CampbeLL's theorem. 0 3.4.5. Hlgh-Intenslty Shot Noise When the intensity of shot noise is high, the statistics become that of a Gaussian random process. The intuition behind this is that X{t) is the sum of a large number of independent events, and hence approaches a Gaussian by the central Emit theorem. To demonstrate this more rigoroq,sly, we will show that the moment generating function of shot noise approaches a Gaussian moment generating function in the limit of high intensity. In order to avoid an infinitely large power of shot noise, as the intensity grows we need to scale the size of the impulse response h {t) also. Tnerefore, let us use a scaling constant (3, which we will allow to grow to L'1finity, and let h (t) = .JIl3 ho(t) . (3.140) With this scaling, we get from Campbell' st.l}eorem that * * mx(t) = "i3~(t) ho(t) , crg(t) = ~(t) hJ{t). (3.141) Hence, as the scaling factor 13 grows, the variance of the process stays constant and the mean value grows without bound. We cannot help this, because as the intensity grows SEC. 3.4 THE POISSON PROCESS AND QUEUEING 85 the variance becomes a smaller fraction of the mean. In this sense high-intensity shot noise approaches a deterministic signal mx (t) as the intensity grows. Only two terms in the moment generating function are important as the scaling constant ~ grows. Exercise 3-25. Show that for large ~ the only significant tenns in the moment generating function of (3.137) are = * * g 10ge**x (I )(s ) S'13,,-<>(t) h o(t) + O.5s 2'A.o(t) h (t) (3.142) Comparing this with the Gaussian moment generating function of (3.41), we see that high intensity shot noise is approximately Gaussian with mean and variance given by (3.141). D 3.4.6. Random-Multiplier Shot Noise. In optical communication systems, it is sometimes appropriate to introduce a random multiplier into the shot noise process, viz. x (t) =LGk h (t-tk ) k (3.143) where Gk is a sequence of mutually statistically independent identically distributed random variables which are also statistically independent of the arrival times tj for all j. Exercise 3-26. Use Campbell's theorem and the assumptions to show that the mean-value of (3.143) is * mx(t) =E [G) "-(t) h(t) (3.144) and the variance is * = crl(t) E [G 2) "-(t) h 2(t) (3.145) where E [G) and E [G 2] are the mean-value and second moment of the random multiplier Gk for all k. D 3.5. FURTHER READING For a general introduction to random variables and processes, Papoulis [7], Stark and Woods [8], and Ross [9] are recommended. Papoulis has more of an engineering perspective. Both books have comprehensive treatments of Markov chains and Poisson and shot noise processes. An excellent introduction to Poisson processes can be found in Ross [10]. There are a number of books that give comprehensive treatment to the application of Poisson and birth and death processes to queueing models, such as Cooper [6], Hayes [11], and Kleinrock [12]. 86 STOCHASTIC SIGNAL PROCESSING APPEN01X 3·A POWER SPECTRUM OF A CYCLOSTATIONARY PROCESS In this appendix we determine the power spectrum of the PAM random process with a random phase epoch (3.81). Calculating Lire autocorrelation function of {3.81), E{Z(t+t)Z*(t)] =E{Y(t+B+'t)Y*(t+B)] =E[ L L. ... XmX n h (t + 8 - mT + t)h • ( t + 8- n T )] . (3.146) n =-00 m = --00 Assuming we can interchange expectation and summation, we use the fact that 8 is independent of Xk to get "00 "00 E[Z(t+t)Z*(t)] = L L E[XmX·~I1]E[h(t+8-mT+t)h* n=- 0 Changing variables again and defining a =H8-nT, we get f 1 0<> 0<> t-nT+T E[Z(Ht)Z'"(t)] =~ L Rx(i) L h(a-iT+t)h"'(a) da. T i =-0<> n =_ t -nT (3.150) The second summation is the sum of integrals with adjoining limits, so it can be replaced with a single infinite integral ~ j E[Z(t+t)Z*(t)] =.l Rx(i) h(a-iT+t)h*(a) da , T . l =--00 --00 (3.151) which is independent of t, so the process Z (t) is wide sense stationary. To get the power spectrum, we take the Fourier transfonn with t as the time index i£ I (allI 'j ~ SzUm)= Rx(i) h • r , h(a-iT+'t)e -j= d d a . (3.152) The expression in brackets is the Fourier transform of h (t) with a time shift of a-IT, APP.3-A POWER SPECTRUM OF A CYCLOSTATIONARY PROCESS 87 so it equals ejw(fJ.-IT)H U (0). Therefore, ~H (il[I d1· SzUro) = Uro)iE..Rx h'(a)ei",a-,T) The expression in brackets is e- j OOlT H * U(0), getting i: = SzUoo) .TlHUoo)H*U (0) . Rx(i)e-jroIT. ':=-00 (3.153) The summation is simply the discrete-time Fourier transform Sx(e jroT ) of the auto- correlation function. The final result is = SzU (0) ~ IH U (0) 12Sx(ejroT). (3.154) APPENDIX 3-8 POWER SPECTRUM OF A MARKOV CHAIN In this appendix we solve the problem of finding the power spectrum of the random process (3.95). The power spectrum only exists if the random process is wide sense stationary. Strictly speaking, this requires that the Markov chain be running over all time, although we can interpret the results as indicative of the power spectrum for a chain that was initialized but has been running long enough to be in the steadystate. We approach this by assuming that the initial probability of each state is the same as its steady-state probability, so that the state probability is in fact constant with time (a stationary Markov chain). We first determine the autocorrelation function of (3.95), (3.155) assuming! (-) is a real-valued function. Assuming wide-sense stationarity, we can take k =0 and this can be written RX(n)= L L !(i)!U)PO.n(i,j)· iEil'l'jEil'l' (3.156) where by Bayes' rule = PO.n (i ,j) PnlOU Ii)Po(i) (3.157) is the joint probability of being in state i at time 0 and state j at time n. Assuming we have already calculated the steady-state state probabilities P (i) for the chain, by the stationarity assumption we can write (3.158) One way to think of this is as forcing the initial state probability to equal the steadystate probability, thus suppressing any transient solution. Finally, we must carefully 88 STOCHASTIC SIGNAL PROCESSING note the d.c. component of the random process, since it contributes a delta-function to the power spectrum that can easily be lost if we are not careful. Specifically, the d.c. component is Ilx = L ! (i )p (i) . i EQ,!, (3.159) The power spectrum is simply the Z transform of the autocorrelation function evaluated at z =e jroT (see (3.58)). Rather than calculate the Z transform Sx(z) directly, let us first concentrate on the quantity 00 = S;(z) L Rx(n)z-n n=O (3.160) that includes only the positive index terms in the summation making up the Z transform. From (3.156), (3.157), and (3.158), this can be written as L L S;(z)= !(i)!U)p(i)'Pj1i(Z) iEQ,!,jEQ,!, (3.161) where = L Pj1i(Z) PnloUIi)z-n . n=O (3.162) This latter quantity can be interpreted as the Z-transform of PnlOU Ii), which is in tum the probability of being in state j at time n given that we started (with probability one) in state i at time O. This quantity is easy to calculate using the techniques we have previously displayed, since it is simply the Z-transform of a transient solution starting with probability one in a particular state. The signal flow graph for this solution is shown in Figure 3-19, where only the states i and j are shown. This signal flow graph must be solved for Pj1i(Z) for all (i,j) for which ! (i)!U) is non-zero in (3.161). p(ilj)z-I Figure 3-19. Signal flow graph representation of equations that must be solved to find Pj1i(Z ). APP.3-8 POWER SPECTRUM OF A MARKOV CHAIN 89 Figure 3·20. Signal flow graph for the parity check circuit. Example 3-24. Again returning to the parity check circuit of Example 3-14, let us compute Sx(z). In this case f (i ) = i, so that the random process XIe = f ('Pie) = 'Pie assumes the values 0 and 1. For that case, we only need evaluate one term in (3.156), corresponding to i =j = 1, and all the others are zero. This term is shown by the signal flow graph in Example 3-24. Solving this flow graph. we get 1 - 0.5z-1 P111(z)= 1 -z-I (3.163) and Sx(z) =0.5 1 - 0.5~1-1 1- z (3.164) since there is only one term in the sum and p (0) =P (1) = Ih. Inverting the Z transform, we find that liz' for n =0 Rx (n) ={ , 1/4; for n > O' (3.165) This result says that the power of the process is Ih, which is obvious, and that the process has a d.c. component of Yz since the autocorrelation function approaches 1/4 for large n. which is also obvious. 0 We have detennined the one-sided tenns in the power spectrum, and we must gen- erate the two-sided spectrum Sx (z). However, before doing this, we must first remove any d.c. component, since that d.c. component can be represented by the one-sided transfonn but is problematic in the two-sided transfonn. This is simple, since we only need to replace S;(z) by ~i S;(z) - 1 _ z-1 (3.166) to remove this d.c. component. Alternatively we could have defined a new random process with the d.c. component removed, although that method is often harder. 90 STOCHASTIC SIGNAL PROCESSING Example 3-25. For the parity check circuit of Example 3-14, the d.c. component is Ilx = In, and subtracting the appropriate tenn from (3.164), Ili S;(z) - 1 -1 = 0.25 . -z (3.167) Note that for this process this result would have been much more difficult to obtain if we had defined a d.c. free random process, since then we would have to evaluate all four tenns in (3.161) rather than just one. 0 We must now turn the one-sided version of the power spectrUm into a two-sided version. The Z transfonn of the autocorrelation function can be written Sx(z)= L Rx(m)z-m = LRx(m)z-m + LRx(m)zm -Rx(O) , (3.168) m=-oo m=O m=O where we have used the symmetry of the autocorrelation function. Noting that = Rx (0) S;(oo) , we get finally Sx(z) =SX-(z) +SX-(z-l)-SX-(oo) . (3.169) Example 3-26. To finish with the parity check example of Example 3-14, Sx (z) = 0.25 + 0.25 - 0.25 = 0.25 (3.170) and the process is white with power 11<1. However, recall that this power spectrum does not include the d.c. tenn, so that in fact Sx(e J.OJT ) 1 =4"+ 7t "2'0 (00 ) . (3.171) The area of the delta function has been chosen so that this area divided by 27t is 1,4, the power of the d.c. component. 0 APPENDIX 3-C DERIVATION OF POISSON PROCESS In this appendix we show that the Poisson distribution for the accumulated number of arrivals as given by (3.134) is valid. To begin with, we need the solution to a first-order differential equation, which is given in the following exercise[13]. Exercise 3-27. Consider the following first order differential equation, i (t) + a (t)x (t) = b (t). (a) Let A (t) = a (t) and show that (3.172) APP.3-C DERIVATION OF POISSON PROCESS = !!.-(eA(I)x(t» b(t)eA(I) . dt (b) Integrate both sides of (3.173) to obtain the solution for x (t) J1 x(t) = x(to) eA(1) + e-A(I) b (u) e A(Il) du • J1 A(t) = a(v)dv - 10 10 o 91 (3.173) (3.174) Returning to the Poisson process, identify a(t)=A(t) , b(t)=A(t)qj_l(t). (3.175) Therefore, given the definition of (3.133) for A(t), Jt • qj (t) = qj (t 0) e-A(t) + e-A{t) A(U )qj-l (u) e A(Il) duo (3.176) to The solution follows immediately for j =0 using the initial condition of (3.132), qo(t)=e-A(t) (3.177) and the rest is easy! Exercise 3-28. Verify the validity of (3.134) by induction on (3.176). 0 APPENDIX 3-D MOMENT GENERATING FUNCTION OF SHOT NOISE In this appendix we derive the moment generating function of a shot noise process X (t) corresponding to impulse response h (t). A sample function of such a process is given by (3.136). To find the moment generating function, divide the time axis into small intervals of length Ot, where the k-th interval.is [(k - Yz)'Ot ,(k + 'h)·ot]. Group all the arrivals in the k -th interval together into a single impulse of height Nk located at time k 'Ot, where Nk is the number of arrivals in the k -th interval. Thus, the shot noise of (3.136) becomes approximately 00 X(t)= L Nkh(t -k'ot) k =- where this equation becomes increasingly accurate as Ot ---+ O. (3.178) Since the intervals are non-overlapping, the Nk are independent Poisson random variables with parameter A(k 'Ot )-Ot, the average number of arrivals in the interval. The moment generating function of Nk is therefore 92 STOCHASTIC SIGNAL PROCESSING loge **N (s) = A(k 'ot )·ot (e S - 1) l and the moment generating function of (3.178) is (3.179) X(t)(S) =EIexp{s L Nkh (t - k 'Ot)}] = I I EIexp(sNkh (t - k 'Ot)}] k=-uo k=-uo (3.180) = IT **Nl(S h(t -k'ot)) . k=-oo Taking the logarithm of the moment generating function, and substituting from (3.179), loge**X(t)= I. A(k'ot)(exp{s h(t-k'ot)} -I)·ot k =-00 and as 01 ~ 0 this approaches the integral (3.181) f lo&**X(t) = A(t)(expfs h(J-t)} -l)dt (3.182) which we recognize as the convolution of (3.137). PROBLEMS 3-1. Use the moment generatinl!: function of (3.41) to show that the mean of the Gaussian distribu- tion is ~ and the ;ariance (;1. .. 3-2. Show that the marginal p.dJ.s of X and Y in (3.47) are those of a zero-mean Gaussian random variaWe witli va..--ianced-. 3-3. Show that fcr y > 0 -'-e-Y-1211--1 1 y~ .r ~ 11 *0, and let E1 ' be the output generated by any other causal and monic filter. (a) Show that (3.184) E IE/1 2 =E IE/ -E1 12 +E IE1 12 , thus establishing that the output MSE is minimized when E/ =E1 . (3.185) 93 (b) Show that it follows from the orthogonality property of (3.184) that RE (m) = 0 for all m 'it 0, and hence the optimal prediction error must be white. 3-6. (a) Restate the results of Problem 3-5 in geometric terms, using the interpretation of Section 3.1.4. (b) Re-derive the results of Problem 3-5 using the projection theorem of Section 2.6.3. 3-7. Given a WSS random process X (t) with power Rx (0), show that the sampled random process Yk = X (kT) has the same power, E[ I Yk 12] = Ry(O) = Rx(O). (3.186) 3-8. 3-9. 3-10. Given a sequence of i.i.d. random variables Ak which take on values ±1 with equal probability, find an expression for E[ApAqArAs ]. Consider a random process X (t) filtered by an ideal bandpass filter with frequency response I; roa < ro < rob H U'ro) = { 0; otherwi.se . Let Y (t) be the output of the filter. Show that J"'" Ry(O) = _I SxUro)dro. 21t "'" Use this to show that Sx Uro) ~ 0 for all roo Extending Exercise 2-6 to random signals. assume the input to the possibly complex-valued LTI system shown in Figure 2-3 is a wide sense stationary complex-valued discrete-time random process with power spectral density Sx(e JillT ) = No. Show that the autocorrelation of the output is * Ry(k) = Nof (kT) f* (-kT) = NoLf (mT)f * «m-k)T) m (3.187) 3-11. 3-12. Show that the cross-correlation function has symmetry RxyCt) = R:X(-t). (3.188) Is the cross-spectral density of two random processes necessarily real-valued? Where a Markov chain has unique steady-state probabilities Pk (i) = P (i), they can be found from the condition that the state probabilities will not change with one time increment. Assume = Q'I' {O.· ... M}, define the matrix of state transition probabilities P to contain pUli) in its (i,j)'h entry, and define the vector 1t= [p(O).··· ,p(M)] to contain the steadystate probabilities, if they exist. Show that the steady-state probabilities can be obtained by solving the system of equations 1t =1tP with the constraint M LP(i)=I. ;=0 (3.189) 3-13. Assume you toss a coin that is not fair, where p is the probability of a tail and q = I - p is the probability of a head. (a) Draw a signal flow graph representation for a Markov chain representing the number of heads tossed in a row. Define N as an absorption state, since in part (c) we will be interested in the first passage time to state N. (b) Show that (3.190) 94 STOCHASTIC SIGNAL PROCESSING (c) Show that the first passage time to N heads in a row is 1- qN fN=--N-' pq (3.191) (d) Interpret this equation for P :: 1 and N large. 3-14. Show that for a Markov chain 'PI:, p('I'o,'I'I>'" ,'I'~) =p('I'~ 1'I'~-t)P('I'~-ll'1'~-:J'" p('I'tl'l'o)p('I'o)· In words, show that the joint probability of the states at times zero through n is the product of the initial state probability P ('1'0) and the transition probabilities P ('I'I: 1'1'1:-1)' 3-15. Show that for a Markov chain 'PI: (3.192) 3-16. In words, show that a Markov chain is also Markov when time is reversed. Show that for the Markov chain 'P.. the future is independent of the past if the present is known. In other words, for any n > r > s. 3-17. Consider the parity checker example in Figure 3-8. Suppose that the initial state is zero, PoCO) = 1. Sketch the signal flow graph describing the state probabilities. Compute PI: (0) and PI: (l) as a function of k. Sketch these functions. Is the Markov chain stationary? 3-18. Consider tossing a fair coin. We are interested in the probability that at the k'h toss we have seen at least two heads in a row. Define the random process 'PI: to have value two if there have been two heads in a row, to have value one if not and the last toss was heads, and to have value zero otherwise. (a) Show that the random process 'PI: is Markov and sketch the state diagram of the Markov chain. (b) Sketch the signal flow graph describing the state probabilities. Assume that the coin is fair. (c) Solve for the probability that at the k'h toss we have seen at least two heads in a row. You may leave the solution in the Z domain. 3-19. Using the results of Exercise 3-3, show that the Chernoff bounds on the distribution function for a Poisson random variable N with parameter a are (3.193) 3-20. 3-21. Find the mean and variance at time t t of a Poisson process N (t) with constant rate A. Show that if t t < t 2 then (k PN(t,). N(h) , k) +n = Ak+~(t2-ttttt n lk 1 -A.t. e 3-22. 3-23. Consider a pure birth process in which the birth rate is proportional to the state (A) (t) =j A), as might model the growth of a biological population. Assume the initial condition is q t (0) = 1, that is we start with a population of one. Find Qj (s) for all j. Shot noise can be generated from a Poisson process by linear filters as shown in Figure 3-21. Assume without further justification that expectation and differentiation can be interchanged; that is, the mean value of ~;t) is ~ E [N (t)]. (a) For N (t) a Poisson process. show that the mean value of fiN (t) is A(t ). dt 95 1L. N<0lL-:,__ 1J ~ ~ w(t) = LS(t - t".) hIt) x(t) POISSON COUNTING PROCESS RANDOM DELTA FUNCTIONS SHOT NOISE Figure 3-21. The generation of the shot noise from a Poisson counting process. (b) Similarly show that the mean value of X (t) is given by (3.138). (c) For a random process N (t), show that the derivative of this process N (t) has autocorrelation RNN(tl,tz) = a2R:N.N:(.t 1,tz) at l at2 (3.194) (d) Consider a linear time-invariant system with input W (t) and output X (t), where W (t) has autocorrelation function Rww (t J,t~. Show that * * Rwx(tJ,t~ = Rww(tJ,t~ h (t~ , Rxx(tJ,t~ = Rwx. (tJ,t~ h (t l ) . (3.195) 3-24. For the Poisson process N (t) in Figure 3-21, consider two times 0 < t l < t2' and note the statistical independence of (N(tl)-N(O)) and (N(t~-N(tl))' Using this fact, and assuming N (0) = 0, show that RNN(tJ,t~ = A(tl)[l + A(t~], t l 5 t2 (3.196) where A(t) is defined in (3.133). Exchange the role of t I and t 2 to show that RNN(tlot~ = A(t~[l + A(t l)], t l ~ t2' (3.197) 3-25. 3-26. 3-27. Using the results of Problem 3-23 and Problem 3-24, show that the autocorrelation of shot noise is * * * Rx (t J,t~ = [A(t I) h (t I)][A(t~ h (t~] + [A(t ~h (t 1 - t~] h (t~, (3.198) and evaluating at t I =t 2 =t , * * Rx(t ,I) = [A.{t) h(t)]2 + A(t) h2(/) (3.199) thereby establishing Campbell's theorem (3.139) by a different method. For the constant rate case (A(t) = A), the shot noise process is wide-sense stationary. Find the autocorrelation and power spectrum. Let a Poisson process have rate 0, t < 0 A(t)={ ~,t~O' Show that a shot noise with this rate has mean value proportional to the step function of the system. 96 STOCHASTIC SIGNAL PROCESSING 3·28. 3·29. Consider a shot noise with rate function A(t) = ~ + A\COS(W\t). Find the mean value of this shot noise. Show that the power spectrum of the output of the parity checker of Figure 3-8 when the input bits are not equally probable is Sx(z) = pO-p) (1- (1-2p)z-\){1- (1-2p)z) (3.200) where p is the probability of a one-bit. REFERENCES 1. R. E. Ziemer and W. H. Tranter, Principles of Communications: Systems Modulation and Noise, Houghton Mifflin Co., Boston (1985). 2. S. J. Mason, "Feedback Theory - Some Properties of Signal Flow Graphs," Proc. IEEE 41(Sep. 1953). 3. S. 1. Mason, "Feedback Theory - Further Properties of Signal Flow Graphs," Proc. IRE 44(7) p. 920 (July 1956). 4. B. C. Kuo, Automatic Control Systems, Prentice-Hall, Englewood Cliffs, N.J. (1962). 5. C. L. PhiHips and R. D. Harbor, Feedback Control Systems, Prentice-Hall, Englewood Cliffs, N.J. (1988). 6. R. B. Cooper, Introduction to Queueing Theory, MacMillan, New York (1972). 7. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-HilI, New York (1991). 8. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, Prentice-Hall, Englewood Cliffs, NJ (1986). 9. S. M. Ross, Stochastic Processes, John Wiley & Sons, New York (1983). 10. Sheldon M. Ross, Introduction to Probability Models, 2nd Ed., Academic Press, New York (1980). 11. J. F. Hayes, Modeling and Anaiysis of Computer Communication Networks, Plenum Press, New York (1984). 12. L. Kleinrock, Queueing Systems. Voiume I: Theory, John Wiley & Sons, New York (1975). 13. E. A. Coddington, An Introduction to Ordinary Differential Equations, Prentice Hall, Englewood Cliffs, N.J. (1961). LIMITS OF COMMUNICATION In the late 1940's, Claude Shannon of Bell Laboratories developed a mathematical theory of information that profoundly altered our basic thinking about communication, and stimulated considerable intellectual activity, both practical and theoretical. This theory, among other things, gives us some fundamental boundaries within which communication can take place. Often we can gain considerable insight by comparing the performance of a digital communication system design with these limits. Information theory provides profound insights into the situation pictured in SOURCE ~ SOURCE CODER ------ll CHANNEL CODER SINK ~ SOURCE DECODER oE--- CHANNEL DECODER 1 CHANNEL I Figure 4·1. A general picture of a source communicating over a channel using source and channel coding. 98 LIMITS OF COMMUNICATION Figure 4-1, in which a source is communicating over a channel to a sink. The source and channel are both modeled statistically. The objective is to provide the source information to the sink with the greatest fidelity. To that end, Shannon introduced the general idea of coding. The objective of source coding is to minimize the bit rate required for representation of the source at the output of a source coder, subject to a constraint on fidelity. Shannon showed that the interface between the source coder and channel coder can be, without loss of generality, a bit stream, regardless of the nature of the source and channel. The objective of channel coding is to maximize the information rate that the channel can convey sufficiently reliably (where reliability is normally measured as a bit error probability). Our primary focus in this book will be on the channel and the associated channel coder, although understanding source coding will also be helpful. Given the statistics of a source, modeled as a discrete-time random process, the minimum number of bits per unit time required to represent it at the output of the source coder with some specified distortion can be determined. The source coding theorem is the key result of this rate distortion theory (see for example [1]). This theory offers considerable insight into the bit rates required for digital communication of an analog signal via PCM (Chapter 1). Example 4-1. We limit our attention here to the simple special case of a discrete-time discrete-valued random process {Xk } with independent and identically distributed (LLd.) samples. Because the process is discrete-valued, it is possible to encode the signal as a bit stream with perfect fidelity. In fact, the minimum average number of bits required to represent each sample without distortion is equal to the entropy of X, defined to be H(X) =E[-log2PX(X)] =- L PX(X)log2PX(X) , ZEn. (4.1) where Ox is the alphabet (sample space) of X. This result is developed in Section 4.1. 0 Since the entropy determines the number of bits required to represent a sample at the output of the source coder, it is said to determine the amount of information in the sample, measured in bits. This concept is explained in Section 4.1. A second concept due to Shannon is the capacity of a noisy communication channel, defined as the maximum bit rate that can be transmitted over that channel with a vanishingly small error rate. The various forms of the channel coding theorem specify the capacity. The fact that an error rate approaching zero can be achieved was very surprising at the time, and it motivated the practical forms of channel coding to be discussed in Chapters 13 and 14. Example 4-2. Consider transmitting a random process (Xk ), with similar characteristics to Example 4-1, over a noisy discrete-time memoryless channel, defined as one for which the current output Yk is dependent on only the current input Xk . Because the channel is memoryless, the samples Yk are also independent and identically distributed. The capacity of this channel can be obtained from the mutual information between the input random variable X and the out- put random variable Y, 99 = I (X ,Y) H (X) - H (X I Y) , (4.2) where H (X I Y) is the conditional entropy. The channel capacity equals the mutual information maximized over all possible probability distributions for the input X. This result is developed in Section 4.2. 0 The result of Example 4-2 can also be used to determine the channel capacity of a bandlimited continuous-time channel using the Nyquist sampling theorem, as will be discussed in Section 4.3. 4.1. JUST ENOUGH INFORMATION ABOUT ENTROPY Intuitively, observing the outcome of a random variable gives us information. Rare events carry more information than common events. Example 4-3. You learn very little if I tell you that the sun rose this morning, but you learn considerably more if I tell you that San Francisco was destroyed by an earthquake this morning. The reason the latter observation carries more infonnation is that it has a lower prior probability. 0 In 1928 Hartley proposed a logarithmic measure of information that reflects this intui- tion. Consider a random variable X with sample space Q x = {a l' a2, , aK}' The self-information in an outcome am is defined to be (4.3) The self-information of a rare event is greater than the self-information of a common event, conforming with intuition. Furthermore, the self-information is non-negative. But why the logarithm? One intuitive justification arises from considering two independent random variables X and Y, where Q y = {b l' b 2, ... , bN }. The infor- mation in the joint events am and bn intuitively should be the sum of the information in each. The self information defined in (4.3) has this property. = h (am. bn ) -log2PX,y(am • bn ) = -log2PX(am) -log2Py(bn ) =h (am) + h (bn ) . (4.4) The average information H (X) in X. defined in (4.1), is also called the entropy of X because of its formal similarity to thermodynamic entropy. Equivalent interpretations of H (X ) are • the average information obtained by observing an outcome, • the average uncertainty about X before it is observed, and • the average uncertainty removed by observing X . Because of the base-two logarithm in (4.1), information is measured in bits. 100 LIMITS OF COMMUNICATION Example 4-4. Consider a binary random variable X with alphabet ilx =[O,I}. Suppose that q =Px(l), so H (X) = - q Iog2 q - (1 - q ) Iog2 (l - q) . (4.5) This is plotted as a function of q in Figure 4-2. Notice that the entropy peaks at 1 bit when q = V2 and goes to zero when q = 0 or q = 1. This agrees with our intuition that there is no information in certain events. 0 Although the intuitive justification given so far may seem adequate, the key to the interpretation of entropy as an information measure lies in the asymptotic equipartition theorem, which is further justified in Appendix 4-A. Define the random vector = X (X 1"" ,Xn ) where Xi are independent trials of a discrete random variable X with entropy H (X). Define the vector x to be an outcome of the random vector X. The theorem says that asymptotically as n ~ 00, there is a set of "typical" outcomes S for which px(x) == 2-nH (X) , xeS, (4.6) and the total probability that the outcome is in S is very close to unity. Since the "typical" outcomes all have approximately the same probability, there must be approximately 2nH (X) outcomes in S. This approximation becomes more accurate as n gets large. We can now conceptually design a source coder as follows. This source coder will assign to each outcome x a binary word, called the code. If n is large, we can assign binary words only to the "typical" outcomes, and ignore the "nontypical" ones. If we use nH (X)-bit code words, we can encode each of the 2nH (X) typical outcomes with a unique binary word, for an average of H (X) bits per component of the vector x. Since each outcome of the component random variable X requires on average H (X) bits, H (X) is the average information obtained from the observation. It is important to note, however, that this argument applies only if we encode a large number of components collectively, and not each component separately. The statement that H(X) is the average number of bits required to encode a component X applies only to an average of n components, not to an individual component. H(x) q o Y2 Figure 4·2. The entropy of a binary random variable as a function of the probability q=Px(I). SEC. 4.1 JUST ENOUGH INFORMATION ABOUT ENTROPY 101 We will now state (but not prove) the source coding theorem for discrete- amplitude discrete-time sources. If a source can be modeled as repeated independent trials of a random variable X at r trials per second, we define the rate of the source to = be R rH (X). The source can be encoded by a source coder into a bit stream with bit rate less than R + E for any E > O. Constructing practical codes that come close to R is difficult, but constructing good sub-optimal codes is often-easy. Example 4-5. For the source of Example 4-4, if q = Ih then H (X) = I. This implies that to encode repeated outcomes of X we need one bit per outcome, on average. In this case, this is also adequate for each sample, not just on average, since the source is binary. A source coder that achieves rate R just transmits outcomes of X unaltered. 0 Example 4-6. When q = 0.1 in Example 4-4, H (X) = - 0.I·log2(0.1) - 0.9·log2(0.9) :::: 0.47 , (4.7) implying that less than half a bit per outcome is required, on average. This is not so intuitive; however, there are coding schemes in which the average number of bits per outcome will be lower than unity but greater than 0.47. One simple coding scheme takes a pair of outcomes and assigns them bits according to the following table. outcomes bits 0,0 0 0,1 10 1,0 110 1,1 111 A bit stream formed by repeated trials can be easily decoded. The average number of bits produced by this coder is 0.645 bits per trial. But note that the pair of trials 1,1 requires three bits, or 1.5 bits per trial. This emphasizes that the entropy is an average quantity. D Example 4-7. Consider a particularly unfair coin that always comes up heads. Then H(X) = 0, (4.8) using the identity 0 log2 0 = O. This says that no bits are required to specify the outcome, which is valid. 0 Exercise 4-1. It is clear from the definition of entropy that H (X) ~ O. Use the inequality log x :; x-I to show that H(X) :;Iog2K , (4.9) where K is the size of the alphabet of X, with equality if and only if the outcomes of X are equally likely. [] 102 LIMITS OF COMMUNICATION The conclusion of Exercise 4-1 is that log2K bits always suffices to specify the outcomes, as is obvious since 210gzK = K possible outcomes can be encoded by a straightforward assignment, at least when K is a power of two. The less obvious conclusion is that the maximum number of bits, log2K, is required only when the outcomes are equally likely. 4.2. CAPACITY OF DISCRETE-TIME CHANNELS The concept of entropy and information can be extended to channels, yielding considerable information about their fundamental limits. This section considers discrete-time channels, deferring continuous-time channels to Section 4.3. We consider three different types of discrete-time channels: discrete-valued inputs and outputs, discrete-valued inputs and continuous-valued outputs, and continuous-valued inputs and outputs. 4.2.1. Discrete-Valued Inputs and Outputs Consider a discrete-time channel with input random process {Xk } and output {Yk }. We consider here only memoryless channels for which the current output Yk is independent of all inputs except Xk • Such a channel is fully characterized by the conditional probabilities PYIX (y Ix) for all x E Ox and y E Oy. Example 4-8. Consider a channel with input and output alphabet Ox = Oy = {O, I} such that PYlx(OII)=PYIX(lIO) =p. This binary symmetric channel (BSC) offers a useful model of a channel that introduces independent random errors with probability p. The transition probabilities may be illustrated by a diagram: PYlx(ylx) l-p o If the input samples are independent, the information per sample at the input is H (X) and the information per n samples is nH (X). The question is how much of this information gets through the channel. We can answer this question by finding the uncertainty in X after observing the output of the channel Y. Suppose that y is an outcome of Y. Then the uncertainty in X given the event Y =y is ~ H(XIY)=E[ -log2PXly(XIY)] =- PXly(xly)log2PXly(xly). (4.10) x E Ox To find the average uncertainty in X after observing Y, we must average this over the distribution of Y, yielding a quantity called the conditional entropy, SEC. 4.2 CAPAClTY OF DLSCRETE-T!ME CHANNELS 103 H(XIy)= L H(Xly)py(y)=- L L PX,y(x,y)log2PXly(.xIY)·(4.1l) yeQy y e Q y x e Qx This conditional entropy, on a channel such as the BSC, is a measure of the average uncertainty about the input of the channel after observing the output. The uncertainty about X must be larger before observing Y than after, the differ- ence is a measure of the information passed through the channel on average. Thus we define I (X ,Y) =H (X) - H (X I Y) (4.12) as the average mutual information (as in (4.2». In other words, I (X ,y) is interpreted as the uncertainty about X that is removed by observing Y, or the information about X in Y. Exercise 4-2. (a) Show that I (X ,Y) can be written directly in tenns of the transition probabilities (chan- nel) and the input distribution (input) as r ___ 1 I II L L I(X,Y)= Px(x) pnx(yIX)IO~l x e Ox ye Or x ~ e~no PYlx(yrx) (")" Px X i[JYIX ( y r X ) _ (4.13) (b) Show that (4.12) can be written alternatively as ,n I (X = H (Y) - H (Y IX) = I (Y ,X) . (4.14) Thus, the infonnation about X in Y is the same as the infonnation about Y in X. 0 The transition probabilities are fixed by the channel. The input probabilities are under our wntrol through the design of the channel coder. The mutual information (information conveyed through the channel) is a function of both transition and input probabilities. It makes intuitive sense that we would want to choose the input probabilities so as to maximize this mutual information. The channel capacity per symbol is defined as the maximum information conveyed over all possible input probability distributions, Cs =max I (X,Y) . px(x) (4.15) This capacity is in bits/symbol, where a symbol is one sample of X. If the channel is used s times per second, then the channel capacity in bits per second is c =sCs . (4.16) Exercise 4-3. For the ESC of Exampie 4-8, let the probability ofthe two inputs be q and 1 - q . (a) Show that the mutual infonnation is ,n I(X = H(Y) + P Iog2P + (1- P )log2(1 - p). (4.17) 104 LIMITS OF COMMUNICATION (b) By maximizing over q, show that the channel capacity per symbol is Cs = 1 +plog2P +(1-p)10g2(1-p). (4.18) The capacity is zero if p =Ih, since then the channel inputs and outputs are independent, and is unity when p =0 or p = I, since then the channel is binary and noiseless. o Using the channel capacity theorem and the source coding theorem, we will now state (but not prove) a general channel capacity theorem. Given a source with rate R =rH (X) bits/second, and a channel with capacity C =sCs bits/sec, then if R < C there exists a combination of source and channel coders such that the source can be communicated over the channel with fidelity arbitrarily close to perfect. If the source is a bit stream, the channel coder can achieve arbitrarily low probability oferror if the bit rate is below the channel capacity. In practice, achieving vanishingly small error probability requires arbitrarily large computational complexity and processing delay. Nevertheless, the channel capacity result is very useful as an ideal against which to compare practical modulation and coding systems. 4.2.2. Discrete Inputs and Continuous Outputs Another useful channel model is a discrete-time channel with a discrete-valued input and a continuous-valued output. Example 4-9. In an additive noise channel, the output is Y=X+N (4.19) where X is a discrete random input to the channel and N is a continuous noise variable. This model arises often in this book in the situation where a discrete data symbol taking on a finite number of possible values is transmitted over a channel with additive Gaussian noise (Le. N is Gaussian). 0 This model is useful because most communications media (Chapter 5) have continuous-valued outputs, due to thermal noise, whereas digital signals are discretevalued. The previous definitions of entropy carry over to continuous-valued random vari- ables, if we are careful about replacing summations with integrals. For example, the entropy of a continuous-valued random variable Y is defined as f H (Y) = E[-10g2fy(Y)] = - fy(y )log2f y(y) dy. Qy (4.20) Just as with discrete-valued random variables, it is possible to bound the entropy of a continuous-valued random variable. cr, Exercise 4-4. Show that if Y has zero mean and variance then 0$ H (Y) $ 10g2(~ ) (4.21) SEC. 4.2 CAPACITY OF DISCRETE-TIME CHANNELS 105 with equality if and only if Y is Gaussian. Hint: Show that (4.22) for any probability density function g (y), using the inequality log x ::;; x - 1. Then substitute a Gaussian p.d.f. for g (y). 0 It is important to note that we have constrained the variance of the random variable in this exercise. A different constraint would lead to a different bound; or, no constraint could lead to unbounded entropy. The conditional entropy is a little trickier because it involves both discrete and continuous-valued random variables. Following the second expression in (4.11), we can define = f L H(YIX) Px(x) fYlx(ylx)logzfYlx(ylx) dy. , x E Ox Or (4.23) Exercise 4-5. Consider the additive Gaussian noise of Example 4-9. Show that H(YIX) =H(N). This result is intuitive, since after observing the outcome of X, the uncertainty in Y is precisely the entropy of the noise. 0 The mutual information and capacity are defined as before, in (4.12) and (4.15). Exercise 4-6. Following (4.13), the mutual infonnation can be written in tenns of the channel transition probability frlX (y Ix) and the probability distribution of the input Px (x), J I (X ,Y) = L Px (x) frlX(Y I x)10g2 x e Ox Of fr\x(y Ix) L dy. Px(x)frlx(ylx) x e Ox (4.24) Derive this from (4.20) and (4.11). 0 The channel capacity for the continuous-output channel depends on the values in the discrete input OX, For example, on an additive noise channel, we would expect the capacity of a channel with inputs ±100 to be larger than the capacity with inputs ±1 when the noise is the same. The set Ox of channel inputs is called the input alphabet. Example 4-10. Some common channel alphabets that we will encounter in Chapter 6 are shown in Figure 4-3. The K -AM alphabets are real-valued, containing K equally spaced points centered at the origin. The remaining alphabets are complex-valued, as appropriate for complex-valued discrete-time channels. The noise in this case is assumed to be complex white Gaussian noise, where the real and imaginary parts have the same power but are independent of one another and of the channel input. 0 One approach to calculating channel capacity would be not to constrain the alphabet at all; this is done in Section 4.2.3. Another approach is to choose an input alphabet, 106 LIMITS OF COMMUNICATION 2·AM 4·AM 8·AM 16·AM 1 ~ 4·PSK I 0 0 ~ • I • o 0 8-PSK o I" T oI 8·AMPM 1 o 0 0 0 ..:..----".:'+1':".----..:.. 16-QAM . . . . o 0 10 0 ·~ . . . o 0 0 0'0 0 0 0 32·AMPM ···:·0 ~ 0 ....:0....: ....:T1o ....O: ....:O....:....:0 64·QAM Figure 4-3. Some real-valued and complex-valued channel alphabets for a discrete-valued channel input The acronyms refer to signaling methods that will be discussed in Chapter 6. .,. + C -- (BJTSISYMBOl) ---,-----,-- r - - - : : : , . . . . . . . . , - =10--5e - - - 16.AM I 0.510&2(1 + a;Icr'l) 10-5 +-----h~~--"~r----_+--- 4.AM 2·AM I 01 I I 1 SNR (dB) 0 20 30 Figure 4·4. Bounds on the information conveyed by a real-valued discrete-time channel with additive white Gaussian noise as a function of SNR for four input aiphabets defined in Figure 4-3. It is assumed that the symbols in the alphabet are equally likely. Also shown is the channel capacity for continuous-valued input signals. derived in Section 4.2.3. The points!abe!ed 10-5 indicate tha SNR at which a probability of error of 10-5 is achieved with direct techniques (no coding). The significance of these ~ints will be discussed further in Chapter 14. ThevaTiance of the transmitted symbofs is ax. so the SNR is defined as a;.Icr, and is expressed in dB. (After Ungerboeck (2).) getting the discrete-input channel model of this subsection, and then determine the capacity by maximizing the mutual information over the probabilities of the inputs using (4.24). GohTlg one step further, we can assu-me a particular distribution for th-e input alphabet, and then find the information I (X ,Y) conveyed by the channel. In a classic paper that is credited with establishing the practical importance of trellis SEC. 4.2 CAPACITY OF DISCRETE-TIME CHANNELS 107 coding (Chapter 14), Ungerboeck makes this calculation assuming that the input symbols in the alphabet are equally likely and that the channel adds independent Gaussian noise [2]. He computes the information conveyed by the channel as a function of the signa/-to-noise ratio (SNR) for the input alphabets in Figure 4-3. The results are shown in Figure 4-4 (real alphabets) and Figure 4-5 (complex alphabets). Example 4-11. Consider the CUlve corresponding to 4-AM. As the signal to noise ratio increases, the information conveyed approaches two bits per symbol. This is intuitive because if the noise is small, nearly two bits per symbol can be sent with an alphabet of four symbols with low probability of error. For each input alphabet Ox with size IOx I, the information conveyed asymptotically approaches logzl Ox I as the signal to noise ratio increases. While a capacity of two bits per symbol is not achievable with 4-AM, it is achievable with 8-AM for an SNR as low as 13 dB. Furthermore, using 16-AM to transmit two bits per symbol does not gain much noise immunity. This suggests that there -is very little lost if we use 8-AM to transmit two bits per symbol. This observation is exploited in Chapter 14, where we discuss trellis coding. 0 4.2.3. Continuous-Valued Inputs and Outputs The question arises as to what is lost by choosing a specific discrete alphabet at the channel input. We can answer this question by determining the capacity with a continuous-valued input, which is an infinite alphabet. For the additive Gaussian channel considered in Example 4-9, for any given SNR, we lose very little in capacity 6 +-------,------r-r:::;:oo--'& 0-5'---, 64.QAM 5 +------+----,hL....::.....r--=-&-:=---+ 32-AMPM 4 +------+-+T-:..-"7"'.~r-----+ 16.QAM 3 -+-------t7"l-::e_-+-e---t-----+ 8-AMPM 8·PSK 2 +-----::oW:....-+-=.;r.----+------+ 4-PSK ,'-=----et------t------+ 2.PSK o+--.----.-----.---,--\IQr:--.--.--.---.---2:i:"0-.----r---.----.--3±0 SNR (dB) Figure 4·5. An analog to Figure 4-4 for a discrete-time complex-valued alphabet (defined in Figure 4-3) and channel. (After Ungerboeck [ 2].) 108 LIMITS OF COMMUNICATION by choosing a discrete input alphabet, as long as the alphabet is sufficiently large (the higher the SN'R, the larger the required aiphabet). This result is important in that it justifies many of the digital cormnunication techniques used in practice (Chapter 6). Let X be a continuous-valued random va.'"iable. The entropy of Y is still given by (4.20), but the summation over x in the conditional entropy (4.23) must be replaced by an integral, J H(YIX) = !x(x) Jiy,x(Y Ix)log2!Ylx(Y Ix) dy. (4.25) Qx Qr We obtain the channel capacity by maximizing I (X ,Y) over!x (x). Scatar Additive Gaussian Noise Channel Assume an additive Gaussian noise channel, Y =X + N where N is an indepen- dent zero-mean Gaussian random va..riable with va.riance a2. What is the capacity under the constraint that the variance of X is a;? The result of Exercise 4-5 is trivi- ally extended to get H (Y IX) =H (N), which is not a function of the input distribution, so the channel capacity is obtained by maximizing H (Y). The variance of Y is con- strained to be a; + c?, so from (4.21), H(Y) S. ~ log2[2ne(a; + a2)] , (4.26) with equality if and only if Y is Gaussian. Fortunately, Y is Gaussian if X is Gaus- sian, so the bound ca..'l in fact be achieven. Therefore channel capacity is achieved with a Gaussian input, and from (4.14), e a; s = -1log2 [2ne 2 (ax + a 2)] - 2 -12 log2 (2ne _? (J) = -1log2 (1 2 + -2) (J (4.27) in bits per symbol. This channel capacity is plotted in both Figure 4-4 and Figure 4-5, where the SNR is a;J(l2. Note that this capacity is very similar to the capacity for any particular discrete alphabet at low SNR, and diverges significantly at large SNR. The capacity in Figure 4-5 is twice that of Figure 4-4 because each of the real and imaginary parts has the capacity given by (4.27). The conclusion is that for the Gaussian channel and any particular SNR, there is a sufficiently large discrete input alphabet that has a capacity very close to the continuous-input capacity. This result give.s a solid theoretical underpinning to the practical use of discrete input alphabets, which are also very convenient for implementation (Chapter 6). Capacity of Vector AddWve Gaussian No1se Channe1 These results for the additive Gaussian channel are easily extended tD a vectDr channel model. This extension will prove to be critically important in Chapters 8 and 10, where we consider continuous-time bandlimited Gaussian channels. We will show there that, for a given finite time interval, such a channel can be reduced to a vector Gaussian channel. Consider a channel modeled by SEC. 4.2 CAPACITY OF DISCRETE-TIME CHANNELS 109 Y=X+N (4.28) where X, Y, and N are N-dimensional vectors, X and N are independent, and the components of N are independent Gaussian random variables each with variance 0'2. It is easily shown, as a generalization of Exercise 4-5, that I (X,Y) = H (Y) - H (Y IX) = H (Y) - H (N) (4.29) and that H (N) = ~ '!og2(21te a2) . (4.30) The entropy of a random vector is the same as that of a scalar random variable, (4.1) or (4.20), except that the sample space has vector-valued members. The noise entropy is proportional to the dimension N because each component of the noise contributes the same entropy as in the scalar case. All that remaiIL~ then, is to find the maximum of H (Y) over all input distributions f x(x). Exercise 4-7. (a) Generalize (4.21) to show that J H(Y)~ - jy(y)log2g(y)dy fly (4.31) for any probabHity density function g{y). (b) Substitute a vector Gaussian density with independent components wit.' mean zero and variance (0'2 + a;"JJ for the n -th component to obtain H(Y)~ ~ iIOg2[21teca2+a;,nH, 11=1 (4.32) and thus show that [(X,Y) ~ ~ ~110g2(l + a~.. ), (4.33) with e4ua1ity if Y is Gaussia.'1 wiLtl independent zero-mea.'1 components. Fortunately, this upper bound can be achieved if the input vector X is chosen to have independent Gaussian components, each with mean zero and with variance 0';,11 for the nth component. (c) Using the inequality logx ~ (x-I), show that if the variance of X is constrained to some ai, N = L a;~ ~ 0';,11 ' 11=1 (4.34) then [(X,Y) ~ N -log2(1 + 2 O'x --2)' 2 Na (4.35) with equality if and Of'ly if all t..qe cornponents of X have equal variance. 0 110 LIMITS OF COMMUNICATION The co~clusion is.tha,t.,the c~.aci~y of the vector Gaussian channel with input variance constraIned to EI I X I *] =0; IS gIven by N 0; C = -log2(1 + ----:2)' (4.36) 2 Ncr and the input distribution that achieves capacit;r is a zero-mean Gaussian vector with independent components, each with variance 0;1N. The interpretation of this result is that the capacity is N, the number of degrees of freedom, times {).5'log2 (l + SNR ), where the signal to noise ratio SNR ={J;/N{J2 is the total input signal power divided by t...~e total noise power. 4.3. FURTHER REA01NG Abramson [3] gives a short elementary introduction to information theory, particularly the channel coding theorem. Gallager [4] has long been a standard advanced text and includes a.., extensive discussion of continuous-time channels. McEliece [5] provides a readable introduction with qualitative sections devoted to describing the more advanced work in the field. An excellent recent text is by Cover and Thomas [6]. Also recommended is the text by Blahut I7]. A collection of key historical papers, edited by Slepian [8] provides an easy way to access the most important historical papers, including twelve by Shannon. "A Mathematical Theory of Communication" and "Communication in the Presence of Noise", two of Shannon's best known papers, are highly recommended reading, for their lucidity, relevance, and historical value. Especially interesting, and mandatory reading for anyone with an interest in the subject, Shannon gives an axiomatic justification of entropy as a measure of information, He simply assumes three properties that a reasonable measure of information should have, and derives entropy as the only measure that has these properties [9,10]. Viterbi and Omura [11] provide an encyclopedic coverage ofinformation theory, with an emphasis throughout on convolutional codes. Finally, Wolfowitz {12] gives a variety of generalizations ill the channel cooing theorem. APPENDIX 4-A ASYMPTOTIC EQUIPARTITION THEOREM In this appendix we give a non-rigorous derivation of the asymptotic equipartition theorem that gives a great deal of insight. Define a random process Yk in which each sample is an independent trial of the random variable Y with alphabet = Oy {b l' ... , bK }. Let there be n trials, and define nj to be the number of outcomes equal to bj • The relative-frequency interpretation of probabilities tells us that if n is large, then with high probability, APP.4-A ASYMPTOTIC EQUIPARTITION THEOREM 111 n· _I n == py(bj ) • (4.37) (A rigorous development depends mainly on defining precisely what we mean by "high probability". One approach is to show that given any e > 0, the probability that fpy (bj ) - e] < nj In < fpy (bj ) + e] approaches unity as n gets large.) Suppose that we are interested in the product of the n observations. We can write the product as = =[ Yl ... Yn (b1)n 1 ••• (bKtK (b1t 1/n ... (bK)nKln] n n K n [= 2~n)Og2bl ... 2 ~)Og2bK] n = [2~' = 1 ~)Og2b'] n (4.38) Then using (4.37), K n Y 1 ... Yn == ~ py(bj »)og2bi] 2i = 1 = [ 2El)og2Y]~J n [ (4.39) with high probability. A rigorous proof is left to Problem 4-16. Since (4.39) is true for any discrete-valued random variable Y, it is certainly true for a random variable Y =!(X), (4.40) where ! is any function defined on the alphabet of X . Define ! (x) = Px (x ) = Pr[X = x] for all x E ilx , certainly a legitimate function defined on the alphabet of X. Then (4.39) implies that for large n n n Yl" 'Yn =!(xl)"'!(xn )= II!(xj)= IIpx(xj) j=l j=l == [ 2El)Og2Px(X)1] n = 2-nH (X) (4.41) with high probability. Since the Xj are independent, n Px(x) = j I=I1 Px(Xj) so with high probability (4.6) holds. (4.42) PROBLEMS 4·1. Consider an unfair coin that produces heads with probability 1/4. What is the entropy of the coin flip outcome? Suppose the coin is flipped once per second. What is the rate in this source? Devise a coder to encode successive coin flips outcomes so that the average number of bits per flip is less than one. How does your coder compare with the rate of the source? 4-2. Consider a random variable X with alphabet Ox = fa I> a2' a3, a4) and probabilities px(at)=112 px(a~=114 px(a3)=118 PX(a4) = 118. (4.43) Find the entropy of the random variable. Suppose independent trials of the random variable 112 LIMITS OF COMMUNICATION occur at rate r = 100 trials/second. What is the rate of the source? Devise a coder that exactly achieves the rate cl the source. 4-3. The well known Jensen' 8 iMquality fmm pmbability t.Qeory imp!iest.'mt E[!og2X ] ~ !og2E[X]. Use this to prove the p-q inequality: Given Pi and O i =1 then M M - L Pi logz Pi ~ - L Pi 10gzqi + 10gzCl i =1 i =1 with equality if and only if qi = {]Pi for all i . 4-4. For ~ discrete-valued r2!!dom vmiable X, use the p-q IDe.quality of Problem 4-3 to give MOther derivation of the results in Exercise 4-l. 4-5. Let X denote a vector of n i.i.d. random variables each taking the value zero or one. Show that H(X) ~ n (4.44) with equality if and only if the two outcomes have equal probability. 4-6. Consider the following discrete memoryless channel, where all transition probabilities are 113: (a) Find the two conditional entropies and the mutual information in terms of the input and output entropies, (b) Find the channel capacity. 4-7. Repeat Problem 4-6 for the following channel: o 0 1 0_--"_ _0 1 I 0-0--"--0 1 4-8. Repeat Problem 4-6 for the following channel, called a binary erasure channel: O~~ 1~ I-p 2 (Answer to b: C. = I - P .) 4-9. (a) Show that when Pi and P2' Further define a second distribution {qj, I ~ i ~ K], where q 1 = P 1 - 0 and q2 = P2 + 0 and qi = Pi, i > 2, where 0> O. Show that the second distribution has larger entropy. Hint: Use the results of Problem 4-9. 4-12. Consider a continuous-valued random variable X uniformly distributed on the interval [-a ,a]: (a) What is its entropy? (b) How does its entropy compare to that of a Gaussian distribution with the same variance? 4-13. Use the p-q inequality of Problem 4-3 to show the following. (a) For any two discrete-valued random variables X and Y ,I (X ,f) ~ O. (b) H(XPH(XIY) (c) H (X) + H (Y) ~ H (X ,Y) (d) When are these inequalities equalities? 4-14. Show that by replacing the summations in (4.24) with integrals, the mutual information of two continuous-valued random variables can be wrinen f f Ix.r(x ,y) I (X ,f) = 0.. Or Ix .r(x,y )log2 Ix (x )/r(y) dydx. (4.46) 4-15. 4-16. Investigate the capacity of the vector Gaussian channel of (4.36) as the number of degrees of freedom N increases. Interpret the result. Consider a random process {Xk J, where the components are independent observations of a random variable X. The law of large numbers for sums of random variables states that for any E > 0, Pr[ E[X]-E<~(Xl+ ... +Xn) n2' b. The critical an- gie of incidence at which the angle of refraction is ninety degrees. c. At angles larger than the critical angle, total internal reflection occurs. SEC. 5.3 OPTICAL FIBER 129 a beam must have a diameter large with respect to the wavelength in order to be approximated as a plane wave [7]. Assuming that the index of refraction n 1 in the incident material is greater than the index of refraction of the refraction medium, nz, or n 1 > n z. Then Snell's Law predicts that si.n(81) =-nz < 1. sm(8z) nl (5.26) The angle of refraction is larger than the angle of incidence. Shown in Figure 5-lOb is the case of a critical incidence angle where the angle of refraction is ninety degrees, so that the light is refracted along the material interface. This corresponds to critical incident angle nz sin(81) = - . nl (5.27) For angles larger than (5.27), there is total internal reflection as illustrated in Figure 5-lOc, where the angle of reflection is always equal to the angle of incidence. This principle can be exploited in an optical fiber waveguide as illustrated in Figure 5-11. The core and cladding materials are glass, which transmits light with little attenuation, while the sheath is an opaque plastic material that serves no purpose other than to lend strength, absorb any light that might otherwise escape, and prevent any light from entering (which would represent interference or crosstalk). The core glass has a higher index of refraction than the cladding, with the result that incident rays with a small angle of incidence are captured by total internal reflection. This is illustrated in Figure 5-12, where a light ray incident on the end of the fiber is captured by total internal reflection as long as the angle of incidence 81 is below a critical angle (Problem 5-4). The ray model predicts that the light will bounce back and forth, confined to the waveguide until it emerges from the other end. Furthermore, it is obvious that the path length of a ray, and hence the transit time, is a function of the incident angle of the ray (Problem 5-5). This variation in transit time for different rays manifests itself in pulse broadening - the broadening of a pulse launched into the fiber as it propagates - which in CLADDING SHEATH Figure 5·11. An optical fiber waveguide. The core and cladding serve to confine the light incident at narrow incident angles. while the opaque the sheath serves to give mechanical stability and prevent crosstalk or interference. 130 PHYSICAL MEDIA AND CHANNELS CLADDING Figure 5-12. Ray model of propagation of light in an optical waveguide by total internal reflection. Shown is a cross-section of a fiber waveguide along its axis of symmetry. with an incident light ray at angle 9\ which passes through the axis of the fiber (a meridional ray). turn limits the pulse rate which can be used or the distance that can be transmitted or both. The pulse broadening can be reduced by modifying the design of the fiber, and specifically by using a graded-index fiber in which the index of refraction varies continuously with radial dimension from the axis. The foregoing ray model gives some insight into the behavior of light in an optical fiber waveguide; for example, it correctly predicts that there is a greater pulse broadening when the index difference between core and cladding is greater. However, this model is inadequate to give an accurate description since in practice the radial dimensions of the fiber are on the order ,?f the wavelength of the light. For example, the ray model of light predicts that there is a continuum of angles for which the light will bounce back and forth between core-cladding boundaries indefinitely. A more refined model uses Maxwell's equations to predict the behavior of light in the waveguide, and finds that in fact there are only a discrete and finite number of angles at which light propagates in zigzag fashion indefinitely. Each of these angles corresponds to a mode of propagation, similar to the modes in a metallic waveguide carrying microwave radiation. When the core radius is many times larger than the wavelength of the propagating light, there are many modes; this is called a multimode fiber. As the radius of the core is reduced, fewer and fewer modes are accommodated, until at a radius on the order of the wavelength only one mode of propagation is supported. This is called a single mode fiber. For a single mode fiber the ray model is seriously deficient since it depends on physical dimensions that are large relative to the wavelength for its accuracy. In fact, in the single mode fiber the light is not confined to the core, but in fact a significant fraction of the power propagates in the cladding. As the radius of the core gets smaller and smaller, more and more of the power travels in the cladding. For various reasons, as we will see, the transmission capacity of the single mode fiber is greater. However, it is also more difficult to splice with low attenuation, and it also fails to capture light at the larger incident angles that would be captured by a multimode fiber, making it more difficult to launch a given optical power. In view of its much larger ultimate capacity, there is a trend toward exclusive use of single mode fiber in new installations, even though multimode fiber has been used extensively in the past [8]. In the following discussion, we emphasize the properties of single mode SEC. 5.3 OPTICAL FIBER 131 fiber. We will now discuss the factors which limit the bandwidth or bit rate which can be transmitted through a fiber of a given length. The important factors are: • Material attenuation, the loss in signal power that inevitably results as light trav- els down an optical waveguide. There are four sources of this loss in a single mode fiber - scattering of the light by inherent inhomogeneities in the molecu- lar structure of the glass crystal, absorption of the light by impurities in the cry- stal, losses in connectors, and losses introduced by bending of the fiber. Gen- erally these losses are affected by the wavelength of the light, which affects the distribution of power between core and cladding as well as scattering and absorp- tion mechanisms. The effect of these attenuation mechanisms is that the signal power loss in dB is proportional to the length of the fiber. Therefore, for a line of length L, if the loss in dB per kilometer is Yo, the total loss of the fiber is yoL and hence the ratio of transmitted power PT to received power PR obeys yoL = lOlog lO PT - PR, 10 L = PR PT 'lO 10. (5.28) This exponential dependence of loss vs. length is the same as for the transmission lines of Section 5.2. • Mode dispersion, or the difference in group velocity between different modes, results in the broadening of a pulse which is launched into the fiber. This broadening of pulses results in interference between successive pulses which are transmitted, called intersymbol interference (Chapter 6). Since this pulse broadening increases with the length of the fiber, this dispersion will limit the distance between regenerative repeaters. One significant advantage of single mode fibers is that mode dispersion is absent since there is only one mode. • Chromatic or material dispersion is caused by differences in the velocity of propagation at different wavelengths. For infrared and longer wavelengths, the shorter wavelengths arrive earlier than relatively longer wavelengths, but there is a crossover point at about 1.3 ~ beyond which relatively longer wavelengths arrive earlier. Since practical optical sources have a non-zero bandwidth, called the linewidth, and signal modulation increases the optical bandwidth further, material dispersion will also cause intersymbol interference and limit the distance between regenerative repeaters. Material dispersion is qualitatively similar to the dispersion that occurs in transmission lines (Section 5.2) due to frequency-dependent attenuation. The total dispersion is usually expressed in units of picoseconds pulse spreading per GHz source bandwidth per kilometer distance, with typical values in the range of zero to 0.15 in the 1.3-1.6 Ilmeter minimum attenuation region [9,10]. It is very important that since the dispersion passes from positive to negative in the region of 1.3 Ilmeter wavelength, the dispersion is very nearly zero at this wavelength. A typical curve of the magnitude of the chromatic dispersion vs. wavelength is shown in Figure 5-13, where the zero is evident. The chromatic dispersion can be made negligibly small over a relatively wide range of wavelengths. Furthermore, the frequency of this zero in chromatic dispersion can be shifted through waveguide design to correspond 132 PHYSICAL MEDIA AND CHANNELS DISPERSION (psecJkm-nm) 20 1!3 1.6 WAVELENGTH (Jlill) Figure 5·13. Typical chromatic dispersion in silica fiber [10]. Shown is the magnitude of the dispersion; the direction of the dispersion actually reverses at the zero-erossing. to the wavelength of minimum attenuation. With these impainnents in mind, we can discuss the practical and fundamental limits on information capacity for a fiber. The fundamental limit on attenuation is due to the intrinsic material scattering of the glass in the fiber - this is known as Rayleigh scattering, and is similar to the scattering in the atmosphere of the earth that results in our blue sky. The scattering loss decreases rapidly with wavelength (as the fourth power), and hence it is generally advantageous to choose a longer wavelength. The attenuation due to intrinsic absorption is negligible, but at certain wavelengths large attenuation due to certain impurities is observed. Particularly important are hydroxyl (OH) radicals in the glass, which absorb at 2.73 jlmeters wavelength and harmonics. At long wavelengths there is infrared absorption associated fundamentally with the glass, which rises sharply starting at 1.6 jlmeters. A loss curve for a state-of-the-art fiber is shown in Figure 5-14. Note the loss curves for two intrinsic effects which would be present in an ideal material, Rayleigh scattering and infrared absorption, and additional absorption peaks at 0.95, 1.25, and 1.39 jlm due to OH impurities. The lowest losses are at approximately 1.3 and 1.5 jlIn, and these are the wavelengtns at which the highest performance systems operate. The loss is as low as about 0.2 dB/km, implying potentially a much larger repeater spacing for optical fiber digital communication systems as compared to wire-pairs and coax. A curve of attenuation vs. frequency in Figure 5-15 for wire cable media and for optical fiber illustrates that the latter has a much lower loss. The loss per unit distance of the fiber is a much more important determinant of the distance between repeaters than is the bit rate at which we are transmitting. This is illustrated for a single-mode fiber in Figure 5-16, where there is an attenuationlimited region where the curve of repeater spacing vs. bit rate is relatively flat. As we increase the bit rate, however, we eventually approach a region where the repeater spacing is limited by the dispersion (mode dispersion in a multimode fiber and chromatic dispersion in a single mode fiber). The magnitude of the latter can be quantified simply by considering the Fourier transform of a transmitted pulse, and in particular its bandwidth W. The spreading of the pulse will be proportional to the SEC. 5.3 OPTICAL FIBER 133 100 r----,--,...--,-----,.---,----,-----r--,---r----,,---, 50 10 5 LOSS (dB/ian) 1 0.5 0.1 0.05 0.01 ...... EXPERIMENTAL _ ~- RAYLEIGH SCATTERING ------ ..... 0.8 1.0 1.2 1.4 WAVELENGTH (j.J.m) INFRARED ABSORPTI~ I / II I ~---' I I / 1.6 1.8 Figure 5-14. Observed loss spectrum of an ultra-low-loss germanosilicate single mode fiber together with the loss due to intrinsic material effects [11]. LOSS (dB/km) 1000 WIRE-PAIR 100 10 FIBER v 106 109 1012 1015 FREQUENCY (Hz) Figure 5-15. Attenuation vs. frequency for wire cable and fiber guiding media [10]. The band of frequencies over which the fiber loss is less than 1 dBlkm is more than 101'lHz. 134 PHYSICAL MEDIA AND CHANNELS 10000 6000 4000 2000 BIT RATE Mbl. 1000 600 400 200 • '\ \ • .\ • \ATTENUATION-UMITE \ \ \ •I 100 1 2 4 6 10 20 40 60 100 200 400 1000 SYSTEM LENGTH, km Figure 5·16. Tradeoff between distance and bit rate for a single mode fiber with a particular set of assumptions [8]. The dots represent performance of actual field trial systems. repeater spacing L and the bandwidth W, with a constant of proportionality D. Thus, if we require that this dispersion be less than half a pulse-time at a pulse rate of R pulses per second, DL·W < 1 2R . (5.29) The bandwidth W of the source depends on the linewidth, or intrinsic bandwidth in the absence of modulation, and also on the modulation. Since a non-zero linewidth will increase the bandwidth and hence the chromatic dispersion, we can understand fundamental limits by assuming zero linewidth. We will show in Chapter 6 that the bandwidth due to modulation is approximately equal to the pulse rate, or W ::: R , and hence (5.29) becomes (5.30) This equation implies that in the region where dispersion is limiting, the repeater spacing L must decrease rather rapidly as the bit rate is increased, as shown in Figure 5-16. More quantitative estimates of the limits shown in Figure 5-16 are derived in Problem 5-6 and Problem 5-11. As a result of these considerations, first generation (about 1980) optical fiber transmission systems typically used multimode fiber at a wavelength of about 0.8 Ilmeter, and achieved bit rates up to about 150 Mb/s. The Rayleigh scattering is about 2 dB per kIn at this wavelength, and the distance between regenerative repeaters was in the 5 to 10 kIn range. Second generation systems (around 1985) moved to single mode fibers and wavelengths of about 1.3 Ilmeters, where Rayleigh scattering attenuation is about 0.2 dB/km (and practical attenuations are more on the order of 0.3 SEC. 5.3 OPTICAL FIBER 135 dB/km). The finite length of manufactured fibers and system installation considerations dictate fiber connectors. These present difficult alignment problems, all the more difficult for single mode fibers because of the smaller core, but in practice connector losses of 0.1 to 0.2 dB can be obtained even for single mode fibers. Since it must be anticipated in any system installation that accidental breakage and subsequent splicing will be required at numerous points, in fact connector and splicing loss is the dominant loss in limiting repeater spacing. Bending loss is due to the different propagation velocities required on the outer and inner radius of the bend. As the bending radius decreases, eventually the light on the. outer radius must travel faster than the speed of light, which is of course impossible. What happens instead is that significant attenuation occurs due to a loss of confined power. Generally there is a tradeoff between bending loss and splicing loss in single mode fibers, since bending loss is minimized by confining most of the power to the core, but that makes splicing alignment more critical. In Figure 5-16, the tradeoff between maximum distance and bit rate is quantified for a single mode fiber fOT a particular set of assumptions (the actual numerical values are dependent on these assumptions). At bit rates below about one Gbls 009 bits per second) the distance is limited by attenuation and receiver sensitivity. In this range the distance decreases as bit rate increases since the receiver sensitivity decreases (see Section 5.3.3). At higher bit rates, pulse broadening limits the distance before attenuation becomes important. The total fiber system capacity is best measured by a figure of merit equal to the product of the bit rate and the distance between repeaters (Problem 5-10), measured in Gb-km/sec. Current commercial systems achieve capacities on the order of 100 to 1000 Gb-km/sec. 5.3.2. Sources While optical fiber transmission uses light energy to carry the infonnation bits. at the present state of the art the signals are generated and manipulated electrically. This implies an electrical-to-optical conversion at the input to the fiber medium and an optical-to-eleetrical conversion at the output. There are two available light sources for fiber digital communication systems: the light-emitting diode (LED) and the semiconductor injection laser. The semiconductor laser is the more important for highcapacity systems, so we emphasize it here. In contrast to the LED, the laser output is coherent, meaning that it is nearly confined to a single frequency. In fact the laser output does have non-zero linewidth, but by careful design the linewidth can be made small relative to signal bandwidths using a structure called distributed feedback (DFB). Thus, coherent modulation and demodulation schemes are feasible (Chapter 8), although commercial systems use intensity modulation. The laser output can be coupled into a single mode fiber with very high efficiency (about 3 dB power loss), lum can generate powers in the 0 to 10 mwatt range [9] with one mW (0 dBm) typical (1OJ. The laser is necessary for single mode fibers, except fOT short mstallces, because it emits a narrower beam than the LED. The light output of the laser is very temperature dependent, and hence it is generally necessary to monitor the light output and control the driving current using a feedback circuit. 136 PHYSICAL MEDIA AND CHANNELS There is not much room for increasing the launched power into the fiber because of nonlinear effects which arise in the fiber [9], unless, of course, we can find ways to circumvent or exploit these nonlinear phenomena. 5.3.3. Photodetectors The optical energy at the output of the fiber is converted to an electrical signal by a photodetector. There are two types of photodetectors available - the PIN photodiode, popular at about 100 Mb/s and below, and the avalanche photodiode (APD) (popular above 1 Gb/s) [12,10]. The cross-section of a PIN photodiode is shown in Figure 5-17. This diode has an intrinsic (non-dope.d) re,gion (not typical of diodes) between the n- and p-doped silicon. Photons of the received optical signal are absorbed and create hole-electron pairs. If the diode is reverse biased, there is an electric field across the depletion region of the diode (which includes the intrinsic portion), and this electric field separates the holes from electrons and sweeps them to the contacts. creating a current proportional to the incident optical power. The purpose of the intrinsic region is to enlarge the depletion region, thereby increasing the fraction of incident photons converted into current (carriers created outside the depletion region, or beyond diffusion distance of the depletion region, recombine with high probability befme a..'1ycurrent is generated). The fraction of incident photons converted into carriers that reach the electrodes is called the quantum efficiency of the detector, denoted by Tl. Given the quantum efficiency, we can easily predict the current generated as a function of the total incident optical power. The energy of one photon is h v where h is Planck's constant (6.6'10-34 Joule-sec) and v is the optical frequency, related to the wavelength A. by vA. =c (5.31) where c is the speed of light (3'108 m/sec). If the incident optical power is P watts, LIGHT 1I ~7 ......~-1 --~ IL-_:__V Figure 5-17. A PIN photodiode cross-section. Electrode connection to the n-and p-regions creates a diode, which is reverse-biased. SEC. 5.3 OPTICAL FIBER 137 then the number of photons per second is P /h v, and if a fraction Tl of these photons generate an electron with charge q (1.6.10-19 Coulombs) then the total current is :V ). i = Tlq ( (5.32) Example 5-15. For a wavelength of 1.5 \.lID and quantum efficiency of unity, what is the responsivity (defined as the ratio of output current to input power) for a PIN photodiode? It is -.l = ~ = q 'A = 19 1.6·1O- 1.5-1O--{j = 1.21 amps/watt. P h v h C 6.6'10-34 3.0'108 (5.33) If the incident optical power is a nanowatt, the maximum current from a PIN photodiode is 1.21 nanoamperes. 0 With PIN photodiodes (and more generally all photodetectors), there is a tradeoff between quantum efficiency and speed. Quantum efficiencies near unity are achievable with a PIN photodiode, but this requires a long absorption region. But a long intrinsic absorption region results in a correspondingly smaller electric field (with resulting slower carrier velocity) and a longer drift distance, and hence slower response to an optical input. Higher speed inevitably results in reduced sensitivity. Since very small currents are difficult to process electronically without adding significant thermal noise, it is desirable to increase the output current of the diode before amplification. This is the purpose of the APD, which has internal gain, generating more than one electron-hole pair per incident photon. Like the PIN photodiode, the APD is also a reverse-biased diode, but the difference is that the reverse voltage is large enough that when carriers are freed by a photon and separated by the electric field they have enough energy to collide with the atoms in the semiconductor crystal lattice. The collisions ionize the lattice atoms, generating a second electronhole pair. These secondary carriers in tum collide with the lattice, and additional carriers are generated. One price paid for this gain mechanism is an inherently lower bandwidth. A second price paid in the APD is the probabilistic nature of the number of secondary carriers generated. The larger the gain in the APD, the larger the statistical fluctuation in current for a given optical power. In addition, the bandwidth of the device decreases with increasing gain, since it takes some time for the avalanche process to build up. Both PIN photodiodes and APD's exhibit a small current which flows in the absence of incident light due to thermal excitation of carriers. This current is called dark current for obvious reasons, and represents a background noise signal with respect to signal detection. 5.3.4. Model for Fiber Reception Based on the previous background material and the mathematics of Poisson processes and shot noise (Section 3.4) we can develop a statistical model for the output of an optical fiber detector. This signal has quite different characteristics from that of other media of interest, since random quantum fluctuations in the signal are important. Since the signal itself has random fluctuations, we can consider it to have a 138 PHYSICAL MEDIA AND CHANNELS type of multiplicative noise. In commercial systems, the direct detection mode of transmission is used, as pictured in Figure 5-18. In this mode, the intensity or power of the light is directly modulated by the electrical source (data signal), and a photodetector turns this power into another electrical signal. If the input current to the source is x (t), then the output power of the source is proportional to x (t). Two bad things happen to this launched power as it propagates down the fiber. First, it is attenuated, reducing the signal power at the detector. Second, it suffers dispersion due to chromatic dispersion (and mode dispersion for a multimode fiber), which can be modeled as a linear filtering operation. Let g (t) be the impulse response of the equivalent dispersion filter, including the attenuation, so that the received power at the detector is * P (t) =x(t) g(t). (5.34) In the final conversion to electrical current in the photodetector, the situation is a bit more complicated since quantum effects are important. The incident light consists of discrete photons which are converted to photoelectron-hole pairs in the detector. Hence, the current generated consists of discrete packets of charge generated at discrete points in time. Intuitively we might expect that the arrival times of the charge packets for a Poisson process (Section 3.4) since there is no reason to expect the interarrival times between photons to depend on one other. In fact, this is predicted by quantum theory. Let h (t) be the response of the photodetector circuit to a single photoelectron, and then an outcome for the detected current y (t) is a filtered Poisson process (5.35) where the tm are Poisson arrival times. The Poisson arrivals are characterized by the rate of arrivals, which is naturally proportional to the incident power, A(t) = -h-.v!L.p (t) + Ao (5.36) where" is the quantum efficiensy and Ao is a dark current. Note from Campbell's theorem (Section 3.4.4) that the expected detected current is x(t) ELECTRIC CU RRENT POWER =P(t) ~ LASER or LED FIBER PHOTO DETECTOR y(t) DETECTED CURR ENT Figure 5-18. Elements of a direct detection optical fiber system. SEC. 5.3 OPTICAL FIBER 139 * * * E [Y (t)] = ')...(t) h (t) = --!L·X (t) g (t) h (t) + A.oH (0) (5.37) hv The equivalent input-output relationship of the channel is therefore characterized, with respect to the mean-value of the detector output current, by the convolution of the two filters - the dispersion of the fiber and the response of the detector circuitry. Of course, there will be statistical fluctuations about this average that will be characterized in Chapter 8. This simple linear model for the channel that is quite accurate unless the launched optical power is high enough to excite nonlinear effects in the fiber and source-detector. Avalanche Photodetector In the case of an APD, we have to modify this model by adding to the filtered Poisson process of (5.35) the random multiplier resulting from the avalanche process, (5.38) m the statistics of which has already been considered in (3.143). Define the mean and second moment of the avalanche gain, G=E[Gm ], G2 =E[G';]. (5.39) Then from Section 3.4.6, we know that the effect of the avalanche gain on the second order statistics of (5.35) is to multiply the mean value of the received random process by C and the variance by G 2. If the avalanche process were deterministic, that is precisely G secondary electrons were generated for each primary photoelectron, then the second moment would be the square of the mean, (5.40) The effect of the randomness of the multiplication process is to make the second moment larger, by a factor FG greater than unity, C G 2 =FG 2 (5.41) = where of course FG G2/C2. The factor FG is called the excess noisefactor. In fact, a detailed analysis of the physics of the APD [13] yields the result FG =k'C +(2- ~)'(I-k) G (5.42) where 0 ~ k ~ 1 is a parameter under the contrQ! of the device designer called the carrier ionization ratio. Note that as k --+ I, FG --+G , or the excess noise factor is approximately equal to the avalanche gain. This says that the randomness gets lar~r as the avalanche gain gets larger. On the other hand, as k --+0, FG --+2 for large G, or the ~cess noise factor is approximately independent of the avalanche gain. Finally, when G = 1 (there is no avalanche gain), FG =1 and there is no excess noise. This is the PIN photodiode detector. 140 PHYSICAL MEDIA AND CHANNELS Fiber and Preamplifier Thermal Noise Any physical system at non-zero temperature will experience noise due JO the thermal motion of electrons, and optical fiber is no exception. This noise is often called thermal noise or Johnson noise in honor of J.B. Johnson, who studied this noise experimentally at Bell Laboratories in 1928. Theoretical study of this noise based on the theory of quantum mechanics was carried out by H. Nyquist at about the same time. Thermal noise is usually approximated as white Gaussian noise. The Gaussian property is a result of the central limit theorem and the fact that thermal noise is composed of the superposition of many independent actions. The white property cannot of course extend to infinite frequencies since otherwise the total power would be infinite, but rather this noise can be considered as white up to frequencies of 300 GHz or so. Nyquist's result was that thermal noise has an available noise power per Hz of = hv N(v) ehVlkT. - 1 (5.43) where h is Planck's constant, v is the frequency, k is Boltzmann's constant (1.38'10-23 Joules per degree Kelvin), and Tn is the temperature in degrees Kelvin. By available noise power we mean the power delivered into a load with a matched impedance. If we consider this as a two sided spectral density, we have to divide by two. At frequencies up through the microwave, the exponent in (5.43) is very small, and if we approximate e x by 1 + x we get that the spectrum is approximately white, N(v):::: kTn . This corresponds to a two-sided spectral density of size (5.44) kTn NO=-2-' (5.45) However, at high frequencies, this spectrum approaches zero exponentially, yielding a finite total power. There are two possible sources of thermal noise - at the input to the detector, and in the receiver preamplifier. At the input to the detector only thermal noise at optical frequencies is relevant (the detector will not respond to lower frequencies), and at these frequencies the thermal noise will be negligible. Example 5-16. At room temperature kT" is 4-10-21 Joules. At 1 GHz, or microwave frequencies, h v is about 10-24 Joules, and we are well in the regime where the spectrum is flat. However, at 1 J.llIl wavelength, or v = 3'1014 Hz, h v is about 2-10-19 Joules, and h v is about 50. Thus, kT" the thermal noise is much smaller than kT" at these frequencies. Generally thermal noise at optical frequencies is negligible in optical fiber systems at wavelengths shorter than about 2 J.llIl [14]. D Since the signal level is very low at the output of the detector in Figure 5-18, we must amplify the signal using a preamplifier as the first stage of a receiver. Thermal SEC. 5.3 OPTICAL FIBER 141 noise introduced in the preamplifier is a significant source of noise, and in fact in many optical systems is the dominant noise source. Since the signal at this point is the baseband digital waveform, it occupies a bandwidth extending possibly up to microwave frequencie.s but not optical fre{juencies, hence the importance of thermal noise. We will see in Chapter 8 that this thermal noise is the primary reason for considering the use of an APD detector in preference to a PIN photodiode. A more detailed consideration of the design of the preamplifier circuitry is given in [14] and the problems in Chapter 8. 5.3.5. Advanced Techniques Two exciting developments have been demonstrated in the laboratory: soliton transmission and erbium-doped fiber amplifiers. The soliton operates on the principle of the optical Kerr effect, a nonlinear effect in which the index of refraction of the fiber depends on the optical power. As previously mentioned, in ch1"Omatic dispersion, the index of refraction also depends on the wavelength. Solitons are optical pulses that have a precise shape and peak power chosen so that the Kerr effect produces a chirp (phase modulation) that is just appropriate to cancel the pulse broadening induced by group-velocity dispersion. The result is that all the wavelengths can be made to travel at the same speed, essentially eliminating w.aterial dispersion effects. In soliton transmission, material attenuation is the only effect that limits repeater spacing. An optical amplifier can be constructed out of a fiber doped with the rare-earth element erbium, together with a semiconductor laser pumping source. If the pumping source wavelength is 0.98 or 1.48 !lm, then the erbium atoms are excited into a higher state, and reinforce. 1.55 lJ.ffi incident light by stimulated emission. With about 10 mW of pumping power, gains of 30 to 40 dB at 1.55 11m can be obtained. A receiver designed using this optical amplifier is shown in Figure 5-19. The optical amplifier has gain G, which actually depends on the input signal power because large signals deplete the excited erbium atoms and thereby reduce the gain. The amplifier also generates a spurious noise due to spontaneous emission, and the. purpose of the optical bandpass filter is to filter out spontaneous noise outside the signal bandwidth (which depends on source linewidth as well as signal modulation). There is a premium on narrow linewidth sources, because that enables the optical filter bandwidth to be min-iIrJzed. I Cir INPUT ~ OPTICAL SIGNAL OPTICAL MPLIFIER G OPTICAL BANDPASS _ _F1_L_TE_R_...l PHOTODETECTOR • i (l) Figure 5-19. A direct-detection optical receiver using an optical amplifier. 142 PHYSICAL MEDIA AND CHANNELS The effect of the amplifier is similar to an avalanche detector, in that it increases the signal power (rendering electronic thennal noise insignificant) while adding additional spontaneous noise. The major distinction between the amplifier and ayalanche detector, however, is that much of the spontaneous noise in the amplifier can be optically filtered out, whereas in the detector it cannot. It is also possible to place optical amplifiers at intennediate points in a fiber system, increasing the repeater spacing dramatically. The design of the receiver in Figure 5-19 will be considered further in Chapter 8. 5.4. MICROWAVE RADIO The tenn "radio" is used to refer to all electromagnetic transmission through free space at microwave frequencies and below. There are many applications of digital transmission which use this medium, primarily at microwave frequencies, a representative set of which include point-to-point terrestrial digital radio, digital mobile radio, digital satellite communication, and deep-space digital communication. Terrestrial digital radio systems use microwave hom antennas placed on towers to extend the horizon and increase the antenna spacing. This medium has been used in the past principally for analog transmission (using FM and more recently SSB modulation), but in recent years has gradually been converted to digital transmission due to increased demand for data services. Example 5-17. In North America there are frequency allocations for telephony digital radios centered at frequencies of2, 4,6,8, and II GHz [15]. In the United States there were over 10,000 digital radio links in 1986, including a cross-country network at 4 GHz. 0 A related application is digital mobile radio. Example 5-18. The frequency band from 806 to 947 MHz is allocated in the United States to land mobile radio services [16]. This band is used for cellular mobile radio [17], in which a geographical area is divided into a lattice of cells, each with its own fixed omni-directional base antenna for transmission and reception. As a vehicle passes through the cells, the associated base antenna is automatically switched to the closest one. An advantage of this concept is that additional mobile telephones can be accommodated by decreasing the size of the cells and adding additional base antennas. 0 Satellites are used for long-distance communication between two terrestrial antennas, where the satellite usually acts as a non-regenerative repeater. That is, the satellite simply receives a signal from a terrestrial transmitting antenna, amplifies it, and transmits it back toward another terrestrial receiving antenna. Satellite channels offer an excellent alternative to fiber and cable media for transmission over long distances, and particularly over sparse routes where the total communication traffic is small. Satellites also have the powerful characteristic of providing a natural multiple access medium, which is invaluable for random-access communication among a SEC. 5.4 MICROWAVE RADIO 143 number of users. A limitation on satellites is limited power available for transmission, since the power is derived from solar energy or expendable resources. In addition, the configuration of the launch vehicles usually limit the size of the transmitting and receiving antennas (which are usually one and the same). Most communication satellites are put into synchronous orbits, so that they appear to be stationary over a point on the earth. This greatly simplifies the problems of antenna pointing and satellite availability. In deep-space communication, the object is to transmit data to and receive data from a platform that is at a great distance from earth. This application includes the features of both satellite and mobile communication, in that the vehicle is usually in motion. As in the satellite case, the size of the antenna and the power resources at the space vehicle are limited. With the exception of problems of multipath propagation in terrestrial links, the microwave transmission channel is relatively simple. There is an attenuation introduced in the medium due to the spreading of the energy, where this attenuation is frequency-independent, and thermal noise introduced at the antenna and in the amplifiers in the receiver. These aspects of the channel are covered in the following subsections followed by a discussion of multipath distortion. 5.4.1. Microwave Antennas and Transmission Microwave propagation through free-space is very simple, as there is an attenua- tion due to the spreading of radiation. The attenuation varies so slowly with frequency that it can be considered virtually fixed within the signal bandwidth. Consider first an isotropic antenna; namely, one that radiates power equally in all directions. Assume the total radiated power is Pr watts, and assume that at distance d meters from this transmit antenna there is a receive antenna with area AR meters2. Then the maximum power that the receive antenna could capture is the transmit power times the ratio of AR to the area of a sphere with radius d, which is 4nd 2. There are two factors which modify this received power. First, the transmit antenna can be designed to focus or concentrate its radiated energy in the direction of the receiving antenna. This adds a factor Gr called the transmit antenna gain to the received power. The second factor is the antenna efficiency 11R of the receive antenna, a number less than (but hopefully close to) unity; the receive antenna does not actually capture all the electromagnetic radiation incident on it. Thus, the received power is PR = Pr -4AnRd-2 Gr11R' (5.46) At microwave frequencies, aperture antennas (such as hom or parabolic) are typically used, and for these antennas the achievable antenna gain is (5.47) where A is the area of the antenna, A is the wavelength of transmission, and 11 is an efficiency factor. Expression (5.47) applies to either a receiving or transmitting antenna, where the appropriate area and efficiency are substituted. This expression is 144 PHYSICAL MEDIA AND CHANNELS intuitively pleasing, since it says that the antenna gain is proportional to the square of the ratio of antenna dimension to wavelength. Thus, the transmit antenna size in relation to the wavelength is all that counts in the directivity or gain of the antenna. This antenna gain increases with frequency for a given antenna area, and thus higher frequencies have the advantage that a smaller antenna is required for a given antenna gain. The efficiency of an antenna is typically in the range of 50 to 75 percent for a parabolical reflector antenna and as high as 90 percent for a horn antenna [18]. An alternative form of the received power equation can be derived by substituting for the area of the receive antenna in (5.46) in terms of its gain in (5.47), J PPR = GrGR [ 41Atd 2; r (5.48) this is known as the Friis transmission equation. The term in brackets is called the path loss, while the terms Gr and GR summarize the effects of the two antennas. While this loss is a function of wavelength, the actual power over the signal bandwidth does not generally vary appreciably where the bandwidth is very small in relation to the center frequency of the modulated signal. The Friis equation does not take into account other possible sources of loss such as rain attenuation and antenna mispointing. The application of these relations to a particular configuration can determine the received power and the factors contributing to the loss of signal power. This process is known as generating the link power budget, as illustrated by the following examples. Example 5-19. Detennine the received power for the link from a synchronous satellite to a terrestrial antenna for the following parameters: Height 40.000 Ian. satellite transmitted power 2 watts, transmit antenna gain 17 dB, receiving antenna area 10 meters2 with perfect efficiency, and frequency 11 GHz. The wavelength can be obtained from (5.31). A. =-c v =-3-1-08 11'109 =27.3 mm . (5.49) The receive antenna gain is Next. the path loss is 1000glOGR = 1000gIO 41t·1O 32 (27.3'10- ) =52.2. (5.50) A. 273'10-3 1000g,o 4--' 2 = 2010g,o[ [ IIoU ] . 7] 41t·4·10 = - 205.3 dB . (5.51) Finally. we are in a position to calculate the received power. which we express in dBW (decibels relative to one watt) and recognizing that the transmit power is 3 dBW, 1000g,oPR = 3 dBW + 17 + 52.3 - 205.3 = -133 dBW. (5.52) D SEC. 5.4 MICROWAVE RADIO 145 Example 5-20. The Mariner-1O deep-space mission to Mercury in 1974 used a transmitter power of 16.8 watts and frequency 2.3 GHz. The transmit antenna diameter on the spacecraft was 1.35 meters with efficiency 0.54, which results in an antenna gain of 27.6 dB. The terrestrial receive antenna diameter was 64 meters with efficiency 0.575, for an antenna gain of 61.4 dB. The distance from the spacecraft to ground was 1.6'1011 meters, for a path loss of 263.8 dB. Finally, the received power was 1000gloPR = 10i0g 1016.8 + 27.6 + 61.4 - 263.8 =-162.6 dBW. (5.53) o The two dominant effects of microwave propagation are attenuation and delay. It is of interest to determine the effect on a passband signal, represented by the complex-baseband signal of Section 2.4. Assume the attenuation is A , the distance of propagation is d, and the speed of propagation isc. The delay the signal experiences is 't =die, and given a passband signal of the form of Figure 2-6, the output of the channel is (5.54) where We't We 21t k=-=-=- d c A' (5.55) is called the propagation constant. The equivalent complex-baseband channel, shown in Figure 5-20, characterizes the effect of propagation on the equivalent complex- baseband signal. Not surprisingly, the baseband signal is delayed by 't, the same as the passband signal. In addition, there is a phase shift by kd = 21tdIA radians, or 21t radians for each wavelength of propagation distance. The equivalent complex-valued impulse response of the propagation is an impulse with delay 't and area A .e - jkd , and the equivalent transfer function is Ae - jCJYte - jkd. The only frequency dependence is I Jl I u(I) ==~>,J D',~Y.==~~® k,-i~ > OUTPUT (a) A'e - jkd (b) f----I-----_) 1 =(=c)=~;i M -i-, -i~ > o Figure 5-20. The equivalent complex baseband channel for a freespace propagation with attenuation A and distance d. a) Equivalent system, b) the equivalent impulse response, and c) the equivalent baseband transfer function. 146 PHYSICAL MEDIA AND CHANNELS linear in frequency, due to the delay. For mobile receivers, the effect of small changes in d on the baseband channel response is particularly significant. The effect is dramatically more pronounced for the phase shift than for the delay. Example 5-21. For a carrier frequency of 1 GHz (typical for mobile radio), the propagation constant is k =roc Ie =21 radians/meter. Thus, a 1tI2 =1.57 radian phase shift, which will be very significant to demodulators, occurs with every 7.4 centimeters change in propagation distance. In contrast, the propagation delay changes by 3.3 nanoseconds for each meter change in propagation distance. In relation to typical baseband bandwidths, this is totally insignificant. For example, at 1 MHz, the change in phase shift due to this delay change is = only CO't 27t·0.0033, or roughly one degree. 0 5.4.2. Noise in Microwave Amplifiers On a radio link noise enters the receiver both through the antenna and as internal noise sources in the receiver. We saw in (5.45) that both sources of noise are Gaussian and can be considered as white up through the microwave frequencies. White noise is completely specified by the spectral density No, given by (5.45). However, in radio transmission it is common to express this spectral density in terms of an equivalent parameter, the noise temperature expressed in degrees Kelvin. This cus- tom derives from the functional form of (5.45), where No is strictly a function of the temperature. The custom of specifying noise temperature derives from the fact that Tn is reasonable in size, on the order of tens or hundreds of degrees, whereas No is a very small number. Note however that the total thermal noise at some point in a system may be the superposition of many thermal noise sources at different temperatures. Hence, the noise temperature is merely a convenient specification of the noise power, and is not necessarily equal to the physical temperature of any part of the system! For example, if we amplify the noise we increase the noise temperature without affecting the physical temperature of the source that generated that noise. There are two sources of noise - the noise incident on the antenna and the noise introduced internally in the receiver. The noise incident on the antenna depends on the effective noise temperature in the direction the antenna is pointed. For example, the sun has a much higher effective temperature than the atmosphere. The noise introduced internal to the receiver depends on the design and sophistication of the receiver. It is customary to refer all noise sources to the input to the receiver (the antenna), and define an equivalent noise temperature at that point. Since the stages of a receiver typically have large gains, the noise introduced internal to the receiver usually has a much smaller noise temperature when referred to the receiver input. These receiver noise temperatures range from about four degrees Kelvin for supercooled maser amplifiers to the range of 70 to 200 degrees Kelvin for receivers without physical cooling. Example 5-22. Continuing Example 5-20 for the Mariner-l0 mission the effective noise temperature of the antenna plus receiver was 13.5 degrees Kelvin. A bit rate of 117.6 kb/s was used. What is the signal-to-noise ratio in the receiver assuming the bandwidth of the system is half the bit SEC. 5.4 MICROWAVE RADIO 147 rate, 58.8 kHz? (We will see in Chapter 6 that this is the minimum possible bandwidth for binary transmission.} The total noise power within the receiver bandwidth would be P" =kT"B = 1.1&·1O-23·BS58.8·}oJ=-}@.6dBW (5.56) The signal-la-noise ratio is therefore SNl? =-162.6 + 169.6 =7.CJdB. (5.57) In practice the noise bandwidth will be larger than this, and the SNR will be lower, perhaps by a couple of dB. This SNR will support data tra.'1smission, albeit at a rather poor error rate. Coding techniques (Chapters 13 and 14) can compensate for the poor SNR. 0 The SNR as calculated in this example is a useful quantity since it will not be changed by the large gain introduced in the receiver (both signal and noise will be affected the same way). SA.3. Emission Masks A radio channel does not in itself provide any significant restriclion on the bandwidth that we can use for digital communication. Moreover, the free-space channel introduces only a slowly varying dependence of attenuation on frequency (due to antenna gain). Thus, there is nothing inherent about the channel to introduce significant motivation to be spectrally efficient. Enter the regulatory agenciesr Since the radio spectrum is a scarce commodity, it is carefully allocated to individual users. Unlike optical-fib~r, where different users can install their own fiber, we must all share a single ra¢o environment. To prevent significant interference between users, spectral emission masks are specified by regulation. An example of such a mask is shown in Figu~e 5-21. In this case, the regulatory agency has assigned a nominal 30 MHz bandwidt{1 centered at f c to a pa.rticular user, but for practical reasons has allowed that us~t a small amount of power (down more than 50 dB) fdB o~------~------~ -50 rI ------I I: -804 I J -26 -15 o 15 26 , -, J Jc MHz Figure 5-21. A spectrat emission mask referenced to a nominal 30 MHz channel bandwidth. The vertical axis is transmitted power spectrum referenced to the power of an unmodulated carrier. The user signal must stay under the mask. (This mask applies to the United States.) 148 PHYSICAL MEDIA AND CHANNELS outside that band. This mask is usually adhered to by placing a very sharp cutoff filter in the radio transmitter. Since this filter is imposed by external constraints, it is natural to think of this filter as being part of the channel (this logic is oversimplified of course since the filter requirements depend on the spectrum of the signal feeding the filter). From this perspective, the microwave radio channel has a very sharp cutoff at the band edges, in contrast to the media we have seen earlier which have at most a gradual increase of attenuation with frequency. This characteristic is shared by the voiceband data channel in the next section, and for this reason similar modulation techniques are often employed on the two media. 5.4.4. Multipath Fading We have seen how the link budget can be determined for a radio link. The calculation we made assumed for the most part idealized circumstances, whereas in practice additional system margin must be included in the link budget to account for foreseen or unforeseen circumstances. For example, at higher frequencies there will be an additional rain attenuation during rainstorms at the earth receiving antenna. In terrestrial microwave systems, there is an additional important source of attenuation that must be accounted for - muLtipathfading [19]. Both rain attenuation and multipath fading result in an attenuation on the signal path that varies with frequency. A significant difference is that unlike rain attenuation, multipath fading can result in a large frequency-dependent attenuation within the narrow signal bandwidth. This phenomenon is known as seLective fading. The mechanism for multipath fading, shown in Figure 5-22, is very similar to mode distortion in multimode optical fibers and to the distortion introduced by bridged taps in wire-pairs, except that it is time varying. The atmosphere is inhomogeneous to electromagnetic radiation due to spatial variations in temperature, pressure, humidity, and turbulence. This inhomogeneity results in variations in the index of refraction, resulting in possibly two or more ray paths for electromagnetic waves to travel from transmitter to receiver. Another source of multipath is the reflection of radio waves off of obstacles, such as buildings. The effective path lengths may be different for the different rays, and in general will interfere with one another since the receiver will perceive only the sum of the signals. :Path 1 Path 2 REC.ANTENNA Figure 5-22. Illustration of two ray paths between a transmit and receive radio antenna. Fading attenuation results when the two paths have different propagation delays. SEC. 5.4 MICROWAVE RADIO 149 We can detennine the effect of multipath fading on a passband signal using the equivalent complex-baseband response for a single path and applying superposition. If we assume two paths have attenuations A I and A 2 and propagation distances d I and = = d 2• corresponding to propagation delays 'tl d lie and 't2 d 21c, we can define two parameters I1d = d I - d 2 and 11't = 'tl - 't2. Then by superposition the equivalent complex-baseband channel transfer function is A Ie - JOYtle - jled l +A 2e - j0Yt2e - jkd 2 =Ale -jOYtle -jled1.(l + A 2 e jro.tJ.'te j k.l1d). Al (5.58) The first terms have a constant and linear phase shift due to the delay 'tl , identical to the first path. The term in parentheses is important, because it can display a compli- cated dependence on frequency due to constructive and destructive interference of the two signals at the receiver. The critically important parameter is 11't, which is called the delay spread. Two distinct cases can be distinguished. The first occurs when, for baseband frequencies of interest, Ico·I1't I «1t, so that the frequency-dependence of the second term is insignificant. This is called the narrowband model. For this case, the two path propagation is similar to a single path, in that it results in a delay (linear phase shift with frequency) plus a constant phase shift. The contrary case is called the broadband model, and results in a more complicated frequency dependence due to constructive and destructive interference. Example 5-23. Assume that we define the transition between the narrowband and broadband model as a delay spread such that I ro·~'t I == om·1t (1.8 degrees) at the highest baseband frequency of interest. EquiValently, we expect that f == l!200~'t for the highest frequency. Then if the delay spread is 1 nanosecond, baseband channels with a bandwidth less than 5 MHz are considered narrowband. and bandwidths greater than 5 MHz (especially those significantly greater) are considered broadband. If the delay spread increases to 100 nanoseconds, then the narrowband channel has bandwidth less than 50 kHz according to this criterion. Note that all that counts is the delay spread, and not the absolute delay nor the carrier frequency. Also note that the actual passband signal has a bandwidth double the equivalent complex baseband signal. 0 Example 5-24. For the two-path case, the magnitude-squared of the frequency response for the frequencydependent term of interest is Il+pejlM't12==1+ IpI2+2·Re{pe jIM't} (5.59) for some complex constant p. We will chose a delay spread of 10 nanoseconds (a typical worst-case number in an urban environment) and a fairly large rpi =0.99. This is plotted in dB in Figure 5-23 over a ±SO MHz frequency range, a broadband model, and a narrower frequency range, a narrowband model. Note the large notches due to destructive interference at some frequencies, accentuated by the fact that the two paths are nearly the same amplitude. Also note the close to 6 dB gain at some frequencies due to constructive interference. The narrowband model is plotted over a ±SOO kHz frequency range, which by 150 PHYSICAL MEDIA AND CHANNELS WIDEBAND \10DEl 10 r---.-----.-.-----,--,----,-r---.-----.-----, dB o ldB ~ .40 LL-...J...L---LL----LL....J...L..-.I.-'-....L.L..---I..1----1..L......JL.L-J -SO 0 MHz SO 0 1_L---'------l.-----1.-----JL.--'--------'------L----.1_ -SOO 0 kHz 500 FJgws 5-23. Comp!ex baseband channel amplitude response over a wide frequency range and a narrow frequency range for a two-path model with p = 0.99j. 1he criterion of Example 5-23 is a narrowband model. Note that 1he channei response varies only a couple of dB over this range. 0 The two-path model, which is usually adequate for fixed terrestrial microwave systems, suggests that fading may result in either a monotonic gain change (or slope) across the channel or as a dip (or notch) in the channel response within the bandwidth. A typical faded channel response is shown in Figure 5-24, and the typical parameters that characterize the fade are identified [15]. In Section 5.4.1, w~ sho~ed that. the, power .loss in freespace radio pr~~agation obeys a square-law reiauonshlp; that IS, tne receIve power decreases as d ~, or the path loss in dB increases as 2O·!og1od. For terrestrial w..icrowave transmission, the path loss increases more rapidly than in freespace, typically more like d-4 or ---iPATH GAIN (dB) ~F CHANN'\1. I ~NFADED RESPONSE_ I F~~ 5-24. A typical frequency-selective notch due to fading with soma terminology. Nota that the impact on the channel depends strongly on the location of the notch relative to the channeibandwidth. SEC. 5.4 MICROWAVE RADIO 151 40·loglOd in dB. This can be explained using the simple model of Figure 5-25. Even for highly directional antennas, for a large d there will be a substantial reflection off the ground interfering at the receive antenna. Typically the ground is close to a short circuit for oblique angles of incidence at microwave frequencies, implying a reflection coefficient near - 1 (so that the net incident and reflected electric fields sum to zero). Exercise 5-3. For the geometry of Figure 5-25, consider only the reflection resulting if the ground acts like a perfect mirror, and that both the direct and indirect paths suffer a freespace loss. Assuming the distance between antennas is much higher than the antenna heights, show that the resulting net power loss is approximately ~R =[ PR ] .[ 41tht hr ] 2 . (5.60) T PT freespace Ad Hence, the effect of the reflection is a destructive interference that increases the path loss by another factor of d-2 over and above the freespace loss. 0 Note that, not unexpectedly, it is advantageous to have high antennas (the loss decreases as the square of the antenna heights). Even when the transmitter and receiver are at fixed locations relative to one another, fading is a time-varying phenomenon for large distances (30 km or greater) due to atmospheric phenomena. Of considerable importance to designers of radio systems is not only the depth but also the duration of fades. Fortunately, it has been observed that the deeper the fade, the less frequently it occurs and the shorter its duration when it does occur. Also, the severity of fades increases as the distance between antennas increases or as the carrier frequency increases. Fading can also be mitigated d I '- - -1 VIRTUAL \ I TRANSMITTER,," \- -/ '" , _I " GROUND REFLECTION \I " '" '" II" Figure 5-25. The attenuation of a terrestrial microwave system is increased by the ground reflection. There will be ground reflections from all points between the two antennas, but the single reflection resu~ing if the ground acts like a perfect mirror is shown. The transm~ an- tenna height is h" the receive antenna height is h" and the distance between antennas is d. 152 PHYSICAL MEDIA AND CHANNELS by using diversity techniques, in which two or more independent channels are somehow combined [20]. The philosophy here is that only one of these channels at a time is likely to be affected by fading. 5.4.5. Mobile Radio One of the most appealing uses for radio transmission is for communication with people or vehicles on the move. For this type of communication there is really no alternative to radio transmission, except for infrared, which does not work well outdoors. Mobile radio exhibits some characteristics that are different from point-topoint transmission. First, antennas must generally be omni-directional, and thus they exhibit much less antenna gain. Second, there can be obstacles to direct propagation, causing a shadowing effect that results in large variations in received signal power with location. Third, the most common application is in urban areas, where there are many opportunities for multiple reflections, and the two-path model is usually not accurate. Fourth, the user is often moving, resulting in extreme time-variations in transmission conditions over even short distances, as well as Doppler shift in the carrier frequency. The two-path model is easily extended to an M -path model, again using superposition. In this case, the complex-baseband output of the channel is ItM '/cd Aj u (t - tj) e - ] i j=l (5.61) where the A j are real-valued attenuation coefficients, dj is the length of the i -th path, and tj is the propagation delay of the i -th path. There may be a dominant path whose attenuation coefficient obeys the fourth-power law with distance, but the other coefficients depend on the reflection coefficients of indirect paths and hence bear a complicated relationship to position. Furthermore, due to shadow effects, there may even be no dominant path. for example if the mobile receiver is located behind a building; the radio waves will suffer a diffraction loss. This shadowing loss typically varies markedly over a distance of tens to hundreds of meters. If we average the received power over an area on the order of 1 km2, we will see the fourth-power loss with distance, but if we average over an area on the order of 1 mete~ we will see an additional fluctuation with position due to shadowing. Shadowing is often assumed to result in a log-normal distribution in local-average received power, that is, the power expressed in dB has a Gaussian distribution. The standard deviation of the power expressed in dB is roughly 4 dB for typical urban areas. When we examine local received power, not averaged over an area, we begin to see wild fluctuations due to multipath fading. For a moving vehicle, fades of 40 dB and more below the local-average level are frequent, with successive minima occurring every half wavelength or so (a fraction of a meter at microwave frequencies). Thus, the motion of the vehicle introduces a whole new dimension to the fading experienced on a point-to-point system, where the fluctuations are much slower. This rapid fluctuation is known as Rayleigh jading because the distribution of the envelope of the received carrier often obeys a Rayleigh distribution [21]. SEC. 5.4 MICROWAVE RADIO 153 To understand Rayleigh fading, we must examine the effect of vehicle motion, which results in a time variation in received carrier phase. As before, this can be understood by considering a single path, and then applying superposition to multiple paths. The geometry of a single path is shown in Figure 5-26, including a reflection between the transmitter and receiver. As shown, a virtual transmitter can be defined behind the reflector with a linear propagation to the receiver. Let d be a vector from virtual transmitter to receiver at time t =0, let v be the velocity vector for the vehicle at time t =0, and let S be the angle between d and v, or the angle of incidence of the propagation path relative to the vehicle velocity. Let the scalar initial distance and velocity be d = II d II and v = II v II. The vector from transmitter to receiver is d + v·t, and the propagation distance as a function of time is II d + v·t II = [d 2 + v 2t 2 + 2t] v. (5.62) where the inner product is =dv ·cosS. This distance is not changing linearly with time, but can be approximated by a linear function of time. Exercise 5-4. Show that if t «d Iv. then (5.62) can be approximated accurately by d + vt ·cose. For example. if d = 1 km and v =30 m/sec (approximately 100 km/hr) then the approximation holds for t «66 sec. 0 The time scale over which the linear approximation to distance is valid is quite large relative to the significant carrier phase fluctuations, and hence it is safe to assume that the distance to the receiver is changing as v ·cosS·t. This change in distance has slope +v when the receiver is moving directly away from the transmitter, -v when it is moving directly toward the transmitter, and zero when the receiver is moving orthogonally to the transmitter. With this basic geometric result in hand, the received complex-baseband signal is VIRTUAL REFLECTOR TRANSM_ITTER d ~r--------~~----------~ vt TRANSMITTER Figure 5-26. Trajectory of motion for a vehicle moving at constant velocity, relative to a propagation path including a reflection. 154 PHYSJCAL MEDtA AND CHANNELS A.Re{u(t-d---cV os9t)e-j".L...J. e - j..v ",_. c o e s f e j" C CJ) f } CC (5.£3) We see here several propagation effects. First, the baseband signal u (t) is delayed by a time-varying amount, due to the changing propagation distance. This effect is generally insignifi~ant at the baseband frequendes of interest. Second, there is a static = phase shift e - j/cd due to the propagation distance at J O. Third, and most interest- ing, is a phase shift that is linear with time. In effect, this is a frequency offset, known as the Doppler shift. The carrier frequency is shifted from roc to roc - rod' where the Doppler frequency is = = rod kv cose 27tV Tcose (5.64) \\'hen the receiver is moving away from the transmitter, the Doppler shift is negative; it is positive when the re.ceiver is moving toward the tra."l.smitter. Example 5-25. If the vehicle velocity is v = 30 mtsec (100 km/hr), and the carrier frequency is 1 GHz (A. =0.3 meters), then the maximum Doppler Shift is f d =ViA =100Hz. This illustrates that relative to the carrier frequency. the Doppler shift is typically quite small. but relative to baseband frequencies it can be relatively large. Also observe that for a constant vehicle velocity, the Doppler sp,jft becomes larger as the carner frequency increases. 0 In addition to affecting the propagation distance and angle of incidence. the reflection in Figure 5-26 will also affect the attenuation constant and add an unknown phase shift due to t..~ reflection coeffident. The Doppler shift by itself might not be a big issue. since it results in an offset in a carrier frequency that might not be too precisely known in the first place. The more substantive effect occurs when there are two or more paths. each with different Doppler shifts because their incident angles at the receiver are different. If the delay spread of the different paths is small. we can assume a narrowband model; that is. the different delays of the a.'Tiving replicas of the baseband signal u (t) are insignificant for the baseband frequencies of interest. The resulting superposition of different Doppler shifts can result in a rapidly fluctuating phase and amplitude. For example, = for a set of paths with amplitude Ai' delays 'ti 't assumed to be the same on all paths (which is the narrowband model), phase shifts is the phase, so we can say that the envelope has a Rayleigh distribution and the phase is uniform. The conclusion is that when a CW carrier is transmitted, the received signal R (t) is well approximated as a complex Gaussian process. The power spectrum of that 158 PHYSICAL MEDIA AND CHANNELS process can be calculated, if we make assumptions about the distribution of arriving power vs. angle. This is because the frequency of an arriving component depends directly on the cosine of the angle of arrival. Let R{t) have power spectrum SR(jm). The contribution to R (t) arriving at angle 8 is at frequency (roc + kv )'cosR This implies that SR Uro) is confined to frequency band [roc - kv, roc + kv]. In particular, the total power arriving in band [roo, roc + kv] corresponds to angle of arrivals in the range /oJ .cosO + me 2: 000' l r roo - roc 1 kvl· or H31 :::; 80 = cos-1 .. J (5.73) If we assume, for example, that a total received power P is arriving uniformly spread over all angles I8 I :::; 1t, then the portion of the power arriving in band [roo, roc + kv ] must be P ·8i1t. Thus, 'Ir 1 )2 coc+ Jr kv s" RV ro, d ro = P _1 _·cos roo - roc I kv ' ~ 1t 1t ~ ) (5.74) and differentiating both sides with respect to Wo, the power spectrum is SRUro)= / 2P ,Iro-roel:::;kv, --V(kv)2 - (ro - roc / and zero elsewhere (of course the spectrum is symmetric about ro =0). This power spectrum is plotted for positive frequencies in Figure 5-32, where we see that the power is concentrated in the region {)f frequencies me ± kv. A sample function of a random process with this power spectrum will look like a random ve.rsion of the deter- ministic signal cos(roc t ) cos( kvt), since the latter has a Fourier transform that consists of delta functions at roc ± kv. This AM-DSB signal is the carrier multiplied by an envelope with periodic zero crossings (fades) spaced at rrIkv = IV2v sec intervals. 1 SHU OJ) I I I I I L; We -kv We -ro We +kv Agure 5·32. The Doppler power spectrum of the received carrier for a vehicle traveling at velocity v, assuming the received signal power is spreadwniform!yover aI! angles of arrivaL SEC. 5.4 MICROWAVE RADIO 159 This is just the time it takes for the vehicle to travel a half wavelength. Thus, temporally, Rayleigh fading exhibits a strong tendency toward fades every half wavelength when the power is uniformly spread over all incoming angles. Example 5-26. If the vehicle velocity is 100 km/hr and the carrier frequency is 1 GHz, the maximum Doppler frequency is approximately 100 Hz. This means that the individual paths coming into the receiver can have Doppler shifts on the order of ± 100 Hz, or the bandwidth of the passband signal is increased by approximately 200 Hz due to the motion of the vehicle. The wavelength is about 0.3 meters, so that the time it takes the vehicle to travel a half wavelength is t= 0.15 meters 30 meters/sec = 5 msec. (5.76) We can expect significant fades approximately ever} 5 msec, which happens to be the reciprocal of the 200 Hz range of Doppler shifts. 0 The model of Figure 5-27 and the Rayleigh fading derivation assumed a narrowband model; that is, the delay spread is small with respect to the reciprocal of the bandwidth, or equivalently that delays tj in (5.65) are identical over all paths. Thus, the model must be modified to accommodate a wideband model, when the signal bandwidth is too large. Usually this is handled as follows. First, any given reflection, like off a high-rise building, is actually a complicated superposition of multiple reflections, where the delay spread across these reflections is small enough to obey the narrowband model. Thus, this single reflection can actually be represented by a narrowband Rayleigh fading model with an associated delay tl- Now if there is a second reflection with a significantly different delay, it can be represented by another narrowband Rayleigh fading model with delay t2 '# tl' The broadband model follows from superposition of these narrowband models. A two-path broadband model is illustrated in Figure 5-33. The complexbaseband signal U (1) experiences the two path delays tl and ~' and the two delay out- puts are multiplied by independent complex-Gaussian processes r1(1) and r2(t). Each path also has an associated attenuation A j , and a static phase shift which can be u(t) 1===:> OUTPUT Figure 5·33. A broadband two-path model, where each path is assumed to be independently Rayleigh fading. 160 PHYSICAL MEDIA AND CHANNELS subsumed in 'l(t) and '2(t). This broadband model is easily generalized to an arbitrary number of paths. 5.5. TELEPHONE CHANNELS Most locations in the world can be reached over the public telephone network, so the voiceband channel is an almost universal vehicle for data communication. The design of digital modems for telephone channels is challenging, because the channel was designed primarily for voice, and impairments that are not serious for voice can be debilitating for data signals. The telephone channel is a prime example of a composite channel, consisting of many media such as wire pairs, satellite channels, coaxial cables, terrestrial microwave links, and optical fiber. Even more important than the media are the many modulation systems built on top of these media, such as pulsecode modulation and single-sideband modulation. The characteristics of the channel vary widely depending on the particular connection. It is useful to discuss these characteristics, not only because of the importance of this particular channel, but also because we encounter many impairments that occur in other situations as well. 5.5.1. Measured Performance of Telephone Channels Because of the wide variety of possible connections, there is no simple analytical characterization of the telephone channel. Modem designers rely rather on statistical surveys of telephone circuits. In the U.S., a comprehensive survey was conducted in 1969-70 [22] and again in 1982-83 [23]. The data in this section comes primarily from interpretation of the second survey. A modem designer needs to determine the acceptable percentage of telephone connections over which the modem will perform, and then find the parameter thresholds that are met or exceeded by that percentage of channels. The resulting thresholds can be quite sensitive to the percentage. Example 5-27. According to the 1982-83 connection survey, 99% of end-to-end channels attenuate a 1004 Hz tone 27 dB or less. But 99.9%,of channels attenuate the same tone 40 dB or less. To get the extra 0.9% coverage, an additional 13 dB of loss must be tolerated. 0 In Table 5-1 we give typical worst-case figures assumed for some of the impairments on the channel. The percentage of telephone channels that exceed this performance is roughly 99%. Linear distortion is a major impairment that is missing from the table because it is difficult to summarize concisely. It is discussed below, followed by discussions of the remaining impairments. Linear Distortion The frequency response of a telephone channel can be approximated by a linear transfer function B (j (0), roughly a bandpass filter from 300 to 3300 Hz. This bandwidth is chosen to give acceptable voice quality in the network, and is enforced by bandpass filters in analog and digital modulation systems used in the network. A typical transfer function of a telephone channel is illustrated in Figure 5-34, using SEC. 5.5 TELEPHONE CHANNELS 161 Imoairment Attenuation of a 1004 Hz tone Signal to C-notched noise ratio Signal to second hannonic distortion ratio Signal to third hannonic distortion ratio Frequency offset Peak to peak phase Jitter (2-300 Hz) Peak to peak phase jitter (20-300 Hz) Impulse noise (-4 dB threshold) Phase hits (20 degree threshold) Round trip delay (no satellites) level 27 dB 20 dB 34 dB 33 dB 3Hz 20 degrees 13 degrees 4 per minute 1 per minute 50ms Table 5-1. Typical worst-case impairments assumed for telephone channels. Roughly 99% of the telephone circuits measured in the 1982-83 connection survey [23J meet or exceed this performance. traditional terminology that we now will explain. Amplitude distortion, the magnitude of the frequency response, is plotted as attenuation (or loss) vs. frequency. Amplitude distortion is often sununarized as a set of slope distortion numbers, which attempt to capture images such as Figure 5-34a. A typical slope distortion measure is 3.5 25 SHORT «181 MILES) - - LONG (>720 MILES) -------- 20 LOOP ONLY - - - - 15 LOSS RELATIVE TO 10 1004 HERTZ LOSS IN DECIBELS 5\ 0 I 3.0 2.5 /!) 2.0 . .-- ... . , ,I ,I --- -,,-,I 1.5 1.0 0.5 0.0 -5 ---0.2 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 (a) FREQUENCY IN KILOHERTZ \\~ \\\ r\ I\I: I' \!~\'\ LONG-MEDIUM - - - - - SHORT ----- MEAN ENVELOPE DElAY DISTORTION IN MILLISECONDS ~, ~ J //11 riii ill ~:/1/ .11 j} 0.5 1.0 1.5 2.0 2.5 3.0 3.5 (b) FREQUENCY IN KILOHERTZ Ftgure 5·34. The attenuation (a) and enveiope delay distortion (b) of a typical telephone channel as a function of frequency. The attenuation is given relative to the attenuation of a 1004 Hz tone, and the envelope delay distortion relative to 1704 Hz, where it is near its minimum value [23J. 162 PHYSICAL MEDIA AND CHANNELS the worst of two differences, (1) the loss at 404 Hz minus the loss at 1004 Hz and (2) the loss at 2804 Hz minus the loss at 1004Hz. For 99% of telephone connections, that number is less than 9 dB. Several other slope distortion characterizations are found in the literature, but they are difficult to use in practice. We refer interested readers to the connection survey {23]. Interestingly, the attenuation in Figure 5-34a is almost precisely the typical attenuation of the local loop from the 1980 survey [24] combined with the typical frequency response of the filters in the PCM modulators in Figure 5-3, suggesting that these are the dominant sources of frequency-dependent attenuation. Phase distortion, the deviation from linear of the phase response of U B (i), is traditionally described as envelope delay distortion. Envelope delay is defined as the negative of the derivative of the phase of the received signal with respect to frequency, and hence measures the deviation from a linear phase response. Envelope delay distortion is often summarized by a set of numbers, much as the magnitude response is summarized by slope distortion. For details see 124]. The overall attenuation of the channel is typically measured at 1004 Hz, where the attenuation is usually near its minimum. The attenuation is usually about 6 dB between one local switch and another, to which is added the loss of the local loops at each end. Noise Sources In addition to attenuation, there is noise present on the voiceband channel, pri- marily from four sources: qJJ.OJl1ization. noise, thermal noise, aosstalk, and impulse noise. We discuss them in order of increasing importance. Crosstalk of the type discussed in Section 5.2 is one impairment that is more severe for voice than for data, so it has largely been eliminated from the network. Induction of interfering tones at 60 Hz and its harmonics (50 Hz in Europe) from power lines is more significant. As on other communication channels, thermal noise is an important impairment. Impulse noise consists of sudden, large spikes of short duration and is measured by counting the number of times the noise exceeds a given threshold. Impulse noise is due to electromechanical switches in the network, such as in telephone switches and dial telephones. Impulse noise is not wen characterized, and modem designs are not heavily influenced by its presence. The dominant source of noise is quantization error introduced by PCM systems, as in Figure 5-3. Quantization error is a consequence of using a Ii-mited number of bits to represent each sample in the PCM system. While the quantization error is deterministically dependent on the signal, the randomness of the signal usually gives quantization error a "noise-likeff characteristic. It has an approximately white power spectrum, and the level of noise is usually measured by the signal-to-quantizationnoise ratio (SQNR). The SQNR for a single quantizer as encountered in the U.S telephone network is illustrated in Figure 5-35. Note that over an input range of about 30 dB (-40 to -10 dBmO) the SQNR varies by only about 6 dB (33 to 39 dB). This relatively constant SQNR implies that the quantization error power varies almost in direct pr 0 YUro)= { XUro+jroo)forro 0 and s (t) is bandlimited so that S U(0) =0 for I (0 I > (Oc' 0 Frequency offset is a consequence of using slightly different frequencies to modulate and demodulate single-sideband (SSB) signals in analog transmission facilities (Figure 5-1). It is allowed because it has no perceptible effect on speech quality, and can be compensated by proper design in voiceband data modems. Phase Jitter Phase jitter on telephone channels is primarily a consequence of the sensitivity of oscillators used for carrier generation in SSB systems (Figure 5-1) to fluctuations in power supply voltages. Since power supply fluctuations are often at 60 Hz or harmonics thereof, the largest components of phase jitter are often at these frequencies. Phase jitter is measured by observing the deviation of the zero crossings of a 1004 Hz tone from their nominal position in time. Phase jitter can be viewed as a generalization of frequency offset. If the phase jitter on a channel is a(t), the effect on the transmitted signal of (5.78) is a received signal of the form y ( t ) = R e { s ( t ) e } ( CO c t + 9(t»}. (5.80) A phase jitter of a(t) = root, amounts to frequency offset. It is common for a(t) to have oscillatory components at the power line frequency (~O or 60 Hz) and harmonics. If we simply demodulate this signal using the carrier e1coct , we recover a distorted baseband signal s (t )e}9(t) rather than the desired s (t). To mitigate this distortion, it is common in carrier recovery (Chapter 16) to include algorithms designed to track and remove this undesired phase jitter. A phase hit is an abrupt change in the nominal phase of a received sinusoidal signallasting at least 4 ms. There is little that can be done to defend a modem against this degradation, but it must be taken into account in the design of the carrier recovery (Chapter 16). SEC. 5.5 TELEPHONE CHANNELS 1&5 Delay and Echo Delay and echo are the final impairments in telephone channels that we will consider. A simplified telephone channel is shown in Figure 5-36. The loealloop, which is the twisted wire pair connecting the central office with customer premise, is used for transmission in both directions. Both signals share the same wire pair. At the central office, a circuit called a hybrid seJ>arates the two directions of transmission. Longer distance facilities are four-wire. meaning that the two directions of transmission are physically separated. One possible implementation of the hybrid circuit is shown in Figure 5-37. The signal from the other end of the two-wire facility is fed through to the receive port. The transmit signal appears at the transformer as a voltage divider with impedances R and Zo, where the latter is the input impedance of the two-wire facility. We cancel II t I .---_----JII~~I_ _---, WIRE FACILITY WIRE II FACILITY Figure 5·36. A simplified telephone channel, showing the two-wire local loop and the fourwire transmission facility. (A) ---[9>>------,1---------,1 TRAyNetS)MIT ~.J R ret) R ,...---,_ RECEiVE (B) ~I 9I ~rlul6~ F~~ TWO --L ret) + yet) Figure 5-37. An electronic hybrid. To avoid leakage of the receive signal (A) into the transmit path (B) the impedance Zs shouid exactiy match the impedance of the transformer and two-wire facility. 166 PHYSICAL MEDIA AND CHANNELS this undesired feedthrough by constructing another voltage divider with a balance impedance ZB' When ZB =Zo, the loss from transmit to receive port is infinite. In practice, a fixed compromise impedance ZB is used, and a component of the receive signal (A) can leak through to (B) with an attenuation as small as {} to W dB d.ue to the variation in impedance of the two-wire facility. The signal and two types of echo paths for the configuration of Figure 5-36 are shown in Figure 5-38. An echo is defined as a signal component that has taken any path other than the talker speech path. The talker echo is the signal that leaks through the far-end hybrid and returns to the sender (talker). The listener echo is the component of the talker echo that leaks through the near-end hybrid a.'ld returns again t{l the listener. This echo is similar to multipath propagation on radio channels (Section 5.4). The length of the telephone channel determines the round-trip echo delay. Echoes from the near end of the connection typically undergo zero to 2 msec of delay, whereas far-end echoes can have round-trip delays of lO-60 msec for terrestrial facilities, or up to 600 msec on satellite connections. To mitigate the effects of echo on speech quality, several strategies co-exist {In the network. The effect of each strategy on data signals is different. For short delays, loss is added in the talker speech path, which is advantageous because the echoes experience this loss more than once. This loss, plus the loss of the subscriber loops at each end, is the source {If the attenuation that must be accommodated by data transmission; it can be as high as 40 dB (at 1004 Hz). For longer delays, devices known as echo suppressors and echo cancelers are added to the connection. A full-duplex (FDX) modem is one that transmits and receives on the same telephone channel. Such a modem requires an internal two-to-four-wire conversion, as shown in Figure 5-39. Because of imperfect balance impedances of the hybrids, some {If the traIlSIrJ.tted signal echoes into the receiver and interferes with the weaker data signal from the far end. The hybrid echo loss may be as low as about 6 dB, and the received signal may have experienced as much as 40 dB loss, so the desired far-end signal may be as much as about 34 dB below the echo. Ways of dealing with this problem are discussed in Chapters 1~ and 20. 5.5.2. Channel Capacity Compared to Practical Modems Rough estimates of the capacity of a voiceband telephone channel indicate it is over 30,000 b/s. In Table 5-2 we summarize the bit rates achieved by existing standardized voiceband data modems. Bit rates as high as 28,800 bls are envisioned, =;'"---'. I . ~ -J J_ TALKER SPEECH PATH 1-J TALKER ECHO LISTENER ECHO Figure 5-38. Three of many possible signal paths in a simplified telephone channel with a single Iwo-Io-four-wire conversion at each end. SEC. 5.5 TELEPHONE CHANNELS 167 ---iIIr!/IIf-- RECEIVER MODEM FOUR·WlRE FACIUTY MODEM Figure 5-39. Two modems connected over a single simplified telephone channel. The receiver 00 the right must be abI& to distfnguish the desired s~'1a! (A) from th~ signaJ leaked by its own transmitter (8). although the higher rates may be achievable on a smaller fraction of possible connections. Indeed, several of the higher speed modems are used exclusively with leased lines, which can be conditioned for guaranteed quality. 5.6. MAGNETtC RECORDtNG CHANNELS Digital communication is used not only for communication over a distance (from here to there), but also for communication over time (from now to then). The latter application is called digital storage or recording, and is usually accomplished using a magnetic medium in the fonn of a tape or disk. More recently, particularly in the context of read-only applications, optical storage media have been used as well. Example 5-2&. The compact disk ROM, an offshoot of a similar consumer audio technology, allows 600 megabytes of data to be stored on a single plastic disk 12 cm in diameter [26]. The bits are stored as small pits in the surface, and are read by spinning the disk, shining a laser diode on the surface, and detecting the reflected light with an optical pickup. 0 Digital recmding is of course used extensively in computing systems, but is increasingly used in addition for the storage of music [27,281 or voice. Example 5-29. The compact disk digital: audio system, which is a great commercial success, records music digitally using a similar technology to the compact disk ROM. The music is converted to digital using 16 bits per sample at a sampling rate of 44.1 kHz for each of two channels, for a total bit rate of about 1.4 Mb/s. Up to 70 minutes of material can be recorded on a single disk. D 168 PHYSICAL MEDIA AND CHANNELS speed (b/s) :$;300 1200 1200 2400 2400 2400 4800 4800 9600 9600 14,400 ~ 28,800 symbol rate :$; 300 1200 600 1200 600 1200 1600 2400 2400 2400 2400 ~3429 duplex (method) fuii(FDM) half full{FDM} half full (FDM) full(EC) half full(EC) half fuii(EC) fuB(EC} full(EC} CCITT std. V.2t V.23 V.22 V26 V.22bis V.26ter V.2? V.32 V.29 V.32 V.32bis V.fast(V.34) modulation 2-FSK 2-FSK 4-PSK 4-PSK 16-QAM 4-PSK 8-PSK 4-QPSK 16-AM/PM 32-QAM+TC 128-QAM+TC 1024-QAM+TC Table 5-2. 1mportant standardized voiceband data modems ara summarizad hera. Tha "duplex" column indicates whether a single channel is shared for both directions of transmission (fuJI) or saparate channels must be used tor each direction {half). For full duplex modems, it also indicates whether frequency division multiplexing (FDM) or echo cancellation (EC)is used for multiple access (Chapter 18). The "CCilT std" column identifies the international standard that applies to this type of transmission. Finally, the "modulation" column identifies the type of modulation, which are discussed in Chapters 6 and 14. The numbers tndicate the number of symbols in the alphabet. The "Te" tn the '1.32 and '1.33 refers to trellis coding (Chapter 14). Example 5-30. Digital storage 00 disk drives is used in speech stcre-and-fcrward systems, whkh are essentially the functional replacement for telephone answering systems, except that they serve a number of customers. 0 Digital recording offers some of the same advantages over analog recording as we discussed earlier for transmission. The principle advantage again is the regenerative effect, in which the recording does not deteriorate with time (except for the introduction of random errors which can be elimL'1ated by coding techniques) {}f with multiple recordings and re-recordings. An additional advantage is the compatibility of digital recording with digital signal processing, which offers very powerful capabilities. Magnetic tape or disk can be considered as a transmission medium in much the same manner as other media such as wires and fibers [29,30]. We will now briefly discuss the properties of that medium. 5.6.1. Writing and Reading In the writing process, a magnetic field is generated in an electromagnet called a head as it passes at high speed over a ferric oxide magnetic medium, thereby orienting the d1rection of magnetization along a track in a nearby magnetic medium on the disk {)! tape [31]. On reading, when the oriented magnetic pattern passes under that same SEC. 5.6 MAGNETIC RECORDING CHANNELS 169 head, it produces a voltage that can be sensed and amplified. There are two basic types of recording. Saturation recording is almost always used for digital recording, in which the magnetization is saturated in one direction or the other. Thus, in saturation recording, the medium is constrained to be used for binary transmission; that is, only two levels are allowed. This is in contrast to wire and coaxial media in which multi-level transmission can be considered. The other form of magnetic recording is a.c. bias recording, in which the signal is accompanied by a much larger and higher frequency bias sinusoid for the purpose of linearizing the channel. A.c. bias recording is necessarily used in analog recording, where linearity is important, but has not been applied to digital recording because of the deterioration in signal-to-noise ratio and the fact that saturation recording is appropriate for binary modulation and demodulation techniques. The magnetic recording process is qualitatively illustrated in Figure 5-40. For saturation recording, the voltage applied to the write head assumes one positive and one negative value corresponding to the two directions of desired magnetization. In Figure 5-40a it is assumed that a square wave corresponding to the binary sequence "11 01" is applied to the write head. This waveform correspondence to a bit sequence is called non-return to zero, or NRZ. The bit stream is recorded on linear (tape) or circular (disk) tracks on the magnetic medium, and one track is shown in Figure 5-40b. Note the two directions of magnetization, schematically indicated by the arrows. The voltage on the read head (which is physically the same as the write head) during a read operation is shown in Figure 5-4Oc. As long as the magnetization is constant, no voltage is induced in the read head coil, but upon a change in magnetization there is a voltage induced (recall that the voltage induced in a coil is proportional to the derivative of the magnetic field). The polarity of that voltage is determined by the direction of change in magnetization. I (a) U:::: t:1:::: ::::D ............ ::::1:::: ............ ............ I++-- ++- ............ (b) r\ \J • (c) Figure 5·40. Illustration of magnetic recording. a. The NRZ waveform applied to the record head corresponding to bit sequence "1101". The abscissa is time, but this is proportional to distance on the medium for constant velocity of the head. b. The magnetization of one track after saturation recording. b. The voltage on the read head coil corresponding to position of the read head, which at constant velocity is the same as time. 170 PHYSICAL MEDIA AND CHANNELS This write-read magnetic recording process can be viewed as a communication channel if we observe only the input and output voltage waveforms in Figure 5-40a and Figure 5-4Oc and ignore the physical medium of Figure 5-40b. Both of these waveforms represent signals in time, just like in communications, although there is a conceptually insignificant and indeterminate time delay between the write and read operations. Viewed as a communication channel, we see that the magnetic recording channel of Figure 5-40 inherently includes a differentiation operation. Another way of looking at this is that the channel is sensitive to only the transitions in the input waveform rather than its polarity. Therefore, from a digital communication point of view, we want to map the input bits into transitions in the input waveform rather than absolute polarity. The way in which this can be done will be considered in Chapter 6. 5.6.2. Linearity of the Magnetic Channel The magnetic channel can be made linear in a special sense to be specified now. This linearity is a very desirable feature, in that it will greatly simplify system design. The view of Figure 5-40 is oversimplified in that it assumes that the magnetization is in either one direction or the other. In fact, the tape medium contains a multiplicity of tiny magnetic particles, and each particle must indeed be magnetized in one direction or the other. The total net magnetization can assume almost a continuum of values, depending on the number of particles magnetized in each direction. Unfortunately this continuum of magnetization depends nonlinearly on the applied magnetic field, and displays hysteresis, and therefore the write process is highly nonlinear. On the other hand, the read process is very linear, in that the voltage induced on the read head is a linear function of the magnetization. If the applied field to the recording head is strong enough and held long enough so that the medium is fully saturated, then the output of the read head displays a form of superposition. This is because this saturation destroys the memory of the hysteresis. This form of superposition is illustrated in Figure 5-41. If the response to a positive transition at time t is h (t), and the response to a negative transition at time t + 11 is - h (t + 11), then the response to the positive followed by negative transition obeys superposition, and is -------1L -R= Figure 5-41. Superposition in the reading process of magnetic recording. SEC. 5.6 MAGNETIC RECORDING CHANNELS 171 h (t) - h (t + M . (5.81) This is true with great accuracy as long as the time between transitions ~ is larger than some threshold~. This threshold is determined by the time to achieve full saturation of the medium in one direction or the other since the last transition, and depends on the design of the write head as well as the medium. 5.6.3. Noise on the Magnetic Channel The noise impairments are very complicated on the magnetic channel, consisting of additive and multiplicative noise components. A major source of noise is due to the granularity of the medium. The total response of the head is the superposition of the responses to a multiplicity of magnetic particles. This discrete nature of the signal is similar to the quantum nature of the optical detection process (Section 5.3) with one important distinction. In optical detection we have only photons (or photoelectrons) or the absence of same, whereas in magnetics there are both positively and negatively magnetized particles. Thus, in optical detection, when there is no signal incoming there is also no quantum noise (neglecting dark current). In the magnetic reading process the particles are present whether or not there is a signal, or putting it another way the absence of a signal is represented by an equal number of positively and negatively magnetized particles. Hence, the granular noise in magnetic recording is present independent of the signal, and is therefore truly an additive noise phenomenon. Its spectrum is not white because it is filtered by the read head response, and in fact its power spectrum tends to be similar to the spectrum of the read signal. Zero crossing jitter results from variations in the composition of the medium and the distance between the write head and the medium. The effect is a small jitter in the position of the read pulses. Another phenomenon is amplitude modulation of the received signal, a multiplicative noise phenomenon due to medium density fluctuations. An extreme form of amplitude modulation is the tape dropout, in which the signal level gets too small for reliable detection. Since dropouts are a reproducible effect, depending on position on the medium, they are often flagged in disk files so that these areas are not used for recording. Another phenomenon is interference from adjacent tracks, which is similar to the crosstalk experienced in multi-wire-pair cables and the interference between radio systems. 5.7. FURTHER READING The literature on communication media is vast and scattered. We offer here some suggestions that may help the reader get started. The basic theory of transmission lines is covered in [1]. There are a number of books devoted to optical fiber devices [32,6,33,34,35,36,37,38] and a smaller number that concentrate on the noise and system issues of primary interest here [39,13,14]. A special issue o~ IEEE Journal on Selected Areas in Communications (November, 1984) is devoted to undersea lightwave communication. It contains numerous useful articles describing the special problems that arise in this hostile environment. Another issue (November, 1985) covers fiber optics for local communication, and concentrates on networking issues. Yet 172 PHYSICAL MEDIA AND CHANNELS another issue (December, 1986) is devoted to terrestrial fiber optics, and includes papers on reliability, economics, and networking issues. Finally, the Optical Society of America's Journal ofLightwave Techn.ology is a good source of information. There are available books on satellite {18] and mobile radio [21] design. Special issues of the IEEE Journal on. Selected Areas in Communications in July 1984, June 1987, and January 1989 are is devoted to mobile radio communication. Many of the papers propose modulation schemes that are robust in the presence of multipath fading. More specificaHy directed to multipath fading channels is another special issue (February 1987). Another issue (April 1987) is devoted to point-to-point digital radio, and yet another (January 1985) to broadcasting satellites. Further information about characteristics of the telephone channel is best obtained by going directly to the published line surveys [22,23,24]. Special issues of IEEE Journal on Selected Areas in Communications (September 1984, and August and December t989) are devoted to modulation and coding for the telephone channel. A special issue of IEEE Journal on Selected Areas in Communications in January 1992 is devoted to recent results on magnetic recording channels. PROBLEMS 5-1. Show that for a terminated transmission line with real-valued characteristic impedance, the maximum power to the load is obtained in Figure 5-61> when Zs =ZL =Zoo 5-2. For a transmission line. derive the relation 11 =v where f is the frequency of11 propagating wave in Hz,A is the wavclengt.'l in nreters, and 11 is the velocity of the wave in meters/sec. 5-3. (a) In subscriber loop wire-pair cables, it is common in some countries to have bridged taps, which Me open dTcuited wire-pairs bridged in parailel on the main pair. Assume that a source has impedance equal to the wire-pair characteristic impedance. the wire-pair is te.rminated ~ the other end by its characteristic impedance, and that the wire-pair has a single bridged tap. Let the distance from source to tap be L I' from tap to termination L 2' and let the length of the bridged tap beL3 . Find an expression for the transfer function of the wire-pair including bridged tap. Be sure to take advantage of the simplifications due to the terminations with the characteristic impedance. (b) Show that when the bridged tap is very long, it causes a fixed attenuation at all frequencies. What is that attenuation? (c) State intuitively what you would expect the response at the termination to be to a single transmitted pulse as a function of the length of the bridged tap. (d) Discuss what happens intuitively when the bridged tap approaches zero length. 5-4. Use SneU's law to show that in Figure 5-12 a my will be captured by the fiber as long as the incident 1lDgle obeys (5.83) This ~.onfums that rays incident M small angles are ~pt'Jred, and those at larger angles are not. 173 5-5. Let the length of the fiber be L . (a) Show that the path length for a ray is equal to Lsec (8 2), (b) Show that the path length varies from L to nrLlnf. Thus, the larger the difference in index of refraction of core to cladding, the larger the range of captured angles, but also the larger the variation in the transit time of rays through the length of the fiber. 5-6. Assuming that the chromatic dispersion in a single mode fiber is 0.15 psec/km-GHz, evaluate numerically (5.30). Sketch the curve of repeater spacing vs. bit rate in the range of repeater spacings between 1 and 1000 km as limited by dispersion. 5-7. In an optical fiber receiver, assume the received optical power is P, the bit rate is R bits/sec. (a) Find the number of received photons per bit. (b) Show that for a constant number of photons per bit, the required received optical power is proportional to the bit rate. (c) Find the received optical power necessary to receive 100 photons per bit at a wavelength of 1.5 j.UJleter and a bit rate of 1 Gb/s. (d) For the same conditions as c., assume you can launch one mwatt power into the fiber, and that the fiber loss at that wavelength is 0.2 dB per km. What is the distance that we can transmit? 5-8. A typical direct detection optical receiver requires about N = 2000 photons per bit in the notation of Problem 5-7. (a) Derive the following formula [9] for the required received power at an optical detector at a wavelength of 1.5 j.UJleter for this value of N , PdB", = - 65.8 + 10gioRMb (5.84) where PdB", is the received power in dBm required and RMb is the bit rate in Mb/s. Note how the required power increases as the bit rate increases. In particular, each order of magnitude increase in bit rate increases the required power by only one dB. (b) Assuming 0 dBm launched power into the fiber, and 0.2 dB per km loss in the fiber, what is the allowable distance between repeaters at bit rates of 100 and 1000 Mb/s? You can assume that loss is the dominant impairment limiting repeater spacing. 5-9. Change the assumptions in Problem 5-8 to those that might better reflect fundamental limits [9]: A launched signal power of 20 dBm and 20 photons per bit required at the receiver. 5-10. Suppose we have a system requirement that a total bit rate of Rr must be transmitted over a distance of Lr using a set of parallel repeatered transmission lines (wire cable or fiber). In each repeater span we have as design parameters the bit rate B and repeater spacing L. Show that the total number of repeaters is minimized when the quantity B L is maximized for the given transmission technology. Thus, if the repeaters are the dominant transmission cost, we want to maximize the product of the bit rate and the distance for each technology. 5-11. (a) Derive the following relationship between repeater spacing and bit rate, using the assumptions of Problem 5-8, and assuming a fiber loss of Yo dB/km: L = 65.8 - 10gioR. Mb Yo (5.85) You can assume that the number of received photons per bit is held constant and the transmit power is held constant at 0 dBm. (b) Sketch this relation for the range of bit rates between 1 Mb/s and 10,000 Mb/s and a fiber loss of 0.2 dB/km and verify that Figure 5-16 is qualitatively correct is predicting this loss-limited region. (c) Using the results of Problem 5-10, argue that it will be best to increase the bit rate until dispersion becomes the controlling impairment, if the criterion is to minimize the number of 174 PHYSICAL MEDIA AND CHANNELS repeaters. 5-12. The available thennal noise power in a bandwidth B Hz is kT~B. For a resistor generating thennal noise, the noise source can be modeled as either a series voltage source or a parallel current source. Show that the voltage source has mean-squared voltage 4kT~RB and the current source has mean-squared current 4kT~B IR within bandwidth B for a resistance of R ohms. 5-13. At6 GHz, what is the diameter of a circular aperture antenna that has an antenna gain of 40 dB with a 70% efficiency? 5-14. A radio channel with bandwidth 30 MHz is centered at 6 GHz. What is the difference in decibels in the path loss between the high and low end of this channel? At which end is the loss the minimum? (CAUTION: The antenna gains are a function of frequency.) 5-15. Compare the tradeoff between repeater spacing d and transmitted power Pr , assuming that the received power PR is held constant for the following two media: (a) Metallic cable or fiber optics with loss Yo dB per lem. (b) A microwave radio system. (c) For which medium does the transmitted power have the most impact? 5-16. Develop the following fonnula which relates the free-space loss between isotropic radiators in dB, the distance d km between radiators in lem, and the frequency f GHz in GHz, Loss(dB) = 92.4 + 20l0glO d km + 20l0gIOf GHz. (5.86) Note the dependence of this loss on distance and frequency. 5-17. In this problem we will detennine how to combine noise sources in a radio receiver with different noise temperatures to yield a single equivalent noise source. For the configuration of Figure 5-42 where the noise temperature of the three noise sources ni (t) are Tj , find the relationship between these noise temperatures such that the two systems will have the same SNR. The parameter G is the power gain of an amplifier, i.e., the ratio of input to output power. 5-18. Use the results of Problem 5-17 to find the equivalent noise temperature at the input to the SIGNAL Figure 5-42. Illustration of the combination of two noise sources into a single equivalent noise source referenced to the input of the system. SIGNAL RF AMPLIFIER IF AMPLIFIER Figure 5-43. Several noise sources introduced at the input and internally to a receiver. 175 receiver of Figure 5-43, where each of the circuit elements is assumed to be noiseless with an associated noise source with associated noise temperature. 5-19. Estimate the delay spread for the two-path groWld-reflection model of Exercise 5-3 for a spacing of antennas by 3 kIn and antenna height of 50 meters. What is the approximate baseband bandwidth over which the narrowband model is applicable? 5-20. Suppose the incoming power in a Rayleigh fading senario does not arrive at a moving vehicle unifonnly spread over all angles. Describe qualitatively how you would expect the power spectrum of Figure 5-32 to be affected Wlder the following conditions: (a) The vehicle is driving toward the transmitter, and more power is arriving from the direction of the transmitter than other directions. (b) A lot of power is reflecting off a nearby mountain, so that more power is arriving at the vehicle from the side (relative to the direction of motion) than any other direction. 5-21. Consider a SSB analog voice transmission system embedded in the telephone network. Sup- pose that the carrier frequency fe is nominally I MHz. In practice, the transmitter and receiver will be designed with components that yield modulating and demodulating frequencies that are slightly off. Component manufacturers often express the accuracy of precision parts in parts per million, instead of percent (which is parts per hundred). How accurate (in parts per million) do the modulating and demodulating oscillator frequencies have to be to guarantee less than 3 Hz frequency offset? 5-22. Suppose that a data signal x(t) = ReI s(t)ejOl,' } (5.87) is transmitted over a telephone channel with frequency offset CJlo and sinusoidal phase jitter with frequency ffip and amplitude a. Assume there are no other impainnents. Give an expression for the received signal. REFERENCES 1. G. C. Ternes and 1. W. LaPatra, Introduction to Circuit Synthesis and Design, McGraw-Hili, New York (1967). 2. Bell Laboratories Members of Technical Staff, Transmission Systems for Communications, Western Electric Co., Winston-Salem N.C. (1970). 3. P. Bylanski and D. G. W. Ingram, Digital Transmission Systems, Peter Peregrinus Ltd., Stevenage England (1976). 4. S. V. Ahamed, P. P. Bohn, and N. L. Gottfried, "A Tutorial on Two-Wire Digital Transmission in the Loop Plant," IEEE Trans. on Communications COM.29(Nov. 1981). 5. K. C. Kao and G .A. Hockham, "Dielectric-Fiber Surface Waveguides for Optical Frequencies," Proc.IEE 113 p. 1151 (July 1966). 6. D. B. Keck, "Fundamentals of Optical Waveguide Fibers," IEEE Communications 23(5)(May 1985). 7. J. T. Verdeyen, Laser Electronics, Prentice Hall, Englewood Cliffs N.J. (1981). 8. D. B. Keck, "Single-Mode Fibers Outperfonn Multimode Cables," IEEE Spectrum 20(3) p. 30 (March 1983). 9. J. E. Midwinter, "Perfonnance Boundaries for Optical Fibre Systems," NATO Advanced Study Institute, (July 1986). 176 PHYSICAL MEDIA AND CHANNELS 10. P. S. Henry, "Introduction to Lightwave Transmission," IEEE Communications 23(5)(May 1985). 11. T.Li, "Structures, Parameters, and Transmission Properties of Optical Fibers," Proc. IEEE 68(10) p. 1175 (Oct 1980). 12. S. R. Forrest, "Optical Detectors: Three Contenders," IEEE Spectrum 23(5) p. 76 (May 1986). 13. S. D. Personick, Optical Fiber Transmission Systems, Plenum Press, New York (1981). 14. S. D. Personick, Fiber Optics Technology and Applications, Plenum Press, New York (1985). 15. D. Taylor and P. Hartmann, "Telecommunications by Microwave Digital Radio," IEEE Communications Magazine 24(8) p. 11 (Aug. 1986). 16. J. Mikulski, "DynaT*A*C Cellular Portable Radiotelephone System Experience in the U.S. and the U.K.," IEEE Communications Mag. 24(2) p. 40 (Feb. 1986). 17. V. MacDonald, "The Cellular Concept," BSTJ 58(1)(Jan. 1979). 18. T. Pratt and C. W. Bostian, Satellite Communications. John Wiley, New York (1986). 19. W. Rummier, R. Coutts, and M. Liniger, "Multipath Fading Channel Models for Microwave Digital Radio," IEEE Communications Mag., (11) p. 30 (Nov. 1986). 20. J. Chamberlain, F. Clayton, H. Sari, and P. Vandamme, "Receiver Techniques for Microwave Digital Radio," IEEE Communications Mag. 24(11) p. 43 (Nov. 1986). 21. W.C. Jakes, Jr, Microwave Mobile Communications. Wiley-Interscience, New York (1974). 22. F. P. Duffy and T. W. Thatcher, Jr., "1969-70 Connection Survey: Analog Transmission Performance on the Switched Telecommunications Network," BSTJ 50(4) pp. 1311-47 (April 1971). 23. M. B. Carey, H.-T. Chen, A. Desc1oux, J. F. Ingle, and K. 1. Park, "1982/83 End Office Connection Study: Analog Voice and Voiceband Data Transmission Performance Characterization of the Public Switched Network," AT&T Bell Lab. Tech. J. 63(9)(Nov. 1984). 24. D. V. Batorsky and M. E. Burke, "1980 Bell System Noise Survey of the Loop Plant," AT&T Bell Lab. Tech. J. 63(5) pp. 775-818 (May-June 1984). 25. B. R. Saltzberg and J.-D. Wang, "Second-order statistics of logarithmic quantization noise in . QAM data communication," IEEE Transactions on Communications 39(10) pp. 1465-72 (Oct. 1991). 26. P. Chen, "The Compact Disk ROM: How It Works," IEEE Spectrum 23(4) p. 44 (Apri11986). 27. S. Miyaoka, "Digital Audio is Compact and Rugged," IEEE Spectrum 21(3) p. 35 (March 1984). 28. P. J. Bloom, "High-Quality Digital Audio in the Entertainment Industry," IEEE ASSP Magazine 2(4) p. 2 (Oct. 1985). 29. J. C. Mallinson, "A Unified View of High Density Digital Recording Theory," IEEE Trans. on Magnetics MAG-ll p. 1166 (Sep. 1975). 30. H. Kobayashi, "A Survey of Coding Schemes for Transmission or Recording of Digital Data," IEEE Trans. on Communications COM-19 p. 1087 (Dec. 1971). 31. H. Bertram, "Fundamentals of the Magnetic Recording Process," IEEE Proceedings 74(11) p. 1494 (Nov. 1986). 32. D. Marcuse, Light Transmission Optics, Van Nostrant Reinhold, Princeton, N.J. (1972). 33. D. Marcuse, Theory ofDielectric Optical Waveguides. Academic Press, New York (1974). 34. D. Gloge, Optical Fiber Techrwlogy, IEEE Press, New York (1976). 35. H. H. Unger, Planar Optical Fibersfor Transmission, Clarendon Press, Oxford (1977). 36. J. E. Midwinter, Optical Fibersfor Transmission, Wiley, New York (1979). 177 37. S. E. Miller and A.G. Chynoweth, Optical Fiber Telecommunications, Academic Press, New York (1979). 38. H. F. Taylor, Fiber Optics Communications, Artech House, Dedham, Mass. (1983). 39. M. K. Barnoski, Fundamentals of Optical Fiber Communications. Academic Press, New York (1976). MODULATION An information-bearing signal must conform to the limitations of its channel. While the bit streams we wish to transmit are inherently discrete-time, all the physical media considered in Chapter 5 are continuous-time in nature. Hence, we need to represent the bit stream as a continuous-time signal for transmission, a process called modulation. This chapter describes the most common modulation and demodulation tech. niques, which are not necessarily optimal. Optimization often involves practical difficulties that add significantly to the cost of an implementation. For this reason, in this chapter we give a practical engineering perspective, covering only ideas that are essential in actual implementations, and deferring most issues of optimization to subsequent chapters. We start with the basic baseband pulse amplitude modulation (PAM), in which a sequence of time-translates of a basic pulse is amplitude-modulated by a sequence of data symbols. Baseband PAM is commonly used for metallic media, such as wire pairs, where the signal spectrum is allowed to extend down to zero frequency {d.c.). We then extend PAM to passband transmission by introducing a sinusoidal carrier signal. Passband PAM is commonly used on media with highly constrained bandwidth, such as radio. It uses two sinusoidal carriers of the same frequency (with a ninety degree phase difference) which are modulated by the real and imaginary parts of a complex-valued baseband signal. Special cases of passband PAM are the commonly used phase-shift keying (PSK), amplitude and phase modulation (AM-PM) and 179 quadrature amplitude modulation (QAM). By treating these techniques as special cases of passband PAM we avoid the alphabet soup that pervades most comprehensive treatments of digital communications, where every minor val"iation is given a new acronym and treated as a separate topic. We then generalize these modulation techniques further to allow the bit stream to be mapped into a set of orthogonal waveforms, introducing a technique called orthogonal multipulse. A special case of this, frequency shift keying (FSK), is of practical importance. FSK is used when simple, inexpensive transceivers are required, when the difficulty of synchronizing tIle receiver to the carrier mandates that an incoherent receiver be used, or when the channel has significant nonlinearities. Orthogonal multipulse is then combined with PAM. Two practical examples of this combination, mulricarrier modulation and codedivision multiplexing, are then described. Finally, we briefly consider special features of optical fiber and magnetic recording channels in the context of the techniques described. These media are different enough to w~ant special consideration. In the past, metallic media such as wire pairs and coaxial cable have dominated the digital communications world, but this is changing rapidly. Optical fiber has rapidly assumed the role formerly played by metallic media, while digital radio provides wireless communication. For optical fiber, there is little motivation to conserve bandwidth. Thus, simple- modulation te.c.hniques such as binary PAM (in the fonn of on-off keying or OaK) are commonly used. Radio channels, such as digital microwave radio, satellite, and mobile radio, are highly bandwidth constrained, and hence there is motivation to conserve bandwidth as much as possible. The voiceband telephone channel has strictly constrained lrandwidth as well. 'When conserving bandwidth is of paramount importance, more sophisticated modulation techniques such as PSK, QAM, and multicarrier are commonly used. In this chapter we emphasize these bandwidth-conserving techniques. 6.1. AN OVERVlEW OF BASlC PAM TECHNlQUES A complete baseband digital communication system is shown in Figure, 6-1. In this section we describe qualitatively the structure of such a system. Every component is discussed in detail in other parts of the book. Channels have already been discussed in Chapter 5, although we will again summarize their characteristics briefly here. Transducers (such as lasers for optical fibers or antennas for microwave) are assumed to be part of the channel. Typically, a transmitter and receiver will be packaged together so that communication in both directions can be performed. Such a package is called a modem, which stands for modulator/demodulator. Often the transmit and receive signal share the same physical medium. This is a special case of multiple access, described in part V of this book. 180 MODULATION BITS SYMBOLS TRANSMIT CODE1'I l'tlT£1'I BII Ak g(t) .. TRANSMITTER ~----_ .._._._--_....._-------------------------_.._--------------------_ SIGNAL S(t) CHANNEL ----<:r1 b(t) N (1) NOISE ,------_ -------_..-..---- _--- -- : __.. __ ----------------------------------- _, : -. - BITS i + - B" ' :I DECODER Ak Q}; ... £S,'IMATE{) ,SUC£~OR _:-- t--- SAMPLER Q(t) T RECEIVE FILTER f(t) i R (t) I "-------j.=.. SYMBOLS DECISION DEVICE ~c-+- - - -'l L__ ~~~:~ . oJ Figure 6-1. A baseband digital communication system, showing transmit coder, transmit filter, channel, receive filter, sampler and timing recovery, decision, and decoder. 6.1.1. Channel The characteristics of channels based on common transmission media were discussed in Chapter 5. With a couple of exceptions, these channels are adequately modeled as a linear time-invariant filter with impulse response b {t ) and additive noise N (1). as shown in Figure 6-1. The major exceptions to the linear channel model apply to microwave radio channels and the magnetic recording channel. A major exception to the additive noise model is the signal shot noise encountered in optical fiber channels. These channels therefore require special techniques, to be described separately. The media of Chapter 5 all have noise, and most can be modeled by additive noise N (t) as shown in Figure 6-1. In most ~ases the noise can be considered Gaussian because its origin is thermal. In many other cases. as in shot noise in optical communications systems, this noise is often approximated as Gaussian. Example 6-1. The crosstalk between metallic cable pairs is distinctly non-Gaussian inteIference~ However, in many applications there are a large number of independent interferers, and by the central limit theorem the combined crosstalk will be approximately Gaussian. 0 Example 6-2. An optical signal displays considerable non-Gaussian randomness due to quantum effe~ts. However, at practical signal levels these quantum effects can usually be approximated as Gaussian. Furthennore, we will see that the thennal noise introduced in the receiver circuitry, rather than the quantum noise, is often the most significant disturbance. Sometimes, howeve.r, the noise in an optical system cannot be modele.d as Gaussian~ these cases will be SEC. 6.1 AN OVERVIEW OF BASIC PAM TECHNIQUES 181 discussed in Chapter 8. D 6.1.2. Transmitter As shown in Figure 6-1, an incoming bit stream is fed to a coder, which converts the incoming bit stream into a stream of symbols. While a bit can only assume the values "0" or "1", a symbol assumes values from an alphabet that we can define. Example 6-3. The simplest coder translates the bits into symbols with the same values, so the alphabet is {O, l}. A slightly more complicated coder might use alphabet {-I,I} so that the symbols have zero mean if the bits are equally likely to be "0" and "1". A more complicated coder might map pairs of bits from the set {OO,O1,10,11} into one of four levels from the alphabet {-3,-1,1,3}. Another coder maps the set {OO,OI,W,Il} into complex-valued symbols {+1,+j ,-1,-j} (this applies to the passband case). All of these coders are used in practice. D Since the coder may map multiple bits into a single data symbol, we must make a distinction between the symbol rate and the bit rate. The symbol rate is also called the baud rate, after the French telegraph engineer Baudot. Example 6-4. If the coder maps two bits into a symbol with an alphabet size of four, the symbol rate is half the bit rate. D In the examples thus far, there is a one-to-one mapping between blocks of input bits and the alphabet. A coder may also increase the alphabet size, usually in order to introduce redundancy. For example, the coder might convert an input bit into a symbol from an alphabet of size three. Alternatively, the coder could convert an input bit into a sequence of two or more symbols, in which case the symbol rate would be higher than the bit rate. These possibilities are discussed in Chapters 12, 13 and 14, where it is shown that redundancy can be used to reduce errors or control the power spectrum. For the purposes of this chapter, we will assume that the coder does not introduce redundancy. Specifically, we will usually assume that the symbols coming from the coder are independent and identically distributed, forming a white discretetime random process. Symbols are applied to a transmit filter, which produces a continuous-time signal for transmission over the continuous-time channel. Example 6-5. A simple transmit filter has a rectangular impulse response, shown in Figure 6-2. The signal produced has very wide bandwidth. however, so it is not suitable for bandlimited channels. D The impulse response g (t) of the transmit filter is called the pulse shape. The output of the transmitter is the convolution of the pulse shape with the symbol sequence, 182 +3 MODULATION Figure 6·2. A transmit filter with a rectangular impulse response. The symbol rate is liT symbols per second. A sample symbol sequence {with alphabet ~ize of four) and corresponding continuous-time signal are also shown. S (t) = L Amg (t-mT), (6.1) m =-00 where lIT is the symbol rate. This signal can be interpreted as a sequence of possibly overlapped pulses with the amplitude of each determined by a symbol. Such signals are termed pulse amp1itu.de modulated (PAM) signals, regardless of the pulse shape. PAM and its generalization to passband are by far the most common signaling methods in digital communications. There is a confusing array of techniques (e.g., QAM, PSK, BPSK, PRK, QPSK, DPSK, and AM-PM) which are all special cases of passband PAM, perhaps with some special coding. We further generalize PAM to include FSK and mulricarrier techniques in Section 6.6 and beyond. A linear channel with impulse response b(t) and additive noise N(t) will result in a received signal i R(t)= Jb(t) Amg(t-mT-t)dt+N(t). (6.2) m =--00 This can be rewritten as R (r) = L Amh(t-mT) + N(t), m =-00 where h (t) is the convolution of b (t) with g (t), (6.3) J h(t) = b(t)g(t-t)dt (6.4) and is called the receivedpulse. Example 6·6. If the channel is ideally bandlimited, so tllat t.~e transfer function is SEC. 6.1 AN OVERVIEW OF BASIC PAM TECHNIQUES 183 I; lrol _~_~ t, oT If the charmel is given by (6.14), then the receiver only needs to sample at 0 and T. Neighboring symbols do not interfere wit.i. nne another at the propei ~ampling time, so we say that there is no intersymbol interference (lSI). 0 We will see in Section 6.2.1 that this choice of pulse shape maximizes the rate at which symbols can be transmitted over a bandlimited channel without lSI. However, it is not a practical pulse shape, since ideally bandlirnited transfer functions are not realizable. Even close approximations to this ideally bandlimited pulse are undesirable, as shown in Chapter 17, because timing recovery becomes very difficult. We consider below more practical pulse shapes which use more than the minimum bandwidth. and in the process establish the minimum bandwidth required for pulse transmission. 6.2.1. Nyquist Pulses We saw above that, in principle, an ideally bandlimited pulse can be used to tra.'lsmit symbols, a..'1* nIT, so from (6.13) the signal S (t) is bandlimited to W nIT. We will see shortly that this is the minimum bandwidth for a fixed T so that the signal can be· sampled to recover the symbols. Minimum bandwidth is desirable, but the ideal bandlimited pulse is impractical. Therefore, practical systems use pulses with more bandwidth than the ideal bandlim- ited pulse. The bandwidth above the minimum is called excess bandwidth. Usually excess bandwidth is expressed as a percentage; for example, 100% excess bandwidth corresponds to a bandwidth of 2nlT , or twice the minimum. Practical systems usually have an excess bandwidth in the range of W% to 100%. L"lCreasing the excess bandwidth simplifies implementation (simpler filtering and timing recovery), but of SEC. 6.2 PULSE SHAPES 189 course requires more channel bandwidth. The zero-excess-bandwidth pulse is unique - the ideal bandlimited pulse of the last section. With non-zero excess bandwidth, the pulse shape is no longer unique. In this subsection we derive a criterion, called the Nyquist criterion, that must be met by received pulses if there is to be no intersymbol interference, and illustrate some pulse shapes that satisfy this criterion, called Nyquist pulses. The input to the sampler can be written as Q(t)= L AmP(t-mT)+U(t) m =--00 (6.17) where the filtered noise process is * U (t) = N (t ) f (t) (6.18) and the pulse shape at the slicer is * * P (t) =g (t) b (t) f (t) , (6.19) where b (t) is the impulse response of the channel and f (t) is the impulse response of the receive filter. We have assumed the random phase e is zero. Sampling (6.17) yields Qk = LAmP (kT - mT) + U (kT) m =--00 (6.20) =AkP(O)+ L AmP(kT-mT)+U(kT). m ~k The second term is called the intersymbol interference (lSI). If P (t) crosses zero at non-zero multiples of T, p(kT) =Ok (6.21) for all integers k, then the output of the sampler is 00 Qk = Q(kT) = L Amok- m + Uk =Ak + Uk m =--00 (6.22) where Uk = U (kT). In this case there is no lSI. We rely on the timing recovery (described in Chapter 17) to supply the correct sampling instants, t = kT. Usually we want to design pulses that avoid lSI. (An exception is partial response signaling, described in Chapter 12, in which lSI is deliberately introduced.) How can we do this without using the ideal bandlimited pulse? For a given impulse response b (t) of the channel, we can design g (t) and f (t) to force correct zero cross- ings in P (t). This criterion on P (t) in (6.21) is called the zerolorcing (ZP) criterion, because it forces the lSI to zero. It is not necessarily optimal because it ignores the effect of the noise; forcing the lSI to zero may increase the noise. Joint minimization of lSI and noise is explored in Chapter 10. For low noise levels, we clearly wish to approximate the ZF criterion. To get zero lSI it is necessary that (6.21) be satisfied. Taking the Fourier transform of each 190 MODULATION side of (6.21) and using (2.17) we see that =__ -1 ~~ PU (J) - J. m27-t) = 1 . Tm T (6.23) This is called the Nyquist criterion. The minimum-bandwidth pulse satisfying (6.23) is the ideal lowpass pulse, the sinc function, so we have demonstrated that a bandwidth of at least W =7tIT is required for zero lSI. Put another way, if we are constrained to frequencies I (J) I < W, the maximum symbol rate liT that can be achieved with zero lSI is liT = W 17t. Commonly used pulses p (t) that satisfy the Nyquist criterion are the raisedcosine pulses, given by =[ p (t) Sin(7ttlT)] [ cos(a.7ttIT) ] 7tt IT 1 _ (2a.r IT)2 (6.24) which have Fourier transforms T; l-sin[ in ~ P U.,) ; [ (1"1- ~)]] 0; o S; I (J) I S; (l - a.)7tIT (l - a.)T~ S; I(J) I S; (l + a.)T~ .(6.25) I(J) I > (1 + a.)7tIT These pulses and their Fourier transforms are plotted in Figure 6-3 for a few values of a.. For a. =0, the pulse is identical to the ideally bandlimited pulse (6.16). For other values of a., the energy rolls off more gradually with increasing frequency, so a. is called the roll-off factor. The shape of the roll-off is that of a cosine raised above the abscissa, which explains the name. The pulse for a. =0 is the pulse with the smallest {J (I) a=l a= 0.75 a=0.5 a=O P (j (0) -nIT 0 nIT -2T -T o T Figure 6-3. A family of pulses with zero crossings at multiples of T, for four values of a, the roll-off factor. The Fourier transform of the pulses is also shown. Note the raised-cosine shape, and the excess bandwidth that increases with a from 0% to 100%. SEC. 6.2 PUlSE SHAPES 191 bandwidth that has zero crossings at multiples of 1tIW ~ larger values of a require excess bandwidth varying from 0% to 100% as a varies from 0 to 1. In the time domain, the tails of the pulses are infinite in extent. However, as a increases, the size of t..'le tails diminishes. For this reason, these pulses can be practicalIy approximated using FIR filters by truncating the pulse at some multiple of T. There are an infinite number of pulses that satisfy the Nyquist criterion and hence have zero crossings at multiples of 7t/W. Some examples are shown in Figure 6-4. Example 6-14Consider a channel bandlimited to Iro/27t I ~ 1500Hz. The absolute maximum symbol rate using signaling of the form (6.12) is 3000 symbols per second. If we use a pulse with 10Q% excess bandwidth, then the maximum symbol rate is 1500 symbols per second. 0 63. BASEBAND PAM In Figure 6-}, we are free to design g (t) and f (t), but not b (t). The impulse responses g (t) and f (t) can be chosen to force the lSI to zero, satisfying the zero- forcing criterion. One difficulty with exactly satisfying the ZF criterion is that the channel is rarely completely known at the time the filters are designed. Furthermore, even when the channel is known, the filters required to exactly satisfy the ZF criterion may be difficult or expensive to re".Jize. In this section we describe practical engineering techniques for the design of baseband PAM systems. 6.3.1. tSt and Eye Diagrams With suboptimal filtering, it is useful to quantify the degradation of the signal. A useful graphical illustration of the degradation is the eye diagram, so called because its shape is similar to that of the human eye. An eye diagram is easily generated using an oscilloscope to observe the output of the receive filter, where the symbol timing serves as the trigger. Such displays have historically served as a quick check of the performance of a modem in the field. The eye diagram is also a useful design tool during the analytical and simulation design phase of the system. An eye diagrdlTI consists of many overlaid traces of smarr sections of a signal, as shown in Figure 6-5. If the data symbols are random and independent, it summarizes e t \ ,~,~ & 1P(jw) I ,{.J} i .W.. -2~ 1t 0 11: 2~ _211: 1£ 0 It 2~ _2 1t 1t 0 1t 2~ T T (a) T T T T (l.}) T T T T (c) T T Figure 6-4. The Fourier transform of some pulses that satisfy the Nyquistcriterion. 192 MODULATION 111011111001100010111100111... T (a) (b) Figure 6-5. A binary PAM signal made with SO% excess-bandwidth raised-cosine pulses. A segment of length 2T is shown in detail in (a). The small circles indicate the sample points where symbols are unperturbed by neighboring symbols. In (b), an eye diagram is made by WN ' l where WN is large. Its power is calculated from (3.59), 1 RN(O)=- ooJ SNUro)drNooW="-,-. 2It_ 1t (6.27) (6.28) The varia.'1ce of the noise samples therefore is cr=E[INk 12] = -HaW-N It which can be quite large, depending on W",. 0 (6.29) Suppose that at the receiver we can use a lowpass filter F vro) without closing the eye. Its bandwidth should be as small as possibk to reduce the valiance of the noise samples at the slicer. Example 6-18. Assume the channel noise is that given in the previous example, but the receive filter is an ideallowpass filter, F(jro) =Krect(ro. W). (6.30) where W < WN and K is a normalizing con..c;tant. Define the. noise component after the receive filter as 196 MODULATION * = U (t) N (t) f (t) . (6.31) Its power spectrum, from (3.64), is = = SuU 0) SN U 0) I F U 0) 12 NoK 2rect(0), W) (6.32) and its power is WNnK2/1t. Hence. with an ideallowpass receive filter, the variance of the noise samples at the slicer is WNr/C2 cJl= . 1t (6.33) These results suggest that the bandwidth W of the receive filter should be small to reduce the noise power at the slicer, but how small can we make it? If it is too small or otherwise badly 4esigned it will affect the signal, introducmg lSI. 0 Example 6-19~ Consider putting the signal in Figure 6-10 through a second-order lowpass filter with cutoff (3 dB) frequency at the symbol rate 27t1T. The pulse shape after the filter, an example of a signal, and an eye diagram are shown in Figure f)-iI. Notice that the eye is almost closed. The noise and timing phase immunity of this signal is poor. 0 The task of the receive filter is to condition the signal for sampling. To avoid lSI, the resulting pulse shape p{t) should satisfy the Nyquist criterion, but at the same time the noise power admitted by the receive filter should be minimized. We are free to design the transmit filter G U00), subject to the power (or peak) constraints of the channel, and the receive filter F U00). Optimal design of these filters is deferred to subsequ'ent chapters; here we concentrate on achieving a reasonable pulse shape p (t). \ArvV\jJ\fV~vflcNvw OOllOOOllllllOlllOOllOOlOOlllOOl... (a) t h(t) I~ o T (b) T (c) Figure 6·11. The signal of Figure 6-10 has been put through a second order Butterworth fiiter with cutoff (3 dB) frequency at the symbol rate. The time function is shown in (a), the pulse shape at the receiver in (bJ. and the eye dia{jram in {c). Notice that the eye is relatively closed. SEC. 6.3 BASEBAND PAM The pulse p (t) at the output of the receive filter has the Fourier transform p U(0) = F Uoo)B Uoo)G U(0) . The received filter therefore is given by PUoo) . B Uoo)G U(0) , FUoo) = 0; '* for all 00 such that B Uoo)G U(0) 0 for all 00 such that B Uoo)G U(0) =0 197 (6.34) (6.35) The receive filter frequency response can be safely set to zero for any 00 such that B Uoo)G U(0) = 0 because there is no signal at that frequency, so no information is lost. In practice the transmit filter G U(0) is often dictated by cost considerations, so our choice of P U(0) determines the receive filter according to (6.35). The choice of P U(0) affects the performance of the system because of its effect on the noise. In Chapter 8 we determine the impact of the noise on the probability of error. Not surprisingly, the probability of error decreases monotonically as the signal to noise ratio (SNR) increases. The SNR at the slicer is the power of the signal com- ponent in Qk divided by the power of the noise component in Qk' We write the out- put of the receive filter as Q(t)= L AmP(t -mT)+U(t) m =-<10 * where U (t) = N (t) f (t), so (6.36) Qk = L AmP(kT -mT)+Uk m =-00 (6.37) where Uk = U (kT). If P (t) satisfies the Nyquist criterion, p (kT - mT) = Ok _m and Qk = Ak + Uk . (6.38) The SNR is therefore SNR = E[ IAk 12] CJ2 (6.39) where a2 is the power of Uk (the variance of its samples). We often assume that the symbols are normalized so that E[ IAk 12] = I, in which case SNR = a12 . (6.40) Exercise 6-1. Show that 198 MODULATION 2 J 00 , cJ2 = 2~ I I SN (j (0) B Joo~000 (0) d where r is the region over which B (j (0)G (j (0) ~ O. 0 (6.41 ) Notice from (6.35) that part of the function of the receive filter is to compensate for channel distortion B U(0) within the frequency band of interest. A receive filter is often called an equalizer because it compensates for (equalizes) the channel response. While the receive filter can eliminate lSI, there is a price to be paid in noise enhance- ment. In frequency regions where B Uoo)G U(0) is small but not zero, and P U(0) is not small, the filter will have a large gain, which will amplify the noise and increase the probability of error. This can result from a poor choice of P U(0) for a given channel, but sometimes it is unavoidable if P U(0) is to satisfy the Nyquist criterion. In this latter case, we can view this noise enhancement as a penalty paid for having a channel that introduces lSI. In Chapter 10 we show that a decision feedback equalizer or Viterbi detector can reduce or in some cases eliminate this noise enhancement entirely. These receivers are nonlinear, and have a fundamentally different structure from that in Figure 6-1. Another problem that often arises is a channel response which is not precisely known, or which is time-varying. This problem can be handled with an adaptive equalizer, as discussed in Chapter 11. 6.3.4. Discrete-Time Equivalent Channel From Figure 6-1 the transmit filter, channel, and receive filter can be modeled as a single continuous-time filter. Further, since the input to this filter is discrete-time, and the output is sampled, we can replace this filter plus the sampler with an equivalent discrete-time filter, as illustrated in Figure 6-12. In the figure, * * Pk =P (kT) == [g (t) b (t) f (t)]r=kT (6.42) and * Uk == U (kT) == [N (t) f (t)]r=kT . (6.43) The receiver may consist simply of a slicer and decoder, as shown, if the eye before BITS CODER SYMBOLS DISCRETE·TIME EQUIVALENT CHANNEL Pl I----JI DECODER Figure 6-12. A baseband PAM system can be modeled as an entirely discrete-time system if all continuous-time subsystems are considered to be part of the discrete-time equivalent channel. SEC. 6.3 BASEBAND PAM 199 sampling is acceptably open. Alternatively, more complicated adaptive filters (Chapter 11) are usually implemented as discrete-time filters placed after sampling and before the slicer. The discrete-time channel model summarizes all we need to k.l'10W about the continuous-time portions of the system (transmitter, channel, and receiver) for purposes of computing the probability of error (see Chapter 8) and designing adaptive equalizers. 6.4. PASSBAND PAM Many practical communication channels do not support transmission of baseband signals. Most physical transmission me4ia are incapable of transmitting frequencies at d.c. and near d.c., whereas baseband PAM signals as discussed in the last section usually contain d.c. and low-frequency components. l Example 6-20. Telephone channels, designed for voice. carry signals in the frequency range of about 3003300 Hz with relatively little distortion. Radio channels are restricted to specified frequency bands by government regulatory bodies, such as the Federal Communications Commission (FCC) in the U.S., and constrain these channels to a bandwidth which is small relative to the center frequency. 0 The developtnent in this section could proceed in two ways: we could consider the transmitted signal to be a random process or a deterministic signal. Since the deterministic model is more intuitive, this will be our approach. In other words, we assume that the transmitted symbol sequence is known. The random model leads to similar results, and is analyzed in appendix 6-A. 6.4.1. Modulation Techniques Assume that the sequence at of transmitted symbols that we wish to transmit is known. It can be considered to be an outcome of the random process At used above. Consistent with our notation in Chapter 3, we denote the outcome by s (t) instead of the random process S (t). For mathematical reasons, we also need to assume that the symbol sequence is finite, ail; =0; for !k! >M. (6.44) This ensures the existence of the Fourier transforms that we will need. M can be arbitrarily large, so this assumption is not seriously restrictive. ASSUII1.e, that the real-valued passband signal to be transmitted over the channel is x (t). It was shown in Section 2.4 that any such passband signal can be represented in terms of an equivalent complex-valued baseband signal s (t), where 1 As shown in Chapter 12, the d.c. component can be removed by line coding. but compolleIltr near d.c. usually remain. These cannot be tolerated by many channels. 200 MODULATION x(t)=~'Re{s(t)ejOOct} . (6.45) The relationship between these signals is illustrated in the frequency domain in Figure 6-13. The ~ factor is included to force the energy of s (t) and x (t) to be the same. The frequency roc is the carrier frequency, and controls the center frequency of the modulated signal. Viewed in another way, (6.45) allows us to map a baseband signal s (t) into a passband signal x (t), a process called modulation. Section 2.4 also showed how to recover s(t) from x:(t), a process called demodulation. Usually we choose roc large enough that s (t )e}OOc t has no negative frequency components, and hence is analytic, as shown in Figure 6-13. For digital communication, s (t) can be a baseband PAM signal, in which case x (t) will be called a passband PAM signal. Section 2.4 explained three modulation methods: AM-DSB, AM-SSB, and QAM. Of these three modulation techniques, QAM is preferred, and will be used here. AM-DSB is bandwidth-inefficient, because it forces the upper and lower sidebands to be conjugate-symmetric, or equivalently it requires that s (t) be real-valued. The conjugate symmetric sidebands are redundant, resulting in use of twice bandwidth necessary. AM-SSB and QAM have the same bandwidth efficiency, but AM-SSB is difficult to implement for baseband PAM waveforms (because it requires a phase splitter at baseband, which results in a frequency-domain discontinuity at d.c.). In QAM, the baseband signal is allowed to be any complex-valued baseband signal; that is, unlike AM-DSB and AM-SSB, there is no enforced relationship between the real and imaginary parts of s (t). Hence, the real and imaginary parts can both carry information. For this reason, the spectrum S Uro) shown in Figure 6-13 is not symmetric in general. In the baseband PAM signal, s(t) = L akg(t - kT), k =-- (6.46) we can make the data symbols ak complex-valued, or we can make the baseband ;f\IDl I I ~ro ---.-,--(0 -roc 0 roc -roc 0 roc (a) (b) Figure 6-13. An example of a baseband signal (a), shown in the frequency domain, its analytic passband equivalent (b), and its real-valued passband equivalent (c). The relationship in the time domain is given by (6.45). SEC. 6.4 PASSBAND PAM 201 pulse g (t) complex-valued, or both. Making the data symbols complex-valued is particularly valuable, because it allows us to double the information transferred (we can think of this as transmitting two real-valued data symbols, the real part and the imaginary part). There is no particular motivation to make g (t) complex valued, so we will assume that it is real valued. The modulation of (6.45) in combination with the complex-valued baseband PAM signal of (6.46) will be called passband PAM. The reason we do not use the terminology QAM that was used for the more general modulation context of Section 2.4 is that we reserve the term QAM for a specific choice of data symbol alphabet (as defined in Section 6.5). As considered in Section 6.5, a wide variety of modulation techniques used in digital communication are special cases of passband PAM. A block diagram of a passband PAM modulator is shown in Figure 6-14. The bits are mapped by the coder into complex-valued data symbols ak' and passed through a real-valued transmit filter g (t). After modulation by e jOV , the signal is analytic (having only positive frequency components), and the real part is a passband signal suitable for transmission over a passband channel. The -./2 factor ensures that the energy of the complex-baseband and passband signals are the same. We can compare the bandwidth required for the channel in the baseband and passband cases. If the baseband signal has bandwidth W, then the passband PAM bandwidth is 2W, because of the upper and lower sidebands, as is evident from Figure 6-13; that is, both positive and negative frequency components of s (t) are represented by positive frequencies in x (t), doubling the bandwidth. However, since we can think of passband PAM as transmitting two baseband PAM signals, one for the real part and one for the imaginary, the overall bandwidth efficiency is the same. 6.4.2. Three Representations for Passband PAM We have been using the following representation for the passband PAM transmitted signal in Figure 6-14: ~ ~ x (t) v'2Re{ ej ",' m _am g (t - mTl} . (6.47) If the transmitted pulse g (t) is real-valued we get a second representation, eilJ>"t COMPLEX COMPLEX BASEBAND BITS SYMBOLS TRANSMIT SIGNAL CODER FILTER b/c a/c g (I) S(/) v1Re{· } X(/) Figure 6-14. A passband PAM modulator. The most important difference from a baseband PAM modulator is that the coder maps bits into complex-valued data symbols. 202 MODULATION .J2[ ~ mTl] = z (I) cos(ro., I)m _ ReI am }g (I - -mTl] -.J2[ ~_ sin(cool)m 1m! am }g(1 (6.48) In practice., g (t) is almost always real-valued. Thus Figure 6-14 is equivalent to modulating two real-valued baseband PAM signals, "2 ~ Ret am }g(r -mT). m=-oo L Y2 Im{ am }g(t -mT), m =:---as (6.49) by the carrier signals cos(roct) and - sin(illct) respe<:tively. These two carriers are 90 degrees out of phase with one another, so they are said to be in quadrature, The first term in (6.49), modulating the cos(coc t) carrier, is called the in-phase component, and the second term, modulating the sin(coc t), is called the quadrature component. Example 6·21~ The representation (6.48) suggests a system in which Re{ am } and 1m! am } are selected independently from the same alphabet. We call this type of transmission quadrature amplitude modulation (QAM), and explore it further in Section 6.5. In the literature, the term QAM is often used to refer to any passband PAM signal, but we will use the term in a more restricted way. 0 A realization of {6.48} is shown in Figure 6-15. While equivalent to Figure 6-14, it is obviously preferable for implementation because Figure 6-14 suggests that the imaginary part of s(t)e)())c l is computed, and then thrown away, while in Figure 6-15 the imaginary part is not computed. Nevertheless, in the remainder of this book we will tend to use the complex-valued notation of Figure 6-14 because it is much cos(wc t) I Re{ a.. ) v2Re{ set) ) TRANSMIT BITS Bk CODER ~TER 2g{l) ~ TRANSMIT Im[ a.. ) ~TER 2g (t) v2Im{s(t») QUADRATURE SIGNAL sin(wc /) Fi~ure 6·15. A passband PAM transmitter. 11 performs the same function as the transmitter in Figure 6-14 when the transmit filter g (t) is real-valued. SEC. 6.4 PASSBAND PAM 203 more compact, and because it is easy to recognize situations where the computation of the imaginary part of a signal can be avoided. A third representation of passband PAM follows by representing the data sym- bols am in terms of their magnitude and angle (polar coordinates), am = Cme1'9.. (6.50) so that x(t) =v'2 Re{m t_Cmej('" +o.)g(l -mTl} :i: = -J2 Cm cos(coc t + em)g (t - mT). m =-00 (6.51) Each pulse g (t - mT) is multiplied by a carrier, where the amplitude and phase of the carrier is determined by the amplitude and angle of am' This is sometimes called AM/PM, for amplitude modulation and phase modulation. It suggests that phase-shift keying (PSK), in which data is conveyed only on the phase of the carrier, is a special case of passband PAM. This is in fact true, and will be explored further in Section 6.5. 6.4.3. Passband PAM Receivers A general demodulator structure which allows recovery of the complex-baseband signal s (t) from the passband signal x (t) was displayed in Figure 2-6. First the negative-frequency components are removed using a phase splitter, and then the positive-frequency components of the resulting analytic signal are demodulated to baseband. Unfortunately, a demodulator structure of that precise form is not practical for passband PAM, because it ignores the fact that there will typically be noise introduced in the channel. In addition, there will often even be other signals sharing the same channel with different carrier frequencies, as with radio channels (Chapter 5). In practice, there is the need for a receive filter to reject out-of-band noise and out-ofband signals, just as in the baseband case. In addition, the receive filter can compensate for frequency-dependent distortion on the channel, resulting in a pulse shape that satisfies the Nyquist criterion at the slicer input. Accordingly, assume a baseband-equivalent receive filter f (t), entirely analo- gous to the receive filter in the baseband case. An equivalent passband filter has impulse response 21 (t)cos(coc t). The normalization is such that the passband filter has the transfer function F U(co - coc)) + F U(co + COc )), which is the same transfer function shifted to passband. For example, if F Uco) is an ideal lowpass filter at baseband, then 21 (t)cos(coc t) is an ideal bandpass filter. To emphasize the analogy to the baseband case, we propose to apply the bandpass filter 21 (t )cos(coc t) to the received signal y (t) before demodulating. This is increasingly done in practice. The resulting demodulator and two equivalent structures are shown in Figure 6-16. In Figure 6-16a we have simply added a passband real-valued receive filter before the demodulator of Figure 2-6. This structure has some advantages in important practical circumstances. 204 MODULATION BANDPASS FILTER PHASE SPLmER y(t) 2/ (/)COS(c.oc I) v'2~(/) S(I) (a) y(/) (c) y(/) ANALYTIC BANDPASS FILTER v'2/ (I )eim.t (b) LOWPASS FILTER .J2/ (I) e-im.t S(/) S(/) Figure 6-16. Three equivalent demodulator structures obtained from Figure 2-6 by adding a passband receive filter with equivalent baseband impulse response / (I). Example 6-22. In a microwave radio system, the IF (intermediate frequency) bandpass filter that acts to reject other radio channels can also double as the receive filter in the configuration of Figure 6-16a. Since this filter is fairly expensive to realize using passive components, it is more desirable to realize a single filter rather than the two physical filters required for the analytic bandpass filter. 0 Example 6-23. In a voiceband data modem, the receive filter is often implemented in an analog front-end integrated circuit, and the phase splitter is realized in the discrete-time domain after sampling. The bandpass filter doubles as an anti-aliasing filter as well as a receive filter. The configuration of Figure 6-16a again offers the advantage of putting as much of the filtering as possible in discrete time. 0 In Figure 6-16b we recognize that the receive filter and the phase splitter can be combined into a single filter. The resulting filter is still a passband filter, but it passes only positive frequency components, and not. negative frequency components. Since the impulse response of this filter, -fif (t)e1c.oct , is an analytic s~gnal, we tenn this filter an analytic passband filter. The impulse response -fif (t )e1c.oct is always complex- valued, so the filter will require two real filters for implementation, as shown below: SEC. 6.4 PASSBAND PAM 205 'y" Re(..J2j (t )e j roc' ) ... y Y (t) 'y" Im(..J2j (t )e jroc ') ... y Finally, we display in Figure 6-16c a third structure, in which the receive filtering and demodulation are reversed. The equivalence of this structure is easily established by noting that or equivalently [y * y * e-jroc' (t) (..J2j (t )e j roc')] = (e -j roc' (t» (..J2j (t» . (6.53) Intuitively, it performs the receive filtering function at baseband after first translating the received signal to baseband. The relevant signals are shown (in the frequency domain) in Figure 6-17. It also eliminates the double-carrier-frequency term that is absent in Figure 6-16a and b because the phase splitter eliminates negative-frequency terms before frequency translation. Figure 6-16 shows only how to demodulate the received passband signal y (t) to recover the complex baseband signal s (t), not how to detect the data symbols. The latter can be performed by sampling and slicing as in the baseband case, except that we now have a complex-valued PAM signal rather than real-valued. A complete passband PAM receiver, including both demodulation and detection, is shown in Figure 6-18 for the case of a demodulator preceding the baseband receive filter. Figure Figure 6-17. Signals in Figure 6-16c are shown in the frequency domain. a. The received signal. b. After demodulation (nJ)te the double frequency components at -2OJc )' c. After lowpass filtering and scaling by "2. 206 MODULATION I RECEIVE X FILTER :::::::::: y(t) r(t) {i f(t) q(t) SAMPLER qi f (a) BITS DECODER Gi hi cos(roct) RECEIVE FILTER IN·PHASE SIGNAL --- {if (t) SAMPLER y(t) RECEIVE FILTER {if(t) QUADRATURE SIGNAL ---SAMPLER BITS hi -sin(roc t) (b) Figure 6·18. A demodulator plus baseband receive filter structure for a passband PAM receiver. (a) In terms of complex-valued signals, and (b) the equivalent structure in terms of real-valued signals assuming the receive filter f (t) is real-valued. The detector structure is similar to the baseband case, except that the slicer is designed for complex-valued data symbols. 6-18b assumes that the receive filter f (t) is real-valued, which will often not be true as we will see shortly. However, this structure conveniently illustrates that the receiver can be thought of as two baseband PAM receivers operating in parallel, one using an in-phase carrier and the other a quadrature carrier. A second receiver structure using an analytic passband filter prior to demodulation is shown in Figure 6-19. This structure illustrates a simplification that occurs when we combine the demodulator and baseband PAM detector. Since the symbolrate sampler immediately follows the demodulator, the sampler and demodulator can be reversed. The demodulation is then performed in the discrete-time domain. This has important practical consequences, because it is common to coordinate the choice of symbol rate and carrier frequency at the transmitter so that the quantity roc T assumes a convenient value. For example, if roc T = 21t/N , then the values of e j Ole kT can be easily generated from a lookup table with N entries. This is often simpler to implement than generating e lOlet and performing the multiplication in continuous time. Noise Power Spectrum at Receive Filter Output As in the baseband case, assume that the channel introduces white Gaussian noise with power spectrum SN Uro) = N Q. We can determine the power spectrum of the noise component at the output of the receive filter, again showing that, intuitively, the bandwidth of the receive filter should be made small. We will use the demodulator structure of Figure 6-16b. Assume, as shown in Figure 6-20, that the noise alone SEC. 6.4 PASSBAND PAM 207 ANALYTIC PASSBAND FILTER .fit (t)ejOl./ y(t) V2FU(ro- roc)) w(t) (a) DECODER .fit (t)ejOl./ y(t) V2FU(ro- Ole)) TYUfl}} (b) M -----L...-r:;'\---,---->.--I ~ ro o WI roc "'2 (C) I -ro" rWuro) .. C \ I .ro Q WI Ole W2 (d) DECODER Figure 6-19. Two receivers equivalent to Figure 6-18 using an analytic passband filter. Also shown are the Fourier transforms of the deterministic received signal (cl. the output of the analytic passband filter (d). and the output of the demodulator (e). (no signal) is applied to the demodulator. Now let us detennine the power spectrum of the baseband noise Z (t). From Figure 6-20, the noise M (t) has power spectrum SMUro)=2N o IFU(ro-roc »1 2 . (6.54) Exercise 6-2. Show that and hence D (6.55) (6.56) ANALYTIC &ANDPASS FIlTER N(t) ~1.J2f(t),j~' ~ Z(t) a-ro. Figure The demodulator of Figure 6-1Gb with noise only at its input. 208 MODULATION It is not surprising that the power spectrum of the noise is proportional to the squared magnitude of the receive filter response. Again, this result suggests minimizing the bandwidth of the receive filter. We defer this topic until Chapter 8, where we examine the properties of this complex-valued noise in detail. 6.4.4. Equivalent Baseband Representations The received signal at the input to the receiver (output of the channel) can be written in a form similar to the transmitted passband PAM signal, except that the transmitted pulse shape g (t) is replaced by another pulse shape h (t) that takes into account the effect of the channel, I. y(t)=..J'2Re{e jOV akh(t-kT)} , k =-00 (6.57) neglecting the noise. We call h (t) the equivalent baseband pulse. Again the ..J'2 fac- tor ensures that the power of the passband signal is the same as that of the baseband signal. Exercise 6-3. Show that for a transmitted pulse g (t) and channel impulse response b (t), the received baseband pulse has Fourier transfonn (6.58) For IDc -:F- 0 this spectrum does not usually have conjugate symmetry about d.c., and hence h (t) is in general complex-valued. 0 In the time domain, the equivalent baseband pulse can be written as * = h (t) bE (t) g (t) , (6.59) where bE (t) is the equivalent complex-valued baseband impulse response of the channel bE(t) =e-juJc!b(t) . (6.60) The equivalent baseband transfer function of the channel is BEUro) =BU(ro+ roc)], (6.61) and is therefore nothing more than the passband transfer function in the vicinity of the carrier frequency shifted down to d.c. For some special cases, the equivalent baseband response h (t) is real-valued, as illustrated in the following examples. Example 6-24. . When IDc =0, h (t) is real valued and e) 00.' = I, and thus baseband PAM reception is a spe- cial case of passband PAM reception. 0 SEC. 6.4 PASSBAND PAM 209 Example 6-25. When Lire channel transfer function B (j ID) is conjugate-symmetric about Lire carrier frequency, (6.62) and B·U(-IDc -ID}j=BU(-IDc +ID}] , !ID! T)=.l GU(CO- T m =__ 21t T m )]BE U(CO- 21t m )]FU(COT 21t m )] T (6.67) from (2.17). The equivalent noise Zk = Z (kT) will be studied thoroughly in Chapter 8. Its power spectrum, however, is just an aliased version of (6.56), L 2N 00 Sz(ejCJ>T)=T IFU(co-m 2 ;»1 2 , m =-=0 (6.6&) where T is the sample interval, which in Figure 6-22 equals the symbol interval. The equivalent discrete-time channel model of Figure 6-22 will prove to be very useful since it abstracts all the details of the modulation, demodulation, and filtering into a single simple baseband model. 6.4.6. More Elaborate PAM Receivers: A Preview The receivers that we have described in this section consist basically of a filter, a demodulator, and a slicer. The filter characteristics are derived from a common-sense requirement to reject out-of-band noise and to avoid lSI at the slicer. However, passband PAM receivers can be much more elaborate, as we will see in subsequent chapters. In order to motivate those chapters, we give here a qualitative description of a typical passband PAM receiver in Figure 6-23. It is a practical receiver, although there are many variations. The parts of the receiver that are already familiar are the bandpass filter on the front end, the phase splitter, the demodulator, and the slicer. In fact, the front end consisting of a BPF followed by a phase splitter is mueh like the structure shown in Figure 6-16a. I I NYQUIST-RATE I SAMPLING I INr--1 -} ~+ I TWICE I SYMBOL-RATE II SYMBOL-RATE CARRIER 1f ,n TiMING RECOVE RY II I SAMPLING SAMPLING RECOVERY .1 II PHASE SPLITTER 1 I rn~1 r--FRA-CTIO-NAL---WLY II 11 -. Ir--u11 P:~A:7:~: ):::::::==::4 SPACED X ... C<. . + .L~ ~Ak jIDokT e- PRELIMINARY [I {I1I I1.I1 I~ ~.I.-~ II DEMODUlATION I (8)< ~ ~ II I II EQUAUZER Ftgure 6·23. Biock diagram of a typicai passband PAM receiver. Specific parts of this receiver will be discussed in detail in subsequent chapters. 212 MODULATION After reviewing some basics of detection theory in Chapter 9, we will derive the optimal receiver structure in Chapter 10. We will find that the front-end filtering and {femodulation considered thus far in this chapter is optimal, as long as a particular filter transfer function called tbemLllched fther is used. However, the optimal receiver uses much more complicated mechanisms for detecting the data symbols in the face of lSI. Careful compromises then lead to structures that look more promising and are made fully practical in Chapter 11. One such structure is a decision1eedback equal- izer that consists of afractionaiiy-spaced precursor equalizer {often also called a "forward equalizer") and a pos!cursor equalizer (often calle.d a "feedback equalizer"), as shown in Figure 6-23. The fractionally-spaced precursor equalizer is a filter that performs the function of the matched filter, and also equalizes the precursor portion of the lSI, which is defined as the interference from future data symbols. The postcursor equalizer then removes the postcursor portion of the lSI, defined as the interference from past data symbols. In Chapter 11 we show how the parameters of these filters can be adapted automatically so that characteristics of the channel do not have to be precisely known by the designer of the receiver. Timing recovery is required to derive a symbol-rate clock from the PAM waveform itself, as shown in Figure 6-23 and explained in Chapter 17. There are many different timing re<:{)very schemes available; the one shown here is decisiondirected, which means that it uses the receiver decisions to update the phase and frequency of the clock. It is also shown producing three different sampling rates, all related by rational multiples. The Nyquist-rate sampling at the front end is required if the phase splitter is implemented in discrete time. Of course it need not be, and in fact can be combined with the bandpass filter at the front end, in which case this sa..'llpling operation will not be required. The second sampling rate is at twice the symbol rate; this explains the terminology "fractionally-spaced" for the subsequent equalizer. The final sampling operation is at the symbol rate, since the slicer requires samples only at the symbol rate. There are also connections from the output of the slicer {the decisions) to the two equalizers. These connections are required for adaptation of the equalizers, and imply that adaptation is also decision-directed. Also shown in Figure 6-23 is the carrier recovery, which will be explained in Chapter 16. Until Chapter 16 we will consistently assume that the precise carrier frequency and phase are available at the receiver (except for incoherent passband receivers in Section 6.8), but in practice this is not true. After the phase splitter in Figure 6-23, a preliminary demodulation is done using a carrier with frequency ~. This carrier frequency is not expected to match the transmitter carrier frequency precisely, so phase errors result from the demodulation. These phase errors are corrected by further demodulation, shown as a complex multiplication following the fractionally-spaced precursor equalizer. The reason for this two-step demodulation is that the canier recovery is decision-directed, like the timing recovery. A loop is formed that includes the slicer, the carrier recovery, and a complex multiplier, as shown in Figure 6-23. It will become clear in Chapter 16 that the performance of this structure is considerably improved if there is no additional filtering inside the loop {the postcursor equalizer is harmless in this configuration). Consequently the final SEC. 6.4 PASSBAND PAM 213 demodulation should be done as close to the slicer as possible. The preliminary demodulation, however, is required in order to bring the signal down close to baseband so that the receiver does not have to operate on the high frequency signal. Sometimes this first demodulation can be performed simply by sampling the signal below the Nyquist rate, without using the complex multiplier shown in Figure 6-23. Some possible variations on the receiver shown in Figure 6-23 include the use of error correcting codes (Chapter 13) or trellis codes (Chapter 14), the use of a Viterbi detector instead of the slicer and equalizers, or the omission of the postcursor equalizer (Chapters 10 and 11). It is also practical to design passband signals that are not PAM signals, for example FSK (below) or continuous-phase modulation, in which case the receivers are significantly different. Baseband receivers can also be more elaborate than those discussed in Section 6.3 above, using for example line coding (Chapter 12) and adaptive equalization (Chapters 10 and 11). 6.5. ALPHABET DESIGN Having determined the equivalent baseband and discrete-time channels, we can now address the problem of designing the data symbol alphabet. For the purposes of this section, the entire system may be viewed as a discrete-time system as shown in Figure 6-22. A baseband communication system is just a special case where the symbols Ak , baseband equivalent channel Pk' and the noise Zk are real-valued. 6.5.1. Constellations The alphabet is the set of symbols that are available for transmission. The receiver uses a slicer which makes the decision about the intended symbol. The input to the slicer is a discrete-time signal with sampling interval equal to the symbol rate. When there is no intersymbol interference (lSI), then each sample into the slicer is equal to the transmitted data symbol corrupted by an additive noise that is independent of the symbol sequence. For the receivers considered so far, the noise component of the slicer input sample is Gaussian when the channel noise N (t) is Gaussian, as will be shown in Chapter 8. For our purposes here, we will consider the effect of the noise only at an intuitive level. A baseband signal has a real-valued alphabet that is simply a set of real numbers, for example A ={-3, -1, + 1, +3 }. A passband PAM signal has an alphabet that is a list of complex numbers, for example A = {-1, -j, +1, +j}. Both of these example alphabets have size M =4; each symbol can represent log2 M = 2 bits. A complex- valued alphabet is best described by plotting the alphabet as a set of points in a complex plane. Such a plot is called a signal constellation. There is a one-to-one correspondence between the points in the constellation and the signal alphabet. Two popular constellations are illustrated in the following examples. 214 MODULATION Im(At } • 1t • 4·PSK b ,RelAt } b I (a) Im{At } fC 0 ~ D 0 fC 0 o o 0 Re{Ail Q Q <>c ;3c 0 0 0 0 16-QAM I (b) Figure 6·24. Two popular constellations for passband PAM transmission. The constants b and c aff1JCt the power cl the1Tansmitted signat Example 6-26. The 4-PSK constellation is shown in Figure 6-24a. It consists of four symbols of magnitude b, each with a different phase. Hence the symbols may be written Am = bei*- (6.69) and the transmitted signal may be written (from (6.51» r. X(t)=b+i cos(fficJ +l\lm)g(J -mT) {6.70) ". =-00 where I\lm assumes the four values from the set (0, 1tI2, 1t, 31t12}. The information is carried on the phase of the carrier, while the amplitude of the carrier is constant, which explains the tenn plulse-shift keying (PSK). The 4-PSK ronsteUation is also <:allek are chosen from some alphabet. The phase cl>k is determined by cl>k = cl>k-l + ~k ' (6.81) where the difference in phase from one symbol to the next, ~k' carries the information, not the absolute phase cl>k' Example 6-30. In differential binary PSK (DBPSK.l. one of two phases is transmitted. For this case, these two phases are 1t apart, and the coder can map a zero bit into A" =0 (two successive transmitted phases are identical) and a one bit into A" =1t (two successive transmitted phases are 1t apart). 0 Example 6-31. The IS-54 standard for digital cellular radio in North America transmits two bits per symbol, using is a form of quadrature PSK (QPSK). However, rather than associating these two bits with four phases, in actuality eight equally-spaced phases are used, as soo\\'n in Figure 6-31. At any given symbol (k) the data symbol assumes only one of four phases chosen from the sets {O,1tI2,1t.31t12 } (for odd-numbered symbols) and {1tI4.31t14.51t14,71t14 } (for even-numbered symbols), where these two sets are otTset by 1tI4 relative to one another. Two irJ'onnation bits are coded as a change in phase by one of the values {1tI4,31t14.51t14,71t14 }. The possible phase transitions from one symbol to another are shown in Figure 6-31. The differential phase A" is determined from the two input information bits in accordance with the following table: Bit I Bit 2 I 1 51t14 o 1 31t14 o 0 1tI4 I 0 71ti4 o There are two choices in the receiver design when using differential encoding: coherent or synchrodyne detection, which attempts to learn and track the absolute 222 MODULATION Figure 6-31. The North American IS-54 digital cellular standard uses eight phases to transmit two bits of information. The two bits are mapped into one of four phase transitions from one symbol to the next. These transitions are shown as dashed lines for each starting phase. phase of the received data symbols, and differential detection, which looks at only the change in phase from one symbol to the other, as illustrated in Figure 6-32. Synchrodyne detection (Figure 6-32a) is appropriate when differential encoding is used only to mitigate the rotational invariance of the signal constellation. Suppose the input samples to the detector in both cases are Qk = eJcIlt + at + Zk (6.82) e where e noise. k In is some unknown phase rotation the coherent case, we assume that and Zk is the complex-valued additive k =e, where e assumes certain discrete phases (e.g. any multiple of 1t/4 in Example 6-31) that allow the slicer to work prop- earslsyu.meTdhethQatkthaererecapeipvleierdesttoimaatceosn$vken+tieonraalthselrictehrande$skig' neAdftefor rthtehesli$cber, and it is a differ- ence operation forms an estimate of 11k , independent of e, that directly represents the information bits. ~k Qk 6.k SUCER (a) 14 ~ ~k Qk 6.k SUCER (b) Q;-l Figure 6-32. Two detection techniques for DPSK. (a) Coherent, which requires an accurate phase reference, and (b) differential, which allows an arbitrary slowly-varying phase rotation of the data symbols. SEC. 6.5 ALPHABET DESIGN 223 The second alternative, differential detection, is shown in Figure 6-32b. This approach avoids tracking a rapidly varying channel phase. For this case, the statistic QkQk·-1 is ,formed before ~e slicer. The slicer is designed to have the proper thresholds for e JAt rather than e Jl1li. There are two consequences of this: • In the absence of noise, the input to the slicer is the proper phase !ik regardless of a. This is valuable on channels with rapid phase variations, since it means that the carrier phase does not have to be tracked. • There is an increase in the noise at the slicer input; this is the price paid for the insensitivity to phase rotation. We will now verify these two properties. The slicer input is = QkQk·-1 ejAtej(ei -ei-I) + e j (l1li +ei)Zk·_1 + e - j(l1li-1 +eH)Zk + ZkZk---1 . (6.83) a Assume that the phase rotation k does not change too much from one symbol to the next (ak ::: ak-l)' This is a valid assumption as long as the symbol rate is high relative to, the rate of phase change. With this assumption, the signal term at the slicer input is a eJ6i , independent of k. Looking at the noise terms, ZkZk·-1 is the product of two noise terms, and hence will typically be insignificant. The phase factors multiplying Zk and Zk·-l do not affect their variance. Approximating these terms as independent, the total noise variance is now = E [I Zk + Zk·-l 12] 402 , (6.84) or twice as large as in the coherent case. There is thus roughly a 3 dB penalty for differential detection (twice as much noise power). A more refined analysis that takes account of the correlation of the two noise terms reveals that the penalty is actually about 2.3 dB at high SNR. 6.5.3. Spectral Efficiency Recall that spectral efficiency, defined in (6.7), is a measure of the bit rate achieved per Hz of bandwidth. To determine the maximum achievable spectral efficiency for QAM, we use (6.8), repeated here for convenience, v = -l-o-gm2M:- ' (6.85) = where M InA 1 is the alphabet size, B is the bandwidth in Hz, and T is the symbol interval. In section 6.2.1, we showed that the minimum bandwidth signal that satisfies the Nyquist criterion has bandwidth W =1tIT radians/sec, or B =1I2T Hz (see (6.23)). Thus, for a minimum bandwidth baseband PAM signal that avoids lSI, BT = 1/2, so v =2·log2 M bits/sec-Hz. (6.86) Any higher spectral efficiency would imply lSI. For passband PAM, the same minimum pulse bandwidth is required, but the modulated signal will occupy twice the channel bandwidth (see Figure 6-13). Thus, the best spectral efficiency for a passband PAM signal that avoids lSI is 224 MODULATION v =log2 M bits/sec-Hz (6.87) In both cases, the pulse shape required to achieve this spectral efficiency is impractical, so this bound is not achievable in practice. Lest the reader infer that passband PAM has lower spectral efficiency than baseband PAM, recall that the alphabet can have more symbols in the passband case without significantly compromising performance. In fact, if we use QAM and transmit N levels on each of two quadrature carriers, the spectral efficiency is = = v log2 N 2 2'log2 N bits/sec-Hz, (6.88) the same as for the baseband system with N levels. So the efficiency of baseband and passband PAM are effectively identical, other considerations being equal. Example 6-32. To achieve 4.5 bits/sec-Hz in a digital radio system (6.87) implies an alphabet size of at least M =23, but considering the need for some excess bandwidth, and the convenience implied if M is power of two, M will be larger in practice. Let B be the spacing between carriers in a frequency-division-multiplexed digital radio system. Then the nominal bandwidth available on each carrier is B and a zero excess bandwidth system would have a symbol rate of liT = B . In fact, the FCC transmission mask (Section 5.4) can be met for a digital radio system with liT =(3/4)08 and raised-cosine shaping with ex =0.5 [1]. The signal bandwidth is therefore 3/2'IIT = 9/808 or 12.5% larger than the available bandwidth. This is acceptable, since the resulting interference with the adjacent carrier is small (the band edges of the raised-cosine pulse are small enough). The resulting spectral efficiency is IOg2M 3 v = - - = -log~ BT 4 (6.89) and 4.5 bits/sec-Hz can be achieved with M =64. Thus, the number of points in the con- stellation is more than twice as great as with zero excess bandwidth. This is the price paid for practical filtering characteristics and tolerance for timing errors (Chapter 17). 0 6.6. THE MATCHED FILTER -ISOLATED PULSE CASE In Sections 6.3 and 6.4 we derived receiver structures for PAM without fully specifying the receive filter f (t). It was argued that in order to eliminate lSI at the slicer input, the receive filter should be designed to yield a pulse satisfying the Nyquist criterion at the slicer. There are two problems with this that must be addressed: • The Nyquist criterion does not uniquely specify the pulse, and hence the receive filter. Within the degrees of freedom available, we would like to choose a receive filter that maximizes the signal to noise ratio. • At this point, we have no indication that the receiver structure assumed in Sections 6.3 and 6.4 is the best possible. SEC. 6.6 THE MATCHED FILTER - ISOLATED PULSE CASE 225 The full answers to both questions will have to await further developments in Chapters 7 through 10, where optimal receiver structures are derived in the presence of lSI. We can move one step closer to answering these questions here if we arbitrarily eliminate any lSI considerations by assuming that only one pulse is transmitted, and design the receive filter to maximize the signal-to-noise ratio at the slicer. This is called the isolated pulse case. We will derive the matched-filter receiver and then show that it is equivalent to the correlation receiver. 6.6.1. Baseband Case lSI can be ignored if we transmit a single data symbol A o. Then the received signal in the baseband case is Y(t) =Ao'h(t) +N(t) (6.90) where h (t) is the real-valued received pulse shape and N (t) is additive white Gaus- sian noise. We will assume a receiver structure similar to Section 6.2, consisting of a real-valued receive filter f (t), followed by a sampler at t =0 and a slicer, except that now we will be able to unambiguously optimize the receive filter since we do not have to be concerned about lSI. The slicer input is Q o = f Y('t)f(t -'t)d't 11=0= f Y('t)f(-'t)d't. (6.91) Note that the slicer input is of the form of a cross-correlation of the received signal with the time-reversed receive filter impulse response. Substituting (6.90) for Y (t), Qo=Aof h('t)f(-'t)d't+ f N('t)f(-'t)d't. (6.92) The first term is the signal and the second term is the noise. The variance of the noise term is easily shown to be (6.93) Intuitively, making the first term (the signal term) in (6.92) larger while keeping the second term (the noise term) constant should improve performance. Assume that power constraints on the channel prevent us from accomplishing this by either increasing the magnitude of h (t) or increasing the symbol spacing in the alphabet. Thus, assume h (t) and A 0 in (6.92) cannot be changed, and select the filter f (t) that maximizes the power of the first term relative to the second. Let al =E[IA o I2]. (6.94) and define the signal to noise ratio to be 226 MODULATION (6.95) We can uniquely choose the receive filter 1 (t) to maximize (6.95). To do this, we need the integral form of the Schwarz inequality, given in vector form in Section 2.6. For any two (possibly complex) integrable functions 1 1(x) and 1 2(X), I I [fI(X)f; (x)<1>: 2 " [[ If ,(x) 12 <1>:] [[ 1J,(x) 12<1>:] (6.96) with equality if and only if1 2(X) = KI l(x) for some constant K [12]. This is actually the same as the Schwarz inequality of Section 2.6, but using the L 2 inner product. Since everything in (6.95) is real-valued (we are considering the baseband case only at this time), the SNR is maximized if I(t) =Kh( - t) (6.97) for some constant K. This choice of receive filter is called the matched filter. Using this in (6.93) and (6.95), we can express the matched-filter bound on the SNR as SNRoS ~ 0 20, 2 No (6.98) where o~ is the energy in the received pulse, (6.99) In the sequel, we choose K =1, since any other choice affects the signal and noise terms equally. To recap, the signal-to-noise ratio is maximized by choosing the receive filter to be (within a constant) the time-reversal of the received pulse shape 1 (I) =h ( - t). This filter, which has transfer function H·U (0), is called a matched filter. The matched filter performs perfect phase equalization, since the transfer function of the pulse at the output of the matched filter, IH U(0) 12, is real-valued. It will be shown in Chapter 8 that under certain assumptions, for the isolated pulse case, this choice of receive filter minimizes the probability of error. From (6.91), the matched-filter receiver output is the correlation of the received signal with the pulse h (t), Qo= fY(t)h(t)dt (6.100) This implementation is known as the correlation receiver, and is shown in Figure 6-33 together with the equivalent matched-filter receiver. Viewing this in signal space (Section 2.6), we obtain an intuitive justification of the matched-filter receiver. The SEC. 6.6 THE MATCHED FILTER -ISOlATED PUlSE CASE 227 y(t)=Aoh(t) +N(t) MATCHED FIlTER t=O f----i -:---- Qo f (t) = h (- t) SAMPLER f ;\0 (a) y(t)=AohCt)+N(t) X h(t) J Qo (b) Figure 6-33. Two equivalent baseband PAM receiver structures, (a) a matched·filter reo ceiver and (b) a correlation receiver. For an isolafed pl1lse, these receivers maximize the SNR at the slicer input. receiver is taking the signal-space inner product of the received signal with the known pulse, or equivalently calculating the component of the received signal in the direction of the known pulse. Components in other directions in signal space must be due to the noise. If h (t) is causal. as will usually be the case, then the matched filter is anti-causal. To implement it in practice, h (t) is assumed to be finite in length, h(t)=O for t?'C (6.101) for some constant C , and the causal matched filter f'(t}=h(C -t} (6.102) is implemented. A similar assumption is required to be able to compute the integral in the correlation re.ceiver in finite time. It should be em}}hasized that the preceding optimization ignores the effect of lSI. In general, if we use a matched filter as our receive filter, we will introduce. lSI. However, no lSI occurs when the received pulse h (t) is confined to one symbol interval. Exercise 6-4. * Show that if h (t) = 0 for t < 0 and T < t. then the pulse shape at the output of the matched filter, h (t) h ( - t), is time-limited to two symbol intervals, - T ~ t ~ T, and furthennore goes to zero at t = - T and t = T. Thus, such a pulse shape at the output of the matched filter satisfies the Nyquist criterion. 0 More generally, we can say that if the pulse shape at the output of the matched filter obeys the Nyquist criterion, then the matched filter is the optimal receive filter, in the sense that it maximizes the SNR. For a received pulse h(t), the pulse at the output of the matched filter has Fourier transform 1H U(0) 12. The Nyquist criterion thus becomes, at the output of the matched filter, 228 MODULATION i 2; ) 1 S" (jm) =~ ~ H(j(m +m . ~ 2 = m =-00 (B.103) Of course, there will be no lSI if "1" is replaced by any constant, since that will simply scale the signal level at the slicer input. The quantity Sh (j co) is called the folded spectrum of the received pulse. It will playa key role in Chapters 7 through 10, where we consider lSI in detail. Equation (6.lO3) depends only on the magnitudeiH(jro)i, as illustrated by the following example. Example 6-33. The raised cosine pulses given in (6.24) have a Fouriertransfonn (6.25) that is real-valued and non-negative for all co. Therefore, a simple way to satisfy (6.103) is to use a pulse h{t) and receive filter f (t) with Fourier transfonns equal to the square root of the raised cosine. =" HUoo) =FUoo) PUoo) , (6.104) where P U(0) is given by (6.25). The corresponding time domain pulse shapes are [13], h{t) =f{t) = 40. _cos(O + a.) 1t tiT) + T sin«(l - a.) 1t t IT)/(4 a. 1) 1t..{f 1-(40.tlT)2 (6.105) Convolving such a pulse with itself will yield the raised cosine pulse of (6.24), so using such a pulse and receive filter results in no lSI at the receive filter output Such pulses are callelated pulse is y(t)=-.f2·Re{Aoh(t)ej~l} +N(t), (6.106) where now the received pulse h (t) may be complex-valued (if the channel introduces dispersion) and the data symbol .t\ 0 is certainly complex-valued. The matched-filter or correlation receiver fDr this case, shown in Figure 6-34, is similar to Figure 6-33. We will now show that for an isolated pulse, these receivers maximize the SNR. Considering again a receive filte.r f (t), we showed in the development leading up to Figure 6-21 that the slicer input sampled at time zero is J = Q0 A o' h ('t)f ( - 't) d't + 2 0 (6.107) where Z 0 is complex noise. Using reasoning similar to that leading to (6.56), it can be shown that this noise has variance SEC. 6.6 THE MATCHED F!tTER -ISOlATED PULSE CASE 229 MATCHED ALTER !(t)=hO(-t) Qlt ==~ SAMPLER J+ Alt (a) f .40 (b) Figure 6-34. The (at matched-fiiter and (bt correiation receiver shown for passband signais. The SNR is given by 2 all J I h('t)! (- 't)d't SNRo =- - - -co - - - - - (6.108) (6.109) Again applying the Schwarz inequality (6.96), we condude that SNRo ~ 0~20:2 ' (6.110) a1 where is given by (6.99). Now equality holds if and only if ! (t) =h \ - t). Again, this filter is caned a matched filter. For a baseband equivalent pulse h (t), the baseband equivalent matched filter has impulse response h •( - t ) and transfer function (as in the baseband case) H·Uoo). As in the baseband case, the Nyquist criterion at the matched filter output is satisfied if the folded specuum Sh (j (0) of (6.103) is a con- stant. The matched filter output sample at t =0 is equivalent to the cross-correlation of the complex baseband received signal with the waveform h •(t ). 6.7. SPREAD SPECTRUM Spread spectrum systems are PAM systems that deliberately use pulses with much more than the minimum bandwidth 1tIT required by the Nyquist criterion. From (6.98) and (6.110), the SNR achieved with a matched-filter or correlation receiver 01 depends on the energy in the received pulse h(t), but not on its bandwidth. So 230 MODULATION from the perspective of SNR, there is no hann in using a pulse with a broad bandwidth, as long as a matched filter receiver is used. There are several reasons for using large bandwidth: • Pulses with a broader spectrum are less sensitive to channel impainnents that are highly localized in frequency. Such impainnents arise, for example, with frequency-selective multipath fading. • Spread spectrum signals are less vulnerable to jamming, in which a hostile party is trying to deliberately disrupt the communication. • Spread spectrum signals can be concealed. By using very wide bandwidth pulses, these signals can be placed in regions of the spectrum already occupied by other signals, and in effect be masked by the other signals. • Many spread spectrum users can share a common bandwidth without interfering much with one another. Consider the jammer situation. Suppose that the bandwidth of the received pulse h (t) is B. Suppose that the total power of the jammer is limited to PJ' and that it transmits bandlimited white noise with power spectrum No =PJ 12B within the bandwidth of the pulse. With a matched-filter receiver, SNRo(}='2(}~'2 =2B(}~'2(}.'2 No PJ (6.111) As the bandwidth B increases, so does the SNR! Recall that in the signal space view, the matched-filter or correlation receiver calculates the component of the received signal in the direction of the known pulse. The intuition behind spread spectrum is that it minimizes the effect of a particular impainnent as long as that impainnent has most of its energy in other directions in signal space. We will study this approach in more detail in Chapter 8. 6.8. ORTHOGONAL MULTIPULSE MODULATION In baseband PAM, symbols Ak are multiplied by a pulse g (t) and combined for transmission, = S(t) L Akg (t - kT). k =--00 (6.112) A single pulse shape g (t) is used in one symbol interval, and amplitude modulated by the (possibly complex-valued) data symbol Ak • We can generalize this model by allowing the pulse shape in any symbol interval to be chosen from a set of N possibilities, (gn (t); 0 =::; n < N -I}, to represent log2 N bits of information. The transmitted signal can then be written as SEC. 6.8 ORTHOGONAL MULTIPULSE MODULATION 231 Set) = L K4.(t - kT) k =.-0 (6.113) where Ak takes on values in the set [0, N -1]. The data symbol thus indexes which pulse is transmitted in the k-th symbol interval, rather than the amplitude of the pulse that is transmitted. If the pulse set is orthogonal and equal energy, meaning that (6.114) a;, for some constant then we call this orthogonal multipulse modulation. In this sec- tion, we will initially follow the simplification of Section 6.6, and ignore the effects of lSI. Thus, we will transmit and receive a single isolated pulse, and design detection strategies that do not take into account the effects of lSI. After establishing some basic receiver structures for the isolated pulse, we will then generalize the Nyquist criterion to design orthogonal signals that avoid lSI. For reasons seen shortly, orthogonal multipulse signaling has poor spectral efficiency, and hence is rarely used when bandwidth is at a premium. Nonetheless, it is valuable as a starting point for more elaborate techniques that combine it with PAM (Section 6.9). 6.8.1. Baseband Equivalent Model For passband systems, we will often allow the pulses (gn (t); 0 ~ n ~ N -1} to be complex baseband equivalents. In that case, the transmitted passband signal will be X(t)=v2Re(e}Cl>c t S(t)}. (6.115) An alternative viewpoint is to define the passband equivalent pulses gn(t) = ..,r2Re( e}o:v gn(t) } . These can then be used to form directly the passband signal (6.116) x (t) = L gAj(t - kT) k =.-0 Both interpretations will be useful. (6.117) Exercise 6-5. Show that if two complex-baseband wavefonns are orthogonal as in (6.114), then their passband-equivalent real-valued wavefonns are also orthogonal. Thus, orthogonality in baseband and passband are equiva1enL You will need to assume that the carrier frequency is at least equal to the bandwidth of the baseband signal. 0 6.8.2. The Correlation Receiver We have assumed a simple receiver structure for PAM in which a receive filter eliminates out-of-band noise. A matched filter, with impulse response equal to the conjugate of the time-reversed pulse, was found to be the one receive filter that maximizea SNR for the isolated pulse case. The matched-filter receiver can also be 232 MODULATION interpreted as a correlation receiver, which cross-correlates the received signal with the transmitted pulse and feeds the resulting correlation to a slicer. In this section, we adapt this receiver structure to orthogonal multipulse. A receiver for the signal in (6.113) needs to distinguish different pulse shapes in each symbol interval, not just different pulse amplitudes. Intuitively, the received signal can be cross-correlated with each candidate pulse shape. The pulse shape that correlates best with the signal can reasonably be assumed to be the one that was transmitted. Fortunately, the resulting receiver is equivalent to the matched-filter receiver. Thus, we have the happy situation that we can maximize SNR and distinguish orthogonal pulses simultaneously, at least for the isolated pulse case. We will begin by considering real-valued pulses, which might be passband equivalent pulses as in (6.116). Let a received pulse be hn (t) for some 0 ~ n ~ N -1. We will assume the effect of the channel transfer function is benign, so that the received pulses hn (t) are orthogonal and equal energy, or (6.118) For practical applications, such as multicarrier and code-division multiple access, discussed below, this assumption is usually valid. It is also obviously valid for channels with inherently flat frequency responses, such as certain radio channels. As in Section 6.6, ignore lSI by considering a received isolated pulse. Assume the received signal Y (t) is corrupted by Gaussian white noise, Y(t) =hn (t) +N(t). (6.119) The correlation receiver forms N cross-correlations, (6.120) cr; for 0 ~ i ~ N -1. Since the possible received pulses hj (t) are orthogonal, and pulse n was transmitted, K n will be equal to plus noise, while K j for i '# n will be noise only. So it makes intuitive sense to choose the maximum K j to decide which pulse was transmitted. This correlation receiver is illustrated in Figure 6-35. It works very well for orthogonal multipulse since, by the orthogonality property, the output of a crosscorrelation against one pulse shape will have a zero signal component if any of the other pulse shapes is actually transmitted. From a signal-space perspective, each K j looks only in the direction of hj (t) in signal space by forming an inner product (cross-correlation) of hi (t) with the received signal. Our intuition will be confirmed in Chapter 9, where the correlation receiver is shown to be optimal under the assumption of additive white Gaussian channel noise. For the. passband case, we should use the passband-equivalent pulse hn (t) = Re{ e J°V hj (t) } in place of hn (t) in (6.120), to obtain SEC. 6.8 ORTHOGONAL MULTIPULSE MODULATION 233 R (t) =h" (t) + N (t ) SELECT 1 - - -.... LARGEST Figure 6-35. An isolated-pulse correlation receiver for baseband multipulse transmission, where the received pulses are assumed to be real and orthogonal. J =~Re{ R (t)ht(t)dt ) (6.121) where the demodulated received signal is R(t)=y(t)e-jOV . (6.122) This interpretation of the receiver is shown in Figure 6-36. The receive signal is y(t) = 11" (t) + N (t) SELECT LARGEST Figure 6-36. An isolated pulse correlation receiver for passband orthogonal multipulse, using baseband equivalent pulses. 234 MODULATION demodulated with a complex exponential and correlated with the baseband equivalent pulse. The scaled real part of the result is used to make the decision. This receiver structure works entirely with the baseband equivalent pulses. The correlation receiver can be implemented as a set of matched filters. Define fj(t) =h/( - t) (6.123) and note that I Ki =V2Re{ [fi(t l*R(tl] ,=o} =V2Re{ R(tl!i( - tl dt} I =V2Re{ R (tlh,"ttld t} . (6.124) Hence the correlations K j can be computed by sampling the output of a filter with impulse response equal to the time-reversed conjugated pulse. A matched-filter receiver is shown in Figure 6-37. Detection of an isolated pulse is only the beginning, of course. To detect a sequence of pulses, the matched-filter receiver in Figure 6-37 can be modified so that samples are taken at multiples of T, rather than just once at t =O. As we will see in Chapter 9, this will prove to be optimal if such sampling does not result in intersymbol interference. Y(t) =h,,(t) +N(t) 1=0 SELECT LARGEST Figure 6-37. A matched-filter receiver for an isolated pulse in muttipulse transmission, using baseband-equivalent pulses. SEC. 6.8 ORTHOGONAL MULTIPULSE MODULATION 235 6.8.3. The Generalized Nyquist Criterion There is a fundamental lower bound on the bandwidth required by an orthogonal multipu1se signal, assuming that we wish to avoid lSI. The Nyquist criterion, discussed in Section 6.2, states that for baseband PAM with symbol rate T, the minimum signal bandwidth is 1trr radians/sec or 1/2T Hz. We can now generalize the Nyquist criterion and show that the minimum bandwidth of orthogonal multipulse is N 1trr radians/sec or N 12T Hz. Thus, the requirement that there be N orthogonal pulses in the symbol interval increases the minimum bandwidth requirement by N. Assume the receiver structure of Figure 6-37, and for an isolated-pulse input, sample the matched-filter outputs at all integer multiples of T. To avoid lSI, if the signal input is pulse hn (t), then the samples at the output of the filter matched to hn (t) must satisfy the ordinary Nyquist criterion, I hn(t)*hn*<-t) t=Kf =Ok' O~n ~N-1. (6.125) = In addition, to avoid crosstalk between pulses, if hn (t) is the input to a filter matched to pulse h{ (t), for I '# n , then the output sampled at t kT must be zero for all k, * I = hn(t ) ht( - t) t = Kf 0, I '# n, - k < < 00 00 •• These conditions can be written together in a compact form, (6.126) (6.127) We can express these conditions in terms of an equivalent frequency-domain cri- terion. Let hn (1) have Fourier transform HnU (0). When we input hn(t) to a filter matched to h{(t), the output has Fourier transform HnUoo)H/Uoo). Sample this at t = kT, the discrete-time Fourier transform has to be unity for I = n and zero for I '# n , and hence f ..!.. HnU(oo+m21t))HtU(oo+m21t)) = O{-n' T m=- T T (6.128) Equation (6.128) is called the generalized Nyquist criterion. Using this, we can show that in order to avoid lSI, the aggregate of N orthogonal pulses occupies a minimum bandwidth of N 1tIT, or N times the minimum bandwidth of ordinary PAM. First, we show that a bandwidth of N 1tIT is sufficient to satisfy (6.128) by displaying a pulse set that meets the criterion. Exercise 6-6. Let h() 12T)] [( l ,. t = "JV1=T [sin(1tt2 1ttl T cos n + I~/2.) -1tt T (6.129) for n = 0, '" ,N-1. Show that these pulses are ideally bandlimited to the range 1tn IT ~ 10)1 < 1t(n +l)IT, as shown in Figure 6-38. so that the aggregate bandwidth occu- pied by the first N pulses is N 1r./T. Also show that they satisfy (6.128). 0 236 MODULATION T ~ I H 1t/T i H"Um) i ~~ Figure 6·38. Time domain (top) and frequency domain (bottom) plots of the pulses in (6.129) for n = 0, 1, 2, and 3. Since these are ideally bandlimited pulses, they are not practical. Nonetheless, they demonstrate that a bandwidth of Nn/T is sufficient to satisfy (6.128). In Appendix 6B, we also show that this bandwidth is necessary. Spectral Efficlen~y of Orthogona~ MuUipulse The minimum bandwidth of orthogonal multipulse with no lSI is N nIT radians/sec or N I2T Hz. Consequently the best spectral efficiency is logiN) logz =0 or 7t depending on which phase is being transmitted. The nominal {;arrier frequency is COc = (000 + 0)1)/2 . (6.151) One representation for the transmitted MSK signal is therefore X(t)= L00 7tt sin(coc 1 +bk 2T + k)w(J -kTL k =-00 (6.152) where bk is determined by the data and k ensures phase continuity. Exercise 6-9. Show that to maintain phase continuity we need 41* = 41*-1 + (hi -1- hi )1tk/2 mod 2x . o {6.IS3) Expression (6.153) explicitly shows the dependence of the phase in each symbol interval on the data. The matched-filter receiver of Figure 6-46 performs roughly as well with binary MSK as with FSK. This will be studied in Chapter 8. In Section 12.4, we will show that the performance can be improved to make MSK better than FSK, in that the probability of error will be lov.0 (6.157} In each symbol interval of length T, N symbols are simultaneously transmitted using N distinct pulses, as shown in Figure 6-50. Because the pulse shapes are orthogonal, the superposition of pulses can be sorted out at the receiver by a bank of matched filters. Note that S (t) can be a complex baseband signal from which we can easily form a passband signal (6.158) PHASE SPLITTER MAGNITUDE DETECTOR --~ ~t) H (a) ----1..:::. H LOWPASS FILTER (b) Figure 6-49. a. An ideal envelope detector uses a phase splitter to get a complex signal. The magnitude of the pI'lase splitt9l' output is equal to the amplitude of the sinusoidal input (see Problem 6-23). b. An approximate envelope detector uses a peak detector and a iowpass filter. 250 MODULATION BITS CODER S(t) FIgure 6·50. A transmitter for combined PAM and orthogonal multipulse. Example 6-43. PAM is clearly a special case of (6.157) in which N = 1. For passband PAM (6.157) represents a complex equivalent baseband signal. Iitterestingly, sometimes the passband PAM signal can be represented directly as a combined PAM and multipulse signal. Consider a passband PAM signal where the carrier frequency Flgwa 6-51. ACoHelatian receiver for a combined PAM/multipulse modulation format. pulse is assumed to carry independent data, and hence has its own slicer, as shown in Figure 6-51. If the pulses satisfy the generalized Nyquist criterion, there is no crosstalk between pulses at the matched filter output, sampled at the appropriate time, so each slicer responds only to its corresponding pulse. It is not necessary that the N data symbols be chosen independently, as illustrated by the following example. Example 6-46. In orh'1Ogonal multipulse, only one of the tV pulses is transmitted in each symbol interval. This can viewed as combined PAM/multipulse where all the symbols but one are set to zero in each symbol interval. We can think of the N symbols as a vector, where this vector is constrained to have a single unity component, and all the remaining are zero. Thus, the components of the data symbol vector are not chosen independently. 0 Where the symbols are not chosen independently, we would not want independent slicers for each pulse, but would rather have one N -dimensional slicer, as shown in Figure 6-52, that takes account of the dependence of the symbols. The correlation receivers shown in Section 6.8 for orthogonal multipulse are a special case of this design. Discrete-Time PAMIMuLtipuLse The combined PAM and multipulse. transmitter of Figure. 6-50 and receiver of Figure 6-51 can be quite expensive to implement, particularly if they are implemented in continuous-time, as shown in the figures. Practical implementations, therefore, often implement a simpler transmission system in continuous time, as shown in Figure 6-53b, deriving from it a discrete-time equivalent channel. The combined PAMlmultipulse signal is generated in discre.te time and transmitted over this discrete-time equivalent channel. 254 MODULATION y(t) Figure 6-52. A correlation receiver for combined PAM and multipulse where the symbols modulating each pulse are not chosen independently. BITS CODER (a) \ .... \'...., \\ \\.... . \\\\"'\\[g(N-'~' f:--- ..--.-,.---...-----~.---- ...-..---...---.--.-.j-------_.:,; Ir. i b> 1, Pi ... •••• kT'!:, ••• ••• ••••• •••• ••••• .J (b) Figure 6·53. (a) A combined PAM/multipulse system where the combination is implemented in discrete-time and transmitted over a discrete-time equivalent channel. The design is shown for an isolated pulse only. (b) The discrete-time equivalent channel can be modeled by a transmit filter, baseband-equivalent channel, baseband-equivalent noise, receive filter, and sampler. SEC. 6.9 COMBINED PAM AND MULTIPULSE MODULATION 255 A single symbol interval, or isolated pulse, is given by a discrete-time version of (6.162), N-l SIc = L AO,ng,,(n), '1::0 (6.163) where g,,(n) is the n -th pulse. As before, the pulses g,,(n) are required to be orthonor- mal, ~ g,,(n)[glm)]* =Bn-m . (6.164) " =-00 Often, they are chosen to be time-limited, consisting of say a vector of K samples. In order to have N orthononnal K -dimensional vectors, it is necessary that K ~ N, with a typical choice of K = N. A discrete-time correlation receiver for the isolated pulse case, analogous to Figure 6-52, computes the decision variables Kn = ~ R" [g,,(n)]*, for n =0, .. , , N - 1 , (6.165) " =-00 where R" is the discrete-time received signal. This is shown in Figure 6-53. Isolated pulses can be cascaded in time to fonn a complete signal. In the spirit of (6.157), N-l SIc = L LAmN,IIg,,~~. m =-00 '1::0 (6.166) Notice from the subscript of the symbols AmN,n that the symbol interval is N times the sample interval. This is intuitive, since we are transmitting N symbols in one symbol interval. So as not to compromise the robustness of the system, we should transmit N samples of SIc per symbol interval. In other words, to transmit N symbols AmN,n, for n = 0, ... , N - 1, we will transmit N values SIc' for k = mN, ... , (m + 1)N - 1. If T is the symbol interval (in seconds) as before, then the sample interval of the discrete-time system will be T' = TIN. We will now give two additional examples of combined PAMlmultipulse modulation, multicarrier modulation and code-division multiple access (CDMA). Both of these are commonly implemented in discrete time, but can also be implemented in continuous time. 6.9.1. Multicarrier Modulation Consider forming a PAM/multipulse combination using the pulses gn(t)=1#e j OJ t 'w(t) (6.167) where T ; ron = 21tn for n =0, ... ,N-l, (6.168) 256 MODULATION and w (t ) is a rectangular windowing function I; OSt but the effect on Lire <1ecision variables Kn \\'OUld be more useful. Since Kn is the inverse DFf of Rk , a similar relationship in tenus of DFTs rather than DTFI's will precisely ~uantify this. Assume that Pt is zero outside the range 0 S k < M, where M S N is some integer. In other words, the number N of separate carriers is large enough that one symbol interval NT' =T is longer than the impulse response of the channel. Thus, we can write the noise-free channel output as a finite convolution, N-l Rk = LPiSk-i . i=O (6.176) Let wk be the circular convolution of Sk and Pk, N-l wk= 'f,PjS(k -i)modN . i=O (6.177) While the frequency-domain representation of the ordinary convolution (6.176) is (6.175), the frequency domain representation of the circular convolution (6.177) is Wll =Ao.n Pll' (6.178) where An,n is the DFT (not the DTFI) ofSt (see Figure 6-54), Pn is the DFT of Pk , = ""N-l Pn ~ Pk e-j2TrnkIN , k=O (6.179) and Wn is the DFf of wk' To the extent that a circular convolution is different from the ordinary convolutioo, (6.178) differs funda.'llentaUy from (6.175). With a modification to the modulation format we can make the drcular convolution equal to the ordinary convolution. Suppose that we precede the N symbols St, SEC. 6.9 COMBINED PAM AND MULTI PULSE MODULATION 259 os k < N , by M redundant symbols, S-i == SN-i ,for 1 SiS M , (6.180) as shown in Figure 6-55. In this case, (6.177) is equal to the ordinary convolution (6.176). The signals on which we will base our decisions are the DFf of the received signal Rb which will now be Kn =AO.nPn . (6.181) In other words, the N symbols A O,n are simply scaled by the N complex constants Pn (the DFf of the channel response). For a finite impulse response Pk' the DFf Pn is the DTFf sampled at frequencies = (J) 2nn INf, The frequency response of the channel determines the effect of the channel on the decision variables Kn , as quantified by (6.181). In most practical situations, we do not know precisely the frequency response of the channel. Often, however, Pn can be estimated and compensated from observa- tions of Kn • For instance, if we estimate that Pn is especially small for some n, then we might reallocate our encoding so that fewer bits are transmitted on the n -th carrier. The mechanism used to estimate Pn is related to the decision-directed techniques used for adaptive equalization (Chapter 11) and carrier recovery (Chapter 16). Referring to Figure 6-54, if the decisions An are all correct, then we can determine what the vari- ables Kn , for n =0, ... ,N - 1 should have been by computing an inverse DFf. Comparing what these variables should have been to the actual observation, using (6.181), we obtain an estimate of Pn for n == 0, .. , ,N - 1. The price paid for this ability to adjust to the channel is the redundant transmission in Figure 6-55 and (6.180), called cyclic extension [18]. If N is large compared to M, where M is the length of the impulse response of the discrete-time equivalent channel Pk' then the overhead associated with transmitting an extra M symbols per block of N becomes insignificant. Even without this redundancy, if the impulse ! i~ i ,~ / . / Sk !:-: .•...• ;o .. -.......\ • ;: .... ~.P_.-_. . :...._.__, 1 : : ::: 1 :: k CYCLIC EXTENSION ORIGINAL SAMPLES Figure 6-55. For multicarrier transmission, cyclic extension of the samples Sk simplifies the effect of the channel on the decision variables K". 260 MODULATION response of the channel is short compared to the symbol interval T, the difference between the circular convolution (6.177) and the ordinary convolution (6.176) might be small enough that (6.181) is close enough to be useful [19]. 6.9.2. Code-Division Multiplexing Another application of combined PAM and multipulse is multiple access, where distinct transmitter-receiver pairs share a single channel. Each transmitter-receiver pair would typically use only one of the orthogonal pulses. As long as the other transmitter-receiver pairs use different orthogonal pulses, the receiver structures given above are effective in separating the signals. Reversing the order of summation and rewriting (6.157) in the form N-l S(t)= L Un(t), Un(t)= L Ak,ngn(t-kT), n=O k=-oo (6.182) we can now think of S (t) as being the superposition of N PAM subchannel signals (Un (t) , 0 ~ n ~ N -I}. as shown in Figure 6-56. Each of these PAM signals transmits an independent stream of data symbols {A k n (t), - 00 < k < oo} using its own distinctive pulse shape gn (t). All these N PAM signals can share the same chan- nel as long as the pulses they use are orthogonal to one another. The matched filter in the receiver for one subchannel will not respond to the pulse shapes used by the other subchannels, as long as the set of pulses satisfies the general Nyquist criterion. Example 6-47. In multicarrier modulation, the pulses are chosen to be sinusoids of different frequencies. A multiple access scheme based on this set of pulses would be termed a frequency-division multiple access (FDMA) system. 0 BITS CODER BITS Ai,l S (t) CODER g 1(t) t-------,./\ BITS 1JrCO-DE-R -~ ...= ..= ,~=~I I Figure 6-56. In code-division multiple access (COMA), N bit streams share the same channel by using N distinct orthogonal pulses gj (t). SEC. 6.9 COMBINED PAM AND MULTIPULSE MODULATION 261 Example 6-48. An alternative is to choose a set of broadband pulses gil (t), each one of which fills the entire bandwidth of I ro I ~ N ·7t!T. For this particular choice of pulses gm (t), (6.182) is known as code-division multiple access (CDMA). This topic is covered more fully in Chapter 18, but suffice it to say that CDMA is but one of several multiple access methods. The pulses used in CDMA are typically generated using a pseudorandom sequence, generated using a technique described in Chapter 12. For now, observe that these pulses can have broad bandwidth (the pseudorandom sequence ensures this) and will be orthogonal to one another. 0 Because the pulses in Example 6-48 individually have much greater bandwidth than would be dictated by the symbol rate, CDMA is related to spread spectrum. In Section 6.7 we described spread spectrum as a technique that can counter certain types of noise and interference signals by greatly expanding the bandwidth of the transmitted pulse. It has very poor spectral efficiency, but can be used in situations where spectral efficiency is not important. In CDMA, each PAM subchannel signal looks like a spread spectrum signal. Unlike in spread spectrum, however, the motivation is not to counter jamming signals so much as to allow other PAM signals (using different orthogonal pulse shapes) to share the same channel. Of course, it is also possible in CDMA to expand the bandwidth beyond N 'nlT , thereby gaining both multiple access and immunity to jamming signals of the type discussed in Section 6.7. 6.10. OPTICAL FIBER RECEPTION Optical fiber (Section 5.3) is quite different from other media, and different modulation and receiver techniques are used. The most common optical systems use direct-detection receivers. In direct detection the intensity or power of the light is modulated by a data signal, and the detector converts the received power into an electrical current. Since power is always non-negative, the data signal is nonnegative. The simplest case is on-off keying (OOK), where a "zero" is represented by zero intensity and a "one" is represented by positive intensity. Most commercial optical fiber digital transmission systems today use OOK. The technique does not require that the source, LED or laser, be capable of producing a single frequency, which reduces its cost. The channel from source input x (t) to detector output y (t) in Figure 5-18 is a baseband channel, and can be used for baseband PAM transmission, but OOK uses only two levels. Multilevel baseband transmission is nonnally avoided because at the very high data rates of fiber transmission, the more complex receivers are not justified, given that the symbol rate can be increased easily using better fiber, sources, and detectors. The optical fiber itself has such a high bandwidth that the limitation on bit rate is imposed by the electronics in the transmitter and (particularly) the receiver. Another reason for avoiding multilevel transmission is that it is difficult to control the transmitted power accurately enough. 262 MODULATION Since optical fiber with direct detection can be modeled as a baseband PAM channel, we might prematurely conclude that no special treatment is required. However, the noise encountered in optical fiber systems is fundamentally different from that in most other media. In Chapter 8, we analyze the fundamental limits of direct detection, and also introduce a different class of coherent modulation. 6.11. MAGNETiC RECORDiNG The magnetic recording medium (Section 5.6), like optical fiber, has special properties, and as a result a special fonn of modulation and detection has evolved. The reading head is differentiating. and henc.e as seen in Figure 5-40 the output of the channel responds to transitions in the write head current rather than pulses as in a conventional channel. As a result, the data is typically encoded as the presence or absence of a transition in the write waveform, rather than as a positive or negative pulse. This form of modulation is called NRZ/, and is illustrated in Figure 6-57. Each "one" in the input is translated into a transition, and each "zero" is encoded as no transition. Note that transitions must alternate in sign, so that the direction of a transition has no particular significance. At the output of the read head, a "one" can be recognized by the presence of a pulse, positive or negative, and a "zero" by the absence of a pulse. The received signal actually has three levels, but both the positive and negative levels have the same meaning, so there is only one bit of information per symbol. The data detection often Figure 6-57. The NRZI write waveform and the resulting read voltage. PLAYBACK SIGNAL G-i P I THRESHOLD FigUr.~. A~atiKl peak i:IeIedOf for magnetic recofdlng. DETECTED TRANSITIONS SEC. 6.11 MAGNETIC RECORDING 263 assumes the form shown in Figure 6-58, known as a gated peak detector [20]. The playback signal is lowpass filtered to eliminate out-of-band noise and passed through a peak detector (the upper leg). The peak detector consists of a bandlimited differentiator followed by zero-crossing detector. This operation is of course very noisy, and subject to spurious zero-crossings due to noise during the periods of no signal ("zeros", or no transitions, or absence of pulses). To counteract this effect, the lower leg is a pulse validator which ensures that only zero-crossings in the presence of a pulse pass through the gate circuit. The validator consists of a fun-wave rectifier and threshold operation to suggest the presence of a pulse. The output of the peak detector is a logical pulse for every valid pulse peak in the read signal. A peak (or zerocrossing) in a particular symbol interval indicates a write transition in the symbol interval, or in other words a "one" bit. 6.12. FURTHER READING More detailed information about FSK can be obtained from Proakis [161 and Lucky, Salz, and Weldon [15], both of which derive the power spectrum of continuous-phase FSK signals. For tutorials on MSK, we recommend Pasupathy [21] and Haykin [22}. The relationship between MSK and QPSK modulation is described in detail by Gronemeyer and McBride [23J and Morais and Feller [24J. For a more general treatment of continuous-phase modulation, see Section 12.4 and the references cited there. Multicarrier modulation dates back to the 1960s, with early work carried out by Chang [14J, SaItzberg [25], Powers and Zimmerman [26], and Darlington [27]. The use of the DFF to construct the transmitted signal is described by Weinstein and Ebert [28J and Hirosaki [19J. Kalet [29] and Ruiz. Cioffi. and Kasturia [301 considered the design of the symbol sets and allocation of bit rates when the channel is distorted, with the latter applying coset coding (Chapter 14). The technique has been applied to the voiceband data channel [3IJ, the high-speed digital subscriber loop (HDSL) channel and the magnetic recording channel [34}. APPENDIX 6-A MODULATING RANDOM PROCESSES In this appendix we consider the modulation of a WSS complex-valued random process with a complex-valued carrier. We do this as a sequence of straightforward exercises, thereby highlighting the main results without cluttering the appendix with algebraic manipulation. Consider a random process defined as Z (t) = S (t)e j wet (6.183) where S (t) is a (possibly complex-valued) WSS random process. 264 MODULATION Exercise 6-12. Show that if S (t) is zero-mean, then Z (t) is WSS and = Rz (t) Rs (t)ejlJl,'t • o (6.184) (6.185) The conclusion of this exercise is that if an equivalent baseband PAM waveform S {t) is zero-mean and WSS, then the complex-carrier modulated waveform Z (t) is also WSS. We saw in appendix 3-A that a random phase epoch is required in order for a baseband PAM waveform S(t) to be WSS. The WSS random process Z (t) is complex-valued. For passband PAM it is only the real part that is transmitted, so we need to relate the power spectrum of the real part of a complex-valued random process to the power spectrum of the complexvalued random process. In order to do this, it turns out that we need to first investigate the joint wide-sense stationarity of Z (t) and its complex-conjugate Z* (t). Ex«~ise6-13. Show that Z (t) and Z* (t) are jointly WSS if and only if Rsdt) =0 • (6.186) or in words, the baseband signal S (t + 't) is uncorrelated with its complex-conjugate S· {t) for all delays t. Show further that if (6.186) is satisfied, (6.187) o To gain insight into this condition, we need to investigate the relation Rss·(t) =O. Define the real and imaginary parts of S (t) as S(t)=R(t)+j!(t) . (6.188) Exercise 6-14. Show that Rss·(t) =Oif and only if (6.189) = = RRI(t) -R1R(t) -RRI(-t) . (6.190) In paIticular, (6.190) requires that RR/(O) =0, and hence the real and imaginary pa.rts must be uncorre1ated when sampled at the same time. 0 The conditions of Exercise 6-14 can be summarized as follows: the power spectrum of the real and imaginary parts must be identical, and the cross power spectral density SRl(t) must be imaginary-valued because RRI('t) is odd. APP.6-A MODULATING RANDOM PROCESSES 265 The next question is whether Rss· ('t) = 0 is satisfied for a passband PAM signal. First, (6.189) requires that the real and imaginary parts of the baseband comp1exvalued PAM signal have identical autocorrelation functions (and hence power spectra). Second, (6.189) requires that the cross-correlation function between real and imaginary parts be an odd function of delay 'to Exercise 6-15. For a passband PAM signal, the equivalent baseband PAM wavefonn is of the fonn S(t)= ~ Ale h(t-kT+8) Ie =-00 (6.191) for a possibly complex-valued pulse shape h (t) and random phase 8. Assume that Ale is WSS and independent of 8. Show that a sufficient condition for Rss'('t) =0 is E[AIeAm ] =0 (6.192) for all k and m and without regard for the distribution of 8. Show further that for (6.192) to be satisfied it is sufficient that the real and imaginary parts of Ale have the same autocorrelation function and be uncorrelated with one another. 0 Now let us find the power spectrum of the real part of a complex modulated baseband signal. Let X(t)=v2r;:Re{Z(t)}= 1 • ..J2[Z(t)+Z (t)]. (6.193) Exercise 6-16. Show that X (t) is WSS if and only if S (t) is zero-mean and WSS and Rss' ('t) = O. Show further that its autocorrelation function under this condition is (6.194) o The power spectrum of the real-valued modulated signal is also of interest. Exercise 6-17. Show that if the X (t) of (6.193) is WSS, its power spectrum is made up of shifted and subtracted versions of the power spectrum of S (t), Sx (j (0) =O.5[Sz(j (0) + Sz (-j (0)] =O.5[Ss(joo- jooc) + Ss(-joo + jooc)] (6.195) o To summarize, we have shown that under reasonable conditions on the sequence of data symbols and random phase epoch, the passband PAM signal is WSS. Specifically, those conditions are that the data symbols be WSS and have statistics that satisfy the conditions of Exercise 6-15 and that the baseband PAM waveform 266 MODULATION have a unifonnly distributed random phase. If these conditions are satisfied, it is not necessary for the carrier to have a random phase. It is important to realize that (6.195) is not valid for any power spectrum SsUro), but only those which satisfy the conditions of Exercise 6-14. Exercise 6-18. Show that if the conditions of Exercise 6-14 are satisfied. then Ss Vro) can be written as SsVro)=2SR Vro)-2j SR/Vro) , (6.196) where SR Vro) and SRI Vro) are the power spectra and cross power spectrum of R (t) and J(t) in (6.188). D Assume the power spectrum of a complex-valued baseband signal is as shown below: rf\UW: W Using (6.195) we can sketch the power spectrum of X(t)="2Re{S(t)e j CJ>c t }. (6.197) Assume roc is large compared to the bandwidth of S (t). The result is shown in the following figure: Sx Uro) ~ r ~ ~ro -roc 0 WI roc W 2 "2 The usual factor is used to ensure that the power of X (t) is the same as the power of S (t). Exercise 6-19. Show that when X (t) is WSS, its power Rx (0) is the same as the power of S (t). D APPENDIX 6-8 THE GENERALIZED NVQUIST CRITERION In this appendix, we first show that a bandwidth of N '1CIT is required to satisfy the Nyquist criterion, 1.- i Hn[j(ro+m T m =__ 21C T )~1 Ht[j(ro+m~)l~ = 0l-n. (6.198) Then we demonstrate a class of practical pulse shapes that satisfy the criterion with a APP.6-B THE GENERALIZED NYQUIST CRITERION 267 bandwidth close to the minimum. Proof of Necessity In (6.129) and Figure 6-38 we showed a pulse set with bandwidth N relT that meets the criterion. This shows that the a bandwidth of N relT is sufficient. We will now show that it is also necessary. The left side of (6.198) is a periodic function of co with period 2rc1T. Conse- quently, we need only verify (6.198) for CO E [- rclT, re/T]. (If hn (t) is real for all n, then we need only verify (6.198) for CO E [0, reIT]. By conjugate symmetry of ° Hn Uco) it is automatically satisfied for the rest of the range.) Assume that all pulses hn (t) for each ~ n ~ N -1 lie within the frequency range Ico I ~ K relT for some arbitrary integer K (we already know that the generalized Nyquist criterion can be met with bandlimited pulses). Then the summation in (6.198) becomes finite, so for CO E [- relT, reIT], 1 ~ Hn U(co + m 2re »HtU(co + m 2re» = 0l-n' T m =-M T T 1 (6.199) where M 1 and M 2 are integers that depend on K. Specifically, we want to make them as small as possible while maintaining the equivalence of (6.199) and (6.198). We require that the range [co - M 12re1T, co + M 22re1T] be at least as large as the total bandwidth of the pulses [ - K relT , K relT] for each co E [- relT, reIT]. If K is odd, then the smallest values are M 1 = M 2 = (K - 1)/2, so there are K tenns in the summa- tion. If K is even, we can use different values of M 1 and M 2 in the ranges co E [-reIT, 0] and co E [0, reIT]. For CO E [-reIT, 0], we can use M 1 = (K 12) - 1 and = = = M 2 K 12. In the latter range we can use M 1 K 12 and M 2 (K 12) - 1. In all cases, the number of tenns in the summation is M 1 + M 2 + 1 =K. For each fixed co, define a vector Un (co) consisting of the K tenns in the summa- tion, Hn U(co +m 2re T » for m = -M l' .. , ,M2' The dimensionality of Un (co) is K. (If real-valued pulses are desired, then co can be restricted to the interval co E [0, relT] with the constraint Hn*( - jco) = Hn Uco).) Now we can write (6.199) as ° ~ Un'(co)ut(co) =0l-n' -reiT ~ co ~ relT , ~ n ,f ~ N -1 , (6.200) where Un'(co) is the transpose of Un (co). Thus, the generalized Nyquist criterion can ° be satisfied if, for each co E [-reIT, reIT], a set of N orthogonal equal-length K- dimensional Euclidean vectors Un (co), ~ n ~N-l, can be found. Clearly, N orthononnal vectors can be found if their dimensionality is at least N , or K ~ N, and cannot be found for smaller K. Thus, a bandwidth of N 'reiT will suffice, as confinned by the earlier example. We have argued that for N orthononnal pulses bandlimited to an integer multiple of relT, the multiple must be K ~ N to satisfy (6.199). Choosing the minimum value, K =N , we can now show that the entire bandwidth [ - N re IT, N re IT] must be used. Thus we prove that the bandwidth cannot be reduced further from N 'reiT . 268 MODULATION Specifically, note that if there is any value of 00 in this interval where all N vectors are zero-valued, then {6.200} cannot be true. To see this, define the NxK matrix H(ro) where row n is Hn ' (m), and note that (6.200) is equivalent to l-H(OO)HH (00) =I T (6.201) for all 00 E f-ltIT, ltlT], where HH (00) is the conjugate transpose and I is the identity matrix. Hence H(oo) must have full rank:. Furthermore, when N =K, the matrix is square, so this implies that each component of the vectors H n (00) must be non-zero for some 0 ~ n ~ N -1, or else H(oo) would have an all-zero column. This in tum implies that HnVm}* 1, the minimum-ba..'ldwidth set {)f pulses is n{)1 unique. To see this, note that if H(oo) satisfies (6.201), then UH(oo) will also satisfy (6.201) for any unirory matrix = U (see Problem 6-29). (A matrix U is unitary if U-1 UH , where UH is the conjugate transpose.) For the four pulses shown in Figure 6-38, if we simply number them left to right, then in the range ro E (O,1tIT), o 0 1 O~ H(oo} ="'rT.;:;- 0 0 1 0 0 0 0 1 {(j.2T) = l. Show that E[ IAk 12] = l. Show that the power spectrum of the transmitted signal is independent of T. Hint: Use the results of appendix 3-A, where a random phase is introduced to make the transmit signal WSS. Find the receive filter F U0» such that the output of the receive filter has a pulse shape with Fourier transform PUo» = Trect(o>,1tIT). (6.212) Does this pulse satisfy the Nyquist criterion? (d) Find the SNR at the slicer with p(t) given in part (c). (e) Find the SNR at the slicer when the pulse p(t) has the triangular spectrum of Figure 6-4b, given by . { T - 10>1 T2127t ; 10>1 < 27tf[ P U0» = 0; otherwise (6.213) Compare this SNR to that in part (d). 6-8. Suppose the baseband PAM system of Figure 6-1 is designed so that the complex-valued pulse p (t) has the Fourier transform shown in the following figure: ~~ ro * * where p (t) =g (t) b (t) f (t). 27tf[ 47tIT (a) Does this pulse satisfy the Nyquist criterion? Does Re{ p (t) ) satisfy the Nyquist criterion? (b) Suppose that the symbols Ak are uncorrelated, so that S.... (eja>T) = 01. (6.214) Assume no noise on the channel, N (t) =O. Give the power spectrum of the input to the sampler Q(t). Sketch it (c) 6-9. Findp(t). * * Consider a channel where pulse. Assume SNUo»=N othaendeqSu..a.. l(iezjead>pTu)=lsoel.p (t) =g (t) b (t) f Find the SNR after (t) the is a raised-cosine receive filter as a function of the excess bandwidth (for the range of zero to 100%) for the following three transmitted pulse shapes and an ideal bandlimited channel: (a) The transmitted pulse is an impulse. (b) The transmitted pulse is a raised-cosine pulse with the same excess bandwidth as desired at the receiver. 272 MODULATION (c) The Fourier transfonn of the transmitted pulse is the square root of the Fourier transfonn of a raised-cosine pulse (tedious). 6-10. Suppose you are to design a digital communication system to transmit a speech signal sampled at 8 kHz with 8 bits per sample. Find the minimum bandwidth required for each of the following methods: (a) Binary antipodal baseband PAM. (b) Binary antipodal passband PAM. (c) 4-PSK. (d) 16-QAM. Define bandwidth to cover positive frequencies only. as shown in the following figure: 6-11. Consider a channel with bandwidth 10 kHz: __.L-_---'_l_B_v~r,o=)===~_ ~_ro 10kHz (a) What is the frequency response BEVro) of the baseband equivalent channel? What is the impulse response bE (t )? * * (b) Let P (t) = g (t) bE (t) f (t), where g (t) is the impulse response of the transmit filter and f (t) is the impulse response of the receive filter. Enter the maximum bit rate achievable with zero lSI using the following methods: • 4-PSK withp(t) being a minimum-bandwidth pulse. • binary antipodal with p (t) being a 50% excess-bandwidth raised-cosine pulse, • 16-QAM with p (t) being a 100% excess-bandwidth raised-cosine pulse. • 16-QAM with the transmit pulse p (t) shown below: ----$~---+/ -T T (c) Assuming the 4-PSK signal of part (b), give transfer functions for filters g(t) and f (t) (and justify). Assume an additive white Gaussian noise channel. 6-12. Derive the Nyquist criterion for a passband PAM channel. Use the results from the baseband case if possible. What is the minimum bandwidth required on the channel? 6-13. Consider a Hilbert transformer, which is a linear filter with impulse response and transfer function given as ~t v ::~. h(t)= j H ro)=-jSgn(ro)={7 ; (6.215) Show that if x(t) =cos(root) is the input. then y(t) =sin(root) is the output. Show further that if x(t) = sin(COot) is the input. then y(t) =-eos(root) is the output. Any sinusoidal input experiences a 90 degree phase change. 273 6-14. Consider an analytic signal z(1) = Re( z(t)) + jlm( z(t)). (6.216) (a) Show that 1m ( z (t)) can be obtained from Re ( z (t) ) by filtering Re ( z (t) ) with the Hilbert ttansfonner of Problem 6-13. In other words, *I 1m ( z (t) ) = - Re ( z (t ) ). xt (6.217) (b) 6-15. Show that -Re( z (t) ) can be obtained from 1m ( z (t) ) using the same filter. Show that if h (t) =0 for t > 0 (i.e. h (t) is anticausal), then the real and imaginary parts are related by a Hilbert transfonn in the frequency domain * Im( H (j c.o) ) = _1_ Re( H (j c.o) ). xc.o (6.218) 6-16. Consider a discrete-time signal Zk = Re( Zk ) + jlm( Zk ) (6.219) satisfying Z(e imT ) =0 for -xlT < c.o < 0 (6.220) where T is the sampling interval. An example is shown in the following figure: Z(e imT ) ----f--~----->t-I-br----i--+~-------->O...+~c.o -2E- x o x 2E- T T T T By analogy, such signals are called discrete-time analytic signals, although the tenn " analytic" mathematically apply sequences. 1m (zk ) obtained filtering re( zk discrete-time hilbert transformer -j h(ei(jjt)="j" { 0< c.o < xlt -xlt (6.221) impulse 2sin2(1tk 12) xn ; nlco hk="{" 0; (6.222) shown following figure: -x2hk -i 6-17. you given xk sample t fourier ttansfonn 274 x iillt r;="-- ~ I r==" \. 11: (j)c ooc-w ~c roc+w 1t told real part analytic zl. b'lat im{ zl result xl hl i~h(eiillt) cllc -w cilc +w --oo'lc--'---o ,e 1---11:'1_ _..l-_ _----l-.~ fir pass g. approximating transfer function below: g(eiii>T) I r=tl I [00 11: me -w {} W Cilc T 11: T Design a practical filter hl approximating the filter in part (a). 6-18. Design a bardware configuration for coders for the constellations in Figure 6-28. 6-19. (a) Show that b'le binary FSK pulses given in (6. I 38) are onhogonal when ~ - (1)1 = 21tJT (6.223) and tol + ~ = K21tlT for some integer K. {b) Show L'":atthey are also orthogonal when ~-tol =1tIT (6.224) 6-20. lb"'ld WI + Wz = K1tIT for some integer K . Consider the binary CPFSK with pulses given by {6.149) and shown in Figure 6-45c. Assume the nominal carrier frequency satisfies r r ' 000 + 001 K1I: {l}c = 2 (6.225) 6-21. for some integer K. Show that the frequency spacing of MSK signals (6.148) is the minimum frequency spacing that results in crthogonal pulses. Consider designing an MSK transmission system with N = 8 pulses of and if k is odd thenIk =Ik -\< Hint Use (6.153). (c) Use parts (a) and (b) to show that 1: X(t)=cos(roct) p(t -kT)(-I)kI2Ik k even +sin(roct) LP(t -kT)(-I)(;t+l)1ZQk, todd (6.227) where p (t) = sin(rtl2T)(w (1) + w (t - T») (6.228) is one half of one cycle of a sinusoid. This is a passband PAM signal with pulse shape p (t ) (which extends over 2T). Notice however that since one of Ute summations is over even k and Ute oUter is over odd k, the in-phase and quadrature parts of the signal are offset from one anoUter by T. The symbol rate is 112T, Ute in-phase symbols are (-I)k12I k for even k, and Ute quadrature symbols are (_I)(k+l)l2Qk for odd k. 6-23. Sh"hJaot wh~UtmataigfnAi tcuodse(roofc th)~iscofemdpilnetxooaunt piudteasligpnhaalsies splitter to produce h~ constant A l..fl. a complex output signal, Henee, h~ amplitude of a sinusoid (its envelope) can be found using Ute structure in Figure 6-49a. 6-24. Consider Ute combined PAM and orthogonal multipulse modulation of (6.157). Suppose you are to achieve a total bit rate of 19.2 kb/s using N = 128 distinct orthonormal pulses. Assume each pulse witll modulated using a 4-PSK constellation. Find the symbol interval T. 6-25. (a) Show Utat Ute D1Ff of Sk in (6.172) is given by 1 N-l S lei6)T'~ = , I N '~f>' A " G Ie)IDT') ,,' ,,=0 where G" (e iIDT ) is Ute D1Ff of Ute sampled and scaled pulses = gl") e iZffIIk /N Wk . (6.229) (6.230) (b) Show Utat . r 1 2mt - roT Slnl J 2 iN-1(ZffJI-IDT) S.lnrl 1J G" (eic.ff) = e 2., 2rtn - roT 2N (6.231) Hint: Ute following summation identity may be useful Ea N-l k 1- aN =-- . k=O 1- a (6.232) 276 MODULATION 6-26. 6-27. 6-28. For a possibly complex-valued WSS stationary random process S (t), define a cosinemodulated version Z (t) == {ic;os(roc t + 8)S (t) , (6.233) where 8 is uniformly distributed over (O,2x) and is independent of S (t). Show that with this uniformly distributed carrier phase, Z(t) is WSS and find its power spectrum. Further, show that without the random phase Z (t) is not WSS, and explain why not. Show that when a complex carrier ej( =xy •. 0 SEC. 7.1 SIGNAL MODEL 283 Example 7-5. If the received signal is a complex-valued signal y (t), it can be considered an element of an inner product space if (7.2) In this space, the inner product is defined as J = x(t)y·(t)dt (7.3) o Our goal is to observe Y, and use it to detect which of the L signals was actually transmitted. In the process we have communicated log2 L bits of infonnation. Inner Products Used in this Chapter For discrete time, we will use the inner product 00 = L XkYk* k=- (7.4) where the summation may also be finite in a finite-dimensional Euclidean space. For continuous time, we will use the inner product J = X (t)y*(t)dt , (7.5) where the infinite integral may be replaced by a finite integral if appr~riate. In both cases we will be considering the space of signals for which II X II = < 00. Thus, in this chapter, it will be assumed that {Sl,S2' ... SL} and Y are finite-energy discrete-time or continuous-time signals. This finite-energy assumption for Y is problematic if it includes a stationary noise component. However, we will find in Chapter 9 that the receiver design technique of this chapter still applies, although different arguments will be needed. The Signal Subspace For many of the signal designs considered later, the number of possible signals L is quite large, and it is advantageous to reduce the complexity of the receiver by going to a two-stage process. This is possible because the dimensionality of the signal set is often much smaller than L. Let Ms be the subspace of signal space spanned by the L signals. Since the number of signals is finite, Ms must be finite-dimensional; thus, assume it has dimension N, where N ~ L. (Note that N was also defined in Section 6.8 as the dimension of a signal set, in that case a signal set consisting of N orthonormal signals.) Also choose an orthononnal basis for M s consisting of {cI>1,cI>2' ... ,cI>N}, where j,cI>j> = Bj.j . Then the signals can be written in tenns of this basis as 284 SIGNAL and RECEIVER DESIGN N = L Sl Sn,lcf>n n=l where the Sn I are scalar (real or complex) quantities, Sn,/ = *~~n>' (7.6) (7.7) Example 7-6. The signals within one symbol interval for PAM combined with orthogonal multipulse modulation are precisely of the fonn (7.6), where the S",l are data symbols, typically chosen independently for different n. 0 7.1.1. Minimum-Distance Receiver Design Criterion We assume that the received signal is a vector Y=S/ +E, (7.8) for some I ~ I ~ L, where the I-th signal Sl is the signal actually transmitted and E is some unknown noise or error. Geometrically, our receiver chooses the signal, from among those in the set of known signals {S/' 1 ~ I ~ L }, that minimizes II Y - S[ 11 2, where IIY-S/1I 2 =[IIYII 2 +IIS/11 2 -2.Re{}] . (7.9) Since the term II Y 11 2 is independent of I, it can be ignored, so equivalently the mr receiver can use the criterion [Re{}-~E/], E/= IIS/1I 2 . (7.10) Here E[ is the energy of the I-th signal, which is not a function of the received signal Y and hence can be pre-computed. The receiver calculates a set of L real-valued decision variables Re{ }, which are the real part of the inner product of the received signal with the set of known signals. Example 7-7. The role of the real-part function Re{ } can be better appreciated by considering the onedimensional Euclidean case. The real part of the inner product is then Re{ } =Re{xy·} =Re{x }-Re{y }+Im{x }'Im{y } (7.11) which is equivalent to the inner product in two-dimensional real-valued Euclidean space, where Re {x } and 1m{x } are considered to be the two components. Thus, there is a geometric equivalence between one-dimensional complex and two-dimensional real Euclidean space. Taking the real part of the inner product is the key to fonning that equivalence. 0 Often L is quite large. making the number of decision variables large. In this case. the receiver complexity can be reduced considerably by using an N-dimensional orthonormal basis for the N -dimensional signal subspace. Substituting for the signal in terms of such a basis, we obtain SEC. 7.1 SIGNAL MODEL 285 Re{ } =Re{ 1N: * Sn,l} . n=l (7.12) The receiver calculates the smaller set of decision variables consisting of inner products, Cn =n>' l~n ~N, (7.13) and then uses the decision criterion f max [Ref CnSn*,1 } - V2El] I n=l (7.14) In other words, if the receiver forms the inner product of the received signal with each of the N orthonormal basis vectors, then it can easily deduce the inner product with each of the L signals. It is easily verified that (7.14) is equivalent to N L min ICn -Sn 1 12 , I n=I ' (7.15) which minimizes the distance in N-dimensional Euclidean space. (Multiplying O.I5) out and throwing away the tenn that is independent of I gives precisely (7.14).) Thus, we have shown that, for an N -dimensional subspace of signals, the minimum-distance receiver design can be recast as a minimum-distance problem in N-dimensional Euclidean space. To get to that point, the receiver first has to calculate the N decision variables fck' 1 ~ k ~ N }, which are the components of the received signal Y in the direction of each of the basis vectors {C1>k' 1 ~ k ~ N }. These decision variables summarize the entire received signal, for purposes of calculating the distance. Mtntmum Dtstance rn the Signal Set If we design receivers according to the minimum-distance criterion, we expect intuitively that signal sets in which the signals are far apart have an advantage over signal sets in which signals are close together. Since the receiver observes which of the possible signals is closest to the received signal, it stands to reason that it is less likely to make an error due to noise or other errors when the other signals are further away. One important measure of the noise immunity of a given signal set is the minimum distance between signals, defined as d . = min mID • . liS· I -S·JII . I ~J (7.16) This minimum distance is the shortest distance between any pair of signals. We will show in Chapter 8 that for Gaussian noise, d min is the single most important parameter in predicting the probability of error with a minimum-distance receiver design. 286 SIGNAL and RECEIVER DESIGN Equivalence of Passband and Complex Baseband Signals This subsection will show that the minimum-distance criterion for a passband signal is equivalent to the minimum-distance criterion for the corresponding complex baseband signal. Thus, in this chapter we focus exclusively on complex baseband signals. If a set of passband signals (aU with the same <:arrier frequency but different pulse shapes) (pj (t), 1 SiS L J has corresponding complex baseband signals (S/(t), 1 S I SL}, where = Pl(t) .,J2·Re{ sl(t)ejO),t }. 1 SJ SL • (7.17) then the ..J2 factor insures that the energy ill Pl{t) is the same as the energy of Sl(t). If we apply the minimum-distance criterion directly to a real-valued passband received signal y (t), then the receiver calculates the decision variables 00 00 = f = f ci y(t)PI(t)dt y(t)..J2·Re{s/(t)e- j y, S, <-> a, . (7.21) Applying the criterion of (7.9), the slicer chooses the transmitted signal til according to min I IIY-S, ll 2 =min I ly -al I2 . (7.22) In (7.22), the first tt· tt 2 is a signal-space squared norm, and the second I· r2 is the squared modulus of a complex number, which is a nonn in one-dimensional complex Euclidean space. The slicer thus calculates the distance from the received signal y in the complex plane, and chooses the signal constellation point that is closest. An equivalent form of the slicer criterion is (7.10), m;x [2'Relyat} - I al 12} (7.23) Example 7-8. Consider PSK, where the data symbols are of the fonn am = eJ9... It is also convenient to write the received signal in polar fonn, y = AeJlt. In this case, the I a/ 12 tenn is unity for all 1, and hence can be ignored because it is independent of t. The criterion of (7.23) becomes max Re{ ye - je, } = max A cos{9 - 9/) (7.24) I I The receiver thus chooses the angle of the data symbol that is closest to the angle of the received signal, and ignores the magnitude of the received signal. This is obvious from the geometry of the signals in the complex plane. The resulting derision regions are shown in Figure 7-5a for a 4-PSK constellation. 0 Example '-9. In a QAM signal constellation, the real and imaginary parts of the data symbol are independently modulate.(J. resulting in a rectangular constellation. If Q is a single slicer input sample (consistent with the notation of Chapter 6), the minimum-distance criterion is Ii]. mlin IQ - aj [2 =m/in [(Ref Q } -Ref Of })2 +(lm{ Q } -lm{ a/ (7.25) Since both tenns are positive, and Ref a/ J and Im{ a, } are chosen independently at the transmitter, the sum can be minimized by minimizing the renns individually. Thus, the 288 SJGNAl and RECEIVER DESIGN complex slicer reduces to two real-valued slicers, one for choosing Re! 0/ } and the other for chosing 1m {a/ }. The decision regions are therefore rectangular. An example is shown in Figure 7-50 for 16-QAM. 0 Minimum Dlstance As pointed out in Se<:tion 7.1.1, the minimum distance between all pairs of signals in the known signal set is a characterization of the noise immunity of that signal set. The minimum distance for slicer design is given by d min =amin ' a min = ;.'..io. 1aj - OJ 1 . (7.26) I *J The quantity a min is the minimum distance between signal constellation points. This suggests that the constellation should be designed to maximize this minimum dis- tance. This point will be reinfor<:ed in Chapter 8, where the actual error probability is considered. 7.2.2. Isolated Pulse Reception for PAM: Matched Filter In Chapter 6 we considered pulse-amplitude modulation, and emphasized the receive filtering that achieved the Nyquist criterion. Let us temporarily ignore intersymbol interference (lSI) by considering only the re<:epti.on of a single pulse. Later we will succeed in extending this receiver structure to counter lSI. For an isolated pulse, the received signal will be of the form y (1) = a1h (t) + e (t) (7.27) where a/ is the received single data symbol as in Section 7.2.1, h(t) is the received pulse shape, and e (t) is some unknown error or noise. In this case, the signal set is one-dimensional, and hence we can choose a single basis signal $(t) = h(t)!ah (728) where a; is the energy of the pulse h (t). In the baseband case, aU the quantities in (7.27) are real-valued, and in the passband (complex-baseband) case they are complex valued. 0 0 0 0 " 0 0 0 0 0 0 0 0 0 0 0 (b) Figure 7-5. The minimum-distance decision regions for the constellations in Figure 6-24. SEC. 7.2 SPECIFIC MODULATION TECHNIQUES 289 In order to apply our geometric results, y (t), h (t), and e (t) all have to be square integrable. Associating Y<->y(t), 81 <->alcrhcl>(t) , (7.29) the receiver criterion of (7.10) becomes j j max [2'Re{ y(t)atcrhcl>*(t)dt} - lalcrhcl>(t)12J l -- -- J If we define the decision variable J = c y(t)cl>*(t)dt , (7.30) (7.31) which is the correlation between the received signal~ and a normalized version of the known pulse shape, then the criterion becomes 2 2) m;x [ 2·Re{ crh cal* } - crh Ia l l , (7.32) which is equivalent to the criterion min IC-crhaI12. l (7.33) Equation (7.33) is equivalent to the minimum distance slicer design for the data sym- bol crh al with input sample c. Two versions of this receiver structure are shown in Figure 7-6, both of which were displayed earlier in Chapter 6. The correlation receiver is shown in Figure 7-6a, and the matched filter receiver is shown in Figure 7-6b. The matched filter follows from the equivalence of (7.31) to C =y(t)*cl>*(-t) 11=0' (7.34) The matched filter has transfer function H*U OO)/crh' and hence does a perfect phase equalization. The Fourier transform of the pulse at the output of the matched filter is IH U(0) 12/cr;, which is non-negative real and has zero phase at all frequencies. It is useful to define a sampled (at the symbol rate) autocorrelation function for the received pulse h(t), J ph(k)= h(t)h*(t -kT)dt. (7.35) Since Ph (k) is a sampled version of a pulse with Fourier transform 1H U(0) 12, its discrete-time Fourier transform is 290 SIGNAL and RECEIVER DESIGN , ( I ) 1f ~_J_ (a) v H ~ v SI l_ ~ ~ £ l t _a _ 1 ell*(t) 9 .·(-/) y(/) F~+~*9L...-Sa_L:_C:_,R_ (b) MATCHED FILTER t =0 Figure 7-6. The mlnlmum-distance receiver tor an isolated pulse PAM. (a) The correlator realization, and (b) the matched filter realization. The minimum-distance receiver consists of a correlator, or equivalently matched filter and sampler, followed by a slicer. i Sh(e}OOT)= Ph(k)e- jCJlkT k =--00 =__ =1.- L 1H U(00 + m' 21t ) 12 Tm T (7.36) Sh (ej 0, and a; is the pulse energy. Calculating the pulse autocorrelation function directly, we get pb(k)=a;'a 1kl , a=e- aT , (7.37) and the folded spectrum is easily shown to be SEC. 7.2 SPECtFlC MODUlATION TECHNIQUES 291 0;(1- 0.2) S (z) - " - {l - W-IX! - W} (7.38) This is a first-order one-pole rational function, with the obligatory second pole at a conjugate-reciprocal location, which forces the folded spectrum to be real-valued. 0 Example 7-11. Let ho(t ) be a pulse shape that is orthogonal to its translates by kT. Also assume the energy of ho(t} is oJ. Let the actual pulse shape be h(t)= ho(t) + wo{t - T). Then the autocorrelation of this pulse is ( ... 0, aoJ, (I + ( 2)oJ. aoJ, 0, folde.d spectrum is S" (z)= oJ(az +(1 +ti)+ az-I }=a6(1 + oz-l)(l + az) . (739) } and the (7.40) This is a first-order all-zero rational function, again with the second conjugate-reciprocal zero that forces the folded spectrum to be re.al-value.d. The pulse energy is 01 = oJ (1 + 0.2). 0 The Nyquist criterion applied t& the output of the matched filter becomes ok> ph(k) = Ph (0) Sh(e jroT )=Ph(O)=a1· (7.41) The pulses in Example 7-10 or Example 7-11 do not satisfy this Nyquist criterion, except fOT ex =O. The matched filter does not change the minimum bandwidth required for p~ since the revise.d Nyquist criterion of (7.41) still requiTes a minimum pulse bandwidth of nIT radians/sec (lI2T Hz). Mtntmum Distance The distance between signals ai h (t) and aj h (t) is (7.42) or (7.43) where amin is the minimum distance for the signal constellation. Keeping the signals far apart (improving the noise immunity) is thus equivalent to keeping the minimum distance for the signal constellation large. Not surprisingly, this minimum distance also increases as the pulse energy increases. Only the isolated pulse energy is relevant t& the minimum distance, not the shape or other properties of the pulse. 7.2.3. Orthogonal Muitipuise Modulation at, For orthogonal multipulse signaling, the signal set consists of N orthogonal pulses, each with the same energy and the received signal is = y (1) ah 4>l (t) + e (1) , (7.44) 292 SIGNAL and RECEJVER DESIGN where {n (t), I ~ n ~ N} is a set of N orthononnal wavefonns, and e (t) is some unknown error or noise signal, assumed to be finite-energy. As shown in Section 7.1.3, the minimum distance receiver fonns the set of N decision variables (7.45) The receiver then calculates the minimum N -dimensional Euclidean distance between a vector c (whose components are the en's), and the signal vector 81 = [0,0, ... ,CJh ,0, ... ,0] where the CJh is in the I-th position. The minimumdistance criterion recast in Euclidean space is thus mr [1l~1IcnI2+ ICI-CJhI2] =m;n L~,1c·12_2"•.Re{c,}+"l] ·(7.46) n *1 Clearly this is equivalent to the criterion max Ret cl } I (7.47) The minimum-distance receiver thus correlates the received signal against each of the orthononnal wavefonns and chooses the maximum real part of the result. The struc- ture of this receiver is shown in Figure 7-7. C1 y(t) J Re( } ljl:(f) {;2 y(t) J Re( } ~(t) SE1.ECT U~ CN y(t) f Re{ } cp;(t) FJgure 7·7. An isolated-pulse mrrelation f~ver for orthogonal mu!tipulse transmission. SEC. 7.2 SPECIFIC MODULATION TECHNIQUES 293 Minimum Distance For orthogonal multipulse, all pairs of signals are equidistant, so the minimum distance is the same as the distance between any pair of distinct signals, d min = ficrh' There are N -1 signals at the minimum distance. 7.2.4. Combined PAM and Multipulse fn the case of combined PAM and multipulse, the received signal corresponding to one symbol interval (although this signal is not necessarily time-limited to this interval) is N y(t)= L an,lcrhn(t)+e(t), (7.48) n=l where tn (t), 1 :s; n :s; N} is a set of orthonormal waveforms, 1 :s; 1 :s; L is an index specifying which signal is transmitted, and e (t) is- a finite-energy unknown error or noise. This is the superposition of a set of orthogonal waveforms, each with the same energy cr; and amplitude modulated by data symbols (an,l. 1 S n SN). This is simi- lar to the general representation for a finite signal set of (7.6). When the N data symbols are chosen independently from a constellation of size M, then L = M N • As shown in Section 7.1.1, the receiver first calculates a set of N decision variables (7.49) and then minimizes the norm in N -dimensional Euclidean space, . mm LN 2 k n - crhan,ll 1 n=l The structure of this receiver is illustrated in Figure 7-&. (7.50) Minimum Distance If we choose two arbitrary vectors of data symbols {an,i' 1 S n SN} and {an.j' 1 :s; n :s; N }, then it is simple to verify that the distance between the corresponding signals, due to the. orthononnality of the basis vectors, is d 2 = crh2 LN lan,i - 2 an,j I . (7.51) n=l '* The minimum distance occurs when an,i an,j for only one value of n, and thus the same as for PAM. dmin = crhamin, (7.52) 294 SIGNAL and RECEIVER DESIGN y(t) y(t) y(t) MINIMUM EUCLIDEAN DISTANCE Figure 7-8. The minimum-distance receiver design for orthogonal multipulse combined with PAM. 7.3. PAM WITH INTERSYMBOL INTERFERENCE Thus far in this chapter we have considered isolated pulses. Conceptually, the minimum-distance receiver design can be used when lSI is present as well. For this case, the received signal is K y(t)= L akh(t -kT)+e(t), k=l (7.53) where h (t) is the received pulse shape, and we make no assumptions about h (t) being crl time-limited or satisfying the Nyquist criterion. However, we do assume that h (t) has finite energy and that e (t) is a finite-energy unknown error or noise. While we have previously considered the set of signals within one signal interval, in (7.53) we consider the set of all signal sequences of length K, {at , 1 :5;; k :5;; K }. Thus, if each data symbol comes from an alphabet of size M, the entire set of signals in (7.53) has size L =M K . By choosing a finite sequence of K symbols, we ensure that every sig- nal in the set of known signals has finite energy. This illustrates the generality of the earlier formulation; namely, it can apply to multiple PAM pulses by re-interpreting the concept of "signal" to include the entire sequence of PAM pulses amplitude-modulated by the entire sequence of data symbols {ak , 1 :5;; k :5;; K }. Although the dimensionality of the signal set of (7.53) is K, expanding this signal in orthonormal functions as was done in the last section is not too useful for lSI. We follow an alternative approach here. SEC. 7.3 PAM WITH INTERSYMBOL INTERFERENCE 295 7.3.1. Receiver Design Applying the criterion of (7.10), the receiver chooses the sequence of data symbols that satisfies max J 2·Re{ LK UkaK k*} - K L L aka; Ph(m -k) {ak,l~k~K} [ k=1 k=lm=1 (7.54) where J Uk = y(t)h*(t -kT)dt , (7.55) and Ph (k) is the pulse autocorrelation function defined in (7.35), where Ph (0) = crl The samples {Uk} are the sampled output of a filter matched to h (t), as in the isolated pulse case, except that now the output is sampled afthe symbol rate t =kT rather than just once at t = o. This filter and sampler are known collectively as the sampled matched filter, and are illustrated in Figure 7-9. The important feature of this minimum-distance receiver is that the continuous-time received signal y (t) is turned into a discrete-time received signal uk, where the sampling rate is equal to the symbol rate. That discrete-time representation of the received signal is then further processed to make a decision on the entire sequence of symbols {ak ' 1 ~ k ~ K }. Two observations are in order. First, maximizing (7.54) requires repeating the distance calculation for all M K possible symbol sequences {ak ' 1 ~ k ~ K }. Thus, the receiver detects the symbols all at once, instead of doing a symbol-by-symbol detection, which was a major goal in the intuitive receiver design in Chapter 6. It considers all possible sequences of data symbols in order to consider all possible lSI conditions. Second, the receiver designs in Chapter 6 arbitrarily choose a receive filter to eliminate lSI at the slicer input. The minimum-distance criterion chooses a different receive filter that does not satisfy the Nyquist criterion, except in the degenerate case where Ph (k) = cr;'0k. It then compensates for the resulting lSI in a completely different fashion. We saw in Chapter 6 that the Nyquist criterion stipulates that the bandwidth of a PAM modulated signal be at least half the symbol rate. Thus, the sampling theorem would dictate a sampling rate greater than the symbol rate. The minimum-distance receiver design introduces aliasing in the symbol-rate sampling operation. We did the MATCHED FILTER SAMPLER Figure 7-9. The sampled-matched-filter receiver front end consists of a matched filter fol· lowed by a symbol-rate sampler. 296 SIGNAL and RECEIVER DESIGN same thing in Chapter 6; that is, we chose symbol-rate sampling in order to feed the slicer one sample per symbol. We will see in Chapter 10 that it is common to choose a sampling rate higher than the symbol rate in practice, to address practical concerns. The minimum-distance receiver design also has the practical problem of high complexity, as manifested by a set Qf known signals that gmws in size exponentially in K. This is not practical to implement in this form; fortunately, we will find a lower-complexity algorithm, called the Viterbi algorithm, in Chapter 9. Furthermore, in Chapter 10, simpler alternative receiver structures based on equalization will be considered. 7.3.2. Equivalent Discrete-Time Criterion A basic result in Section 2.5, (2.55), provides a factorization of a rational transfer function that is non-negative real on the unit circle. This spectral factorization will now prove useful in deriving a basic minimum-distance receiver structure for lSI. We fust consider the special case of no lSI, where this spectral factorization is not needed, and then extend to the general case. Special Case: Orthogonal Pulses When the successive pulses are orthogonal, or equivalently the pulse shape at the output of the matched filter satisfies the Nyquist criterion (Ph (k) = af·Ok), (7.54) reduces to max {ak' 15.k 5.K} (7.56) This criterion is equivalent to K min L ~uk -RECIJRSOR EQUALIZER EUCLIDEAN NORM Figure 7-11. A minimum-distance receiver for PAM with 1St This structUfi5 is a gern:traHzation of Figure 7-10, in that the continuous-time minimum distance criterion is transformed into a discrete-time minimum distance criterion. Unlike Figure 7-10, it appfies even In the pr~oflSl matched filter. It is called a "precursor equalizer" because it eliminates the "anticausal" or "precursor" response of the channel and sampled matched filter. This terminology will be explained further in Chapter 10. The second signal used in the discrete-time Euclidean norm is a filtered version of the candidate data-symbol sequence. The filter in this path is an equivalent dL"Crete-time model for the response of the transmit filter, channel, matched filter, and precursor equalizer to the input data symbols, as we will see. The Euclidean distance between precursor equalizer output and the filtered version of the candidate datasymbol sequences is calculated for an possible sequences of K data symbols. The sequence that minimizes that discrete-time Euclidean distance is chosen. Th.e Euclidean distance must be recalculated many times, once for each allowable *" sequence of K data symbols. In the presence of lSI (Gh (z) 1) it never reduces to a symbol-by-symbol detection as in Figure 7-lOb. As in Figure 7-lOa, the Euclidean distance should be calculated only for feasible sequences of data symbols, reflecting any redundancy built into the coder at the transmitter. Since Gh (z) is minimum-phase, G h"'tc(lIz~) by Hrillpass(z)lA~G·(lIz·}and replacing Gh(z) by Hallpass(z)Gh(z). This replacement has no effect on the data-symbol sequence chosen by the receiver. If this aHpass filter has all poles inside the unit circle (and hence all zeros outside), then it is a stahle causal filter, and does not destroy the causality of the channel model in (7.68) either. Effectively, this changes the discrete-time channel model from minimum-phase to non-minimum-phase<. While this change would not ttppear to be hannful, it is shown in Problem 2-22 that a causal minimum-phase sequence h$ the property that, among all sequences with the same Fourier transform magnitude, it is maximally concen- trated near zero delay. Thus, in this sense the impulse response of the minimum- phase channel model has minimum intersymbol interference, among all impulse responses with the same Fourie.r transform magnitude. Stating this another way, Problem 2-23 shows that the impulse response of the filter Gh (z) is more concentrated near the origin than the impulse response of the modified channel model H allpass(z )Gh (z), and thus has less lSI. While this property does not affect the minimum-distance receiver, in the sense that it chooses the same data-symbol sequence, it is very impor- tant for another practically important receiver structure based on a similar decomposi- tion (the decision-feedback precursor equalizer of Chapter 10). Minimum Distance The minimum distance between known signals was asserted in Section 7.2 to be a measure of the noise immunity of the modulation technique. This minimum distance for PAM with lSI is obtained by considering two different sequences of data symbols {cit, 1 ~ k :5 K} and {cit, I :5 k :5 K }. Letting Ek be the difference between these two sequences, €k; = cik - ak' 1 ~ k :5 K , (7.70) then it is simple to show that the distance squared between the two signals is KK d2 =LL Ei EjPhU-i). i=lj=l (7.71) Substituting from (1.59), this distance can be expressed as a discrete-time distance L L 2 2'" d =Ah . I K €igh.k-i I2 . (7.72) k=l i=l The minimum distance d~ is the minimization of (7.72) over all non-zero difference sequences {Ek' 1 ~ k ~ K }. This minimization problem will be considered further in Chapter &. 304 SIGNAL and RECEIVER DESIGN 7.4. BANDWIDrH and SIGNAL DIMENSIONALITY This chapter has introduced the important concept of the dimensionality of the subspace spanned by the set of known signals. In this section, we develop additional insight into the relationship between this dimensionality and the bandwidth required to accommodate this set of signals. One important property of a signal design is the spectral efficiency (defined in (6.7)). Chapter 6 also derived a generalized Nyquist criterion, which established the minimum bandwidth required to eliminate lSI at a matched filter output for N orthogonal pulses. A set of pulse waveforms hn (t), 1 ::;; n ::;; N satisfies the generalized Nyquist criterion if *" or in words hn (t) is orthogonal to its own translates by multiples of the symbol inter- val T and also to all translates of hm (t) for m n. When (7.73) is satisfied, we can successfully separate out the orthogonal signals using a bank of matched filters at the receiver, and we can also guarantee no lSI. It was shown in Chapter 6 that (7.73) can be satisfied if and only if the collective bandwidth of the pulses is at least N ·1tIT radians. In this section, we will understand this result better, by examining it from a different perspective. In Section 7.1 we defined the subspace of signals spanned by a set of L known signals, and observed that this subspace is finite dimensional (with dimension N). Our goal is to understand better the relationship between the bandwidth of the signals and the dimension of the subspace. 7.4.1. Landau-Pollak Theorem No signal can be both time limited and bandlimited. A bandlimited signal is not time limited, in the sense that its energy cannot be totally confined to any finite interval of time, and a time-limited function is not bandlimited, in the sense that its energy cannot be totally confined to a finite band of frequencies. However, it is possible for functions to be bandlimited and approximately time limited, or time limited and aJ. approximately bandlimited. For example, consider a function f (t) that is causal and bandlimited to B Hz, and also has finite energy Then f (t) never goes precisely to zero beyond any fixed time to, but because it has finite energy it will decay gradually to zero. One way to measure the rate of that decay is to calculate the fraction E(t0) of its energy outside the interval [0, to], where E(to) < 1. Specifically, let to f aJ- = If (t) 12 dt (1- E(to)) . o (7.74) Since a fraction E(t0) of the energy is outside the interval, a fraction 1 - E(t0) is within the interval. For a causal finite-energy function f (t), as t 0 ~ 00, E(t0) ~ O. If we define the signal to be approximately time limited to an interval when less than a specific fraction E of its energy is outside that interval, then we can always choose a SEC. 7.4 BANDWIDTH and SIGNAL DIMENSIONALITY 305 large enough interval that the signal is approximately time limited. Although the signal space of all finite-energy signals is infinite-dimensional, it is also true that the subset of such signals that are bandlimited to B Hz and approximately time limited to [0, to} is approximately finite dimensional, with dimension 18t£}+ L Thi.s statement is made rigorous by the Landau-Pollak theorem [1}. i. Theorem. There exists a set of 1St0 + 1 orthononnal wavefonns crJ-(l- E), fr there exists a set of 28t 0 + 1 coefficients f i such that (7.75) J 2810 If(t)- L!i4>i(t)1 2 dt < 12crJE (7.76) i=O o We can state this theorem in words as follows. If less than a fraction E of a bandlimited signal's energy is outside an interval [O,to], then that signal can be approximated by a linear combination of a set of 2Bt0 + I orthonormal waveforms with an error which has energy less than a fraction 12e of the signal's energy. Thus, the dimensionality of the subspace of all signals approximately time limited to t{} and bandlim- ited to B is approximately 18to + I, in the sense that a small fraction of the signal's energy is outside a signal subspace of dimension 2Bt0 + 1. As to increases, the frac- tion of energy outside this subspace (which is also growing in dimensionality) gets smaller. 7.4.2. Relation to the Generalized Nyquist Criterion In the generalized Nyquist criterion, we made no attempt to time-limit the pulse waveforms hn(t) to the symbol interval T. Thus, the Landau-Pollak: theorem does not apply directly. However, the generalized Nyquist criterion and the Landau-Pollak theorem are connected, and consistent with one another, as we now show. The key to forming the connection is to consider a sequence of K transmitted symbols. Suppose hn (t), 1 ~ n ~ N is a set of pulses baOOlimited to B Hz that satisfy the generalized Nyquist criterion. Generate a PAM plus orthogonal multipulse signal consisting of K symbols, K-l N s(t)= ~ ~Ak,nhn(t-kT). k=O n=l (7.77) Since s (t) is a linear combination of NK orthogonal waveforms hn (t - kT), 1 ~ n ~ N, 0 ~ k ~ K -1, it lies in an NK -dimensional subspace of signal space. It is also easy to show (see Problem 7-12) that under very mild conditions, s (t) is approximately time limited to [0, KT} in the sense that the fraction of the energy of 306 SIGNAL and RECEIVER DESIGN s (t) outside this interval goes to zero as K ~ 00. Thus, the Landau-Pollak theorem tells us that s (t) can be approximated by 2BKT + 1 orthonormal functions, with increasing accuracy as K -+ 00. This means that this dimensionality must be at least the actual dimensionality NK, 2BKT +. 1-> NK , B > - 2NKK-T1 ' (7.78) As K ~ 00, the Landau-Pollak theorem implies that the bandwidth required is B ~ N I2T Hz. Since trus equals N·rr/[' radians/sec, this is consistent with the generalized Nyquist criterion. 7.4.3. Impact of SignaJ Bandwidth on the Isolated Pu~se One impact of the Landau-Pollak theorem is that the parameter 2Bt0' the 80called time-bandwidth product, plays an important role for signals that are approximately time limited and bandlimited. For a bandlimited signal with bandwidth B , as 2Bt 0 increases, a couple of things happen: • The fraction of the signal energy confined to an appropriate time interval of duration to will increase. • The fraction of the signal energy falling outside a 2Bt0 + 1 dimensional subspace .of signal space will decrease. When 2Bt0 is small, the notion of a puise being confined to an interval of duration to is crude at besL However, as 2B10 gets large, we can design bandlimited pulses that are, for all practical purposes, confined to the interval of duration to. The Landau-Pollak theorem considers a waveform with bandwidth Band requires us to find a sufficiently large time limit t.o such that most of the energy of the waveform lies within [0, to). An alternative approach is to hold to fixed and increase the bandwidth B, allowing the waveform to be increasingly confined to [0, to]. The dual notions of increasing the bandwidth or the time interval both arise in digital commullication. Example 7-15. In spread spectrum, a single pulse h (t) is amplitude modulated for each data symbol. The Nyquist criterion says that a bandwidth of yJ[ is required if lSI is to be avoided. In fact, in spread spectrum the bandwidth B is much larger (often hundreds of times), so that 2BT is very large. In this case, it is possible to make the pulse h (t) nearly time limited to the symbol interval T. This implies in tum that lSI is rarely a practical issue in spread spectrum systems. In fact, cOUDteJ'ing or avoiding lSI is often a motivation for using spread ~pec trum; the essential property is that the time-bandwidth product is very large. This issue is addressed further in Chapter 8. [] Example 7-16. In orthogonal multipulse modulation (for example FSK), one of N orthogonal pulses is transmitted for each symbol. A side effect is that the minimum bandwidth require~~* } > d·~,J. (7.79) where U is a unit vector in the direction of (Sj - Si)' 7-2. (a) Give an example of a pulse h (t) with time-duration that is exactly two symbol periods (2T) (and hence it is not bandlimited) and obeys the Nyquist criterion at the output of a matched filter, p" (k) = aNi1 • (b) Repeat a. for three symbol periods (3T). 7-3. (a) Show that the pulse autocorrelation obeys the symmetry relation p" (k) = p;( - k). (b) Show that the folded spectrum is non-negative real valued on the unit circle. 7-4. Define - s".+(z) = L p,,(k)z-l , 1=0 and show that the folded spectrum is (7.80) Sit (z) = S".+(z) + S,,",+(lIZ) - p" (0) . (7.81) This gives a convenient way to calculate the folded spectrum. 7-5. Generalize Example 7-11 as follows. Let ho(t) be a complex-valued pulse shape that has oj energy and is orthogonal to all its translates by multiples of the symbol interval T. Let F (z ) = L h z-l be a general K -th order FIR filter, and define a pulse shape 1=0 K h(t)= r.hho(t -kT). 1=0 (7.82) (a) Show that (7,83) and hence that Sit (z) = aJ F (z )F"(lIz") . (7.84) (b) What is the pulse energy? 7-6. Describe the operation of the minimum-distance receiver of Figure 7-lOa for the following transmitter coder strategy: Four bits of information are transmitted as three successive symbols chosen from the alphabet {±I,O}. Only 16 possible combinations of three successive symbols out of 33 = 24 are used. How many sequences of data symbols must the receiver consider? 7-7. Show that the minimum-distance receiver of Figure 7-11 reduces to that of Figure 7-10 when there is no lSI. 7-8. In this problem, we will derive an alternative geometrical interpretation, due to John Barry, of the receiver structure in Figure 7-11. Let M" be the subspace of signal space spanned by the K translates of the known signal pulses, (h (t - kT), 1 S k S K }. (a) Show that for any signal XEM", where x (t) <-> X, when x (t) is input to the top arm of Figure 7-11, 309 J :r. IIXII 2 = Ix(t)1 2 dt = Iwl l 2 . 1=1 (7.85) Thus, for input continuous-time signals restricted to Mil , the filter is norm-preserving. (b) Show that for any input signal Y <-> yet), and any set of vectors (XiEMd, minimizing II Y - Xi 11 2 over i is the same as minimizing II Y /< - Xi 11 2 over i, where Yh is the projection of Y on Mil' Thus, since the minimum-distance receiver wishes to determine the distance between the received signal Y and a set of known signals, where the latter are all in AlII, it suffices to first determine the projection Y /<, and then calculate the distances. (c) Show that the discrete-time response to continuous-time input X = Y - YII is identically zero. (d) Show that if Xi EMil and the responses of the filter to Y and Xi are wy,l and Wi,l, then = +L lIY-K.1l2 llY-Y... 1I 2 IWy ,.t-Wi,k12. 1=0 (7.86} This basic result proves that minimizing the discrete-time distance over i is equivalent to minimizing the continuous-time distance. (e) Using the result of (d), interpret the minimum-dL.t N (t). In particular. = = Rz ('t) E [Z (t + 't)Z·(t)] eia>.'tRN('t) • (8.14) (8. IS) Z(n is wide-sense stationary. but it is neither strictly stationary nor circularly symmetric. That Z (t) is ncn-stationary is not surprising. since at certain times (the- zero-crossings of the carrier) the real part of Z (t) is identically zero. and similarly for the imaginary part. That Z (t) is wide-sense stationary in spite of not being strictly stationary is perhaps surprising to those accustomed to real-valued Gaussian processes. 0 Discrete-Time Gaussian Precesses All the properties we have described carry over to discrete-time zero-mean Gaussian random processes. In particu1ar~ such a complex-valued process Zk is fully characterized by Rz(m) and Rz(m). By definition, it is circularly symmetric if E [Zk+mZkl = 0 ~ for all m and k . (8.16) If Zk = Z (kT) is obtained by sampling a circularly symmetric continuous-time process~ it will itself be circularly symmetric. If Zk is circularly symmetric, its real and imaginary parts have the same variance, and are independent at a given time t. Further, the real and imaginary parts are statistically independent for all time if and only if the autocorrelation Rz (k) is real-valued. As in continuous time, circular symmetry is preserved by linear time-invariant filtering. and more generally by linear operations. Whrte Gaussian Processes An important subclass of zero-mean complex Gaussian processes are white. Such processes have an autocorrelation function Rz('t) = No·o('t), Rz (7c)=Zcr2'Ok' (8.17) for continuous and discrete time respectively. The convention is that cr2 is the vari- ance of the real part or the imaginary part, so that zcJl is the variance of the complex process. For real-valued processes, in continuous time the white property implies that Z (t + 't) and Z (t) are uncorrelated and hence independent for all 't:;l!: O. and for 316 NOISE discrete time, Z (k + m) and Z (k) are uncorrelated and hence independent for all m i'0. A white complex-valued Gaussian process is not necessarily strict-sense stationary. However, if the process is both white and circularly symmetric, then the fonowing properties hold: • The real and imaginary parts {)f the process are identically distributed, and are each white real-valued Gaussian processes. • The real and imaginary parts are independent ofone another, since the autocorrelation function is real-valued. Thus, circularly symmetric zero-mean white complex Gaussian processes are maximally random, in the sense that (a) the samples of the process are mutually independent and (b) the real and imaginary parts are independent. Another important observation is that any Gaussian process obtained by a timeinvariant linear filtering of a circularly symmetric white Gaussian process is itself circularly symme.tric, ~though in general it will not be white (unless the filter is allpass). 8.2. FUNDAMENTAL RESULTS We will now calculate the probability of error for a particular N-dimensional complex Euclidean formulation of the minimum-distance receiver design for a set of known signals in Gaussian noise. It will turn out that this fonnulation is general enough to cover all the cases of interest in the remainder of this chapter. Gausslan Noise Vectors Let Zl = [Z 1>Z2' ... ,ZN] be a complex-valued zero-mean Gaussian vector (where Z' denotes the matrix transpose ofZ). In the sequel we will assume that Z has several simplifying properties: = • The components of Z are uncorrelated, that is, E {Zj Z/l 0 for i ;I:. j. • The components of Z are cin.."\tlarly symmetric (Section 8.1), or EfZiZj] =0 for 1 ~ i,j ~ N, This plus the uncorrelated property implies that the components of Z are mutually independent, and further that the real and imaginary parts of each component are independent. • The components of Z are identically distributed, that is E[1211 12] =2cJl for l~n~N. We also need to consider the real-valued Z case. In this case, Z cannot be circularly symmetric, but the uncorrelated assumption by itself implies that the components of Z are independent. The identically distributed assumption implies that the real and imaginary components of Z have the same variance cJl. Let a complex random variable C be defined as SEC. 8.2 FUNDAMENTAL RESULTS 317 C = =Z'e* (8.18) where e is a unit-magnitude vector, i.e. II ell = 1. Then clearly C is Gaussian, since it is a linear combination of Gaussian random variables. Further, it is circularly symmetric (E [C 2] = 0), since it is a linear function of a circularly symmetric Gaussian vector. This implies that Re{ C } and Im{ C } are identically distributed and independent. To determine the statistics of C , all we have to determine is its variance. Calculating this directly, E[ICI 2] =E[1N: 1N: ZjZk*eje*d j=1 k=1 N N = 1: E [ IZj 12] I ej 12 = 2ei 1: Iej 2 1 = 2ei . j=1 j=1 (8.19) Thus C has the same variance as the components of Z, 2c?, and as a result, the real and imaginary parts of C each have variance (J2. This result can also be explained intuitively, since is the projection of Z on the span of a unit-magnitude vector e, or the component of Z in the direction of unit-vector e. Since Z has the same variance in each of its components, it stands to reason that the variance of the component of Z in any direction has the same variance, not just in the direction of the principal axes. Vector-Valued Signal in Vector-Valued Noise Consider a received signal that is an N -dimensional complex vector, consisting of a known signal vector and an additive complex Gaussian noise vector, y = 8m + Z , (8.20) = = where y' [Y l' Y2' ... ,YN] is the received signal, and 8m' [Sm,1' Sm,1' ... ,Sm,N] is drawn from a set of L known signals {SI , 1 ~ l ~ L}. Probability of Error Now suppose we apply the receiver design strategy of Chapter 7 to the received signal of (8.20). That is, we choose the signal that satisfies (8.21) What is the probability of error? We will first determine the probability that the received signal Y = Sm + Z is closer to 8j than it is to Sm for i ¢ m , or the probability of the event II Y - Sj 11 2 < II Y - Sm 11 2 . (8.22) Substituting for Y, (8.23) or 318 NOISE IIZII 2 + IISj -Sm 11 2 -2'Re{ } < IIZI1 2 Cancelling the II Z 11 2 tenn and dividing both sides by (8.24) dm,j = II Sj - Sm II , (8.25) the distance between 8m and Sj, we get equivalently Sj - Sm dm,j Re{ < Z, d . > } > -2- . m,l (8.26) The probability of event (8.26) is easily calculated, since the vector (Sj - Sm)/dm,j is a unit-length vector, and hence from (8.19) the left side of (8.26) is a Gaussian random variable with variance (J2. The probability of (8.26) is therefore Pr[ Y closer to Sj than Sm I Y = Sm + Z] = Q ( d2m~j ), (8.27) where Q (-) is the integral of the tail of the unit-variance Gaussian distribution, as defined in Chapter 3. U sing this result, we can determine the error probability for the case of two sig- nals (L =2). Example 8-3. If L = 2, and if Y = 81+ Z, then an error occurs if Y is closer to 82 than to 8 1, This occurs with probability Pr[ 82 chosen I Y =81 + Z ] =Q (d2-10,2) . (8.28) o Bounds on the Probability of Error The exact error probability for three or more signals can be difficult to calculate. since the minimum-distance decision boundary can be very complicated. We will illustrate special cases in Sections 8.3 and 8.4 where it is not too difficult. More generally, however, we can establish bounds on the error probability that are easy to apply. These bounds become tight as (J -t 0, and thus represent not only bounds, but also accurate approximations for small 0 (small error probability). Since most digital communication systems operate at low error probability, these bounds are very useful. The upper bound will be based on the union bound described in Section 3.1. For N events {En' 1 S n SN}, the union bound is N N Pr[ U En ] S L Pr[ En ] (8.29) n=1 n=1 Returning to the probability of error for the minimum-distance receiver design, suppose that SI is transmitted. We are interested in the probability that one of the other signals Sl ,2 S I SL is closer to the received signal Y than SI' If we define E1 as the event that Sl is closer to Y than SI' then SEC. 8.2 FUNDAMENTAl RESUlTS 31~ Since =L L Pr[SI not dosestto Y I Y =81 + Z] Pr( UE[] S 1: Pr[ E[] . 1=2 1=2 (8.30) Pr[ E[ ] = Pr[ Y closerto S[ than to SI I Y = SI + Z] = Q ( d2l~l ) , (8.31) we get Pr[SI not closestto Y I Y=SI +Z] S LL Q(d~l)l . [=2 ~(J (8.32) This is an upper bound on the probability oferror, conditional on SI being transmitted. It was shown in Chapter 3 that Q (.) is a very steep function of its argument for large arguments (corresponding to high SNR).·· This implies that the sum in (832) tends to be dominated by the tenn corre$pOnding to the smallest argument. Define d l,min as the smallest d 1) , 2 SiS L. Then the union bound of (8.32) can be approximated by Pr[YnotdosesttoSlIY=SI+Z]::; Kt"Qd(l ,m~in), (8.33) where K 1 is the number of signals that are distance d l,min away from SI' We can no longer assert that (8.33) is an upper bound, since we have thrown away positive tenns, making the right side smaller. However, for smaIl 0', (8.33) remains an accurate app.roximation. It is also intuitive that the error pmbabHity would be dominated by the signals that are closest to SI' since the nearest signals are the ones most likely to be confused with SI' L A lower bound on error probability can also be established. Since U E[ contains 1=2 Em for any 2 S m SL, we get that L dIm P r [ U E [ ] ~ P r [ E m ] = Q ( - 2 ') . (8.34) [=2 0' = Obviously, the bound is tightest when d I,m d I,min' since that will ma.xiInize the right side of (8.34). Thus, a lower bound is Pr[SI not closest to Y r Y = SI + Z] ~ Q("d12,main) . (8. 35) Together (8.33) and (8.35) establish, for small 0', an approximation to the error probability if SI is transmitted that is accurate within a factor of K l' This bound applies equally well for any other transmitted signal Sm with K 1 replaced by Km and d 1,min replaced by dm,rnin' where dm,min is the minimum distance from Sm to any other sig- nal, and K m is the number of signals at distance dm ,min' We are often interested in the overall probability of error Pe , defined as the pro- bability that the wrong signal is chosen by the minimum-distance criterion. To 320 NOISE calculate Pe' we must know {PI' 1 ~ l ~ L }, the set of probabilities of the L signals being transmitted. Then L Pe = I. PrISm not chosen 1Y =8m + Z]· Pm m=l (8.36) Substituting the union-bound approximation. Pecan be approximated as P ;:: e ~~L d· K. Q ( m ,mm Pm m 'J 2, the probability of symbol error can often be i:ali:ulated exactly, without resorting to the union bound, as will now be illustrated by SEC. 8.3 PERFORMANCE of PAM 325 a series of examples. Example 8-5. Consider the multilevel one-dimensional constellation shown in the following figure: e e l e e lm(Ak } .Re(Ak } -3a -a I a 3a If the transmitted symbol at time k is A k =-a , then the p.d.f. of Qk is shown in the follow- ing figure: (1 - 2Q(di2cr» \ . Q(d~n~) Re(A.k } -3a -a---..a 3a d The probability that the received sample Qk is closer to a symbol other than -a is equal to the area ofllie shaded regions. The shaded regions each have area Q(d/20) so Pr[symbol error at time k IA k = a] =2Q (d 120) . (8.56) On the other hand, Pr[symbol error at time k IAk = ± 3a] = Q (d /20) , (8.57) so if the symbols are equally likely at all times and independent then Pr[symbol error1= l.5Q (d /20) . The coefficient 1.5 is the average number of nearest neighbors. 0 Example 8-6. For the 4-PSK constellation of Figure 7-5a, the symbol -b can be mistaken for any of the other three if the noise is sufficiently large. The probability that it is mistaken for +jb , for example, is not pre.cisely equal to the probability that it is closer to +jb than to -b, because it might be closest to +b. Denote the transmitted symbol by Ak and the received signal by Qk = Ak + Zk' For mathematical convenience, we rotate the coordinates through 45 degrees and rescale, so that the points are {±b ±jb J as shown in Figure 8-1. For a correct decision, Figura 8-1. A rotated version of th& 4-PSK const~ation in Figure 7-5. The shaded region is the decision region corresponding to one of the symbols. 326 NOISE the rotated Qk' must lie in the shaded region in Figure 8-1. The statistics of the rotated noise are the same as the statistics of the non-rotated noise, since the noise is circularly symmetric. Then (8.59) is a complex-valued Gaussian noise with independent real and imaginary parts, each with ressed in terms of passband signals is thus needed. Receiver Structure Assume that Y (t) is a real-valued passband received signal. given by Y (t) = "2Re{ sm(t )e jcoct } + N (t), (8.74) where N (t) is real-valued white Gaussian noise with power spectral density No and Sm (t) is the received complex baseband signal drawn from a set of L known signals {s1(t ), I ~ I ~ L }. The known signals can be expressed in terms of set of N complex-valued orthonormal basis functions N = Sf {t} L Sl,n +n (t) . n=l (8.75) Since the set of known signals is assumed to be passband in nature, assume that each sl (t) and each $n (t) is bandlimited to less than roc radians/sec. As shown in Chapter 7, the minimum-distance receiver calculates the set of decision variables J00 en = Y(t y12e - jcoct iP:(t) dt, 1 ~ n ~ N , (8.76) and then chooses the signal with index 1 that satisfies the Euclidean-distance criterion 330 NOISE min (8.77) I Analyzing the perfonnance of this receiver requires a statistical characterization of the en , decision variables 1 :s;; n :s;; N . Statistics of the Decision Variables Substituting (K74) into (8.76), the decision variables are given in terms of the signal and noise as (8.78) where (8.79) 00 -J-= Zn N(t)-fie iCJlct $;(t)dt . (8.80) Exercise 8-1. Verify that the integral on the right side of (8.79) is in fact equal to Sm.n as defined by (8.75) when 1 =m. 0 The equivalent noise Zn is a complex-valued zero-mean Gaussian random process. Using precisely the same techniques as in Section 8.3, the orthogonality of the basis functions $n (t) implies that the noise samples are uncorrelated and have the same variance, E[Zj Z/l=2a2'Oj ,i' 1 :S;;i,j:S;;N, (8.81) = where a2 N Q, due to the unit energy of $n (t). Further, the fact that the $n (t) are bandlimited to Olc implies that the Zn are circularly symmetric, E[~Zil=O, l:S;;i,j:s;;N. (8.82) Together, these properties establish that the Zn are mutually independent and identically distributed, with identically distributed and independent real and imaginary parts. In fact, the decision variables of (8.78) represent a vector-valued received signal represented by C' =[C l' C 2' . . . ,eN1that is mathematically identical to (8.20), C = 8m + Z . (8.83) including the same statistics for the noise vector Z. Furthermore, the minimumdistance criterion of (8.77) is mathematically equivalent to (8.21), mm I IIC-SI 1l 2 . (8.84) SEC. 8.4 PERFORMANCE of MlNlMUM-DlSTANCE RECE~VERS 331 Therefore, the results of Section 8.2 apply directly to the minimum-distance, receiver design. All the work is already done! To summarize the earlier results, from the upper and lower bounds we conclude that for small a the probability of the minimum-distance receiver choosing a signal different from that transmitted is approximately = d· P K'Q( mm) e '2a' (8.85) where K is a constant and d min is the minimum-distance in the signal set. The parameter d min was already determined in Chapter 7 for several modulation techniques. In that chapter, it was argued that d min is a measure of the noise immunity. This is confirmed by {8.8S}, which shows further that it is the size of d min in relation to the noise standard deviation cr that matters most The coefficient C (which is the average number of nearest neighbors) is secondary. 8.4.2. Probability of Error tor Minimum-Distance Design Determining the probability of error for different signaling schemes is now a simple matter of substituting for d min in (8.85). In addition, as in the case of the slicer, we will show that the. exact probability of error can often be determined without resorting to the approximation of (8.85). It is difficult to use probability of error to compare different modulation schemes, because we typically want to keep the transmit signal powers the same, or the spectral efficiencies the same, or some similar constraint. Thus, we defer a comparison of the perfonnance of different modulation schemes to Section 8.1, where spectral and power efficiency are taken into account. rsorated Purse PAM with Matched Filter For the detection of PAM with an isolated pulse h (t), the minimum-distance cri- a; terion resulted in a matched filter foHowed by sampler and slicer, as in Figure 7-6. The resulting input to the sheer was the data symbol multiplied by ah' where is the energy in the received pulse h(t). The minimum distance is thus the same as the minimum distance of the data symbol alphabet, already considered in Se.ction 8.}, multiplied by crh' Calling the symbol alphabet minimum-distance a min' the error pro- bability is approximately aha . P :::c K.Q( mm). e 2cr (8.86) This is the same as the slicer design error probability considered earlier, but with a different scaling of signal level. Orthogonal Multipulse For orthogonal multipulse., each signal is of the form 8m = [0,0, ... ,ah ,0, ... ,0] (8.87) 332 NOISE where the non-zero term is in the m -th position. Thus, every signal is the same distance from every other signal, namely d = '12.ah' so the minimum distance is = d min ...J20 cr, where f Rei C I }{a) is the p.dJ. of a Gaussian random variable with mean ah and vari- ance f Re{ C I} ( The minimization is over all sequences of and that are not equal; that is, that differ for at least one k. Within a constant (which is due to the different normaliza- tion), this minimum distance is equal to the minimum distance c~c.ulated directly in continuous-time in Section 7.3. A practical algorithm for calculating the minimum- distance of (8.107) will be explained in Chapter 9. 8.6. SPREAD SPECTRUM Spread spectrum, a term applied to passband PAM when the bandwidth is chosen to be very much larger than the minimum dictated by the Nyquist criterion, was briefly introduced in Section 6.7. Spread-spectrum has a long history, mostly in secure military communications, as discussed by Scholtz [5]. More recently, a number of commercial applications have arisen, for example in digital cellular systems. A useful definition of spread-spectrum is E6}~ Spread-spectrum is a means of transmission in which the signal occupies a bandwidth in excess of the minimum necessary to send the information; the 338 NOISE band spread is accomplished by means of a code that is independent of the data, and a synchronized reception with the code at the receiver is used for de-spreading and subsequent data recovery. The spreading code used in this definition will be defined shortly. Bandwidth and Probability of Error Assume the channel noise is white and Gaussian. If we are to increase the bandwidth of the received pulse h (t), the question is whether this bandwidth expansion adversely affects the SNR at the slicer. Intuitively it might, because we must expand the bandwidth of the receive filter and let in more noise. This logic is valid with a simple lowpass filter in the receiver. However, with a matched filter, there is no relationship between bandwidth and slicer SNR. To show this, consider an isolated pulse input to the receiver, which consists of a matched filter and sampler. For received signal Ak h (t), the single signal sample at the matched filter output is AkaK, which has minimum-distance d min =aK amin' For white noise with spectral density No, the matched filter output noise has variance c? =aK N o. Thus the error probability is approximately = a 2 a . aa . Pe :::: K'Q ( h nun) K .Q ( h mm). 2ah -.IN0 2-.JN 0 (8.108) Pe depends on the energy of the received pulse, but not its bandwidth. An intuitive explanation of this bandwidth independence is as follows. By the Landau-Pollak theorem (Section 7.4), the space of received pulses bandlimited to B Hz and approximately time-limited to T sec (the symbol interval) has approximate dimension 2BT. By definition, spread spectrum corresponds to 2BT » 1, where this approximation becomes accurate. Since h (t) only occupies one dimension, the matched filter captures only a fraction 1f2BT of the total noise in bandwidth B. While the variance of this total noise is proportional to B , on net, the noise variance at the output of matched filter is not dependent on B . It is common in practice to characterize the signal-to-noise ratio (SNR) at the receiver input, in preference to the. energy per bit and noise power spectral density. A formula for Pe based SNR will also prove valuable in the next section. Assuming no lSI (translates h (t - kT) are mutually orthogonal), the received signal power, Ps , is equal to the energy per symbol (a1 aK) times the symbol rate 1fT, Ps =a1aKIT. (8.109) Furthermore, the total noise power within bandwidth B is 2NrfJ. The received SNR is defined as the ratio of signal power to noise power, SNR = Ps 12NrfJ . (8.110) Substituting into (8.108), Pe expressed in terms of SNR is Pe :::: K'Q (-.J 2BT . llA . SNR ) . (8.111) The quantity SEC. 8.6 SPREAD SPECTRUM 339 T\A =a~/4al (8.112) is a parameter of the signal constellation, and is independent of any scaling of that constellation. If we keep SNR constant, then Pe decreases as the dimensionality 2BT increases. However, in order to keep SNR constant for a fixed No, Ps has to be increased in proportion to B. If Psis kept fixed, then Pe is independent of B as stated earlier. The bandwidth independence of Pe for fixed signal power Ps presumes a matched filter in the receiver. We saw in Section 8.4.3 that the matched filter maximizes the SNR at the slicer, and thus a different receive filter will inevitably result in a lower SNR and higher Pe' With spread spectrum, the use of a different receive filter can be disastrous. As 2BT increases, there are an increasing number of waveforms that are bandlimited to B, approximately time limited to T, and orthogonal to the actual pulse h (t). If we happen to use a filter matched to one of these, the signal component at the receive filter output will be zero! this same observation also explains why spread spectrum has been used for the concealment of communications in military applications. Generating Broadband Pulses Spread spectrum requires ways to generate broadband pulses with controlled spectral properties. A whole family of pulse shapes h (t), each with the same ampli- tude spectrum and different phase spectra, is conveniently generated using a chip waveform and spreading sequence. In this approach, the symbol interval T is divided into N sub-intervals, each of duration Te =TIN. Within each sub-interval, a pulse- amplitude modulated time translate of a pulse he (t) is transmitted. The translate of he (t) is called a chip. The pulse h (t) is formed from a PAM modulation of the chips by some deterministic sequence {xm , 0 ::;; m ::;; N -I}, called the spreading sequence, N-I N-I . h(t)= L xmhe(t-mTe ), HUro)=HeUro) L xme-JO>mTc m =0 m =0 (8.113) The bandwidth of the resulting pulse will equal the bandwidth of he (t). Typically, we choose he (t) to satisfy the Nyquist criterion at pulse rate liTe' which requires a minimum bandwidth of B =1I2Te =N 12T Hz; this causes a bandwidth expansion by a factor of N =T ITe . The spectrum can be controlled to some degree by the spread- ing sequence, with precisely N degrees of freedom. The chip waveform and spreading sequence can also be used to generate orthogonal pulses (see Problem 8-11). For example, such orthogonal pulses are required for CDMA systems (Section 6.9). lSI and Spread Spectrum The preceding error probability calculation presumed no lSI. In fact, spread spectrum affords a degree of immunity to lSI. To understand this, we need to consider the pulse shape at the output of a receiver matched filter, and then the effect of channel dispersion. 340 NOISE Keeping the symbol interval T fixed, and increasing the bandwidth B , the lSI in the transmit pulse can be reduced. For a large 2BT, a pulse bandlimited to B can be largely confined to an interval T, in the sense that a diminishing fraction of the pulse energy falls outside that interval as 2BI increa...~s< (In fact, approximately 2BT orthogonal pulses can satisfy this condition simultaneously.) When 2BT is near unity, even a single pulse cannot come close to being time-limited to T. Thus, the transmit pulse can come much closer to being time-limited in a spread-spectrum system. However, the receive pulse is affected by the channel; furthermore, we are interested in the lSI at the matched filter output rather than the channel output. (Recall that the matched filter is crucial to the operation of a spread spectrum system because of its power to suppress in-band noise.) Assume for the moment that the channel is ideal. The isolated-pulse output of the matched filter is then the pulse auto- correlation function, Ph (t). The Fourier transform of that isolated pulse is IH Uco) 12, which by definition has a wide bandwidth B . Example 8-12. Suppose that IH (j co) 2 1 is constant over the bandwidth B and zero elsewhere. If we nor- malize the energy of h{t) to unity, then IH{jill}j2= 1/2B. The isolated pulse at the matched filter output is p,,(J)=sinc(21tBJ). {8.114) As B increases, the energy of this pulse concentrates in a shorter time duration. furthermore, if 2BT is an integer, Ph (t) always obeys the Nyquist criterion, pJ/(kT) = sinc(kIt'2BT) = 0" . (S.ll5) o This simple example illustrates two important points: • For an ideal channel, the isolated pulse at the output of the matched filter is dependent only on the magnitude spectrum of k{t), and is not dependent on the phase spectrum. Even though we have specified the magnitude spectrum in Example 8-12, there remains flexibility in choosing the phase. • The time duration of isolated pulse at the output of the matched filter can be much shorter than the symbol interval, even though h (t) completely fills the symbol interval. The greater B , the shorter this duration can be. Pulses of the form of (8.113) can be designed to have a narrow autocorrelation function. The pulse autocorrelation (matched filter output isolated pulse) will con- form to Example 8-12 if two sufficient (but not necessary) conditions are satisfied: • The chip pulse he (t) = he (t) sinc(1tt fTc), and is an ideal LPF with bandwidth B =lI2Tc' • The sequence {Xm } is chosen to satisfy 1 L 1 = N-l 'fJX!iF 2 Xm e - J < 1 m ={} for all co . (8.116) SEC. 8.6 SPREAD SPECTRUM 341 Example 8-13. = A trivial case is Xk ()k-L for some {}~L ~N-1. Regardless of L, (8.H6}is satisfied. The choice. of L affects the pha...-.e spectrum of h(t), but not the magnitude spectrum. The 01, problem with this choice is that the peak signal is very large in relation to creating prac- tical difficulties on most channels, and especially on radio channels. 0 01 Example 8-14. We can increase for a given peak signal by choosing ~xm ~2= liN, 05m 5N-1. For such choices, (8.116) cannot be exactly satisfied. However, a good approximate approach is to force (8.116) to be satisfied at uniformly spaced frequencies, with spacing 2rc/NTc , I I N 2 L xke-j21rmnIN =VN. {}~n~N-l. {8.H?} m =1 This is the condition that the DFf of {x",} have constant magnitude. 0 = It is possible to come very close to satisfying the two conditions that tXm [2 1 and the DFF have a constant magnitude, by making {xm } a maximal-length shift register sequence (this will be shown in Chapter 12). The result is called direct-sequence spread spectrum. Spreading codes can also be used to design orthogonal multipulse signal sets (see Problem 8-11). Unlike the pulse sets designed in Chapter 6, these can overlap one another completely in frequency. Having established a method of designing a broadband h (t) that has very narrow autocorrelation Ph {t}, the next question is the effect of channel dispersion. Assuming the channel has impulse response b (t), the output of the matc.he.d filte.r be.c.omes * * * h (t) b(t} h"'( - t} =Ph (t) b(t) . (8.118) As before, the phase spectrum of h (t) does not matter. The non-ideal channel * increases the time duration of the matched filter output However, since Ph (t) can be kept very narrow when B is large, Ph (t) b (t) will have time duration approximately equal to the duration of b (t). As long as the duration of b (t) is smaller than the symbol interval T, the channel dispersion will not have a significant effect. Example 8-15. Spread spectrum is often used on radio channels, which suffer from multipath distortion (Chapter 5). Suppose we take a two-path model, = b (t) o(t) + a.' oCt - 't) , (8.119) where 't is the relative delay of the second path. For the Ph (t) of Example 8-12, the isolated pulse output of the matched filter with this dispersive channel will be * 't» . f (t) =b (t) Ph (t) =sinc(2rcBt) + a.. sinc(2rcB (t - (8.120) The symbol-rate samples of this isolated pulse are then f = (kT) Ok + a.. sinc(2rcB (kT - 't» . (8.121) As B gets large, assuming that l't I < T, the lSI gets small. This is illustrated in Figure 8-5. For 2BT = 4 or 300% excess bandwidth, the multipath distortion sampled at the symbol = interval can stili be fairly large at the output of the matched filter. For 2BT 128, even a 342 NOISE 2BT=4 2BT = 128 Figure 8-5. The isolated pulses at the output of a matched filter for a two-path multipath channel with 't = OA·T and a = 0.5. Two bandwidth axpansions ar~ shown: 2BT =4 aRC! 2BT = 128. Note that the lSI will be small for 2BT = 128 as long as delay spread 't is a little less than the symbol interval T. but for 2BT = 4 lSi wiI1 be significant even ior very smal1 'to half-symbol multipath spread results in a very small lSI, because the basic pulse shape at the matched filter output decays so rapidly. 0 This example illustrates the desirable effect of bandwidth expansion in tenns of minimizing the effect of channel dispersion on the isolated pulse at the output of the matched filter. However, two caveats are in order: • Spread spectrum with large 2BT is not immune to lSI; for example, consider = what happens when 1: T in Example 8-15. In fact, if the multipath delay spread is 't =T, lSI will be a problem no matter how large 2BT may bel Spread spec- trum successfully mitigates channel dispersion only where the time-delay spread is smaner than a symbol interval. • Figure 8-5 illustrates that the timing recovery must be increasingly accurate as 2BT increases. Fortunately, the broader bandwidth of the signal is helpful in incre.asing the timing recovery accuracy as well. Spread Spectrum and Jamming or Interference We have shown that any bandwidth increase has no impact on the probability of error on white Gaussian noise channels, but it does help mitigate the effects of lSI. There are several other benefits to increasing bandwidth, including countering jamming and interference in military commuoications, hiding the communications from intruders, coexisting in the same spectrum with other uses, and multiple access (separating users sharing a common communications medium, Section 6.9 and Chapter 16). To illustrate these other advantages, we will consider one of them: immunity to jamming or broadband intetferefu,"e. SEC. 8.6 SPREAD SPECTRUM 343 Assume that the noise on the channel is generated by a jammer. Jamming is a deliberate attempt to disrupt communication by generating a broadband interference signal. In practice, the jammer is limited in the power it can generate. The jammer generates bandlimited white noise with power PJ over the signal bandwidth B , with spectral density No =PJ 12B. Since the jamming signal is white over the signal bandwidth, from the perspective of the receiver the error probability is the same as for white Gaussian noise (the presence or absence of out-of-band noise will be inconsequential). The receive SNR is now SNR =-Ps- =-Ps , 2BNo PJ (8.122) independent of bandwidth. The fact that the SNR is bandwidth-independent has profound implications, since from (8.111), Pe is now strongly dependent on the dimensionality 2BT. In fact, as 2BT increases by expanding B , Pe decreases. For this rea- son, 2BT is called the processing gain. Example 8-16. = If the processing gain is 2BT =103 (30 dB), the jammer power is effectively suppressed by 30 dB. That is, in going from 2BT 1 to 2BT = 103 by expanding the bandwidth by 1000, 30 dB greater jammer power PJ can be tolerated with the same error probability. 0 The processing gain can also be interpreted in signal space. The pulse h (t) defines a one-dimensional subspace, and our assumption is that the jammer does not know the direction of this subspace. Thus, the jammer must spread its power PJ evenly over all 2BT dimensions of the subspace of signals bandlimited to B Hz and time limited to a symbol interval T (this is equivalent to the jammer transmitting bandlimited white noise). The matched filter responds to the jammer noise in the direction of the signal only, and hence at the output of the matched filter the jammer power is reduced by 2BT. The foregoing presumes that the jammer cooperates, and transmits bandlimited white noise, or equivalently spreads its power evenly over all dimensions of the signal subspace. But clearly the jammer would do better to concentrate its power in the direction of the signal, because then there would be no processing gain! Increasing the signal bandwidth is beneficial only if the jammer does not know the direction of the signal, and therefore must spread its jamming power equally in all directions. If the jammer transmits a one-dimensional signal, the jammer power in the direction of signal vector h (t) can fall anywhere between 100% (no processing gain) and 0% (infinite processing gain). The use of the term "jammer" implies a military connotation, but in commercial microwave radio systems an important consideration is co-channel interference (Section 5.4). This interference, for example between two satellites within the aperture of a single antenna, between a satellite and terrestrial radio system, or among different users of a terrestrial cellular radio system, has similar characteristics to jamming. If the signal and interferer both spread their bandwidth, keeping their total powers the same, then a processing gain results. In fact, if we take steps to actively reduce 344 NOISE interference, the interferer can avoid transmitting in the one-dimension of the signal, resulting in an infinite processing gain! This is the principle behind the use of spread spectrum as a multiple access technique, as described in Section 6.9 and Chapter 16. 8.7. CAPAC1TY AND MODULATION There are a number of ways of comparing different modulation techniques. The appropriate method depends on the context. Some of the measures of interest when making comparisons include: • The average transmitted power is limited on most media, due to physical constraints ill" regulatDry fiat. On some media, peak transmitted power is also of concern. Either of these limitations, together with the attenuation of the medium, will limit the received signal power. The noise immunity of the modulation system is measured by the minimum received power relative to the noise power required to achieve a given error probability. • The probability of error is normally the basic measure of the fidelity of the information transmission. Of primary coocem in some contexts is the probability of bit error, and in other contexts the probability of block error, fill" some appmpriate block size. • The spectral efficiency is of great concern on bandwidth-constrained media, such as radio. The spectral efficiency is the ratio of two parameters: the information bit rate available to the user and the bandwidth required on the channel. • A measure that has not been discussed yet is the potential advantage of using coding techniques (Chapters 13 and 14). This advantage is different for modulation formats that are otherwise comparable. Coding gain. is usually defined as the decrease in received signal power that could be accommodated at the same error probability if coding were used in conjunction with the modulation system. • An important criterion in practice, although one we do not emphasize in this book, is the implementation cost or design effort required to realize a given modulatkm system. The variety of measures for comparison of modulation systems makes it difficult to define one "standard measure of comparison". Fill" example, for two modulation techniques, we can set the transmitted powers equal and compare the uncoded probability of error. However, the usefulness of this comparison will be compromised if the two modulation systems have different bandwidth requirements or provide different information bit rates for the same bandwidth. In this section, we first illustrate how to make simple comparisons of uncoded modulation systems, addressing specifically baseband and passband PAM. Following this, a more sophisticated approach (based on a "rate-normalized SNR") is developed. This approach allows modulation systems to be compared against the fundamental limits of information theory (Chapter 4) and against one another in a way that is independent of bit rate or bandwidth requirements. The rate-normalized Sl\I""R takes SEC. 8.7 CAPACITY AND MODULATION 345 into account most of the parameters mentioned - transmit power, noise power, and spectral efficiency - and summarizes them in a single universal error probability plot that further displays the available coding gain. Comparisons will be made under the following assumptions: • The channel is an ideal bandlimited channel with bandwidth B and additive white Gaussian noise with power spectral density No. • The average transmit power is constrained to be Ps. There is no peak power limitation. • Symbol error probability adequately reflects the performance of the system. Moreover, the union bound is sufficiently accurate as an approximation to error probability. 8.7.1. Error Probability of PAM The simplest approach to comparing modulation systems is to calculate their error probability as a function of all the relevant parameters. A convenient approximate formula for the probability of symbol error (based on the union bound) is given by (8.111), which we repeat here, Pe :::: K'Q(--J2BT 'l1A ·SNR). (8.123) This formula applies to any PAM system, as long as the effect of lSI is ignored, and expresses Pe in terms of the received SNR , dimensionality 2BT, and a parameter of the signal constellation, llA' Often it is desired to express the probability in terms of the spectral efficiency, which is 10gzM v=--- BT ' (8.124) where M is the number of points in the signal constellation. It is instructive to determine llA for two standard constellation designs. Example 8-17. For a baseband PAM constellation with M equally spaced points, let M be even and let the constellation consist of odd integers. Thus, mInimum distance is a min =2. Assuming QA all = {(2m - the points 1), in 1 - M /2 ~ m ~ M the constellation /2}, are and the equally probable, the variance is (calculating the average-squared valued of only the positive points, taking advantage of symmetry) 01 =-2 = M~I2 (2m _1)2 M 2 -l M m=l 3 (8.125) Substituting into (8.112), and using (8.124), 3 3 llA = M2 _ 1 = 22BTv - 1 (8.126) o 346 NOISE Example 8-18. For a passband PAM L xL square constellation with M = L 2 points, and again using odd ax integers on each axis, the minimum distance is again amin = 2, and the variance for equally probable points can be shown to be =2 (M - 1)/3. Substituting into (8.112) and (8.124), 3 3 T\A = 2(M -1) = 2 (2BTv -1) . (8.127) o In both the baseband and passband cases, as the number of points in the constellation increases, or equivalently the spectral efficiency increases, l1A gets smaller and as expected, from (8.123) we must increase SNR to hold Pe fixed. It is useful to determine Pe when the highest possible symbol rate consistent with the Nyquist criterion is used, since this will maximize the SNR and minimize Pe . = = For the baseband case, the maximum symbol rate is liT 2B, and hence 2BT 1. For the passband case, a passband channel with bandwidth B corresponds to a complex baseband channel with bandwidth B /2, and thus the maximum symbol rate is = = liT B or 2BT 2. Expressed in terms of SNR and v, the resulting Pe is the same for both cases, P e ::: K'Q ['-J'3. 2Sv N _ R1] . (8.128) This is a "universal" formula that applies to both baseband and passband PAM systems with square QAM constellations and the maximum possible symbol rate. Using (8.123), the error probability a variety of baseband and passband constellations can be accurately estimated and compared, not just square QAM. A typical comparison would set the arguments of Q (-) equal for two constellations, to force their Pe 's to be approximately equivalent, and then solve for the relative SNR 'So Example 8-19. Both a baseband binary antipodal constellation and a passband square 4-QAM constellation with the maximum feasible symbol rate obey (8.128). They also have the same spectral efficiency, 2 bits/sec-Hz. Thus, these two systems will have, to accurate approximation, the same Pe if their SNR 's are the same. (Taking into account the error coefficients, K = 1 for binary antipodal and K = 2 for 4-PSK, so in actuality the binary antipodal constellation will have half the error rate.) Intuitively, this can be interpreted as follows. For the same minimum-distance (and transmit power, since the hence radius Pe ), the passband constellation of the constellation points will be requires -J2 larger. 3 dB greater However, the passband bandwidth is also twice as great, for the same symbol rate, allowing in 3 dB more noise. Thus, at a fixed Pe the net passband SNR is the same as in the baseband case, since both the signal and the noise are 3 dB larger. 0 The comparison in Example 8-19 is straightforward because the two modulation systems being compared have the same spectral efficiency (2 bit/sec-Hz). The passband system requires twice the bandwidth, but also has twice as many information bits per symbol. However, if two systems with different spectral efficiencies are compared, things get more complicated. SEC. 8.7 CAPACITY AND MODULATION 347 Example 8-20. If we use the same two constellations as in Example 8-19, but make them both passband, we are comparing 2-PSK (binary antipodal) against 4-PSK (square 4-QAM). For a 2-PSK passband constellation, 2BT =2 and llA =1, and thus Pe =Q (" 2·SNR). Setting the argu- ments of Q(-) equal. 2 . SNR 2-PSK =1. 3'SNR4-PSK 2 ' 2 -2 (8.129) or 2 .SNR 2-PSK =SNR 4-PSK. We conclude that 4-PSK requires a 3 dB higher SNR for the same error probability. (Again, this ignores the effect of the error coefficient. which will be K =1 for 2-PSK and K =2 for 4-PSK.) The bandwidth requirement, and hence noise powers, are the same for both systems. In order to maintain the same minimum-distance. the transmitted power for 4-PSK has to be 3 dB higher. Since the noise is the same, the SNR also has to be 3 dB larger. 0 Example 8-20 indicates that 2-PSK is "better" than 4-PSK, in the sense that at the same symbol rate it operates at the same approximate error probability with a 3 dB lower SNR. This could be misleading, however, because for the same symbol rate, the 2-PSK system is achieving only half the spectral efficiency (1 bit/sec-Hz vs. 2 bit/sec-Hz). In order to achieve the same bit rates for 2-PSK and 4-PSK, the symbol rate of the 2-PSK system would have to be twice as large, which would require twice the channel bandwidth and increase the noise by 3 dB. Thus, a 2-PSK system and a 4-PSK system operating at the same bit rate would require the same SNR to achieve the same Pe • The complications and subtleties of.comparing modulation systems that have different bit rates or spectral efficiencies, as in Example 8-20, demonstrate that a better approach is needed. The universal formula for Pe in (8.128) gives a hint as to a better approach, since it shows that, when expressed in terms of spectral efficiency, all baseband and passband square QAM constellations are equivalent. Another simplification of this formula is that Pe does not depend on SNR or v individually, but only through the ratio SNR /(2V - 1). It is helpful, therefore, to define a new parameter SNR norm' which is called the rate-normalized SNR, as SNR = SNR norm 2v _ 1 (8.130) Now, the Pe is a function of only two parameters, K and SNR norm' C" Pe = K'Q 3· SNR norm) . (8.131) Two square baseband or passband QAM constellations with the maximum symbol rate and the same SNR norm will have approximately the same Pe • The utility of (8.131) is that it expresses very succinctly Pe for a variety of PAM systems, including baseband, passband, and different bit rates and symbol rates. The simplicity of this result leads us to speculate that there may be something fundamental about this tradeoff between SNR and v expressed in SNR norm' Indeed there is, although we will have to take a diversion, calculating the capacity of the channel, to uncover it. 348 NOISE 8.7.2. Capacity of the Ideal Gaussian Channel Two approaches to comparing modulation systems operating over the same channel are first, to compare them directly, or second, to compare them against the fundamental limits of channel capacity (Chapter 4). The channel capacity tells us the maximum bit rate (or equivalently spectral efficiency) that can be obtained on the underlying channel. Comparing against capacity has a couple of benefits. First, it gives an indirect comparison of the systems against one another. Second, the comparison against fundamental limits gives us a valuable benchmark, because it indicates the maximum possible benefits of doing channel error-correction coding (Chapters 13 and 14). The capacity of an ideal bandlimited GRussian channel with additive white Gaussian noise will now be derived, and subsequently compared against the spectral efficiency of several modulation techniques operating over this same channel. A general way to compare a given modulation system against this capacity, based on the rate-normalized SNR already encountered in QAM modulation, will be uncovered. The frequency response of an ideal channel with bandwidth B Hz is shown in Figure 8-6 for two cases, baseband and passband. The convention is that the bandwidth of the channel is B in both cases. This implies that the baseband channel is equivalent to the passband channel for the specific carrier frequency f c =B /2. Intui- tively, we would not expect the carrier frequency to affect the capacity of the channel, since the noise is white, and thus we expect the capacity of the two channels in Figure 8-6 to be identical. In fact, that is the case. We are interested in calculating the capacity C of these two channels, where capacity has the units bits/sec. Thus, C can be directly compared to the bit rate achieved by a given modulation system. The capacity is calculated under a transmitted power constraint, so that the transmitted signal is constrained to have power Ps . Also of interest is the spectral efficiency v, which has the units of bits/sec-Hz. We define Vc as the spectral efficiency of a system operating at the limits of capacity, and thus (a~1 I ~f B o B iB (b) f B I- ~1 I I I - Ie o Ie •I Figure 8-6. An ideal bandlimited channel with bandwidth B. (a) The baseband case, and (b) the passband case. SEC. 8.7 CAPACITY AND MODULATION 349 vc = C IB . (8.132) The channel coding theorem (Chapter 4) says that a certain spectral efficiency Vc can be achieved with transmit power Ps in the sense that an arbitrarily small probability of error can be achieved by some modulation and coding scheme. Further, it says that if you try to achieve a higher v at this Ps , the probability of error is necessarily bounded away from zero for all modulation and coding schemes. The tradeoff between \Je andPs as quantified by the channel capacity theorem is thus a fundamental limit against which all modulation systems can be compared. The capacity of the ideal channels in Figure 8-6 with additive white Gaussian noise is simple to calculate using the capacity of a vector Gaussian channel (Chapter 4), together with the results of Chapter 7. We will do the calculation for the baseband case since it is slightly easier, although the passband case is also straightforward (Problem 8-14). Utilizing the Landau-Pollak theorem (Section 7.4) and the orthogonal expansion of the signal subspace of Section 7.1, any transmitted signal with bandwidth B Hz can be approximately represented in a time interval of length T by 2BT orthonormal waveforms, with increasing accuracy as T ~ 00. From Section 8.4, the minimum-distance receiver will generate a set of 2BT decision variables, by (8.83), C=S+N, (8.133) where S is the 2BT -dimensional vector of signal components, and N is a vector of Gaussian independent noise components. each component having variance a2 =N {}. In this case, since the signal, noise, and channel are real-valued, all vectors in (8.133) are real-valued, and we use N for the noise vector rather than Z. a; The capacity of channel (8.133), consisting of an N -dimensional real-valued vec- tor signal in real-valued vector Gaussian noise, with total signal variance and noise variance per dimension cJL, is given by (4.36), TN a ;") CVG = log2(1+SNR) , SNR = Na2 (8.134) This is the capacity for a single use of the vector channel, or equivalently the capacity for the continuous-time channel over a time interval oflength T. The signal-to-noise ratio SNR is defined as the ratio of the total signal variance to the total noise variance. The constraint that the transmitted power is Ps implies that the average transmit- ted energy in time interval T must be T Ps ' and thus a; T N E[jS2(t) dtl = r,E[S;J = = TPs . o n=l (8.135) Defining CT as the capacity for time interval T, with this power constraint, CT = BT ·1og2 (1 + Slv'R) , , SN'R = T 2B Ps TNo = - 2PsrNfJ . bits. (8.136) In this case, SNR can again be interpreted as signal-to-noise ratio, since the numerator Psis the total signal power at the channel output, and the denominator is the total 350 NOISE noise power within the signal bandwidth (the noise spectral density No times the total bandwidth, which is 2B for positive and negative frequencies). The capacity per unit time is C = CTIT = B 'log2(l + SNR) bits/sec. (8.137) This expression for the capacity of a bandlimited channel is known as the Shannon limit. Alternative proofs and interpretation of this result are given in [7,8]. Fundamental limit In Spectral Efflclency The spectral efficiency is the bit rate per unit time (capacity) divided by the bandwidth, and thus the maximum spectral efficiency predicted by the channel capacity is vc = C IB = log2(1 + SNR) bits/sec-Hz. (8.138) If v is the spectral efficiency of any practical modulation scheme operating at signalto-noise ratio SNR , then we must have v :5: vc' Rate-NormaUzed Signat-to-Nolse Ratio Rewriting (8.138) in a different way, if a modulation system is operating at the limits of capacity with signal-to-noise ratio S.NR and spectral efficiency vc ' then SNR = 1. 2v• -1 (8.139) This relation has a striking similarity to SNR norm defined in (8.130), and SNR nonn was shown in (8.131) to largely detennine Pe for a rectangular baseband or passband QAM constellation. The only difference is that v, the spectral efficiency of the PAM modulator, is substituted for vc' the spectral efficiency at capacity limits. The combi- nation of (8.131) and (8.139) suggests that SNR nonn is a fundamental and useful parameter of a modulation system 19]. In fact, since v S; Vc for a system operating short of the ~capacity limit, SNRnonn = i" 2'" -1 -1 ~ 1 . (K140) This is another way of expressing the Shannon limit on the operation of a given modulation system; if the modulation system operates at signal-to-noise ratio SNR with spectral efficiency v, and the corresponding SNR nonn > 1, then there is nothing fundamental preventing that system from having an arbitrarily small Pe' (If it has a large Pe' that is only because it is falling short of fundamental limits). Conversely. if SNR nonn < 1, the Pe of the system is necessarily bounded away from zero, because the parameters of the system (SNR and v) are violating Shannon limits. In this case, the capacity theorem does not prevent Pe fmm being small, but it does guarantee that there is nothing we could do (like adding error-control coding) to make Pe arbitrarily small, short of changing the parameters SNR and/or v. Thus, SNR norm > 1 is the region where we want to operate on an ideal bandlimited white Gaussian noise channel. SEC. B.? CAPACITY AND MODULATION 351 It is useful to plot the relationship between SNR and SNR nann' where both are expressed in dB, as in Figure 8-7. Taking the logarithm of (8.130), = = SNRnorm,dB SNR dB - !:iSNRdB , !:iSNRdB 10 . log10 (2V - 1) . (8.141) At large spectral efficiencies, the unity term can be ignored, and !:iSNR dB approaches an asymptote of !:iSNR dB :::: 3v. Thus, for a hypothetical high spectral efficiency system operating at the limits of capacity, 3 dB of additional SNR is required to increase spectral efficiency by one bit/sec-Hz. At low spectral efficiencies, a larger increase in SNR is required. Remarkably, PAM systems with square QAM constellations operating at a constant Pe > 0 obey exactly the same tradeoff between v and SNR , as indicated by (8.131). (Although the tradeoff is the same, they will require a higher absolute SNR to achieve a reasonable Pe , as will be seen shortly.) For any modulation system, the gap (usually expressed in dB) between SNR norm and unity (the minimum value of SNR norm) is a measure of how far short of fundamental limits the modulation scheme falls. Specifically, it is a measure of how much the transmitted power (or equivalently the SNR) must be increased to achieve a given spectral efficiency, relative to the lower bound on transmitted power (or SNR) predicted by capacity. The usefulness of SNR norm is that it summarizes SNR and v in a single parameter, and the Shannon limit is very simply expressed in terms of SNR norm' 8.7.3. Using Normalized SNR in Comparisons While SNR norm =1 corresponds to a hypothetical system operating at capacity, all practical modulation schemes such as those considered in Chapters 6 and 7 will have a non-zero error probability for all values of SNR norm' A useful way to characterize Pe is to parameterize it on SNR norm' because SNR norm expresses both SNR and 20 ,.----,-----,-----,-----r---.------, MNR dB 15 10 5 o SPECTRAL EFFICIENCY (blts/sec·Hz) .5 ' - - _ - ' - _ - - - - 1 ._ _....1.-_---'-_ _-'---_---' o 1 2 3 4 5 6 Figure 8-7. The difference between SNR and SNR norm in dB plotted against spectral efficiency. The "asymptote" is the same relationship ignoring the "1" term. 352 NOISE v in a single parameter, and because the Shannon limit is so simply characterized in terms of SNR nom}" In this section we illustrate how the symbol error probability Pe can be calculated as a function of SNR norm for different modulation techniques. This relationship gives us several types of infonnation: • Comparisons can be made between different modulation techniques. For two modulation systems operating at the same Pe (as approximated by the union bound, and ignoring the effect of the error coefficient K), and at the same spectral efficiency, if the superior modulation system allows SNR norm to be 3 dB lower, then it allows 3 dB lower transmit power. Alternatively, if the two systems are operating at the same SNR , then the superior system will operate at a spe{;tral effideocy that is one bit/sec-Hz higher (asymptotically at high v). • Comparisons ~an be made betw'een a modulation system and fundamental limits. At a given Pe and v, the difference between SNRnorrn and unity (usually expressed in dB) tells us how far the modulation system is operating from fundamental limits, in the sense that it is requiring a higher SNR or lower v to achieve the same spectral efficiency. This quantifies, for example, the ultimate potential benefit of adding ermr-eorrection coding to the system {Chapters 13 and 14). • Reasonable comparisons between modulation systems operating at different infOlmation bit rates and spectral efficiencies can be made. As we saw in Example 8-20, such a comparison can be conceptually difficult, and yet is of practical interest. For example, we might want to compare two schemes utilizing the same bandwidth but having a different number of points in the constellation (and heoce different spe{;tral effidency). Comparing them each against the Shannon limit is an indirect way of comparing them against each other. We are interested in a wide range of error probabilities (some applications are more demanding than others), and thus it is useful to plot the functional relationship between Pe and SNR norm' and compare to capacity (SNR norm = 1). This will now be illustrated for several modulation systems. Baseband and Passband PAM Earlier in this section, Pe was estimated for both baseband and passband QAM constellations. The result was (8.123), where the parameter l1A is given by (8.126) (baseband case) and (8.127) (passband case). Expressing Pe in terms of SNR norm' rather than SNR ,(8.123) can be rewritten as Pe ::: K'Q (~ YA . SNR norm) , (8.142) where (8.143) This assumes that the bandwidth on the channel is the minimum consistent with the Nyquist criterion. For the baseband case, from (8. 126}, SEC. 8.7 CAPACITY AND MODULATION 353 (8.144) and for the passband case, from (8.127), a~in (2V -1) 'YA = 22 O'A (8.145) Example 8-21. For square QAM constellations, as shown in (8.131), the remarkably simple result is that 'YA =3. This holds for all cases where the number of points in the constellation is even (baseband case) or the square of an even number (passband case). 0 For other PAM constellations, 'YA is a parameter of the constellation that is independent of scaling, but is a function of the geometry of the constellation. Remarkably, the Pe of any PAM constellation across a wide range of SNR 's, is accurately summarized in this single parameter 'YA' It can be determined directly from (8.144) or (8.145). The error coefficient K is also relevant, although much less important. We can plot Pe vs SNR norm under different conditions, and get a set of universal rate-normalized curves. First, in Figure 8-8, 'YA is held fixed (at "fA = 2), and K is o 'YA =2 -5 SNR GAP to CAPACITY -10 o 5 10 NORMAUZED SNR In dB (10'}0&10 SNR noun) Figure 8-8. A plot of p. vs. SNR notm for passband PAM assuming 'YA = 2 and three typical values of K. This illustrates that K has a relatively minor effect on the error probability. 354 NOISE varied. This set of curves has several interesting interpretations. First, it shows how large an SNR nonn is required to achieve a given error probability for these assumed parameters. As expected, the required SNR nonn increases as Pe gets smaller. The Shannon limit dictates that SNR nonn>l, or 10·loglO SNR nonn > O. Since the channel capacity theorem guarantees the feasibility of achieving any (arbitrarily small) error probability, it is theoretically possible to achieve any point on the 0 dB SNR nonn axis; conversely, since SNR nonn ~ 1, the probability of error will be theoretically bounded away from zero at any point to the left of the 0 dB SNR nonn axis. In this sense, the odB axis represents the limit on deliverable performance as dictated by Shannon limit. At a given Pe the horizontal distance between the 0 dB SNR nonn axis and the curve, labeled "SNR GAP to CAPACITY", represents the increase in SNR nonn required relative to capacity. Also, the horizontal distance between two curves represents the difference in SNR nonn required for two different signal constellations to achieve the same Pe' This gap can be made up in one of two ways: operate the sys- tem at a higher SNR, or at a lower v. By definition, the SNR gap to capacity goes to zero as SNR nonn -+ 1. What may be surprising is that Pe can be small (like 10-1) at this crossover point, or even for SNR nonn < 1. Doesn't the channel capacity theorem rule out any useful operation for SNR nonn < I? Two points should be made about this behavior. First, since the error probability is based on the union bound, it is generally not wise to trust these quantitative results at low SNR (high Pe), except for modulation schemes for which the union bound is exact (such as binary antipodal signaling). Second, although it would be tempting to assert that the channel capacity tells us something specific about the error probability of any modulation scheme operating at SNR nonn < 1, in fact it only asserts that in this region the error probability is bounded away from zero. It does not tell us what that bound is. Thus, the channel capacity theorem does not rule out any non- = zero error probability at the point where SNR nonn 1. In Figure 8-8 the effect of K on SNR nonn is small, emphasizing that K has a relatively minor influence on Pe' The effect of 'YA is much more significant, as illustrated in Figure 8-9. The major factor distinguishing different signal constellations is 'YA' We will calculate 'YA for a couple ()f cases to illustrate this. Example 8-22. All rectangular QAM constellations are equivalent, in the sense that they require the same SNR nonn to achieve a given error probability. That tradeoff between SNR nonn and p~ is the YA = 3 curve in Figure 8-9. For example, at an error rate of p~ = 10"'~, the SNR gap to capacity is about 9 dB, independent of the size of the constellation. However, at a fixed p~, square QAM constellations do require different unnonnalized SNRs, since for the passband case SNR =SNR nonn'(2V -1) =SNR nonn·(M-l). (8.146) As M increases, the SNR must increase in proportion to M - 1 because of the need to increase the signal power to maintain the same minimum distance. Looking at it another way, as the spectral efficiency v increases, the SNR must be increased in proportion to (2v -1). 0 SEC. 8.7 CAPACITY AND MODULATION 355 -5 SNR GAP to CAPACITY K=4 -10 -15 o 5 10 NORMAUZED SNR In dB (lO'loglO SNR no",,) Figure 8·9. A plot of Pe vs. SNR no"" for passband PAM, assuming K =4 and different values of 'YA . Example 8-23. For a PSK signal constellation, all the points fall on the unit circle, and thus o~ = 1 independent of the distribution of signal constellation points. It is straightforward to show that Qmin = 2'sin(1tIM), and thus, (8.147) In this case, YA is strongly dependent on M. in contrast to rectangular QAM. This dependence is plotted in Figure 8-10, where the largest YA is 3, the same as rectangular QAM, for M = 3 and M = 4. Thus, the SNR gap to capacity for PSK is the same for 3-PSK and 4PSK as it is for rectangular QAM. The equivalence at M = 4 is obvious, since 4-PSK is in fact a square QAM constellation. Both 2-PSK (binary antipodal) and M -PSK for M > 4 are inferior to rectangular QAM in the sense that they require a larger SNR nonn to achieve the same Pe (higher SNR at the same v or lower v at the same SNR). In the case of 2-PSK, which is equivalent to binary antipodal signaling, its gap is larger than QAM because it is a passband PAM system that fails to use the quadrature axis (we have shown previously that 356 NOISE a baseband PAM binary antipodal constellation has YA =3). The SNR gap to capacity for PSK increases rapidly at a given P~ as M increases. Intuitively, this is because PSK does not efficiently pack the circularly-shaped constellation with a regular grid of points, and thus suffers in spectral efficiency as compared to square QAM constellations. We can also plot the Pe directly, as shown in Figure 8-11, for different M, taking into account the corresponding K. M = 4 has the smallest gap (equivalent to rectangular QAM), and M = 2 and 8 have roughly the same gap (because the YA is about the same, as seen in Figure S-lO). Choosing large values of M for PSK results in significantly poorer perf~ (8.148) to yield SNR norm,PSK = 3 SNRnorm,QAM 2(M-l)-sin2(7t/M) . (8. 149} This relationship is plotted vs. M in Figure 8-12 in dB. The penalty in SNR nonn for using PSK is shown as a function of lyf. For all M except four, PSK requires a higher SNR (by the amount shown) to achieve a similar error probability. All values of M are squares of even integers, which are the only square QAM constellation sizes. 0 Spread Spectrum Our examples of the SNR gap to capacity for PAM thus far have presumed that the maximum feasible symbol rate in relation to channel bandwidth is used. In spread spectrum, a much lower symbol rate is used, and the SNR gap to capacity win expand accordingly. We can quantify this effect as follows. Considering the bandpass case, from substituting (8.127) into (8.123), I I t 1 ~t ~ j ~ CONSlElLAlION SiZE M ~ 0 4 16 36 64 Figure 8-12. The penalty in dB for PSK in comparison to QAM. vs. constellation size M. The plot starts at M = 4 because this is the smallest QAM constellation, and only the M that are pedect squares ara shown. The horizont& axis is not to sca1&. 358 NOISE Pe ::: K'Q C-J 3"iss SNR norm) , where the additional factor is (8.150) (8.151) Since v is a function of BT, it is useful to express 1ss in tenus of M, the number of points in the constellation, "fss = BT . {M lIBT M -1 - 1) (8.152) For the maximum symbol rate, 2BT = 2 and "fss = 1. More generally, novv-ever, "iss < I, forcing SNR norm to be larger for the same P e and increasing the SNR gap to capacity. This implies that coding is more beneficial in spread spectrum systems. Exercise 8-2. Show thatasBT ~ 00, '(55 ~ (loge M)/(M-l). 0 The effect of "iss is to increase the SNR gap to capacity. Asymptotically, the gap is increased by (M -1 )/loge M for a signal constellation of size M. For M =4, the smal- lest M for which the formula for "fA is valid, the SNR gap to capacity is increased by 3110ge 4 =2.16 (3.3 dB). Penalizing spread spectrum in its SNR gap to capacity, although understandable in tenns of its reduced spectral efficiency, is unfair when we realize that multiple spread spectrum signals can coexist within the same bandwidth, as in cn (t), 0 ~ n ~ N}. If these signals are bandlimited to B Hz for transmission over the baseband channel, and if the dimensionality N is relatively large, then these pulses = can be largely confined to an interval of length T N 12B. Thus, over this interval the transmitted signal is ag ' 0) the probability of n received photoelectrons is p(n) = , Ane -A n. (8.160) Since the integral of power is energy, A =Eb- hv (8.161) where Eb is the total received optical energy in the baud interval. Since h v is the energy of one photon, another interpretation of A is as the average number of photons arriving at the detector in one baud interval for a "one" bit. We can now see why quantum effects are negligible at microwave frequencies. Since these frequencies are about five orders of magnitude smaller than optical frequencies, the energy per photon is five orders of magnitude smaller, and for a given received pulse energy the average number of photons is five orders of magnitude larger! Since the variance of the Poisson distribution in (8.160) is equal to the mean, the standard deviation is the square-root of the mean. The "width" of the distribution, as defined by the standard deviation divided by the mean, approaches zero as the mean gets large. Thus, for a very large number of received photons, the width of the Poisson distribution approaches zero and the randomness due to quantum effects becomes negligible. Returning to the optical case, for our idealized receiver no error can be made if no pulse is transmitted, since precisely zero photons will be received. Hence, the only error that is possible results if a pulse is transmitted and no photons are observed. From (8.160), the probability of error is therefore Pe =0.5-p(0)=0.5·e- A (8.162) where again A is the average number of observed photons when a pulse is transmitted and the factor of 0.5 reflects the fact that no errors occur if no pulse is transmitted (we assume that input bits are equally likely). We can also write this in terms of the average number of arriving photons per bit M = 0.5A, assuming equally like "0" and "1", = Pe O.5-e -1M . (8.163) The quantum limit relates the required average number of photoelectrons to the probability of error, (8.164) or (8.165) SEC. 8.8 QUANTUM NOISE in OPTICAL SYSTEMS 363 Example 8-25. For an error probability of 104. we must have A =20 photoelectrons, whereas for 10-6 only = A 1J photoelectrons are required. 0 It is important to note that the quantum limit is not an information-theoretic bound on the performance of the channel, but rather is a bound on the performance of an OOK detector. Other modulation schemes can theoretically achieve more than one bit per photon, analogous to the ability to achieve multiple bits/sec-Hz in spectral efficiency on radio channels. On the other hand, the quantum limit cannot be approached in a practical direct detection OOK receiver because we cannot reliably detect a signal this small in the thermal noise introduced in an electrical preamplifier. In practice we need a receiverl powe.r roughly 10 to 20 dB large.r than this (200 to 2000 photons) [11]. These sensitivities can be improved by using an optical preamplifier rather than electrical preamplifier. The quantum limit is useful in the same sense that the notion of channel capacity is useful - it tells us what additional performance can be achieved through heroic measures in our OOK receiver design. If all other impairments could be eliminated, then this would be the performance that could be achieved. Coherent techniques, discussed momentarily, can approach the quantum limit. 8.8.3. Filtered Poisson and Avalanche Noise We will now characterize the quantum and avalanche noise in terms of its second order statistics. The terms used in this section are defined in Section 5.3. Since the PIN photodiode detector is a special case of an APD detector where the avalanche gain is unity, we will consider the latter more general case. The receiver design for optical fiber with detailed consideration of the noise analysis was pioneered in the early 1970's by S.D. Personick, then at Bell Laboratories. A more refined model than the idealized receiver used to derive the quantum limit must account for the photodetector bias circuitry, preamplifier impulse response to a single photoelectron h (t), and the avalanche gain Gin' The preamplifier output is therefore a random process of the form (neglecting thermal noise) Y ( t ) = L G m h ( t - tm ) m (8.166) where the tm are a set of Poisson arrival times. Define the first and second moments of the avalanche gain as G and G Z, where from (5.41) the two are related through the excess gain factor FG ' (8.167) From Section 3.4.4, the mean and variance of the filtered Poisson process is given by my(t)=G ·A.(t)*h(t) a;(t)=G 2 ·A(t)*h 2(t). (&.16&) Even though the preamplifier output signal Y (t) is random in nature, we can model it for the purpose of second order statistics and SNR as a deterministic signal my (t) with additive zero-mean noise with variance criU}. Since A(t) is proportional 364 NOISE to the received .QPtical power, the preamplifier output signal is proportional to the avalanche gain G and the received optical power. The fundamental difference from the additive Gaussian noise case considered earlier in this chapter is that the preamplifier output noise variance is also proportional to the received power. Thus, this noise has similar characteristics to crosstalk in a multiple wire-pair system, in that the noise level increases as the signal level increases. It is also similar to the quantization noise experienced in the voiceband telephone channel (Section 5.5) in this respect. Using these results we can calculate the SNR. If we approximate the received ~tical power as a constant P, then the preamplifier output signal is proportional to G·p and the noise variance is proportional to FG (P·P. Defining the constants of proportionality as a and ~ respectively, the SNR (defined as the ratio of signal squared to noise variance) becomes SNR = PA a2(P.p 2 RFGGn'2.P tJ -- R0..2 tJ P - FG . (8.169) We see that the SNR improves by one dB for each dB increase in the received power. This dependence is the same as with additive Gaussian noise, although for much different reasons! In the Gaussian noise case the noise variance stays constant and the signal squared is proportional to the signal power. In this case, the preamplifier noise has variance proportional to the optical signal power and the preamplifier signal power is proportional to the square of the optical signal power. The question remains as to how to choose the avalanche gain G. From (8.169), we can maximize the SNR by minimizing the excess gain factor FG' l'rom (5.42), this factor is a monotonically incr~sing function of the avalanche gain G. Hence, we can maximize the SNR by letting G =1; that is, using a PIN photodiode in preference to an APD as the detector. The result follows because we have not yet considered thermal noise introduced in the preamplifier. In the absence of this thermal noise, the APD detector is deleterious. Preamplifier Thermal Noise As discussed in Section 5.3, another important noise source in fiber systems is thermal noise introduced in the preamplifier. This noise tends to be significant because the signal current at the output of the detector is so small, and therefore the thermal noise introduced at that point is significant relative to the signal level. This is the motivation for using an APD detector, since this is a way to boost the signal level without affecting the thermal noise level. Unfortunately this benefit comes at the price of the additional noise source due to random avalanche multiplication. Since the latter noise generally increases with avalanche gain, there is an optimum gain for any given signal and thermal noise level as we will now show. Extending the analysis of the last section, assume that there is an additional ther- mal noise (within the bandwidth of interest) of c? at the preamplifier output. The result is that the SNR is now SEC. 8.8 QUANTUM NOISE in OPTICAL SYSTEMS 365 SNR 2-2 2 = aGP cJ2 + pFc G2p =SNR pA' 1+ 1 (J2 ~Fc(j2P (8.170) where SNR pA is given in (8.169). The second term, due to thermal noise, reduces the SNR. The thermal noise can be mitigated by' either increasing the optical signal power P or by increasing the avalanche ga~ G. Thus, avalanche gain is helpful in this case. However, as the avalanche gain G is increased, the SNR must eventually start falling because the second term approaches u~ty and the first term, SNR PA' decreases since Fe is monotonically increasing with G. In summary, avalanche gain is useful because it increases the signal level at the input to the preamplifier without affecting the thermal noise. However, it introduces its own excess noise in the random avalanche multiplication, and as a result there is an optimum avalanche gain above which avalanche multiplication is the dominant noise and the SNR starts to decrease again. In practical optical fiber system designs, when a PIN photodiode is used as a detector, the dominant noise source in the system is thermal noise at microwave frequencies and below, and therefore the white Gaussian noise analysis of earlier sections of this chapter is directly applicable. When an APD detector is used, then filtered Poisson noise and avalanche multiplication noise are significant impairments in addition to this thermal noise. 8.8.4. Homodyne and Heterodyne Optical Reception The number of repeaters required in a fiber optic network is inversely proportional to the bit rate of each repeater times the repeater spacing (Problem 5-10). A promising way to reduce the number of repeaters, and hence the network cost, is therefore to increase the repeater spacing. From (5.28) the way in which to increase the repeater spacing is to either reduce the fiber loss, increase the transmitted power, or reduce the received power (increase the receiver sensitivity). The fiber losses are already approaching the theoretical limit for the materials being used, about 0.2 dB per km, and the transmitted power is limited by nonlinear materials effects. Thus, we are left with the option of increasing the receiver sensitivity. For the minimum fiber loss, an improvement of lOdE in receiver sensitivity implies up to 50 additional kilometers between repeaters for fixed transmitter power. Fortunately a way to substantially increase the receiver sensitivity has been demonstrated in the laboratory; namely, homodyne or heterodyne reception. Together these techniques are often called coherent optical reception, although we avoid that term here because of the possible confusion with coherent demodulation. Coherent demodulation uses a carrier at the receiver that has the same frequency and phase (approximately) as the carrier at the transmitter. In optical fiber reception the term coherent refers to the requirement for highly coherent lasers (monochromatic and relatively constant phase). In heterodyne detection the detection method may in fact be incoherent (in the sense that no attempt is made in the receiver to estimate the phase of the carrier), whereas in homodyne detection the reception must be coherent. We will see that incoherent FSK detection (Section 6.6) is among the most promising techniques for heterodyne optical fiber reception. 366 NOISE It has been demonstrated that homodyne receivers can approach the quantum limit in sensitivity, and heterodyne receivers give up 3 dB in sensitivity. Early work with heterodyne optical communication used lasers to transmit over large distances in space [12,13,14]. In such an application the power available to the transmitter can be quite small, so receiver sensitivity is crucial. In optical fibers, we have the luxury of being able to use regenerative repeaters, but for economic reasons we want to reduce the number of repeaters. APD receivers are typically used for weak optical signals, but because of random fluctuations in their gain, they are noisy. In contrast, a PIN photodiode yields at most one electron-hole pair per incident photon, resulting in a weak electrical signal for small incident power and thermal noise when we amplify this signal. Homodyne and heterodyne receivers use PIN photodiodes even when the optical signal is weak. They use a local light source to supplement the incoming photons in such a way as to significantly enhance the sensitivity of the receiver. This offers the significant advantages of eliminating the APD multiplication noise and taking advantage of the higher bandwidth capabilities of the PIN diode (roughly three to four times the bandwidth of the APD). While theoretically very interesting, heterodyne reception has not achieved commercial viability primarily because optical amplifiers offer many of the same advantages at lower cost. We will discuss homodyne detection, followed by heterodyne detection, in the following subsections. Homodyne Detection Assume that the electromagnetic field at the receiver can be represented as r(t) = ±Acos(root) (8.171) where A 2 is proportional to the optical power and hence proportional to the average rate of arrival of photons. The signal is a binary antipodal PSK signal, + A cos(root) representing a "one" and -Acos(root) representing a "zero". The generation of such a signal requires a monochromatic light source with fixed phase, an ideal that can be approached with sufficient accuracy in practice to make the detection techniques that follow of practical interest. The ideal homodyne detector adds (optically) a local signal of exactly the same frequency and phase, s (t) =B cos(root) (8.172) getting x(t) =r(t) + s(t) =(B ±A )cos(root), (8.173) as shown in Figure 8-14. We assume that B is much larger than A, with the result that the optical power falling on the photodetector is sufficiently large that thermal noise effects in the receiver electronics will be rendered ne1ligible. The optical power incident to the photodetector is proportional to (B ±A) , depending on whether a "one" or "zero" is transmitted, and therefore the energy and the average number of photons to arrive in symbol interval T is K '(B ± A )2T for some constant of SEC. 8.8 QUANTUM NOISE in OPTICAL SYSTEMS 367 OPTICAL COMBINER PHOTOOETECTOR FIBER~~ PIN r lOCAL LASER B COS(CJJol ) INTEGRATE AND DUMP Figure 8-14. A coherent optical recaiver works by optically adding a locally generated optical signal to the received optical signal and detecting the sum with a PIN photodiode. proportionality K. A reasonable detection system effectively counts the number of photons arriving in each symbol interval. This can be approximated by integrating the output of a PIN photodiode and sampling at the end of the symbol interval, as shown in Figure 8-14. The integrator should be reset (dumped) before the next symbol interval. The expected number of photoelectrons that will be generated in the PIN photodiode will be, assuming perfect quantum efficiency, A=K·(B ±A)7 =K·(B2 +A 2 ±2AB)T. (8.174) Since the constant K will not affect the results to follow, we will set K = 1. The receiver can subtract out the common term (B 2 + A 2)T and the result is a binary antipodal signal ± 2ABT which can be applied to a slicer with threshold at zero. If B is large, then the number of photons arriving at the PIN photodiode is large, so little electrical amplification is needed. Furthermore, since the desired component of the signal ± 2AB is proportional to B, it can be made large. There is no need, therefore, to use an APD and suffer its random gain, since thermal noise can be made insignificant. This is the principal advantage of homodyne detection. To analyze the performance of the homodyne detector we need to characterize the noise at the input to the slicer. To this end we will use the Chernoff bound to characterize the probability of error asymptotically as the average number of photoelectrons gets large, and show that the resulting probability of error is the same as the quantum limit. The following exercise gives a useful result. Exercise 8-3. Usethe first two tenns of a Taylor series expansion, = In (l + e) £ - 0.5£2 ... (8.175) to show that for a Poisson random variable X with large parameter a and a small 0> 0 such that OIa « I, the Chemov bound of Problem 3-19 becomes I-Fx(a+O)~e-5212a (8.176) (8.177) o 368 NOISE Neglecting thermal noise, the random variable Yk at the output of the integrate-and- dump filter is Poisson distributed with parameter a =A given by (8.174). The probability of error assuming the signal is A cos(roc t) is Fx(A - 0) where 0 =2ABT, and the Chernoff bound for this is given by (8.177). In this bound we can approximate A by B 2T for large B , in which case the bound becomes Pe ~e 2 -2A T . (8.178) Using (8.176) to bound the probability of error assuming signal - A cos(roct) is transmitted, we get the same answer as (8.178), and hence (8.178) becomes the upper bound on the probability of error regardless of the a priori probability of each signal. We can relate this probability of error to the average number of received photons M l since M = A 2T regardless of which signal is transmitted, and hence the Chernoff bound becomes Pe <- e -2M , (8.179) the same as the quantum limit in (8.163) if we disregard the insignificant factor of 0.5. As shown in Section 6.5, for small probability of error this multiplicative factor does not result in a significant difference in the signal power required to achieve a given probability of error, so we see that ideal homodyne detection permits us to closely approximate the quantum limit as the local oscillator amplitude B gets large. Stated another way, an ideal homodyne detector for 2-PSK performs as well as an ideal photon counter with OOK when the average receive power is the same, but shows more promise of being practical. Note that if we constrain the peak power instead of the average power, then the ideal homodyne detector actually performs 3dB better than the quantum limit (see Problem 8-21). Heterodyne Detection Heterodyne detection is similar to homodyne except that the local laser has a frequency different than the carrier, thus achieving a frequency translation to an intermediate frequency (IF) rather than directly to baseband. Although heterodyne techniques have been common in radio applications, it was not until 1955 that the first beat signal from the mixing of two light sources on a photocathode was reported. If the output of the local laser is written s (t) =Bcos(ro1t) (8.180) then the sum of the incoming and local optical signals for a single symbol, taken without loss of generality as occurring over the interval 0 ~ t ~ T, is x (t) =± A cos(root) + B cos(rolt) . (8.181) Exercise 8-4. Show that (8.181) can be written in tenns of the envelope and phase about a carrier at frequency WI as X(t)=E(t)COS(OOlt + ~(t)) (8.182) where the envelope is SEC. 8.8 QUANTUM NOISE in OPTICAL SYSTEMS 369 £2(t) = B 2 + A 2 ± 2AB COS(ffin;t) , (8.183) ffin; =roo - COl is the intennediate frequency (IF), and ~(t) is a time-varying phase. 0 The photon arrivals fonn a Poisson process with arrival rate A(t) proportional to the square of the envelope, which is the instantaneous power, A(t) = EZ(t) . (8.184) The output of the photodiode and preamplifier electronics will be a shot noise process. If we let h (t) be the impulse response of the photodiode bias circuitry and preamplifier, then from (3.141) (with 13 = 1 and A(t) large) the preamplifier output approaches a Gaussian process with mean value * s(t) =A(t) h(t) (&.185) and variance (8.186) We expect the bandwidth of the preamplifier to be large relative to the IF, and hence s (t) :: A(t) assuming unity gain in the passband. Ignoring the d.c. tenn, which will be subtracted prior to the slicer as in the homodyne case, s(t)::::±2ABcos(COJpt) . (8.187) Similarly, since B is large, for purposes of the noise variance we can consider A(t):::: H 2, and the noise variance is therefore independent oftime, Example 8-26. If the preamplifier has a flat gain with bandwidth Hi radians/sec, we get dl· = B 2W Ix. Since the noise variance is proportional to bandwidth, we can consider this noise to be white Gaussian noise passed through a bandlimiting filter. If the bandwidth is large relative to IF, we can consider the additive Gaussian noise to be white with power spectral density B Z• 0 On the basis of this example, we will assume the additive noise N (t) to be white and Gaussian with power spectral density 8 2, and write the reception in the Conn Y (t) =± 2AB cos(COJpt) + N (t) . (8.189) Put into this approximate fonn, the detection problem now is identical to that of 2PSK as discussed earlier in this Chapter. A correlation or matched filter receiver (justified intuitively in Section 6.6 and shown to be optimal in a maximum likelihood sense in Chapter 8) computes the correlation T Q = IY(t)cos(O>rJ;t)dt o (8.190) and decides that a one was sent if Qis greater than zero and a zero was sent otherwise. 370 NOISE Exercise 8-5. Show that the signal to noise ratio in Q, the input to the slicer, is given by SNR = (ABT)2 =2A 2T . B 2T/2 Assume that COrF is either large or satisfies COrFT =K21t where K is an integer. 0 (8.191) The probability of error for this antipodal symbol set is Pe =Q(..JSNR )=QrJ2A2r)~e-A2T =e- M (8.192) where M =A 2T is again the average number of photons per bit arriving over the fiber. The inequality again follows from the Chernoff bound (3.43). The exponent is a fac- tor of two smaller than for homodyne detection implying that heterodyne detection requires a 3 dB higher signal power than the quantum limit or ideal homodyne detec- tion. In spite of this penalty for heterodyne detection, it is very attractive from a practical perspective. Heterodyne detection does not require the local laser to be precise in either frequency or phase, since uncertainties in the IF can be compensated by carrier recovery applied to the IF signal. In addition, heterodyne detection allows incoherent demodulation of FSK or MSK signals, obviating the need for carrier recovery at IF (at the expense of a small additional penalty in SNR). Perhaps the most intriguing possibility for heterodyne detection is optical frequency-division multiplexing, in which many closely spaced carriers are used to transmit independent data streams. Frequency-division multiplexing can be used with direct detection also (this is called wavelength-division multiplexing (WDM) but the much larger bandwidth of the intensity-modulated data stream makes it much less bandwidth efficient. The bandwidth efficiency of WDM is on the order of 10-6 b/s/Hz, as compared to about 10-1 for heterodyne detection. While bandwidth is not by any means a scarce resource in the optical fiber medium, if the maximum repeater spacing and bit rate are to be obtained we must limit the bandwidth of the optical signal to regions of low attenuation and small chromatic dispersion, making optical FDM a very attractive approach. Laser Phase Noise Phase or frequency noise in lasers phenomenon seriously complicates homodyne and heterodyne fiber detection [15,16,17]. Laser phase noise is caused by randomly occurring spontaneous emission events. Each event causes a spontaneous jump (of random magnitude and sign) in the phase of the electromagnetic output. The phase executes a random walk away from the value it would have in the absence of spontaneous emission. As the time between events becomes very small, the phase due to the events can be approximated as the integral of a white Gaussian noise process t E>(t)=21tJN(t)dt o (8.193) where N (t) has power spectrum No. The power spectrum of N (t) is a property of the SEC. 8.8 QUANTUM NOISE in OPTICAL SYSTEMS 371 laser. Laser phase noise is observable as a broadening of the spectrum of the output of the laser. The 3dB width of the spectrum of the laser is called its linewidth. The lasers most likely to be used in optical fiber systems, semiconductor injection lasers, can be made with linewidths in the range of 5 to 50 Mhz. Example 8-27. The linewidth re~uired for heterodyne detection with MSK incoherent demodulation is approximately 10- '[d' where '[dis the bit rate. For homodyne detection with PSK modulation the required linewidth is about 10-4'[d' This more stringent requirement stems from the required coherent demodulation of PSK. A laser with linewidth of 10 Mhz will therefore support a bit rate of about 10 GHz (heterodyne MSK detection) or 1 GHz (homodyne PSK detection). 0 Considerable research effort is devoted to designing appropriate lasers with much narrower linewidths. Until these lasers become available, the most promising coherent fiber signaling scheme is heterodyne FSK or MSK with incoherent demodulation. 8.9. FURTHER READING The geometric approach to estimating the error probability of modulation systems was originally inspired by the text by Wozencraft and Jacobs [4]. The approach to comparing modulation systems used here, based on the normalized signal-to-noise ratio, was heavily influenced by [9]. Spread-spectrum is covered in some detail in the digital communications textbooks by Proakis [18], Cooper and McGillem [19], and Ziemer and Peterson [20]. Our coverage has relied heavily on two excellent tutorial articles [6,21]. The design of fiber optic receivers, with a detailed analysis, is covered more thoroughly in the books written by Personick [22,10], Barnoski [23], and Gagliardi and Karp [24]. A useful overview of optical fiber technology is given by Henry [25]. A study of practical limits for direct detection is given by Pierce [26]. For coherent fiber optics, see the survey articles by Salz [27] and Barry and Lee [28]. A general description is also given by Kimura [29], accompanied by several excellent papers in the special issue of the IEEE Journal of Lightwave Technology in April 1987, jointly prepared with the IEEE Journal on Selected Areas in Communications. For example, one paper in the issue is on optical continuous-phase FSK by Iwashita and Matsumoto [30]. A thorough analysis of the bit error rate of various coherent optical receivers is given by Okoshi et. al. [31]. Kazovsky gives an excellent comparison of optical heterodyne vs. homodyne receivers [32] as well as an analysis of the impact of laser phase noise on heterodyne systems [33]. Other theoretical analyses of various types of coherent receivers of note are [34,31,35]. 372 NOISE PROBLEMS 8-1. In this problem we derive the statistics of the noise Z (t) at the output of a PAM receiver receive filter in a different way from Section 8.3. The configuration we will use is shown in Figure 6-19a. The front end of this receiver is reproduced in Figure 8-15, where the input is the noise N (t), the complex-valued noise at the output of the analytic bandpass filter is denoted by M (t), and after demodulation by Z (t). (a) Assuming that! (t) is bandlimited to roc, what can you say about the relationship of the real and imaginary parts ofV2! (t)elCJl,t? (b) Explicitly write the real and imaginary parts ofV2! (t )ejCJl,t in terms of! (t). (c) What are the variances of the real and imaginary parts of M (t), as well as their crosscorrelation, in terms of! (t) and No? (d) Use the results of (c) to show that the real and imaginary parts of M(t) are independent and have the same variance. (e) Show that Z (t) has the same first-order statistics as M (t). 8-2. Assume that the real-valued receive filter! (t) in Figure 8-15 is an ideal lowpass filter with bandwidth W radians/sec and that the symbol rate obeys rtf[ =W. Show that the noise sam- ples at the slicer input Z (kT) are white in this case. 8-3. Compare Q (d 120) and Q 2(d (20) for values d =2 and 0 =0.5. Do it again for 0 =0.25. Is the approximation in (8.62) valid for these values of o? You may use Figure 3-1 to approximate QO. 8-4. Consider the 4-PSK constellation in Figure 7-5. Assume that 0 =0.25 is the standard deviation of the independent real and imaginary parts of additive Gaussian noise. Assume b = I and the transmitted symbol is -I. Find the probability that the received sample is closer to j than to -1 and compare it to the probability that the received sample is closer to +1 than to -I. You may use Figure 3-1 to estimate the probabilities. 8-5. Show that the probability of error for the 16-QAM constellation of Figure 7-5 can be written Pr[error] = 3Q (df2cr) - 2.25Q2(df20). (8.194) 8-6. In this problem we put together the results of Chapter 6 and 8 to analyze a passband system. Assume a benign channel, B Uro) =1, with additive Gaussian noise with power spectrum SNU ro) =No. The transmit filter produces a 100% excess-bandwidth raised-cosine pulse. The transmit power cannot be greater than unity. The receive filter has ideal lowpass baseband equivalent! (t) that permits the 100% excess-bandwidth pulse to get through undistorted. The ANALYTIC BANDPASS FILTER V2F U(ro - roc» I-M_(_t)""""_-,,,- N(t) V2!(t)e j CJl,t Z(t)N(t) Z(t) e - jCJl,t (a) Figure 8-15. Configuration for calculating the noise at receive filter output. a) A complexvalued filter realization, and b) a detail showing explicitly the real and imaginary parts of M(t). 373 constellation is 16-QAM. Find the probability of error as a function of No and T. 8-7. Consider the constellation in the following figure: 0 fbt2 0 I -b 0 t I• b 0 Assume that • The inner two symbols each have probability 1/4. • The outer four symbols each have probability 118. cr. • The noise in each dimension is independent and Gaussian with variance (a) Design a coder for this constellation that achieves these probabilities if the input bits are equally likely to be zero or one. (b) Find the exact probability of error as a function of b . (c) Find the signal power as a function of b. (d) Give the probability of error as function of the SNR. Use an approximation from Figure 3-1 to find the probability of error when SNR = 10dB . (e) Give approximations for the probability of error. Compute the approximate probabilities of error when SNR = lOdE • 8-8. Suppose that more than two dimensions are available for our alphabet. Consider an alphabet where the symbols are vertices of an M dimensional hypercube, shown for M = 3 in the following figure: Assume that all the symbols are equally likely and the noise in each dimension is independent with variance d. (a) What arec the. dedsion regions? (b) What is the probability of error as a function of M and the minimum distance. d between points in the signal constellation. 8-9. Assume a constellation that is M dimensional, consisting of M equally likely symbols at right angles each with magnitude a. Find a bound on the probability, of error using the union bound. 8-10. (a) Find the union bound on the probability of error for the 16-QAM constellation in Figure 7-5b. Assume Ak = c + jc is actually transmitted. (b) The CenT V.29 standard for full-duplex transffiiSl>ion at 9600 bis over voiceband channels uses the constellation shown in Figure 8-16. Find the union bound on the probability of error. Assume Ak = 1 + j is transmitted. (c) Explain why the exact analysis technique of Example 8-6 would be difficult to apply for the V.29 constellation. Cd) Find c in Figure 1-5 so that the two constellations have the same power. Use the union bounds of parts (a) and (b) to compare their performance. 374 NOISE Im[At } 5 0 3 0 01 0 Re{At } 0 !35 0 0 Figure 8-16. The constellation for the CCID V.29 standard for transmission at 9600 bls over voiceband channels. 8-11. Show that if the translates of the chip waveform he (t) are mutually orthogonal, then N = '2BT pulses of the form of (8.113) can be made mutually orthogonal by choice of the spreading sequences. Specify the required properties of the spreading sequence. 8-12. Consider the spreading sequence (x"" 0 S; m S; N - I J in (8.113) to be the impulse response of a causal, discrete-time FIR filter. Condition (8.116) suggests that we would like this filter to be an allpass filter. Use the results of Section 2.5.3 to show that the only FIR filters that are allpass have impulse response 51 -L, for some integer L. Thus, (8.116) can be exactly satisfied only for the trivial choice of spreading sequence in Example 8-13. 8-13. Consider a spread-spectrum system operating in N = '2BT dimensional signal space, where the isolated pulse signal is chosen randomly. Let a set of orthonormal basis functions for this space be B b suppose they have the same channel capacity. Find a relation for the SNRs (in dB} required for the two channels as a function of the bandwidth expansion factor B 2/B 1, Interpret this relation. You may assume large SNR. 8-16. It is common to use binary PSK in spread spectrum modulation. Find the SN"R gap to capacity for 2-PSK spread spectrum as a function of BT. What is the increase in the SNR gap to capacity asymptotically as BT ~ 00, expressed in dB? 8-17. Consider the input to the slicer in a direct-detection optical fiber receiver with a PIN detector, cr. which consists of a Poisson random variable witlr mean-value '\) or Al for the two pos:.ible signals plus independent additive Gaussian noise with variance Find the Chernoff bound on the probability of error, assuming the two signals are equally likely. 8-18. (a) In this problem we will explore the conditions under which the direct-detection OOK opticalfiber receiver performance is limited by thermal noise. Assume a PIN detector with 100% quantum efficiency, a bit rate of 108 bits/sec, and a wavelength of 1.5 1lJIl. Assume the front end of the receiver consists of a current source (the photodetector) in series with a 10K ohm resistor, and the voltage across the resistor is integrated for each baad interval and applied to a slicer (integrate-and-dump receiver). The 10K ohm resistor has an internal thermal noise source. What is the incident average optical power required to achieve a l~ error probability at the quantum limit? (b) Use the results of Problem 5-12 to find the variance of the thermal noise component of the slicer input. Also find the size of the average signal component at the slicer input for a transmitted one bit. (c) At what incident average. optical powe-r is the. signal to the-rma! noise. ratio (for a one bit} at the slicer input equal to 20 dB? (d) At the incident optical power of (c), how many photons per one bit are incident on the detector? (e) What are the relative sizes of the variance of the shot noise and thermal noise at the slicer input for the incident optical power of (c)? 8-19. (a) For the same conditions as Problem 8-18, adjust the incident optical power so that the signal to thermal noise ratio at the slicer input is only 10 dB. Using an APD detector with ionization ratio k =.03, find the APD gain that maximizes the total SNR, including both thermal and shot noise components, at the slicer input (this will require a numerical solution, with the aid of a calculator or computer). (b) For this optimum APD gain, how much of a gain in SNR is attributable to the APD relative to a PIN detector? 8-10. In the ideal homodyne optical de!e.etOt, the larger B ge~ in (8.J72), the. more· dynamic range· is required in the electrical circuits prior to the subtraction in Figure 8-14. The purpose of this problem is to show that the dynamic range requirement is modest. Let the local laser produce tOOO times as many pl1otoIl£ as are arriving from the fiber, so B 2 = lOOOA 2. Assuming there. is no noise, find the ratio (in dB) of the power of the desired signal ± 2ABT at the sampler and the power of the common term ('82 + A 2)T that is subtracted out. 376 NOISE 8-21. 8-22. Show that an ideal homodyne 2-PSK detector performs at least 3 dB better than an ideal OOK photon counting receiver (the quantum limit) if the peak received power is the same in both systems. Repeat the derivation of the Chernoff bound of (8.179) using the following technique. Use the fact that a Poisson random variable with large parameter a approaches a Gaussian random variable, and then approximate the probability of error using the Chernoff bound for a Gaussian random variable in (3.43). REFERENCES 1. D. B Williams and D. H. Johnson, "On Resolving 2M-1 Narrow-Band Signals with an M Sensor Uniform Linear Array," IEEE Trans. on Signal Processing, p. 707 (March 1992). 2. N. R. Goodman, "Statistical Analysis based on a Certain Multivariate Complex Gaussian Distri- bution (An Introduction)," The Annals 0/ Mathematical Statistics 34(1) pp. 152-177 (March 1963). 3. S.W.Golomb, Digital Communications with Space Applications, Prentice Hall, N.J. (1964). 4. J. M. Wozencraft and I. M. Jacobs, Principles o/Communication Engineering. Wiley, New York (1965). 5. R. A. Scholtz, "The Origins of Spread-Spectrum Communications," IEEE Trans. Communications COM-30(5) p. 822 (May 1982). 6. R. L. Pickholtz, D. L. Schilling, and L. B. Milstein, "Theory of Spread-Spectrum Communications - A Tutorial," IEEE Trans. CommU1lications COM-30(5) p. 855 (May 1982). 7. C. E. Shannon and W. Weaver, The Mathematical Theory 0/ Communication, University of Illi- nois Press, Urbana, Illinois (1963). 8. C. E. Shannon, "Communication in the Presence of Noise," Proc.IRE 37 pp. 10-21 (Jan. 1949). 9. G. D. Forney, Jr and M. V. Eyuboglu, "Combined Equalization and Coding Using Precoding," IEEE Communications Magazine, (Dec. 1991). 10. S. D. Personick, Fiber Optics Technology and Applications, Plenum Press, New York (1985). 11. J. C. Campbell, A. G. Dentai, W. S. Holden, and B. L. Kasper, "High Performance Avalanche Photodiode with Separate Absorption, Grading, and Multiplication Regions," Elect. Lett. 19 pp. 818-819 (Sep. 29, 1983). 12. M. Ross, Laser Receivers, John C. Wiley and Sons (1966). 13. O. E. DeLangue, "Optical Heterodyne Detection," IEEE Spectrum, pp. 77-85 (Oct. 1968). 14. W. K. Pratt, Laser CommU1lication Systems, John C. Wiley and Sons (1969). 15. C. H. Henry, "Theory of the Linewidth of Semiconductor Lasers," IEEE J. Quant. Elec QE. 18 pp. 259-264 (Feb. 1982). 16. M. W. Fleming and A. Mooradian, "Fundamental Line Broadening of Single-Mode GaAlAs Diode Lasers," Appl. Phys. Lett. 38 pp. 511-513 (April 1, 1981). 17. C. Harder, K. Vahala, and A. Yariv, "Measurment of the Linewidths Enhancement Factor a of Semiconductor Lasers," Appl. Phys. Lett. 42 pp. 328-330 (Feb. 15, 1983). 18. J. G. Proakis, Digital Communications. Second Edition. McGraw-Hili Book Co., New York (1989). 19. G. R. Cooper and C. D. McGillem, Modern CommU1lications and Spread Spectrum. McGrawHill Book Co., New York (1986). 377 20. R. E. Ziemer and R. L. Peterson, Digital Communications and Spread Spectrum Systems, Macmillan, New York (1985). 21. C. E. Cook and H. S. Marsh, "An Introduction to Spread Spectrum," IEEE Communications Magazine, p.8, (March 1983). 22. S. D. Personick, Optical Fiber Transmission Systems, Plenum Press, New York (1981). 23. M. K. Barnoski, Fundamentals of Optical Fiber Communications, Academic Press, New York (1976). 24. R. Gagliardi and S. Karp, Optical Communications, Wiley-Interscience, New York (1976). 25. P. S. Henry, "Introduction to Lightwave Transmission," IEEE Communications 23(5)(May 1985). 26. J. Pierce, "Optical Channels: Practical Limits with Photon Counting," IEEE Trans. on Communications, (Dec. 1978). 27. J. Salz, "Modulations and Detection for Coherent Lightwave Communications," IEEE Communications Magazine 24(6)(June 1986). 28. J. R. Barry and E. A. Lee" "Performance of Coherent Optical Receivers," Proceedings of the IEEE 78(8)(Aug. 1990). 29. T. Kimura, "Coherent Optical Fiber Transmission," IEEEIOSA Journal of Lightwave Technology LT.S(4)(April 1987). 30. K. Iwashita and T. Matsumoto, "Modulation and Detection Characteristics of Optical Continuous Phase FSK Transmission System," IEEEIOSA Journal of Lightwave Technology LT· S(4)(ApriI1987). 31. T. Okoshi. K. Emura, K. Kikuchi, and R. Th. Kersten, "Computation of Bit-Error Rate of Various Heterodyne and Coherent-Type Optical Communication Schemes," J. Optical Communications 2 pp. 89-96 (1981). 32. L. G. Kazovsky, "Optical Heterodyning Versus Optical Homodyning: A Comparison," J. Opt. Commun. 6(1) pp. 18-24 (1985). 33. L. G. Kazovsky, "Impact of Laser Phase Noise on Optical Heterodyne Communication Systems," J. Opt. Commun. 7(2) pp. 66-78 (1986). 34. Y. Yamamoto and T. Kimura, "Coherent Optical Fiber Transmission Systems," IEEE J. Quantum Electronics QE.17(6) pp. 919-935 (June 1981). 35. T. Okoshi, "Heterodyne and Coherent Optical Fiber Communications: Recent Progress," IEEE Trans. on Micr. Th. and Tech. MTT·30 pp. 1138-1148 (Aug. 1982). DETECTION We saw in Chapter 8 that one of the fundamental problems in digital communications is the corruption of the transmitted signal by noise. Using common sense, practical receivers can be designed that are reasonably robust in the presence of noise. Nevertheless, the question arises: Are the "common sense" receivers designed in Chapter 6 and 7 optimal? In this chapter we develop a theory of optimal detection for both discrete-time and continuous-time channels. With this theory, we will verify that the receiver structures given in Chapters 6 and 7 are optimal under certain circumstances and certain criteria of optimality. Chapters 13 and 14 will also use the theory we develop here to decode error-correction and trellis codes. In fact, we will uncover an underlying commonality between the problem of detection of data symbols on channels with intersymbol interference and trellis decoding. The general approach to deriving optimal receivers is to model the relationship between the transmitted and received signals by a joint probability distribution. Based on the noisy observation (the received signal plus noise), we wish to estimate or detect the input signal. We use the term estimation when the transmitted signal is a continuous-valued random variable, as is often the case in an analog communication system, and the term detection when the transmitted signal is discrete-valued (even if the received signal is continuous-valued). The primary distinction is that in detection we can often recover the signal exactly with high probability, a restatement of the regeneration principle of Chapter 1. In estimation, by contrast, we must be satisfied with a recovered signal that may be more accurate than the observation but will not be exact. In this chapter we study only detection, although very similar techniques can 379 be applied to the estimation problems of analog communications. In fact, we will encounter parameter estimation problems in Chapter 11 when we communicate over channels with parameters initially unknown. In order to address the detection problem, we need a statistical model for the received signal. Before the data symbols arrive at the detector, they are processed by a transmitter, pass through a channel, and are further processed by the front end of the receiver. Some of this processing is deterministic, such as any filtering functions, and some is random, such as additive noise on the channel. In this chapter we call the detenninistic portion :.ignal generation and a random component noise generation. The mode! is shown in Figure 9-1. The input Xk is a discrete-time and discrete-valued random process. It is not only discrete-valued but has a finite number of possible values, each of which is a function of the source bits. Example '-1. Suppose XI< is a data symbol sequence. Then the signal generator could be a discrete-time. equivalent transmit filter and channel transfer function, and the noise generator could be additive Gaussian noise, independent of Xk • 0 Example 9-2. Suppose Xit is a bit sequence. The signal generator could be a coder that produces another bit sequence, and the noise generator could be modeled as a binary symmetric channel (BSq, which randomly inverts some of the bits. 0 The receiver uses the observation Yk to make a decision about Xk • The observation can be either discrete or continuous-valued. Since each Xk has a finite number of possible values, the detector must make a decision from among a finite number of alternatives. Two related detection methods are covered: maximum likelihood (,\-1£) and maximum a-posteriori probability (MAP). MAP detection, also called Bayesian detectio~ is optimal in the sense that it minimizes the probability of error. While probability of error is undoubtedly the most appropriate criterion to minimize in most digital communications systems, ML detection is almost always used in practice instead of MAP detection. ML dete.ction is a special case of MAP detection for the simplifying assumption that all the possible inputs are equally likely. It is often reasonable to assume equally likely signals, and in any case the performance of the simpler ML detector is usually so close to that of the MAP detector that there is little incentive to INPUT SIGNAL GENERATION ---l --! Xt MODEL NOISE GENERATION MODEL f----+OBSERVATION Yt Figure 9-1. A signal Xt to be transmitted is processed deterministically (signal generation) and stochastically (noise generation). 380 DETECTION implement a costlier MAP detector. We begin with simple signal generation models and progress to more realistic (and more complicated) models. We address the two basic noise generators of Example 9-1 and Example 9-2 - additive Gaussian noise and the BSC. We begin with the detection of a single real-valued nata symbol, where the input is a real-valued data symbol and the signal generator is trivial Then we progress to the detection of vector-valued inputs, which applies to the case of complex-valued signal constellations among others. At this point we will have theoretically justified the slicer used so liberally in Chapter 6, at least for the case where there is no intersymbol interference {lSI). The next step is to derive the optimal netector in annitive Gaussian noise, for both the discrete-time and continuous-time cases. This is followed by relaxing the known-signal assumption by allowing the carrier phase to be unknown (random). Up to this point the optimal detectors have been defined for the detection of a single data symbol, and the next extension is to lSI, where it is shown that the minimum-distance receiver design of Section 7.4 is optimal in aiiilltive Gaussian noise. A lowcomplexity algorithm for carrying out the minimization with all data-symbol sequences, the Viterbi algorithm, is then derived. The Viterbi algorithm has many other applications in digital communication, including the detection of convolutional and trellis codes (Chapters 13 and 14). Finally, the detection of a shot-noise signal with known intensity, characteristic {)f fiber nptic systems, is consideren. 9.1. DETECTION OF A SINGLE REAL-VALUED SYMBOL In this section, we consider the simplest case, where the input is a single random variable X (a single data symbol A) rather than a random process, and the signal generator passes this symbol &rectly through to the noise generator without modification. The data symbol has as a sample space the alphabet !2A ~ as discussed in Chapter 6. Noise generators that arise in practice for this case result in either a discrete-valued observation Y or a continuous-valued observation. We give examples of both cases in the following subsections, at the same time illustrating the ML and MAP detectors. 9.1.1. Discrete-Valued Observations Some noise generators result in discrete-valued observations Y. In order to design a detector, we must know the discrete distribution of Y conditioned on knowledge of the data symbol, PYlA (y Id), as this completely specifies the noise gen- erator. The maximum likelihood (ML) detector chooses Ii e Q A to maximize the likel- ihood PYiA (y 14'), where y is the ooserved outcome {)f Y. Example 9-3. Suppose that we have additive discrete noise N, so that Y =A + N. Assume A and N are independent and take -0.5. (c) The likelihoods as a function of the observation yare plotted for = ± 1. Note that the ML detector has no preference when the observation is - 0.5 < y < + 0.5. (d) The MAP criterion is plotted assuming additive Gaussian noise. detector will set d =+1 if Y > - 0.5, otherwise d =-1. An error never occurs if A =+1. An error occurs one third of the time that -1 is transmitted, because one third of the time the observation will be greater than -D.5. Hence, the probability of error is PA (-1)/3 = 1112. This is the area of the shaded region Figure 9-3b. 0 Example 9-9. We now find the .ML detector and its probability of error for the same scenario as in the previous example. In Figure 9-3c we plot the likelihoods for the two possible decisions as a function of the observations. The likelihoods are equal in the region - 0.5 < y < 0.5. so the ML detector can make its decision arbitrarily. A legal ML detector sets its threshold at - 0.5, and has performance identical to that of the MAP detector. But another legal ML detector sets its threshold at zero (halfway between the two possibilities); this detector will make an error 1/6 of the time for each possible transmission, so the probability of error is 1/6. 0 The most common distribution for additive noise in digital communications is Gaussian, rather than uniform as in previous examples. The principle of the detectors is the same. Example 9·10. In Figure 9-3d we show the functions f YIA (y I ± 1)PA (±1) as functions of the observations assuming additive Gaussian noise. For the MAP detector, the threshold is selected where these curVes cross. For the ML detector. the threshold is selected at zero. 0 In the next section we will consider the additive Gaussian noise case for the more general situation where the signal and noise are vector-valued. This will model many situations that we encounter in digital communications. SEC. 9.2 DETECTION OF A SIGNAL VECTOR 385 9.2. DETECTION OF A SIGNAL VECTOR Many of the communication channels we describe in this book can be modeled as noise corrupting a vector-valued signal. Although typical channels accept only scalar-valued signals, a convenient vector communication model can often be obtained using the technique shown in Figure 9-4. A vector of transmitted symbols is converted to a sequence of scalars for transmission over an additive noise channel, and then reconverted to a vector at the channel output. In effect we have taken a finite sequence of samples and modeled them as a vector. Example 9-11. In Section 8.2 we fonnulated a vector-valued received signal consisting a set of known signals and additive Gaussian noise. Further, we showed in Section 8.3 that this fonnulation applied directly to the minimum-distance receiver design considered in Chapter 7, if the appropriate decision variables were calculated based on an orthononnal expansion of the subspace of known signals. The PAM slicer design considered in Section 8.3 was a special one-dimensional case. D Example 9-12. A signal generator that results in a binary signal vector appropriate as input to a BSC (Figure 9-2) is a binary block code, to be considered in Chapters 12 and 13. In this case, S is a vector of values "0" or "I". If S is transmitted over a BSC, some of these bits may be inverted. We will show that the ML detector selects the codeword "closest" to the received bits in the sense that the fewest number of bits are different. Binary block codes are often used for error detection and error correction (Chapter 13). D A general model for the situation is as follows. The signal generator accepts an input X and maps it into a vector signal S with dimension M. SYMBOLS NOISE VECTOR TO SCALAR CONVERTER SCALAR CHANNEL SCALAR TO VECTOR CONVERTER OBSERVAliON \~-----------"yr--------_---.J} EQUIVALENT VECTOR CHANNEL Figure 9-4. A scalar channel plus some signal processing can sometimes be modeled as a vector channel. 386 DETECTION Example 9-13. The signal generator might take a set of M consecutive data symbols as the signal vector, S={A j •··• ,AMl. 0 The observation is a vector Y with the same dimension as the signal. The noise generator is spe<:ified by the conditional distribution of the observation given the signal, s f YIs(y Is). The detector decides which signal vector from among aU the possible signal vectors was actually transmitted based on the observation. A common characteristic of the noise generator is independent noise components, by which we mean precisely that M !Yls(yls)= II!Yklsk(Yk Isk ) , k=1 (9.l3) or in words, given knowledge of the signal vector, each component of the noise generation is independent of the others. In the following subsections we will consider first the ML detector (for which the signal generator does not need to be statistically characterized) and then the MAP detector. 9.2.1. Ml Detection s The ML detector chooses the signal vector from among all the possibilities in order to maximize the conditional probability f Y Is(y Is), a probability given directly by the noise generation modeL It is simple in that the statistics of the signal Sneed not be taken into account. Two important examples are the additive Gaussian noise generator and the BSC noise model. Example 9-14. Consider the additive Gaussian noise problem formulated in (8.20), where the complexvalued noise vector Z is assumed to be circularly symmetric with uncorrelated (and hence. independent) components with variance 2cr2. The received signal Y is therefore a complex-valued Gaussian vector with mean equal to s, and hence has the probability density function f YIS 0 ~ k < 00 }. The matched-filter receiver of Figure 9-5 can be applied to this whitened received signal, taking into account the new signal spectrum. Since linear filtering preserves the circular symmetry of Gaussian noise, the white noise is circularly symmetric, and thus the samples of the noise are mutually independent. As also shown in Figure 9-6, the whitening and matched filters can be combined into a single filter, equivalent to that in Figure 9-5 except that the transfer function is normalized by the noise spectrum Sz (z). This normalization is appropriate for a matched filter in nonwhite noise, and the result is an ML detector. WHITENING FILTER MATCHED FILTER lhE, A.G.(z) A.G.,i(t), O~t ~T i=! (9.33) where the functions are orthonormal in signal space, r Ju:·/-J. . (9.35) Under quite general conditions, a set of orthonormal functions {$i {t}} can be found such that (9.35) is satisfied (even for the case where Z (t) is not wide-sense stationary as we assume here). The resulting expansion is known as the Karhunen-Loeve expansion. First, taking the inner product of both sides of (9.33) with /(t) dt . o (9.36) Since Zj is a linear function of a Gaussian process, it is a Gaussian random variable, and further it is circularly symmetric since Z (t) is assumed circularly symmetric. This circular symmetry together with (9.35) implies that the Zj are statistically independent. In Appendix 9-A, it is shown that a necessary and sufficient condition on fj(t)} for (9.35) to be satisfied is T }Rz(t-t}.j(t}dt=cr}.j(t}, l~j<=, O~t~T. o (9.37) Although the left side of (9.37) looks like a convolution, the equality is valid only for the finite time interval 0 ~ t ~ T, so it is in fact not a convolution. This is an integral equation, and, in analogy to similar matrix equations, j (t), OJ' 1 ~ j < oo} satisfying (9.37). This is considered in some detail by Van Trees [1], where it is confirmed that they do exist under rather general conditions. For our purposes, it suffices that the power spectrum SzUro) be non-zero for all ro. Fortunately, in the following we don't actually have to find the eigenfunctions satisfying (9.37); it suffices to know that they exist. Now returning to the original detection problem of (9.32), the approach is to expand the received signal in the same set of orthonormal functions that arise out of the Karhunen-Loeve expansion of Z (t), L Y(t)= Yj j(t) , Yj =sm.j +Zj, 0 ~t ~T . j=1 (9.38) The coefficients Zj are uncorrelated, circularly symmetric (and hence independent) Gaussian random variables. Their variances are not necessarily equal, E [ IZj 12] = o?, and the sm j are the coefficients of sm (t) with respect to the orthonormal basis func- tions j (t ), T I= Sm,j Sm(t)**t(t)dt o (9.39) The Karhunen-Loeve expansion turns the continuous-time received signal into an equivalent discrete-time received signal, at least in a mathematical if not literal sense (since the i in Yj is not time, but an index over the signal-space basis). The continuous-time received signal Y (t) is represented on the finite time interval o~ t ~ T by the countable set of random variables {Yj , 1 ~ i < oo}. We can apply the earlier discrete-time results to this equivalent representation, with the slight complication that the noise samples do not all have equal variance. Assuming that the eigenvalues are all non-zero, this problem is easily circumvented by normalizing the samples by dividing both sides by the known standard deviation, Yj sm j Zj - = -'- + - , 1 ~ i < 00 • OJ OJ OJ (9.40) The Zj faj are all unit-variance Gaussian random variables. The normalization of (9.40) can be considered a form of whitening, similar to the whitening filter applied in Figure 9-6. The normalized representation of (9.40) satisfies all the conditions assumed for the discrete-time case, namely a set of known discrete-time signals with additive white noise variables. We can therefore apply the earlier detector to the normalized received signal Yj faj, where the known signal component is s[,;lOj. Thus, the ML detector minimizes SEC. 9.3 KNOWN SIGNAlS IN GAUSSIAN NOISE 395 E,/2 ~ ~ ~Il ~---+ II g/(t) Y{t) I l R, (a) ---- E,I2 H~~~_Re_{}---'~R, 1 U(t) st'VfJ» .............•.. t =0 ". stu ro) S~~(fJ) (c) Figure 9-7. The ML detector for a continuous-time known signal in additive Gaussian noise. (a) The correlation receiver. (b) The matched filter receiver. (c) The matched filter in the limit as T ~oo. L I t L 00 D1 = y. SI . 2 00 1 y. - sl . 12 _I - _,I = 1 2 ,I j=l OJ OJ i=l OJ (9.41) over all possible signals 1 ~l ~L. As in the discrete-time c.ase, the first term i: rYj r2 will be independent of J, so the ML detector equivalently maximizes the i=l decision variable R( =Re{ :E00 ~ Y.st }- 1 -'£1 ' j=l OJ 2 (9A2) In Appendix 9-A, this result is related to the original continuous-time signals. In particular, defining a function BI (t) that satisfies the integral equation T !Rz(t-'t)BI('t)d't=SI(t), I~L~L, O~t~T, (} (9.43) then (9.42) can be written as T 1 R( = ReUY(t)g/'(t)dt }- Z'E(, o T E1 = jSI(t)g/'(t)dt . o (9.44) 396 DETECTION Example 9-19. If the additive noise is white" Rz('t) =NffS('t), then No" gt(t) =Sl{t)· In that {;ase, the receiver simply crosscorrelates with each of the known signals sJ (t), 1 S l S L. 0 The significance of (9.44) cannot be overstated. It shows that the infinitedimen.<;ional continuous-time received signal can be reduced to a finite set of L decision variables RJ ' IS;;/ S;; L, where L is the number of known signals. R{ consists of a crosscorrelation against gl (t), as shown in Figure 9-7a. As in the discrete-time case, the correlation detector is equivalent to the continuous-time matched filter detector of Figure 9-7b. For the special case of white noise, Example 9-19 establishes that the matched filters in Figure 9-7 are matched to the set of signal waveforms si(t), 1 S;;/ S;;L. If we let T ~ 00, we get a different interpretation and a better understanding of * = gl(t). In that case, assuming that gJ(t) is causal, (9.43) approaches a convolution equation Rz (t) gl (t) = sl (t), or GI U(0) SI Uoo)/Sz U(0). This limiting case has the stet) stu interpretation shown in Figure 9-7c. In the white noise case, the matched filter has impulse response and transfer function (0). In the nonwhite noise case, a different interpretation takes advantage of the fact that Sz(} ill} is positive real-valued. It can be factored as the product of two identical filters, (9.45) Thus, in Fig~re 9-7c the matched filter has been divided into two parts: a whitening filter lISzlh(e 1 roT) that has white noise at the output, and a filter matched to the signal at the whitenin¥ filter ouWut. The signal component at the output of the whitening filter is Sm (ej(J) )/S:;:(e jro ), and the second filter is matched to this new signal. The factorization into whitening and matched filtering is similar to the discrete-time case (Figure 9-6). EI is the energy of the signal s1(1) after it passes through the whitening filter, making it con...<:istent with the white-noise case. The factorization of (9.45) can be replaced by the product of two tenus with equal magnitudes and arbitrary (rather than zero) phases, such that the product is real-valued. For example, we could replace (9.45) by a minimum-phase spectral factorization, similar to the factorizaiion of (9.31) for the discrete-time power spectrum. This vy'OUld have the benefit of explicitly controlling the causality or anticausality of the whitening and matched filters. However, since we are only using this factorization for intuition and not for realization, we will avoid this complication. 9.3.3. Sufficient Statistics In (9.44), define the L complex-valued decision variables T f Vi = Y(t) gt(t )dt, 1 S;; I S;; L , o (9.46) as labeled in Figure 9-7. The ML detector first calculates these L decision variables, corresponding to the L signals, and then chooses / to maximize R I = Re{ VI } - £1/2. SEC. 9.3 KNOWN SIGNALS IN GAUSSIAN NOISE 397 The ML detector is thus summarizing the continuous-time received signal (Y (t), 0 ::;; t ::;; T} by the L random variables {Vi, 1 ::;; I ::;; L }. In the process, it is clearly throwing away a lot of information about (Y (t), 0 ::;; t ::;; T }. The information that is being thrown away is considered irrelevant by the ML detector. The overall goal of this subsection is summarized in Figure 9-8. Starting with the received signal (Y (t), 0 ::;; t ::;; T }, the Karhunen-Loeve expansion coefficients {Yi , 1 ::;; i < oo} give an equivalent representation. This countable set of random variables is much easier to deal with analytically, but remains impractical for implementation because of the infinite number of variables. However, the ML detector further reduces the received signal to L ML decision variables {Vi' 1 ::;; I ::;; L }, where L is the number of known signals. For purposes of implementation, this finite set of decision variables is a dramatic improvement. Conceptually, however, if the dimensionality of the subspace spanned by the signals is less than L, then based on the experience of Chapter 8 we would expect that a number of deci~ion variables equal to the dimension of the subspace would suffice. In fact, it will now be shown that the received signal can be represented by a set of N sufficient statistics {Uk> 1 5. k 5. N}, where N 5. L, and N will be defined shortly. The sufficient statistics summarize {Y(t),O::;;t ::;;T} for purposes of detection of (Si(t), I ::;;/5.L,05.t 5.T}. The N sufficient statistics can be used for purposes of ML detection. This reduces the number of decision variables that must be dealt with in the implementation of the ML detector. Remarkably, the sufficient statistics can be relied upon for the detection of the known signals (Si (t), 1 5.1 5. L, 0 5. t 5. T} for any criterion of optimality, not just the ML criterion. For example, they would serve equally well as the starting point for MAP detection. The intuitive basis for sufficient statistics is that they retain all the information in the received signal that is relevant to the detection of the known signals (Si (t), 1 5.1 5. L, 0 5. t 5. T} and discard only information that is irrelevant. As we will also show below, the sufficient statistics {Uk> 1 5. k 5. N} can be obtained from the ML decision variables {Vi' 1 5. I 5. L} as pictured in Figure 9-8, and therefore all the information in the sufficient statistics must also be included in the ML decision variables. Thus, the ML decision variables are themselves sufficient statistics, and could be used by any detection criterion (not just the ML criterion) without RECEIVED SIGNAL KARHUNEN·LOEVE COEFFICIENTS EQUIVALENT (Y (t), 0 ~ t ~ T)·--------------------·-------·.., (Yi , 1 ~ i < 00) i ,j. SUFFICIENT STATISTICS ML DECISION VARIABLES (Uk, 1 ~ k ~ N ) .------------------------------ (VI, 1 ~ I ~ L ) Figure 9·8. The progression from the received signal Y (t) through a progression of deci- sion variables, all of which retain all relevant information for detection of the known signals. 398 DETECTION compromising petformance. It will now be shown that {Uk' 1 ~ k ~ N} represent a sufficient statistic, and in the process make {;oncrete the definition of sufficient statistic. This is done here for the limiting case of T ~ 00, and a more rigorous (and complicated) argument based on the Karhunen-Loeve expansion is given in Appendix 9-A. Returning to Figure 9-7c, letll(t) have Fourier transform = . SIUro) FI(j(fj) Ih ' Sz VOl) (9.47) where SIUro) is the Fourier transform of the pulse Si(I). The signal fz{t) is the response of the whitening filter lIS~hU (0) to s1 (t). Moreover, the second half of the matched filter in Figure 9-7c is matched to It (t); that is, it has impulse response It( - t). Assume the {II (t), 0 ~ t < 00, 1 ~ I $ L} span a subspace Mf of signal space of dimension N $ L, and let {'11k (t), 1 $ k < oo} be a complete set of orthonormal functions {;hosen so that the first N, {\Ifk {t), 1 $ Ie ~ N }, serve as a basis for Mf' Then we can write = J N 00 II(t) L FI,k"'k(t) , FI,k = fl(t)",;(t)dt . k=l 0 (9.48) In the following, we wiii represent U(t), the output of the whitening filter in Figure 9-7c, in terms of the basis {\Vk(t), 1 $ k < oo}, and show that oniy the first N coordinates are relevant to detecting (s: (1), 1 ~ 1 $ L > 0 $ t $ T }. The components of U (t ) with respect to the new basis are o Uk = fU(t)",;(t)dt, 1 $k $00. Substituting for U (t) from U (t) =f I (t ) + W (t ) where W (t) is white noise with ullit variance, (9.49) (9.50) Uk = JII(t)",;(t)dt + JW(t)",;(t)dt o 0 _ { F l,k + Wk, 1 c 1 }, (9.54) where sm (t) is drawn from a set of of L complex baseband signals, and N (t) is additive stationary real-valued Gaussian noise with autocorrelation RN ('t) and power spec- tral density SN U(0). Assume as well that gm (t), defined by (9.43), is written in terms of a complex baseband representatron, gm(t)=v2Re{gm(t)e j Cl>c1 }. (9.55) Then substituting (9.54) into (9.43), T IRN(t -'t)v2Re{gm('t)e j Cl>c t } d't =v2Re{sm(t)e j Cl>c1 }, O::;;t::;;T. (9.56) o Recognizing that RN('t) is real valued, since N (t) is real valued, we can rewrite (9.56) as T v2Re{ejCl>ctfRN(t -'t)e-jCl>c(l-t)gm('t)d't} =v2Re{sm(t)e j Cl>c t }, (9.57) o for 0 ::;; t ::;; T. Thus, we can recast the integral equation in terms of baseband signals SEC. 9.3 KNOWN SIGNALS IN GAUSSIAN NOISE 401 as fT RN(t - 't)e - jooc(t --r)gm('t)d't= Sm(t) , 0 ~ t ~ T . a In partkular, a!; T ~ 00, (9.58) (9.59) Recognizing that "2Re(gm(t)e jOOct } ="2Re(g~(t)e-jOOct}, (9.60) the sufficient !;tati!;tics become 00 Vm = [Y(t)gm(t)dt = [Y(t)"2Re{g;(t)e- jOOct }dt b b (9.61) ="2Re{ fY(t)e-jOOctg;U)dt}, o Another set of sufficient statistics for the received signal is clearly 00 Vm = fY(t)e-iOOc'g':(t)dt, 1 ~m ~L. o (9.62) g; This is illustrated in Figure 9-9a, where the detector first demodulates and then crosscorrelates with the complex baseband waveform (t). An equivalent matchedfilter realization is shown in Figure 9-9b. This is precisely the matched filter receiver considered in Chapter 7, except that the matched filter response is normalized by the power spectrum of the channel noise, translated from passband to d.c. because of the demodulator. \Vhen the noise is white, the structure specializes to precisely the receiver front end that arose out of the minimum-distance criterion in Chapter 7. To generate fewer sufficient statistics, we can first whiten the received signal as shown in Figure 9-10, generating a new complex baseband received signal U (t) that contains complex-valued white Gaussian noise. A matched filter is then applied, where the transfer function 'Pk V ro) is obtained as follows. First, ignoring the double-frequency and noise terms, the complex baseband signal at the output of the whitening filter has Fourier transform SmVro) n F Vro) - m - SJ/;,U(ro + roc .(963) Then an orthonormal basis ('I'k (t), 1 ~ k ~ N } for the subspace spanned by {Im (t), 1 ::; m ~ L } is chosen. This result generalizes the ML detector results earlier, demonstrating that optimal detector structures for all criteria of optimality in additive stationary Gaus-sian noise can share a common receiver front end consisting of demodulator and a bank of 402 DETECTION Y(t) (a) T I VIII Y(/) SAMPLER 1=0 Figure 9-9. Generation of L sufficient statistics for a passband received signal Y (I). (a) Cross-correlator and (b) matched filter realization, valid as T --+ 00. Y(/)-~ 1 U(/) ==~ SAMPLER 1=0 Figure 9-10. A detector front-end that generates a set of N sufficient statistics for the passband received signal. L or N crosscorrelators or matched filters. Applying a given criterion is then a matter of characterizing the statistics of the resulting finite set of decision variables, and working out the optimal processing of those variables based on their statistics. 9.4. OP1-IMAL INCOHERENT DETECTION In Chapter 6, FSK was presented as a modulation technique suitable for transmission over channels that cause rapidly varying carrier phase. One of the major advantages of FSK is the ability to incoherently detect a signal, without deriving the carrier phase. Intuitively, this can be accomplished by realizing a set of bandpass filters, one centered at each of the known signal frequencies, and measuring the power at the output of each filter. The question arises, however, as to the optimal detection technique where the carrier phase is unknown. We will now derive the optimal incoherent detector, applying directly the results of Section 9.3. Assume that the carrier phase is random, with the goal of rederiving the ML detector. The received signal is now of the form SEC. 9.4 OPTIMAL INCOHERENT DETECTION 403 Y(t)=..J2Re{sm(t)e i (COc t+9)}+N(t). ISm SL. (9.64) where e is assumed independent of the signal. and the noise N (t) is white and Gaus- e sian. In the absence of any other relevant information. we can assume that is uni- formly distributed over the interval [O,21t]; this is also the most tractable choice analytically. leading to a simple result. The general approach to determining the ML e = detector is to first condition on knowledge of 9. and then average over 9. Assume that the {sm (t), 1 S m S L} span a subspace of dimension N. and that this subspace has an orthonormal basis {**n (t). I S n S N }. If the carrier phase is e = known to be 9. then a sufficient statistic is Vn = fY(t)e-icoct:(t)dt. 1 Sn SN. o (9.65) as in (9.62). Incorporating.~hase 9 in the calculation of the sufficient statistic would simply multiply Vn by eJ • but not add any additional information. Substituting (9.64) into (9.65). and observing that the 2coc term will integrate to zero. we can express the sufficient statistics as an N-dimensional vector. V =ei9 8m + Z. (9.66) where 8m is a vector of the coefficients of sm (t) with respect to the orthonormal basis and Z is a vector of independent circularly symmetric Gaussian random variables. The effect of the unknown carrier phase 9 is to shift the signal component of V by phase 9. To determine the ML detector. we must determine the probability density func- tion of V conditioned on signal M being transmitted. As the first step. we find the e. p.d.f. of V conditioned on both M and the phase f v 1M 9(v 1m .9). This is a multi- dimensional Gaussian density function. given by fVIMe' where c? =No is the variance of the real or imaginary part of the Gaussian noise. This formidable expression will be made yet more formidable by finding f V IM (v 1m) by integrating out the dependence on 9. But do not despair - the end result is sim- ple! It is useful to derive first the following simple result. Exercise 9·1. Define the modified Bessel function of zero order. 1 It Io(x)= 27tlexp{XCOs(e)} de (9.68) for a real-valued x. Show that for a complex-valued z. 404 DETECTION fIt !(j{lz1)= _1 exp{ Re{ ei9 z· J Jde 2rc -It IllNT: Write z in polar coordinates. 0 {9.es) Using this result, we can find the marginal density of the received signal by integrating against the density function of 8, 7t f fV1M{v!m)= fVIM.e{v~m,e)f8(e)de -1t [ 1 ,. - [1 - ] = (2rc1f (J2N exp - 202 (11 v 11" + 11 8m 1l",.',)1J 10 (12 I I . (9.70) The final result will exploit a property Qf the Bessel function, that it is monotonic in its argument. Its precise shape is irrelevant. Th~ result is particularly simple when each signal has the same energy; that is, when 118m II is a constant. Then the exponential term in (9.70) is independent of m. From the monotonicity ofIo(x), the ML receiver selects m to maximize f N 00. Km = I1 = 1 L vns';:.n ,,,=1 I= I Y(t)e-JOleJ s;(t)dt 1 . (9.71) This is the simple fann that was promised. A receiver structure t{} compute Km is shown in Figure 9-11. Instead of correcting the matched filter output for the phase. as would be done if the phase were known, the receiver simply determines the magnitude of the matched filter output, throwing away any phase information. Example 9-21. . . = = For binary FSK, Sl(t) e - JCO"t and S2(t) eJCO"t, where 2rod is the deviation between the two signals. The optimal receiver calculates the quantity 1f 1 ~12 2 =j[ Y(t)e -j(ro,,±ro..)1 dt J Y(t)cos(ro, ± rod)t + l[JY(t)sin(ro, ± rod)t d,]:9.72) o LO . J0 J as shown in Figure 9-12. Since the signal energies are the same, and since the Bessel e -jm.,t MATCHED FILTER K", Figure 9-11. The optimal incoherent receiver uses a matched filter (a correlator could be used also) and throws away phase information. SEC. 9.4 OPTIMAL INCOHERENT DETECTION 405 function is monotonic, K 1 and K 2 given by (9.72) are compared and the signal corresponding to the maximum chosen. Intuitively, since the phase of the signal is unknown, we must correlate against two quadrature sinusoids, since we are then assured of a strong correlation for any signal phase for one or the other sinusoid phases. This receiver is also equivalent to passing the received signal through two filters ej(Ol,±Ol,}l , -T $, t$,O g (t ) _{ - 0, otherwise . (9.73) These are roughly bandpass filters centered at roc + rod and roc - rod' the two transmitted frequencies. The filter outputs are each followed by an envelope detector. This structure was previously shown in Figure 6-48, where it was justified on intuitive grounds. 0 The calculation of the probability of error for an incoherent detector is rather involved, and each case is best treated individually. The starting point is substituting (9.66) into (9.71), so that the decision variable becomes, conditioned on transmitted signall, I I . Km = e j9*~~ + We can illustrate the probability of error calculation using FSK. (9.74) Example 9-22. For FSK the two signals are orthogonal (with the proper choice of frequency deviation), and thus <8 1,82> =O. Hence, the decision variables of (9.74) become, assuming 81 is transmit- ted, I . 12 K 1 y(t) MAX Figure 9-12. The optimal incoherent receiver for a binary FSK signal using correlators. 406 DETECTION I I. K 1 = ei9 11 St ll2 + K 2 = I I . The probability of error becomes. conditioned on St trmsmittoo, P(errorJSttransmittedl=Pr{ ~~ > ~ei9HSIH2+I} (9.75) (9.76) This probability is in fact independent ofe, and by symmetry is the same as the error probability conditioned on Sz being transmitted. The two random variables 00 the left and right sides are not independent. The evaluation of the result is rather involved, and leads to an expression in terms of a tabulated function known as "Marcum's Q function". 0 For more examples of the calculation of the probability of error, the interested reader is referred to f2,3]. 9.5. OPTIMAL DETECTORS for PAM WITH ISJ In Chapter 7 a receiver for PAM with lSI was derived using a minimum-illstance criterion. In Chapter 8, for white noise on the channel, the symbol rate noise samples at the output of the front-end filter were shown to be white. The front-end filter was therefore called a whitened matched filter (WMF). The variance of this noise was calculated. and the probability of a sequence error (one or more data symbols in error) was calculated. In this section, we extend these results in several ways: • We show that the WMF output is a set of sufficient statistics for the received signal. Thus, a receiver designed with respect to any criterion of optimality can share this same WMF front end. • We show that the minimum-distance receiver design of Chapter 7 is, as expected, the M1. detector for a set of M K known signals consisting of a sequence of K data symbols. As a result, we <:aU this receiver the ML sequence detector (MLSD). • We extend these results to nonwhite noise. and in particulm- show that the WMF can be reformulated for this case. WMF Outputs as Sufficient Statistics As in Section 7.3, assume that the received signal consists of a finite sequence of K data symbols. each with the same alphabet with size A1, modulating a basic complex baseband pulse h (t), L Y(t) =v'2Re{ K akh(t _kT)e 1'bleJ } +N(t). (9.77) k=! For the moment assume that the noise is white, SN (jill) =No, as well as zero-mean and Gaussian. Then we can consider (9.77) as consisting of a signal portion drawn from a set of L =M K known signals. together with additive white Gaussian noise. In Section 9.3, we established that a set of sufficient statistics can be generated by demo- dulating and correlating with the complex conjugate of each possible complex SEC. 9.5 OPTIMAL DETECTORS for PAM WITH lSI 407 e -jrA,1 MATCHED FILTER WHITENING FILTER Y(t) H°(j ro) SN U(ro + roc)] ==~SAMPLER 1 A"2... . G"°... (lIz ") Wt (a) t =kT at G" ... (z) Wt (b) Sz(z) = - 1 2 - A" ... Figure 9·13. A whitened matched filter for PAM and nonwhite noise. (a) Complete front end of the receiver, and (b) equivalent discrete-time model. baseband signal, where J L = L ~ K . Y (t) ak* h * (t - kT)e -JCJ)c f dt K a;Uk k=l k=l (9.78) (9.79) This correlation has to be repeated for all L = M K sequences of data symbols {ak ' 1 ~ k ~ K }. In practice, calculating L = M K correlations is not feasible as K gets large, but fortunately, (9.78) can be generated from the K decision variables Uk' 1 ~ k ~ K in (9.79). These K variables summarize the received signal from the perspective of calculating (9.78). The {Uk' 1 ~ k ~ K} are themselves sufficient statistics. This reduces the number of sufficient statistics from M K down to just K. These K sufficient statistics are the outputs of K correIators against h *(t - kT), 1 ~ k ~ K. As shown in Figure 7-9, these K correlators can be replaced by a single matched filter, matched to h (t), followed by a sampler at t = kT for 1 ~ k ~ K. Thus, we conclude that a receiver structure consisting of a demodulator followed by a filter matched to the complex baseband pulse h (t) followed by a symbol-rate sampler generates a set of sufficient statistics for the received signal detection. This result is easily generalized to nonwhite noise using the results of Section 9.3, specifically the asymptotic results as T -7 00• In this case, the output of the demo- dulator is first whitened by filter lISN"hU (ro + roc )], and the set of known signals is replaced by the set of known signals as modified by this whitening filter. Equivalently, the matched filter can be replaced by a filter matched to h (t) normalized by SN U(ro + roc )]. The resulting front end that generates the sufficient statistics { Uk , 1 ~ k ~ K} is shown in Figure 9-13a. The noise samples at the sampler output 408 DETECTION are not white, but rather have power spectrum lH(j{ro+m' 21t »j2 r)] Sh~ (e jOOT ) =lT.. ~~ T 21t m = -00 SNU(ro+ roc +m . {9.8B) This is similar to Sh (z) derived in Chapter 7, (7.36), except that H Uro) is normalized by SNU(W + Wc )]. As in Chapter 7, we can invoke a minimum-phase spectral factori- zation tD Vt'li.te . Sh,n(Z) =Al~ Gh,n(z)Gh·,n(l/z·) (9.81) Al where n is a positive constant and Gh,n (z) is a monic minimum-phase transfer function. In Figure 9-13a a maximum-phase whitening filter is added to the output of the sampled matched filter, yielding an output noise process Zk that is white, Gaussian, and circularly symmetric, with variance l!.4h~n' The resulting discrete-time channel model from input symbols to WMF outputs is shown in Figure 9-13b. Example 9-23. The front end of Figure 9-13 reduces to the WMF derived in Section 7.3 when the channel noise is white, SN Uill) = No. For this case, SII", (e JOOT) = Sk (eJooT)/N(p where Sh (e JOOT ) is the folded spectrum, and thus Sh(Z) Al ...... Sh,n(Z)=--=-.(;h(Z)Gh(l!Z). No No (9.82) The o Z/c thus have variance lIAl,n = No /Ah2• consistent with that determined in Section 8.6. A receiver for detection of the data symbols can be safely based on the WMF front end of Figure 9-13a regardless of what criterion of optimality is applied. This result is quite remarkable when we consider that symbol-rate sampling at the matched filter output is generally at 1~""S than the Nyquist rate, so aliasing of both noise and signal is inherent in this sampling, This aliasing will not compmmise the perfcrmance of the receiver as long as the filter before the sampling is a matched filter. There are, however, practical concerns with this receiver structure that will be addressed in Chapter 10. Maximum-Likelihood Sequence Detector The sufficient statistic argument allows us to use the front end of Figure 9-13a for any detection criterion, so we will now apply it to the ML detector. With the ~ front end, the equivalent discrete-time model of Figure 9-13b can be used as a starting point for application of the ML criterion. In particular, the ML detector for this e.quivalent discrete-time model was developed in Section 9.3. It chooses the sequence of data symbols that minimizes the Euclidean distance SEC. 9.5 OPTIMAL DETECTORS for PAM WITH lSI 409 00 K min L L IWm - akgh,rn-k 12 . (ak ,ISo k So K} m=l k=l (9.83) This is precisely the minimum-distance receiver design of Chapter 7, and thus that receiver design is equivalent to the detector using the criterion of maximizing the likelihood of the received signal conditional on a sequence of data symbols {ak ,ISo k So K }. Since an entire sequence of data symbols is detected at once, this detector is called the maximum-likelihood sequence detector (MLSD). If all sequences are equally likely, the MLSD minimizes the probability of making one or more errors in a sequence of data symbols. That is, the criterion penalizes the detector equally for making any number of detection errors. We have now derived the MLSD criterion of (9.83) in two ways. First, in Section 7.3 it was shown that the discrete-time criterion of (9.83) is equivalent to a continuous-time minimum-distance receiver design, in the sense that both criteria will choose the same sequence of data symbols. Second, in this section, using the argument that the WMF forms a sufficient statistic for the received signal detection, and also using the white noise property of the WMF output, we have shown that the criterion of (9.83) is optimal in the ML sense. Combining these two facts, we arrive at the conclusion that minimum-distance receiver design is optimal in the ML sense for PAM with lSI on the white Gaussian noise channel. 9.6. SEQUENCE DETECTION: THE VITERBI ALGORITHM In Section 9.3 we derived the ML detector for the case where the reception is one of M signals. In many cases, the dimension N of the subspace of signals is too large to make the correlation (or equivalent matched filter) receiver practical. An example of this is the MLSD of Section 9.5, in which the signal corres.fonds to a sequence of K data symbols, and if they have alphabet size M, there are M total possible signals. The computation is therefore exponential in time (because time is proportional to K). A practical realization requires that the computation be linear in time, since that corresponds to a fixed computational rate. More generally, the coding techniques covered in Chapters 13 and 14 use a signal set of high dimensionality as a way of combating noise, and would seem to suffer the same exponential dependence of computation on time. Fortunately, there is a way to achieve a computational load that is linear in time in many practical situations, including coding as discussed in Chapters 13 and 14. Namely, we can impose some structure on the set of possible transmitted signals so that more sophisticated algorithms, equivalent to the correlator or matched filter but of much lower complexity, are possible. In particular, in this section we will assume a discrete-time model, and the signal-generation model is a finite-state machine (FSM). For this signal-generation model, the ML detector complexity can be reduced dramatically using a dynamic programming algorithm known as the Viterbi algorithm, originally proposed by A. Viterbi in 1967 [4]. 410 DETECTION 9.6.1. Finite-State Machine Signal Generator The Viterbi algorithm is applicable when the following properties hold: • The signal is generated by a finite-state machine (FSM). • The noise component in each sample is independent. • A ML criterion is used, maximizing the sequence likelihood. The output of an FSM driven by independent inputs is a homogenemIS Markov chain (Section 3.3). This view of the signal generator output is a useful alternative viewpoint. Let '¥k be the state sequence of a homogeneous Markov chain (Section 3.3). The sample space of each state'llk is finite. For the signal generator of interest, the signal samples are a function ill the Markov chain state transitions, Sk =g('¥k''¥k+l) ' (9.84) where g C·) is a memoryless function. The objective of our ML and MAP detectors will be to detect the sequence of states given an observation sequence Yk , which is Sk perturbed by independent noise components. A special case that suits all {}Uf applications is the shift-register process of Exam- ple 3-13, reproduced in Figure 9-14 with a noise generator. Assume the inputXk is a sequence of LLd. random variables with a finite sample space. The state of the Mar- = kov chain is 'Pk [Xk - 1, Xk - 2, ••• ,Xk - J ], where J is the length of the shift-register. Example 9-24~ When the received signal is PAM with lSI. and the front-end filter is the WFM of Figure 9-13a, the resulting discrete-time channel model of Figure 9-13b is in the form of a signalgeneration model followed by a noise generation modeL The signal generation model is a filter GJt,lJ (z) driven by the data symbols t4. and the noise generation model is additive complex-valued Gaussian noise with independent samples. The signal-generation model is in general not an FSM, unless the filter is FIR. If Gh ,1l (z) is FIR, '1'1 r-- ----"A'- ~, "'--12h I NOISE GENERATION ...---Y_k-cp MODEL Figure 9-14. A shift-register process with an observation function and noise generator. SEC. 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 411 then the FSM has state J Gh,n(Z) = ~gh,kz-k, k=O (9.85) (9.86) (Gh,ll (z) is FIR if and only if Sh,ll (z) is an all-zero filter, or equivalently onl}' a finite set of translates h (t - kT) after whitening are non-orthogonal.) There are then M J if the signal alphabet has size M. This lSI channel model is an example of a shift-register process, where the observation function is J g('l'k,'Pk+l)= 'r.gh,iak-i . i=O (9.87) In addition, the independent noise samples at the output of the WMF satisfy the assumptions required for application of the Viterbi algorithm. 0 Example 9-25. A simple example that we will carry along for illustrative purposes is the lSI model = gk Ok + O.5·0k _1, (9.88) as shown in Figure 9-15. Using the notation of Figure 9-14, this is the shift-register process shown in Figure 9-15a. If the input symbols are U.d., the observation Yk is a noisy obser- vation of the Markov chain with state transition diagram shown in Figure 9-15b. We have assumed binary inputs Ak , so that there are only two states, corresponding to the two possi- ble values for Ak-io The arcs are labeled with the input/output pair (Ab Sk) of the signal generator. 0 The Markov chain signal generator, and the shift-register process in particular, also model an important coding technique considered in Chapter 13. Although it is prema- ture to talk about coding here, we will nevertheless present this as another example of a signal generation model. (1,1) (0,0) (1,1.5) (0,0.5) (a) (b) Figure 9-15. a. The 151 signal generator of Example 9-25 is a shift-register process, which is a Markov chain when the data symbols Ak are Li.d. b. The state transition diagram for the Markov signal generator assuming binary input symbols. The arcs are labeled with the input bit/signal output pair (Akt Sk)' 412 DETECTION Example 9-26. Convolutional coders introduce redundancy so that random errors occurring in the transmission of a data sequence can be corrected. An example of a convolutional coder is shown in Figure 9-16a. In this case the input Xk is binary, usually the bits to be transmitted. The signal Sk is a binary channel input (Sk E {O,l». Two symbols are generated for each data bit X". namely S2:k and S2k+l' so that Sk are actually transmitted at twice the bit rate of the input bit stream Xk • The noise generator often assumed in this case is the BSC with independent noise components. Assuming that the input sequence is Li.d., we can define = the state of a Markov chain to be 'Pk TXk- I , Xk- Z]. The state transition diagram is shown in Figure 9-1&. There are foor states {;Qrresponding tQ the two past input bits Xk- I and X"-2' D One of the characteristics of the output Sk produced by the Markov signal generator is redundancy. In Example 9-25, the redundancy occurs because Sk takes on four possible levels (0.0,0.5,1.0,1.5) even though only one bit of information is carried. The lSI which introduces this redundancy is assumed to be an undesired MODULO 2 ADDITION .--~'--------, l---·4T: MERGE MO ONE \ : BIT STREAM S2k I \1 I !I BINARY SVMMETRIC CHANNEL 1-J S2k+! (a) (0,[0,0]) (U.11,I]) (b) (1,[1,0]) (1,[0,1]) F~. 946. a. A rate ~ convolutional coder feeding a binary symmetric channel (asc), which randomly (with probability p) inverts bits. b. The state transition diagram. The arcs are labeled with the input bit and the pair of output bits (XI;, [S 2k. S2k+1])' SEC. 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 413 property of the channel, although there are situations where it is introduced deliberately - for example with partial response in Chapter 12. In Example 9-26 the redundancy occurs because two bits are transmitted through the BSC for every bit of information, with the goal of mitigating the effect of errors introduced on the BSC. Although the form and function of the convolutional coder is very different from the lSI channel, the principles of sequence detection developed in this section apply equally to both cases. In the remainder of this section we will discuss the technique of sequence detection, and leave detailed discussion of its applications to Chapters lO, 12, 13, and 14. 9.6.2. The Trellis Diagram The state transition diagrams in Example 9-25 and Example 9-26 are traditional representations of Markov chains. D. Forney suggested in 1967 a valuable alternative representation called a trellis diagram [51, which shows the possible progression of states over time. Example 9-27. The state transitions of Example 9-25 are shown in the trellis diagram in Figure 9-17a, sub- = = ject to the starting and ending conditions \{Io 0 and \{IK O. Each small circle is a lUlde of the trellis, and corresponds to the Markov chain being in a particular state at a particular time. Each arc in the diagram is called a branch, and corresponds to a particular state transition at a particular time. Thus, the single node at the left indicates that the MarkoY chain begins in state '1'0 =0 at time k = O. The next state can be either state '1'1 = 0 or state '1'1 =1, so transitions to both are shown. After time k =1, the Markov chain for this example may branch (transition) from any node (state) to any other node (state), until it reaches the termi- nal node of the trellis in state 'ilK =O. Each branch in the trellis corresponds to one state transition that is triggered by a particular input X" and produces the output S", and thus there is a one-to-one correspondence at time k between a branch, the state transition, and ::: .. :>:<7 k=O k=} k=2 0~>Z:>:<: k=K-} k=K 0 (a) (b) (1,1.5) Figure 9·17. (a) A two-state trellis illustrating the possible state transitions of the Markov chain in Example S-25, assuming the initial and final states are zero. (b} On& st~ of the two-state trellis is shown labeled with the input and output pairs (Xj;o Sj;) corresponding to the state transition. 414 DETECTION both the input and output of the signal generator. One segment of the trellis is shown in Example 9-27b with the input and output pairs (Xt'st) labeled for each transition. 0 A sequence of branches through the trellis diagram from the beginning to terminal nodes is called a path. Every possible path corresponds to an input sequence Xk' 0 $; k $; K. The goal of a detector, based on the observation of Sk corrupted by noise, is to decide on the sequence of inputs. Deciding on a sequence of inputs is equivalent to deciding on a path through the trellis diagram. The detector in this case is called a sequence detector, since it is simultaneously deciding on an entire sequence of inputs (or a path through the trellis) rather than deciding on one input at a time. 9.6.3. ML and MAP Sequence Detectors Our goal is to design a MAP or ML detector for the state sequence 'I'k' input Xk , output signal Sb or path through the trellis diagram (all of which are equivalent), based on the noisy observation sequence Yk • In principle we have already solved this problem, since this is an example of the vector signal generation model. Example 9-28. In the Gaussian noise case the ML detector chooses the signal f K t that minimizes the squared Euclidean distance between the observation and the signal, L IYk - f k 12, where Yk is the k=O observed outcome of the random variable Yt . 0 Example 9-29. K The ML detector for the BSC minimizes the Hamming distance L dH(Yt,!k)' where k=O dH (u ,v) is the Hamming distance between u and v. (For the convolutional coder of Figure 9-16, there are actually 2K +2 bits generated for K +l input bits, so the upper limit of the summation should be 2K+1.) 0 We can relate these results back to the trellis diagram. Recall that there is a signal sk associated with each branch of the trellis at each stage k of the trellis. For each stage k there is also an observation Yk' After observing Yk' we can assign to each branch of the trellis a numerical value called the branch metric that is low ifYk is close to Sk and high otherwise. For the Gaussian case the appropriate branch metric is = branch metric IYk - Sk 12 , (9.89) and for the BSC it is = branch metric dH (Yk>sk) . (9.90) Then for each path through the trellis, we can calculate the path metric, which is the sum of the branch metrics. The preferred path will be the one with the lowest path metric. In Appendix 9-B we generalize this to any noise generator with independent noise components, and also generalize to the MAP detector. In each case the objective is to minimize a path metric that is the sum of branch metrics, where the only SEC. 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 415 difference between the detectors is the formula for the branch metric. The ML or MAP detector first calculates the branch metrics for every branch in the trellis diagram. It then calculates the path metric for every path in the trellis diagram, and chooses the path for which this path metric is minimum. The detected input sequence is then the sequence corresponding to this path. This straightforward approach of exhaustively calculating the path metric for each and every path through the trellis will clearly fail in practice because the number of paths grows exponentially with K. Usually K will be very large, corresponding to the entire time that communication takes place (usually minutes, hours, or even decadesO. The Viterbi algorithm is a computationally efficient algorithm that exploits the special structure of the trellis to achieve a complexity that grows only linearly with K, or in other words, requires a constant computation rate (per unit time). Consider one node in the trellis diagram, and all paths through the trellis that pass through this node. Example 9-30The particular case of two incoming branches and two outgoing branches is shown below: ~ ~o~ ----0---. The incoming branches are labeled A and B • and the outgoing branches are labeled C and D. There are a large number of paths passing through this node (increasing exponentially with K), but all these paths follow one of just four routes through this node, AC , AD , Be , andBD. 0 The path metric for a particular path through the node is the sum of the partial path metrics for the portion of the path to the left and the portion to the right of the node. Among all possible partial paths to the left, the detector will always prefer the one with the smallest partial path metric, called the survivor path for that node. We can immediately remove from consideration all partial paths to the left other than the survivor path, because any other partial path to the left has by definition a larger partial path metric, and if it replaced the survivor path in any overall path, the path metric would be larger. This is the basis of the Viterbi algorithm, which allows us to reject many possible paths at each stage of the trellis. The Viterbi algorithm finds the path with the minimum path metric by sequentially moving through the trellis and at each node retaining only the survivor paths. At each stage of the trellis we do not know which node the optimal path passes through, so we must retain one survivor path for each node. Whe-n wee reach the tenninal node of the trellis, we find the optimal path, which is the single survivor path for that node. The algorithm thus determines, at each time increment k , the survivor path for each of the N nodes. The trick, then, is finding these N survivor paths based on the information developed up to time k - 1. This is pictured below for the case where there are two incoming branches to a given node at time k: 416 DETECTION k-l k - - - - -...0 ............... SURVIVORS ---.... UPTO ~O T1MEk-l - - - - -...0 The only incoming paths to a node at time k that are candidates to be survivors are those consisting of survivors at time k - 1 followed by branches to time k. (The number of such candidates is equal to the number of incoming branches to that node.) We therefore determine the partial path metrics for each of those candidate paths by summing the partial path metric of the survivor at time k - 1 and the metric of the branch to time k. The survivor path at time k for a given node is the candidate path terminating on that node with the smallest partial path metric. We must store, for each node at time k, the survivor path and the associated partial path metric, for the algorithm to proceed to time k + 1. We will illustrate the Viterbi algorithm with an example. Example 9-31. The trellis shown in Figure 9-18 is marked with branch metrics corresponding to the observation sequence {0.2, 0.6, 0.9, 0.1} for the additive Gaussian noise case of Example 9-25. The path metrics IYk - Sk 12 are labeled in Figure 9-18. A simple ML slicer (not a sequence detector) would decide that the transmitted bits were {O, 1, 1, O}, but the ML sequence detector takes into account knowledge of the lSI and selects {O, 1, 0, O}. An iterative procedure for making this decision is illustrated in Figure 9-18. The survivor paths at each node and the partial path metric of each surviving path are shown. 0 The computational complexity of the Viterbi algorithm is the same at each time increment, except for end effects at the originating and terminating nodes, and hence the total computational complexity is proportional to the length of time K. One practical problem remains. The algorithm does not determine the optimal path until the terminal node of the trellis; that is, it does not reach a conclusion on the entire ML or MAP sequence until the end of the sequence. Further, while the computation at each step is the same, the memory required to store the survivor paths grows linearly with time. In digital communications systems, sequences may be very long, and we cannot afford the resulting long delay in making decisions nor the very large memory that would be required. It is helpful if at some iteration k, all the survivor paths up to iteration k - d coincide, for some d. Example 9-32. In Example 9-31, when k ~ 2, all the survivor paths coincide from k =0 to k =1. The ML or MAP detector decision for the first state transition can be made when k = 2. It is not necessary to wait until the terminal node of the trellis. 0 When all the survivor paths at some time k coincide up to some time k - d, we say that the partial paths have merged at depth d, and we can make a decision on all the inputs or states up to time k - d. Unfortunately, we cannot depend on the good fortune of a merge, as it is possible for no merges to occur. It is usual therefore to make a modification to the algorithm by forcing a decision at time k on all transitions prior SEC. 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 417 TRELLIS (ALL PATHS SHOWN): k = 0 0.04 k = 1 0.36 k = 2 0.81 k = 3 0.01 k = 4 'I'=0~7 0.64 0.16 0.01 '1'=1 0~~0.16 0.81 0.36 OBSERVATIONS: 0.2 0.6 0.9 0.1 VITERBI ALGORITHM (ONLY SURVIVORS ARE SHOWN): ~0.04 0.04 0.64 t t PATH METRIC BRANCH METRIC 0.64 0.04 0.36 o>----+t- ....o:>o:~,....--+----< 0.40 0.16 ~ 0.20 0.04 0.36 0.36 0.41 0.04 0.01 .;Cr-+----o 0.37 DECISION: o 1 o o Figure 9-18. A two-state trellis with the branch metrics of the transitions marked and the Viterbi algorithm illustrated. The Viterbi algorithm iteratively finds the path with the minimum path metric without ever considering more than two paths at once. to time k - d, for some truncation depth d. The usual approach is to compare all the partial path metrics for the N partial paths at time k, and note which one is the smallest. The decision on the input or state transition at time k - d is then the transition at k - d of this survivor path. Since the decision has been made, there is no need to store the survivor paths beyond a depth of d transitions in that path. If d is chosen to be large enough, this modification will have negligible impact on the probability of detecting the correct sequence. 9.6.4. Error Probability Calculation We have already analyzed in Section 9.2 the probability of error in a vector detection problem. If the entire sequence from k =0 to k =K is considered to be a vector, then the result in Section 9.2 can be applied directly. The fact that we have found a computationally efficient algorithm for making the ML decision will not change that error probability. The sequence error probability is the probability that 418 DETECTION the path chosen through the trellis does not correspond to the correct state sequence, or in other words the probability that one or more detected states are in error. This criterion thus gives the same weighting to an error in which ten states are incorrect as when only one state is in error. We showed in Section 9.2 that this error probability is dominated by the path through the trellis that is nearest in distance (Euclidean or Hamming) to the correct path. However, caution is in order! As the length K of the sequence gets large, the number of distinct paths with minimum distance from the correct path also gets large, invalidating the approximations in Section 9.2. In fact, as K gets large. the probability of sequence error usually approaches unity! The usual measure of performance for a digital communication system, however, is the probability of a single symbol or bit error, or the probability of a sequence error for a relatively short sequence (such as one block of data). This observation has two implications. First, for many applications the ML sequence criterion is not the appmpriate one. It might be more appropriate to minimize the data symbol or bit error probability instead. Second. even if we use the sequence detector, we woold like to know the probability of a bit or symbol error, and the relationship of these to the sequence error probability is not trivial. A simple approach, developed below, is to calculate the probability of a sequence error per unit time, and then relate this normalized sequence-error probability to the probability ofa symbol and bit error. In practice the ML sequence criterion is usually used in preference to minimizing the probability of a symbol error, for two reasons. First, it is much simpler to implement, and in fact the best known algorithms for optimal bit-by-bit detection have exponential complexity in K. Second, the performance of the ML sequence detector is almost identical at high SNR (Gaussian noise) or low channel error probability (BSC) to the {}etector that minimizes the bit or symbol error probability. Error Events For purposes of determining the symbol error probability, it is useful to introduce {W the concept of an error event. Let {'Ifk } be the correct state sequence and k ) be the sequence selected by the Viterbi algorithm. Over a long time, {'lfk} and {Wk} will typically diverge and remerge several times. Each distinct separation is called an error event, which is therefore de.fined as a correct path through the trellis paired with an error path that begins and ends with the correct state. By definition, the error path does not share any intermediate states with the correct state sequence. The length of an error event is the number of intermediate (incorrect) nodes in the path. Example 9-33. Examples of error events of length one and two are shown in Example 9-33 for a two-state trellis. The a:.""'SUII1ed correct state trajectory is shown by dashed lines, and the error event by solid lines. There are error events ofunbounded length, although as we will see, the pmbability of the longer events will usually (but not always) be negligibly small. 0 An error event has one or more symbol errors, which are incorrect symbols or bits that result from taking an incorrect path through the trellis. In Appendix 9-C we show that the probability of symbol error is dominated by the probability of the minimum distance error event at high S~'R. For the Gaussian noise case, SEC. 9.6 SEQUENCE DETECTION: THE VITERBI ALGORITHM 419 i i+l i+2 ~ (a) i+l i+2 i+3 - - .. - -0- - - .. - -0- - -~ - a a (b) Figure 9-19. When the correct state sequence 'I' and the detected state sequence '" diverge and remerge, we have an error event. Two error events are shown here. (a) The shortest error event for the two-state trellis in Example 9-25. (b) The next longest error event. In both cases we have assumed the correct state trajectory is all zeros, shown with the dashed lines. Pr[symbol error] ~ C·Q (d min/2cr) (9.91) and for the BSC case Pr[symbol error] ~ C·Q (d min, p) , (9.92) where C is some constant between P and R given by (9.158) and (9.150) respectively, and Q(.;) is defined by (9.25). As long as P and R are reasonably close to unity, we need not be too concerned with this multiplicative constant. The following procedure will find the distance of any particular error event for either the Gaussian or BSC cases. Assume a correct state sequence, and label each branch in the trellis with its squared distance from the corresponding branch of the correct state sequence. This would be the branch metric if the channel were noiseless. The correct state sequence will have branch metrics that are zero, and normally all the branches not on the correct path will have a non-zero branch metric. For each possible error event, we can find the distance of that error event very simply by computing its path metric. Example 9-34. Continuing the lSI Example 9-25, Figure 9-20 shows the trellis labeled with the branch metrics assuming a noiseless channel and an all-zeros transmitted sequence. The path metric for each path through the trellis is now the square of the Euclidean distance of that path from the correct all-zeros path. The error event of length one is easily seen to have i i+l i+2 ~(a) i+l i+2 i+3 - - .. - -0- - - .. - -0- - - .... -- (b) 2.25 Figure 9·20. The trellis of Figure 9-19 labeled with the distances from the correct branches, assuming the correct branches are the dashed ones. 420 DETECTION Euclidean distance ..j1.25, and the error event of length two has distance../3:5. Longer error events have still greater distances. Obviously, the error event oflength one is much more probable than longer error events. It is easy to show ~Y exhaustive sear{;h that all possible correct paths through the trellis are at least distance 1.25, and none has smaller distance (see Exercise 9-2 below), so "1.25 is the minimum distance for all possible correct paths. It is shownin Appendix 9-C that C = I, so Pr[symbol error] :::: QCh.25I2cr) . ('9.93) o Exercise 9-2. Completing Example 9-34, show that for each possible co~th through the trellis, the minimum-distance error event has length one and distance "1.25. 0 In Example 9-34 we found the minimum distance by inspection. This will not be so easy in general. Fortunately, it turns out that the Viterbi algorithm can itself be used to find the minimum-distance error event for a given correct state sequence1 Using the trellis diagram labeled with the same branch metrics as above., we can begin on the correct path and use the Viterbi algorithm to find the survivor at each stage, excluding the correct path (the only one with zero path metric). Each survivor at a node on the correct path corresponds to an error event, and the path metric is the distance of this error event from the correct path. At each stage of the algorithm, keep track of the minimum-distance error events recorded thus far; when all survivors have a partial path metric greater than this minimum, the minimum-distance error event has been found for one assumed correct path through the trellis. As shown in Appendix 9-C, it is the global minimum distance d min that dominates the probability of error, not the minimum distance from one particular path through the trellis. Fortunately, usually we do not need to examine all possible correct paths to find the minimum distance~ Better techniques are available for both the lSI examples (discussed shortly), and the BSC examples (discussed in Chapter 13). In both cases we exploit symmetry based on linearity, although the nature of the linearity is quite different in each case. 9.6.5. Calculating the Minimum-Distance for lSI The performance of the MLSD is determined largely by the minimum-distance properties of the lSI. This can be determined in brute force fashion by finding the minimum-distance error event for all possible starting paths through the trellis. How- ever, due to the linearity of the FSM model in the particular case of lSI, the problem can be simplified. In particular, if we define an error symbol €k as €k = ak - ak where at and at are feasible data symbols, then it was shown in (7.72) that the minimum distance is given by 00 K d 2 min = min L L 1 Ekgh ,m-k 12 , {Ek • I So k So K} m=l k=l (9.94) where the minimization is over aU non-zero sequences of error symbols; that is, at SEC. 9.6 SEQUENCE DETECTlON: THE VLTERBL ALGORLTHM 421 least one of the error symbols must be non-zero. By time invariance, we can limit attention to error events that begin with an error at time k = 1; that is, the minimiza- tion of (9.94) is over all {Ek ,I$" k :s; K} such that El ::/; O. The minimum-distance problem of (9.94) can be formulated in a form that can be solved by the Viterbi algorithm, but only for the case where Gh (z) is an FIR filter, where an FSM model holds. Assuming gh.k =0 for k >J, the convolution sum can be reversed, = d 2mm. min K+J J "~-' ["~-' gh ,i cLm-i [2 {Ek' 1 t(t)JRz(t -t)j(t)dtdt . o0 (9.111) A sufficient condition for this to be satisfied is (9.37), as can be established by substituting (9.37) into (9.111). To show that (9.37) is a necessary condition, we assume that (9.33) and (9.35) are valid, and show that this implies (9.37). Multiplying (9.33) by Zn* and taking the expected value, we get (9.112) Similarly, multiplying the conjugate of (9.36) by Z (t) and taking the expectation, T T = E[Z(t)Zn*]=E[Z(t)JZ*(t)n(t)dt] JRz(t -t)n(t)dt, O~t ~T. (9.113) o 0 Equating these two results establishes (9.37). Derivation of the Continuous-time Whitening Filter We now derive (9.44). Define --+ . =L00 Sl' gl(t) i(t) i=l ai (9.114) and then (9.43) follows directly by multiplying both sides of (9.114) by Rz(t - t) and integrating. Similarly, J = J = T Sm(t)gm* (t)dt o Lo os*.T -m~-I Sm(t)i*( t ) d t i=l ai 0 00 Is 12 L -m~,i - =Em i=l a? (9.115) Sufficient Statistic Argument In Section 9.3.3 we derived a set of sufficient statistics {Uk> 1 ~ k ~ N} for the received signal Y (t), 0 ~ t ~ T, by letting T ~ 00 and using intuitive arguments. Here we derive these sufficient statistics carefully for finite T using the Karhunen-Loeve APP,9-A KARHUNEN-LOEVE EXPANSION 429 expansion. The results remain valid as T ~ 00. Define Jf m (t }--~£0.0J~S.t'l\Yi(t), 1<- m -i (t), it foHows that = L 'l'k(t) 'l'k,i i(t}C\>z*(t}dt o i=lZ=l 0 00 =L 'lfk,i 'lfj,i . i=l (9.120) Thus, the discrete-time sequences {'lfk,i' 1 ::; k ::; oo} for 1 ::; i < 00 are orthonormal. Returning to the received signal of (9.40), and fanning the. inner product of both sides with 'I'k.:,i' 1 ::; i ::; 00, thus expressing it in terms of a new basis {'lfk i' l::;k ::;oo}, (9.121) The first term on the right side of (9.121) is F[ k, and the second is a noise term Wk' All that remains to establish (9.51) is to show that Wk.: is white, which follows from 'I1 * E[WkW'l=E / [ ~00. "L,; Zia ~00 -- "L,; lZazl*,f. k. "; i=lZ=l i Z * l,l..f/,Z. =~.00\If,;ll* fk'=O", "L,; ,.. /" ,.. ,J J '/ i=l (9.122) oJ Since Wk is a line.ar fune.tion of a circularly symmetric process Z", it is circularly symmetric, and (9.122) implies that the Wk are mutually independent. 430 DETECTION We can express the sufficient statistics in tenns of continuous-time signals as fol- lows. Substituting from (9.118) in (9.121), I f · f · 00 y. T T = = Uk L - "'k(t)c!>j(t)dt U(t)"'k(t)dt j=1 CJj 0 0 (9.123) where L - . Y 00 j U (t) = c!>j (t) , 0 ~ t ~ T , j=1 CJj (9.124) is the output of the whitening filter. This confirms (9.49). APPENDIX 9-8 GENERAL ML AND MAP SEQUENCE DETECTORS In Section 9.6 we illustrated the Viterbi algorithm for the additive Gaussian noise and BSC noise generation models, where the signal generator is a Markov chain. In this appendix we show that the Viterbi algorithm applies to any noise generator with independent noise components. Let the random vector '¥ of length K +1 denote the state sequence '¥k from k = 0 to k =K, and let the vector", denote an outcome of this random vector. Similarly let = = the vector Y denote the observations Yk from k 0 to k K - 1 and y an outcome (note that there is one fewer observation than states because observations correspond to transitions between states). Then given an observation y, the MAP sequence detector selects the", that maximizes the posterior probability P'I'IY('" Iy). Note that the criterion is to maximize the a posteriori probability of the whole sequence of states, rather than a single state, and hence the term sequence detector. In this appendix we will omit the subscripts in the p.dJ.'s, writing p (", Iy) instead of P'I' Iy(", Iy), for example. There is no ambiguity here, and the shorthand will greatly simplify the expressions. The MAP sequence detector can equivalently maximize the product P (", I y)f (y) because f (y) is not dependent on our choice",. (The notation f (y) implies that Yk is continuous-valued, as in the additive Gaussian case. If it is discrete-valued, as in the BSC, then simply replace f (y) with l! (y).) From the mixed form of Bayes' rule (3.31), we can equivalently maximize f (y I'" )p (", ). Exercise 9-3. Show that Bayes' rule and the Markov property imply that I!K-l p (~) = p (~o)k rf (~k+ll ~k)' o (9.125) APP.9-8 GENERAL ML AND MAP SEQUENCE DETECTORS 431 V This is intuitive because the probability of a given state trajectory is equal to the product of the probabilities of the corresponding state transitions and the probability of the initial state. Since we assume the initial state is known, p(VO) = 1. Because of the independent noise components assumptio~ A K-l A f(yl'l')= Ilf(Yk 1'l1) k=(} (9.126) FurthernlOl'e, since Yk depends on only two of the states in lJI, we can write I!IK-l f (y IV) = k (Yk IVk+l' Wt)· (9.127) V Putting these results together, we wish to find the state sequence that maximizes K-l K-l f (y IV)p (V) = n p (Wk+ll~k) HI (Yk I~k+l' ~k)' k=O k=O (9.128) We can equivalently maximize the logarithm, IOg[ f(yIV)P(~I)] =]:IOg[ P(~k+ll~k») 1 + K~-l l [ og f (Yk A A I"'k+l' 'l'k)J ' k =0 (9.129) or minimize the negative of the logarithm. To each transition (~k' ~k+l) in the trellis we assign the branch metric -IOg[ W("'ko"'k+l)=-log[ P("'k+ll"'k)] f(YthVt+l,Vk)]' (9.130) The MAP detector then calculates the path metric for each path through the trellis and finds the path with the smallest path metric. Often the expression (9.130) for the weight of transitions in the trellis can be significantly simplified. For example, if all permissible transitions are equally likely then the first term is a constant and can be omitted. Alternatively, if we do not know the transition probabilities, we can assume the permissible transitions are equally likely and again omit this term. In either case the result is the i~fL sequence detector. In this case the branch metrics are w("'k> "'k+l) =-log[ f (Yk t"'k+l' Wk)] . (9.131) Example '-41. Consider a real-valued transmission with an additive Gaussian noise generator. In this case fry}; 1~k+1' ~};)=f(ylr.lsn =fN.cylr. - 1,,) (9.132) 1- =~e (y. - fdl2tT , ct'i21t "' '+1' s" where is the output of the signal generator when the state transition is from "'" to 432 DETECTION The branch metrics are , ' r ; ; ; - (Yk - f k)2 W('I'k.'I'k+l)=log(0''121t)+ 2 20' (9.133) The first term is independent of the decision. so equivalent branch metrics are W/(~k'~k+l)=(yk-fk)2, (9.134) the squared Euclidean distance. Hence we have rederived the result from Section 9.2 in another way! Extensions to complex-valued signals are easy. 0 Example 9-42. Consider a BSC noise generator. f (Yk I ~k+l' ~k) =f (Yk If k) =p Ii" (y•• s')(1 _ p )M - Ii" (Y• .s.) (9.135) where p is the probability that the BSC will invert a bit. and M is the number of bits in each signal sample Sk' The branch metrics are W(~k' ~k+l)= -log(p)[dH (Yk. f k)]-log(1-p)[M -dH(Yk,fk )]· =[log(1 - p) -log(p )]dH(Yk' f k) - Mlog(1 - p) . (9.136) The last term is not a function of the decision, and so can be ignored. Assuming p < 1 - P , an equivalent branch metric is the Hamming distance W'(~k'~k+l)=dH(yk,fk)' (9.137) Hence we have again rederived the result from Section 9.2 in another way! 0 APPENDIX g-C BIT ERROR PROBABILITY FOR SEQUENCE DETECTORS We showed in Section 9.6 that the probability of sequence error is easy to obtain using the vector channel results from Section 7.2.4, but is often useless because the probability approaches unity as the sequence gets large. Instead of the probability of sequence error, we can compute the probability that an error event begins at a particular time. This effectively normalizes the probability of sequence error per unit time. Error events are defined in Section 9.6.4. We are most often interested however in the probability of a bit or symbol error rather than an error event. In this appendix we derive the error event probability and a general expression for the probability of bit or symbol error that does not depend on linearity in the system. We then show that for the additive Gaussian white noise case the probability of error is approximately C·Q (dmin/2cr), where C is a constant that we can easily bound, and d min is the distance of the minimum distance error event. For the BSC channel case, the probability of error is approximately C·Q (dmin,p), where Q (.,) is defined by (9.25). APP.9-C BIT ERROR PROBABILITY FOR SEQUENCE DETECTORS 433 After the sequence detector selects a path through the trellis, the receiver must translate this path into its corresponding bit sequence (recall the one-to-one mapping between incoming bit sequences and state trajectories). Several bit or symbol errors may occur as a consequence of each error event. Let E denote the set of all error events starting at time i. Each element e of E is characterized by both a correct path '" and an incorrect path ~ that diverge and remerge some time later. We make a stationarity assumption that Pr[e] is independent of i, the starting time of the error event. This will of course not be true if the trellis is finite, but if it is long relative to the length of the significant error events then the approximation is accurate. Each error event causes one or more detection errors, where a detection error at time k means that Xk at stage k of the trellis is incorrect. For the lSI example, each Xk is a symbol Ak , so a detection error is the same as a symbol error. For the binary coding examples, each Xk is a set of one or more bits. Define _{ 1; if e has a detection error in position m (from the start i) (9.138) cm(e) - 0; otherwise . This function characterizes the sample times corresponding to detection errors in error evente. Example 9-43. Consider Example 9-25. the lSI example. Let el denote the error event of Figure 9-19a. which assumes that the correct state trajectory 'If consists of zero states. From Figure 9-17b we see that this error event causes decisions Xi =1 and Xi+l =O. Since Xi =0 and Xi+l =0 are the correct decisions. (9.139) o The probability of a particular error event e starting at time i and causing a detection error at time k is Ck-i (e) Pr[e] . (9.140) Since the error events in E are disjoint (if one occurs no other can occur), = r, r, k Pr[detection error at time k] Pr[e ]ck-i (e) . i=-ooeeE Exchanging the order of summation, assuming this is legitimate, r, r, = Pr[detection error at time k] k Pr[e] ck-i (e) . eeE i=-oo By a change of variables, r, r, k w(e)= ck-i(e)= cm(e) i=-oo m=O which is the total number of detection errors in e. Thus (9.141) (9.142) (9.143) 434 DETECTION = Pr{detection error] 1: Pr{e ]w (e) , eeE (9.144) where we note that the dependence on k has disappeared. Hence, the probability of a detection error at any particular time is equal to the expected number of detection errors caused by error events starting at any fixed time i. In retrospect this result is not unexpected, since from the perspective of time k, the probability of a detection error at that time must take into account all error events starting at times prior to k. The probability of the error event e depends on the probabilities of both the A correct and incorre<:t paths 'If and 'I' that make up e , PrIel = Pr['I'] Pr[Vl'1']. (9.145) It is usually difficult to find exact expressions for Pr[V I'1'], but bounds are often easy. The reason is easy to see in the simple example of Figure 9-24, where we assume there are only three possible traject{)ries. In Figure 9-24a the ML ilix:ision regions for the three signals are shown. These decision regions lie in a K -dimensional space that we schematically represent on the two-dimensional page. Now suppose that 'Jf is the V actual trajectory, corresponding to signal s in Figure 9-24a. The region corresponding to the detection of is shown in Figure 9-24b. The probability of the noise carrying us into this region is very difficult to calculate, especially as the number of possible trajectories gets large. However, this probability is easy to upper bound by llSing the larger decision region of Figure 9-24c, which ignores the possibility of any trajectory other than V and 1jI. For the additive white Gaussian noise model, the probability of the region in Figure 9-24c is (9.146) s wbere d (V, '1') is the Euclidean distance between transmitted signals and s corresponding to state trajectories V and '1'. For the BSC, Pr[VI'I'] ~Q(d(V, 'I'),p) (9.147) where d (V, '1') is now a Hamming distance and Q(',.) is defined by (9.25). The bound s is precisely the probability tbat the received signal is closer to the signal r:-5 s" ·S (a) ~5 s'~ , ·S (b) *5 ~ ~ 5' ·s (c) r:-., S• , 5 ~:'\., ·S (d) Figure 9·24. a. Three signals corresponding to state trajectories 'If. '" and 'If' where 'If is the actual state trajectory. b. If.recejved signal ~wJth noise} is in the shaded {~, the Ml detector will phoose trajectory 'If. c. The decision region for 'If if there were only two signals, \jI and \If. --+-- . . . . . .... ~-.---o.__ ~_ .... ....0---.-- L /-- ." --.---0',..... ~--o---.-- .. Figure 9-25. Eight error events that have the same probabilities. 436 DETECTION to an accurate estimate of Pr[detection error]. Returning to Figure 9-24, we want to use the decision region of Figure 9-24c to somehow obtain a lower bound. Shown in Figure 9-24d is the decision region corresponding to any error event conditioned on actual state sequence ",. The probability of any error event is evidently lower bounded by calculating the probability of the smaller decision region in Figure 9-24c. Thus, we see that in order to determine a lower bound, we must start with the probability of any error event, rather than the probability of a particular error event. Since w (e) ~ 1 for all error events e, then from (9.144) L Pr[detection error] ~ Pr[e] = Pr[an error event] . eEE (9.153) Now consider a particular actual path", through the trellis. For this path, let d min(",) denote the distance of the minimum distance error event (either Euclidean or Hamming). Of course, d min(",) ~ d min, where d min is the minimum distance error event over all possible actual state sequences ",. As in Figure 9-24c, if", is the actual state sequence, the probability of an error event is lower bounded by Figure 9-24c. Obviously to make this bound strongest, we want to choose a particular error event that is closest to "', one of those at distance dmin(",). Hence, for the Gaussian case Pr[an error event ,,,,] ~ Q (dmin(",)I2cr) . (9.154) Combining this with (9.153) we get Pr[detection error' ",] ~ Q(d min(",)/2cr) . (9.155) Consequently, L Pr[detection error] ~ Pr[",lQ (dmin(",)I2cr) . 1jf (9.156) If we omit some terms in this summation, the bound will still be valid since the terms are all non-negative. Thus, let us retain only those state sequences '" for which = d min(",) d min , L Pr[detection error] ~ Pr[",lQ (d min/2cr) , (9.157) 1jfEA where A is the set of actual paths '" that have a minimum distance error event, and d min is that minimum distance. Define P = L Pr[",], 1jfEA (9.158) the probability that a randomly chosen", has an error event starting at any fixed time with distance d min (or is consistent with a minimum distance error event). Then Pr[detection error] ~ PQ (d min/2cr) . (9.159) In retrospect this lower bound is intuitive, since we would expect that every state sequence consistent with a minimum-distance error event will result in a probability of error event at least as large as Q(dmir/2cr), and each error event will result in at least one detection error. For the common case where all possible paths", through the = trellis are consistent with a minimum-distance error event, P 1. This is true in APP.9-C BIT ERROR PROBABILITY FOR SEQUENCE DETECTORS 437 Example 9-44. Combining our upper and lower bounds, PQ (dmi.J2a) S Pr[detection error} SRQ (d min/2a) , (9.160) where the upper bound is approximate since some terms were thrown away. We. c.on- elude that at high SNR Pr[detection error} :: C·Q (dIlliJ!2a)' (9.161) for some constant C bet\'¥-een P and R. The BSC case is identical, Pr[detection error] :: C .Q (dmin ,P ) (9.162) where Q C·) is defined in (9.25). Example 9-45. Continuing Example 9-44, note that P = R = 1. Hence Pr[detection error1::: Q C" 1.25/2a) . (9.163) For this example, each detection error causes exactly one bit error, so Pr[detection error] = Pr[bit error]. Hence, with the sequence detector we get approximately the same probability of error as for an isolated pulse and a matched filter receiver (see Problem 9-9). o In general, a single detection error may cause more than one bit error. Suppose each input to the Markov chain Xk is determined by n source bits (and hence comes from an alphabet of size 2n ). Then each detection error causes at least one and at most n bit errors. Hence we can write 1. Pr[detection error} $: Pr[bit error} $: Pr[detection error} . n Typically we make the pessimistic assumption that (9.164) Pr[bit error}::: Pr[detection error}. (9.165) PROBLEMS 9-1. Suppose a binary symbol A (ilA = (0,1]) with PA (0) =q and PA (1) = 1 - q is transmitted through the BSC of Figure ~-2. The observation y is also a binary symbol {fly = (0,1}); it equals A with probability 1- P . (a) Find the ML detection rule. Assume P < lh. (b) Find the probability of error of the ML detector as a function of P and q . (c) Assume P =0.2 and q =0.9. Find the MAP detector and its probability of error. Compare this probability of error to that in part (b). (d} Find the genera} MAP detector for arbitrary p and q . a (e) Find the conditions on p and q such that the MAP detecter always selects =O. 438 DETECTION 9-2. Consider the vector detection problem for the BSC of Example 9-15. Specify the MAP detector for some given prior prooabilities fm signal vectors. 9-3. Assume the random variable X has £ample space Ox = (-3, -I, +1, +3) with prim probabilities Px(±3)=0.1 and Px(±I)=OA. Given an observation y of the random variable y = X + N, where N is a zero mean Gaussian random variable with variance (J2, independent = = of X, find the decision regions for a MAP detect{)f. Now suppose <>2 D.25 and y 2.1. What is the decision? 9-4. Assume the random variable X is from the alphabet Ox = {XI,X2}' Define the random vari- = able Y X + N, where N is a zero mean Gaussian random variable with variance (Jz, indepen- dent of X. Give an expressjon for the MAP decision boundary between x i and Xl' 9-5. Consider M vectors each a distance d min from the other vectors. Assume an ML de!e£!or wi]] be used to distinguish between these vectors. (a) Give an example for M = 3 of such a set of vectors where d min is a Euclidean distance of-fi. (b) Give an example for M = 3 of such a set of vectors where d min is a Hamming distance of 2. (c) Use the union bound to find an upper bound on the probability of error for your two examples, assuming additive white Gaussian noise for (a) and a BSC for (b). FirS! give the bound assuming 51 is transmitted, then give the bound without this assumption. 9-6. Suppose you are given N observations x I, ... , XN of the zero mean independent Gaussian random variables X I, ... , XN . Assume that the random variables have the same (unknown) variance (J2. What is the ML estimator for the variance? 9-7. Given a Gaussian channel with independent noise components, one of the following four equally lfkely signals is transmitted: (I, I ),( I ,-1 ),(-I, I ),(-1,-1). Determine the exact probability of error of an ML detector for Gaussian noise with variance {>2. 9-8. (a) Repeat Problem 9-7 for a BSC with error probability p and four equally likely signals: (000000),(000 I 11),( 111000),( Illll I). (b) What is this error probability when p = 0.1 ? Compare to the minimum distance approximation. n 9-9. Suppose that a symbol A from the alphabet A = {O, I} is transmitted through the LTI system with impulse response h* =1)* +0.5·1)*_1' (9.l66) and corrupted by additive white Gaussian noise with variance {>2. (a) Determine the structure of the ML detector. (b) Calculate the probability of error.for the ML detector. 9-10. Consider the system in Figure 9-26. Assume Z* is a sequence of independent zero mean Gaus- n sian random variables with variance (Jl. Assume the symbol alphabet is A = {O,I} and that the channel impulse response is Figur~ 9-2~. A simple case Qf a discrete-time channel with intersymbol interference. 439 (9.167) (a) Derive the matched filter receiver. (b) (c) 9-11. Give an expression for the probability of error. Now suppose nA = {- I, 0, +I}. Repeat part (a). (a) Determine the optimal incoherent receiver structure for passband PAM modulation, where the data symbols assume only two values A 0 =0 and A 0 = I. This is known as amplitude shift keying (ASK). (b) Discuss the conditions under which passband PAM can be successfully demodulated using an incoherent receiver. (c) Find an expression for the probability of error of the type derived in Example 9-22. You do not need to evaluate this expression. 9-12. (a) Derive a discrete-time channel model analogous to Figure 9-13 where instead of a matched filter, a general filter F' UOJ) is used, the Gaussian noise added on the channel has power spec- trum N oSN(OJ), and symbol-rate sampling is used at the output of the filter. (b) Determine a model for the special case where the matched filter optimized for the particular noise spectrum is used. (c) As a check, verify that your model reduces to Figure 9-13 when the filter is a matched filter and the noise is white. ~~