ELEMENTARY MATHEMATICAL and COMPUTATIONAL TOOLS for ELECTRICAL and COMPUTER ENGINEERS USING MATLAB® © 2001 by CRC Press LLC ELEMENTARY MATHEMATICAL and COMPUTATIONAL TOOLS for ELECTRICAL and COMPUTER ENGINEERS USING MATLAB® Jamal T. Manassah City College of New York CRC Press Boca Raton London New York Washington, D.C. Library of Congress Cataloging-in-Publication Data Manassah, Jamal T. Elementary mathematical and computational tools for electrical and computer engineers using MATLAB/Jamal T. Manassah. p. cm. Includes bibliographical references and index. ISBN 0-8493-1080-6 1. Electrical engineering—Mathematics. 2. Computer science—Mathematics. 3. MATLAB. I. Title. TK153 .M362 2001 510′.24′62—dc21 2001016138 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microﬁlming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Speciﬁc permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identiﬁcation and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2001 by CRC Press LLC No claim to original U.S. Government works International Standard Book Number 0-8493-1080-6 Library of Congress Card Number 2001016138 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper About the Author Jamal T. Manassah, has been Professor of Electrical Engineering at the City College of New York since 1981. He received his B.Sc. degree in Physics from the American University of Beirut, and his M.A. and Ph.D. in Theoretical Physics from Columbia University. Dr. Manassah was a Member of the Institute for Advanced Study. His current research interests are in theoretical and computational quantum and nonlinear optics, and in photonics. © 2001 by CRC Press LLC Introduction This book is mostly based on a series of notes for a primer course in electrical and computer engineering that I taught at the City College of New York School of Engineering. Each week, the class met for an hour of lecture and a three-hour computer laboratory session where students were divided into small groups of 12 to 15 students each. The students met in an informal learning community setting, a computer laboratory, where each student had the exclusive use of a PC. The small size of the groups permitted a great deal of individualized instruction, which was a key ingredient to cater successfully to the needs of students with heterogeneous high school backgrounds. A student usually takes this course in the second semester of his or her freshman year. Typically, the student would have completed one semester of college calculus, and would be enrolled in the second course of the college calculus sequence and in the ﬁrst course of the physics sequence for students in the physical sciences and engineering. My purpose in developing this book is to help bring the beginner engineering student’s analytical and computational skills to a level of competency that would permit him or her to participate, enjoy, and succeed in subsequent electrical and computer engineering courses. My experience indicates that the lack of mastery of fundamental quantitative tools is the main impediment to a student’s progress in engineering studies. The speciﬁc goals of this book are 1. To make you more comfortable applying the mathematics and physics that you learned in high school or in college courses, through interactive activities. 2. To introduce you, through examples, to many new practical tools of mathematics, including discrete variables material that are essential to your success in future electrical engineering courses. 3. To instruct you in the use of a powerful computer program, MATLAB®*, which was designed to be simultaneously userfriendly and powerful in tackling efﬁciently the most demanding problems of engineering and sciences. 4. To give you, through the applications and examples covered, glimpses of some of the fascinating problems that an electrical or * MATLAB® is a registered trademark of the MathWorks, Inc., 3 Apple Hill Drive, Natick, MA, 01760-2098, USA. Tel: 508-647-7000, Fax: 508-647-7101, e-mail: info@mathworks.com, Web: www.mathworks.com. © 2001 by CRC Press LLC computer engineer solves in the course of completing many of his or her design projects. My experience indicates that you can achieve the above goals through the following work habits that I usually recommend to my own students: • Read carefully the material from this book that is assigned to you by your instructor for the upcoming week, and make sure to solve the suggested preparatory exercises in advance of the weekly lecture. • Attend the lecture and follow closely the material presented, in particular the solutions to the more difﬁcult preparatory exercises and the demonstrations. • Following the lecture, make a list of questions on the preparatory material to which you still seek answers, and ask your instructor for help and clariﬁcation on these questions, preferably in the ﬁrst 30 minutes of your computer lab session. • Complete the in-class exercises during the computer lab session. If you have not ﬁnished solving all in-class exercises, make sure you complete them on your own, when the lab is open, or at home if you own a computer, and certainly before the next class session, along with the problems designated in the book as homework problems and assigned to you by your instructor. In managing this course, I found it helpful for both students and instructors to require each student to solve all problems in a bound notebook. The advantage to the student is to have easy access to his or her previous work, personal notes, and reminders that he or she made as the course progressed. The advantage to the instructor is to enhance his or her ability to assess, more easily and readily, an individual student’s progress as the semester progresses. This book may be used for self-study by readers with perhaps a little more mathematical maturity acquired through a second semester of college calculus. The advanced reader of this book who is familiar with numerical methods will note that, in some instances, I did not follow the canonical order for the sequence of presentation of certain algorithms, thus sacriﬁcing some optimality in the structure of some of the elementary programs included. This was necessitated by the goal I set for this book, which is to introduce both analytical and computational tools simultaneously. The sections of this book that are marked with asterisks include material that I assigned as projects to students with either strong theoretical interest or more mathematical maturity than a typical second semester freshman student. Although incorporated in the text, they can be skipped in a ﬁrst reading. I hope that, by their inclusion, I will facilitate to the interested reader a smooth transition to some new mathematical concepts and computational tools that are of particular interest to electrical engineers. © 2001 by CRC Press LLC This text greatly beneﬁted from course material previously prepared by my colleagues in the departments of electrical engineering and computer science at City College of the City University of New York, in particular, P. Combettes, I. Gladkova, B. Gross, and F. Thau. They provided either the starting point for my subsequent efforts in this course, or the peer critique for the early versions of this manuscript. I owe them many thanks and, of course, do not hold them responsible for any of the remaining imperfections in the text. The preparation of this book also owes a lot to my students. Their questions and interest in the material contributed to many modiﬁcations in the order and in the presentation of the different chapters. Their desire for working out more applications led me to expand the scope of the examples and exercises included in the text. To all of them, I am grateful. I am also grateful to Erwin Cohen, who introduced me to the ﬁne team at CRC Press, and to Jerry Papke whose stewardship of the project from start to end at CRC Press was most supportive and pleasant. The editorial and production teams at CRC in particular, Samar Haddad, the project editor, deserve credit for the quality of the ﬁnal product rendering. Naomi Fernandes and her colleagues at The MathWorks Inc. kindly provided me with a copy of the new release of MATLAB for which I am grateful. I dedicate this book to Azza, Tala, and Nigh whose support and love always made difﬁcult tasks a lot easier. Jamal T. Manassah New York, January 2001 © 2001 by CRC Press LLC Contents 1. Introduction to MATLAB® and Its Graphics Capabilities 1.1 Getting Started 1.2 Basic Algebraic Operations and Functions 1.3 Plotting Points 1.3.1 Axes Commands 1.3.2 Labeling a Graph 1.3.3 Plotting a Point in 3-D 1.4 M-ﬁles 1.5 MATLAB Simple Programming 1.5.1 Iterative Loops 1.5.2 If-Else-End Structures 1.6 Array Operations 1.7 Curve and Surface Plotting 1.7.1 x-y Parametric Plot 1.7.2 More Parametric Plots in 2-D 1.7.3 Plotting a 3-D Curve 1.7.4 Plotting a 3-D Surface 1.8 Polar Plots 1.9 Animation 1.10 Histograms 1.11 Printing and Saving Work in MATLAB 1.12 MATLAB Commands Review 2. Difference Equations 2.1 Simple Linear Forms 2.2 Amortization 2.3 An Iterative Geometric Construct: The Koch Curve 2.4 Solution of Linear Constant Coefﬁcients Difference Equations 2.4.1 Homogeneous Solution 2.4.2 Particular Solution 2.4.3 General Solution 2.5 Convolution-Summation of a First-Order System with Constant Coefﬁcients 2.6 General First-Order Linear Difference Equations* 2.7 Nonlinear Difference Equations 2.7.1 Computing Irrational Numbers 2.7.2 The Logistic Equation © 2001 by CRC Press LLC 2.8 Fractals and Computer Art 2.8.1 Mira’s Model 2.8.2 Hénon’s Model 2.9 Generation of Special Functions from Their Recursion Relations* 3. Elementary Functions and Some of Their Uses 3.1 Function Files 3.2 Examples with Afﬁne Functions 3.3 Examples with Quadratic Functions 3.4 Examples with Polynomial Functions 3.5 Examples with Trigonometric Functions 3.6 Examples with the Logarithmic Function 3.6.1 Ideal Coaxial Capacitor 3.6.2 The Decibel Scale 3.6.3 Entropy 3.7 Examples with the Exponential Function 3.8 Examples with the Hyperbolic Functions and Their Inverses 3.8.1 Capacitance of Two Parallel Wires 3.9 Commonly Used Signal Processing Functions 3.10 Animation of a Moving Rectangular Pulse 3.11 MATLAB Commands Review 4. Numerical Differentiation, Integration, and Solutions of Ordinary Differential Equations 4.1 Limits of Indeterminate Forms 4.2 Derivative of a Function 4.3 Inﬁnite Sums 4.4 Numerical Integration 4.5 A Better Numerical Differentiator 4.6 A Better Numerical Integrator: Simpson’s Rule 4.7 Numerical Solutions of Ordinary Differential Equations 4.7.1 First-Order Iterator 4.7.2 Higher-Order Iterators: The Runge-Kutta Method* 4.7.3 MATLAB ODE Solvers 4.8 MATLAB Commands Review 5. Root Solving and Optimization Methods 5.1 Finding the Real Roots of a Function 5.1.1 Graphical Method 5.1.2 Numerical Methods 5.1.3 MATLAB fsolve and fzero Built-in Functions 5.2 Roots of a Polynomial © 2001 by CRC Press LLC 5.3 Optimization Methods 5.3.1 Graphical Method 5.3.2 Numerical Methods 5.3.3 MATLAB fmin and fmins Built-in Function 5.4 MATLAB Commands Review 6. Complex Numbers 6.1 Introduction 6.2 The Basics 6.2.1 Addition 6.2.2 Multiplication by a Real or Imaginary Number 6.2.3 Multiplication of Two Complex Numbers 6.3 Complex Conjugation and Division 6.3.1 Division 6.4 Polar Form of Complex Numbers 6.4.1 New Insights into Multiplication and Division of Complex Numbers 6.5 Analytical Solutions of Constant Coefﬁcients ODE 6.5.1 Transient Solutions 6.5.2 Steady-State Solutions 6.5.3 Applications to Circuit Analysis 6.6 Phasors 6.6.1 Phasor of Two Added Signals 6.7 Interference and Diffraction of Electromagnetic Waves 6.7.1 The Electromagnetic Wave 6.7.2 Addition of Electromagnetic Waves 6.7.3 Generalization to N-waves 6.8 Solving ac Circuits with Phasors: The Impedance Method 6.8.1 RLC Circuit Phasor Analysis 6.8.2 The Inﬁnite LC Ladder 6.9 Transfer Function for a Difference Equation with Constant Coefﬁcients* 6.10 MATLAB Commands Review 7. Vectors 7.1 Vectors in Two Dimensions (2-D) 7.1.1 Addition 7.1.2 Multiplication of a Vector by a Real Number 7.1.3 Cartesian Representation 7.1.4 MATLAB Representation of the Above Results 7.2 Dot (or Scalar) Product 7.2.1 MATLAB Representation of the Dot Product 7.3 Components, Direction Cosines, and Projections 7.3.1 Components © 2001 by CRC Press LLC 7.3.2 Direction Cosines 7.3.3 Projections 7.4 The Dirac Notation and Some General Theorems* 7.4.1 Cauchy-Schwartz Inequality 7.4.2 Triangle Inequality 7.5 Cross Product and Scalar Triple Product* 7.5.1 Cross Product 7.5.2 Geometric Interpretation of the Cross Product 7.5.3 Scalar Triple Product 7.6 Vector Valued Functions 7.7 Line Integral 7.8 Inﬁnite Dimensional Vector Spaces* 7.9 MATLAB Commands Review 8. Matrices 8.1 Setting up Matrices 8.1.1 Creating Matrices in MATLAB 8.2 Adding Matrices 8.3 Multiplying a Matrix by a Scalar 8.4 Multiplying Matrices 8.5 Inverse of a Matrix 8.6 Solving a System of Linear Equations 8.7 Application of Matrix Methods 8.7.1 dc Circuit Analysis 8.7.2 dc Circuit Design 8.7.3 ac Circuit Analysis 8.7.4 Accuracy of a Truncated Taylor Series 8.7.5 Reconstructing a Function from Its Fourier Components 8.7.6 Interpolating the Coefﬁcients of an (n – 1)-degree Polynomial from n Points 8.7.7 Least-Square Fit of Data 8.8 Eigenvalues and Eigenvectors* 8.8.1 Finding the Eigenvalues of a Matrix 8.8.2 Finding the Eigenvalues and Eigenvectors Using MATLAB 8.9 The Cayley-Hamilton and Other Analytical Techniques* 8.9.1 Cayley-Hamilton Theorem 8.9.2 Solution of Equations of the Form dX = AX dt 8.9.3 Solution of Equations of the Form dX = AX + B(t) dt 8.9.4 Pauli Spinors 8.10 Special Classes of Matrices* 8.10.1 Hermitian Matrices © 2001 by CRC Press LLC 8.10.2 Unitary Matrices 8.10.3 Unimodular Matrices 8.11 MATLAB Commands Review 9. Transformations 9.1 Two-dimensional (2-D) Geometric Transformations 9.1.1 Polygonal Figures Construction 9.1.2 Inversion about the Origin and Reﬂection about the Coordinate Axes 9.1.3 Rotation around the Origin 9.1.4 Scaling 9.1.5 Translation 9.2 Homogeneous Coordinates 9.3 Manipulation of 2-D Images 9.3.1 Geometrical Manipulation of Images 9.3.2 Digital Image Processing 9.3.3 Encrypting an Image 9.4 Lorentz Transformation* 9.4.1 Space-Time Coordinates 9.4.2 Addition Theorem for Velocities 9.5 MATLAB Commands Review 10. A Taste of Probability Theory* 10.1 Introduction 10.2 Basics 10.3 Addition Laws for Probabilities 10.4 Conditional Probability 10.4.1 Total Probability and Bayes Theorems 10.5 Repeated Trials 10.5.1 Generalization of Bernoulli Trials 10.6 The Poisson and the Normal Distributions 10.6.1 The Poisson Distribution 10.6.2 The Normal Distribution Supplement: Review of Elementary Functions S.1 Afﬁne Functions S.2 Quadratic Functions S.3 Polynomial Functions S.4 Trigonometric Functions S.5 Inverse Trigonometric Functions S.6 The Natural Logarithmic Function S.7 The Exponential Function S.8 The Hyperbolic Functions S.9 The Inverse Hyperbolic Functions Appendix: Some Useful Formulae © 2001 by CRC Press LLC Addendum: MATLAB 6 Selected References *The asterisk indicates more advanced material that may be skipped in a ﬁrst reading. © 2001 by CRC Press LLC 1 Introduction to MATLAB® and Its Graphics Capabilities 1.1 Getting Started MATLAB can be thought of as a library of programs that will prove very useful in solving many electrical engineering computational problems. MATLAB is an ideal tool for numerically assisting you in obtaining answers, which is a major goal of engineering analysis and design. This program is very useful in circuit analysis, device design, signal processing, ﬁlter design, control system analysis, antenna design, microwave engineering, photonics engineering, computer engineering, and all other sub-ﬁelds of electrical engineering. It is also a powerful graphic and visualization tool. The ﬁrst step in using MATLAB is to know how to call it. It is important to remember that although the front-end and the interfacing for machines with different operating systems are sometimes different, once you are inside MATLAB, all programs and routines are written in the same manner. Only those few commands that are for ﬁle management and for interfacing with external devices such as printers may be different for different operating systems. After entering MATLAB, you should see the prompt >>, which means the program interpreter is waiting for you to enter instructions. (Remember to press the Return key at the end of each line that you enter.) Now type clf. This command creates a graph window (if one does not already exist) or clears an existing graph window. Because it is impossible to explain the function of every MATLAB command within this text, how would you get information on a certain command syntax? The MATLAB program has extensive help documentation available with simple commands. For example, if you wanted help on a function called roots (we will use this function often), you would type help roots. Note that the help facility cross-references other functions that may have related uses. This requires that you know the function name. If you want an idea of the available help ﬁles in MATLAB, type help. This gives you a list of topics included in MATLAB. To get help on a particular topic such as the Optimization Toolbox, type help toolbox/optim. This gives you a list of 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC all relevant functions pertaining to that area. Now you may type help for any function listed. For example, try help fmin. 1.2 Basic Algebraic Operations and Functions The MATLAB environment can be used, on the most elementary level, as a tool to perform simple algebraic manipulations and function evaluations. Example 1.1 Exploring the calculator functions of MATLAB. The purpose of this example is to show how to manually enter data and how to use basic MATLAB algebraic operations. Note that the statements will be executed immediately after they are typed and entered (no equal sign is required). Type and enter the text that follows the >> prompt to ﬁnd out the MATLAB responses to the following: 2+2 5^2 2*sin(pi/4) The last command gave the sine of π/4. Note that the argument of the function was enclosed in parentheses directly following the name of the function. Therefore, if you wanted to ﬁnd sin3(π/4), the proper MATLAB syntax would be sin(pi/4)^3 To facilitate its widespread use, MATLAB has all the standard elementary mathematical functions as built-in functions. Type help elfun, which is indexed in the main help menu to get a listing of some of these functions. Remember that this is just a small sampling of the available functions. help elfun The response to the last command will give you a large list of these elementary functions, some of which may be new to you, but all of which will be used in your future engineering studies, and explored in later chapters of this book. Example 1.2 Assigning and calling values of parameters. In addition to inputting data directly to the screen, you can assign a symbolic constant or constants to rep- © 2001 by CRC Press LLC resent data and perform manipulations on them. For example, enter and note the answer to each of the following: a=2 b=3 c=a+b d=a*b e=a/b f=a^3/b^2 g=a+3*b^2 Question: From the above, can you deduce the order in which MATLAB performs the basic operations? In-Class Exercise Pb. 1.1 Using the above values of a and b, ﬁnd the values of: a. h = sin(a) sin(b) b. i = a1/3b3/7 c. j = sin–1(a/b) = arcsin(a/b) 1.3 Plotting Points In this chapter section, you will learn how to use some simple MATLAB graphics commands to plot points. We use these graphics commands later in the text for plotting functions and for visualizing their properties. To view all the functions connected with 2-dimensional graphics, type: help plot All graphics functions connected with 3-dimensional graphics can be looked up by typing help plot3 A point P in the x-y plane is speciﬁed by two coordinates. The x-coordinate measures the horizontal distance of the point from the y-axis, while the y-coordinate measures the vertical distance above the x-axis. These coordi- © 2001 by CRC Press LLC nates are called Cartesian coordinates, and any point in the plane can be described in this manner. We write for the point, P(x, y). Other representations can also be used to locate a point with respect to a particular set of axes. For example, in the polar representation, the point is speciﬁed by an r-coordinate that measures the distance of the point from the origin, while the θ-coordinate measures the angle which the line passing through the origin and this point makes with the x-axis. The purpose of the following two examples is to learn how to represent points in a plane and to plot them using MATLAB. Example 1.3 Plot the point P(3, 4). Solution: Enter the following: x1=3; y1=4; plot(x1,y1,'*') Note that the semicolon is used in the above commands to suppress the echoing of the values of the inputs. The '*' is used to mark the point that we are plotting. Other authorized symbols for point displays include 'o', '+', 'x', … the use of which is detailed in help plot. Example 1.4 Plot the second point, R(2.5, 4) on the graph while keeping point P of the previous example on the graph. Solution: If we went ahead, deﬁned the coordinates of R, and attempted to plot the point R through the following commands: x2=2.5; y2=4; plot(x2,y2,'o') we would ﬁnd that the last plot command erases the previous plot output. Thus, what should we do if we want both points plotted on the same graph? The answer is to use the hold on command after the ﬁrst plot. The following illustrates the steps that you should have taken instead of the above: hold on x2=2.5; © 2001 by CRC Press LLC y2=4; plot(x2,y2,'o') hold off The hold off turns off the hold on feature. NOTES 1. There is no limit to the number of plot commands you can type before the hold is turned off. 2. An alternative method for viewing multiple points on the same graph is available: we may instead, following the entering of the values of x1, y1, x2, y2, enter: plot(x1,y1,'*',x2,y2,'o') This has the advantage, in MATLAB, of assigning automatically a different color to each point. 1.3.1 Axes Commands You may have noticed that MATLAB automatically adjusts the scale on a graph to accommodate the coordinates of the points being plotted. The axis scaling can be manually enforced by using the command axis([xmin xmax ymin ymax]). Make sure that the minimum axis value is less than the maximum axis value or an error will result. In addition to being able to adjust the scale of a graph, you can also change the aspect ratio of the graphics window. This is useful when you wish to see the correct x to y scaling. For example, without this command, a circle will look more like an ellipse. Example 1.5 Plot the vertices of a square, keeping the geometric proportions unaltered. Solution: Enter the following: x1=-1;y1=-1;x2=1;y2=-1;x3=-1;y3=1;x4=1;y4=1; plot(x1,y1,'o',x2,y2,'o',x3,y3,'o',x4,y4,'o') axis([-2 2 -2 2]) axis square %square shape Note that prior to the axis square command, the square looked like a rectangle. If you want to go back to the default aspect ratio, type axis normal. The % symbol is used so that you can type comments in your program. Comments following the % symbol are ignored by the MATLAB interpreter. © 2001 by CRC Press LLC 1.3.2 Labeling a Graph To add labels to your graph, the functions xlabel, ylabel, and title can be used as follows: xlabel('x-axis') ylabel('y-axis') title('points in a plane') If you desire to add a caption anywhere in the graph, you can use the MATLAB command gtext('caption') and place it at the location of your choice, on the graph, by clicking the mouse when the crosshair is properly centered there. 1.3.3 Plotting a Point in 3-D In addition to being able to plot points on a plane (2-D space), MATLAB is also able to plot points in a three-dimensional space (3-D space). For this, we utilize the plot3 function. Example 1.6 Plot the point P(3, 4, 5). Solution: Enter the following commands: x1=3; y1=4; z1=5; plot3(x1,y1,z1,'*') You can also plot multiple points in a 3-D space in exactly the same way as you did on a plane. Axis adjustment can still be used, but the vector input into the axis command must now have six entries, as follows: axis([xmin xmax ymin ymax zmin zmax]) You can similarly label your 3-D ﬁgure using xlabel, ylabel, zlabel, and title. 1.4 M-ﬁles In the last section, we found that to complete a ﬁgure with a caption, we had to enter several commands one by one in the command window. Typing © 2001 by CRC Press LLC errors will be time-consuming to ﬁx because if you are working in the command window, you need to retype all or part of the program. Even if you do not make any mistakes (!), all of your work may be lost if you inadvertently quit MATLAB and have not taken the necessary steps to save the contents of the important program that you just ﬁnished developing. To preserve large sets of commands, you can store them in a special type of ﬁle called an M-ﬁle. MATLAB supports two types of M-ﬁles: script and function M-ﬁles. To hold a large collection of commands, we use a script M-ﬁle. The function M-ﬁle is discussed in Chapter 3. To make a script M-ﬁle, you need to open a ﬁle using the built-in MATLAB editor. For both Macs and PCs, ﬁrst select New from the ﬁle menu. Then select the M-ﬁle entry from the pull-down menu. After typing the M-ﬁle contents, you need to save the ﬁle: For Macs and PCs, select the save as command from the ﬁle window. A ﬁeld will pop up in which you can type in the name you have chosen for this ﬁle (make sure that you do not name a ﬁle by a mathematical abbreviation, the name of a mathematical function, or a number). Also make sure that the ﬁle name has a .m extension added at the end of its name. For Macs, save the ﬁle in a user’s designated volume. For PCs, save the ﬁle in the default (bin) subdirectory. To run your script M-ﬁle, just type the ﬁlename (omitting the .m extension at its end) at the MATLAB prompt. Example 1.7 For practice, go to your ﬁle edit window to create the following ﬁle that you name myfile.m. clear, clf x1=1;y1=.5;x2=2;y2=1.5;x3=3;y3=2; plot(x1,y1,'o',x2,y2,'+',x3,y3,'*') axis([0 4 0 4]) xlabel('xaxis') ylabel('yaxis') title('3points in a plane') After creating and saving myfile.m, go to the MATLAB command window and enter myfile. MATLAB will execute the instructions in the order of the statements stored in your myfile.m ﬁle. © 2001 by CRC Press LLC 1.5 MATLAB Simple Programming 1.5.1 Iterative Loops The power of computers lies in their ability to perform a large number of repetitive calculations. To do this without entering the value of a parameter or variable each time that these are changed, all computer languages have control structures that allow commands to be performed and controlled by counter variables, and MATLAB is no different. For example, the MATLAB “for” loop allows a statement or a group of statements to be repeated. Example 1.8 Generate the square of the ﬁrst ten integers. Solution: Edit and execute the the following script M-ﬁle: for m=1:10 x(m)=m^2; end; In this case, the number of repetitions is controlled by the index variable m, which takes on the values m = 1 through m = 10 in intervals of 1. Therefore, ten assignments were made. What the above loop is doing is sequentially assigning the different values of m^2 (i.e., m2) in each element of the “x-array.” An array is just a data structure that can hold multiple entries. An array can be 1-D such as in a vector, or 2-D such as in a matrix. More will be said about vectors and matrices in subsequent chapters. At this time, think of the 1-D and 2-D arrays as pigeonholes with numbers or ordered pair of numbers respectively assigned to them. To ﬁnd the value of a particular slot of the array, such as slot 3, enter: x(3) To read all the values stored in the array, type: x Question: What do you get if you enter m? 1.5.2 If-Else-End Structures If a sequence of commands must be conditionally evaluated based on a relational test, the programming of this logical relationship is executed with some variation of an if-else-end structure. © 2001 by CRC Press LLC A. The simplest form of this structure is: if expression commands evaluated if expression is True else commands evaluated if expression is False end NOTES 1. The commands between the if and else statements are evaluated if all elements in the expression are true. 2. The conditional expression uses the Boolean logical symbols & (and), | (or), and ~ (not) to connect different propositions. Example 1.9 Find for integer 0 < a ≤ 10, the values of C, deﬁned as follows: C = 3 2 ab for a > 5 ab for a ≤ 5 and b = 15. Solution: Edit and execute the following script M-ﬁle: for a=1:10 b=15; if a>5 C(a)=a*b; else C(a)=(a*b)*(3/2); end end Check that the values of C that you obtain by typing C are: 22.5 45 67.5 90 112.50 90 105 120 135 150 B. When there are three or more alternatives, the if-else-end structure takes the form: if expression 1 Commands 1 evaluated if expression 1 is True © 2001 by CRC Press LLC elseif expression 2 Commands 2 evaluated if expression 2 is True elseif expression 3 Commands 3 evaluated if expression 3 is True … else Commands evaluated if no other expression is True end In this form, only the commands associated with the ﬁrst True expression encountered are evaluated; ensuing relational expressions are not tested. 1.5.2.1 Alternative Syntax to the if Statement As an alternative to the if syntax, we can use, in certain instances, Boolean expressions to specify an expression in different domains. For example, (x>=l) has the value 1 if x is larger than or equal to 1 and zero otherwise; and (x<=h) is equal to 1 when x is smaller than or equal to h, and zero otherwise. The relational operations allowed inside the parentheses are: ==, <=, >=, ~=, <, >. Homework Problem Pb. 1.2 For the values of integer a going from 1 to 10, using separately the methods of the if syntax and the Boolean alternative expressions, ﬁnd the values of C if: C = a2 C = a+5 C=a for a < 3 for 3 ≤ a < 7 for a ≥ 7 Use the stem command to graphically show C. 1.6 Array Operations In the above examples, we used for loops repeatedly. However, this kind of loop-programming is very inefﬁcient and must be avoided as much as possi- © 2001 by CRC Press LLC ble in MATLAB. In fact, ideally, a good MATLAB program will always minimize the use of loops because MATLAB is an interpreted language — not a compiled one. As a result, any looping process is very inefﬁcient. Nevertheless, at times we use the for loops, when necessitated by pedagogical reasons. To understand array operations more clearly, consider the following: a=1:3 % a starts at 1, goes to 3 in increments of 1. If the increment is not 1, you must specify the increment; for example: b=2:2:6 % b starts at 2, goes to 6 in increments of 2 To distinguish arrays operations from either operations on scalars or on matrices, the symbol for multiplication becomes .*, that of division ./, and that of exponentiation .^. Thus, for example: c=a.*b % takes every element of a and multiplies % it by the element of b in the same array location Similarly, for exponentiation and division: d=a.^b e=a./b If you try to use the regular scalar operations symbols, you will get an error message. Note that array operations such as the above require that the two arrays have the same length (i.e., the same number of elements). To verify that two arrays have the same number of elements (dimension), use the length command. Thus, to ﬁnd the length of a and b, enter: length(a) length(b) NOTE The expression x=linspace(0,10,200) is also the generator for an x-array with ﬁrst element equal to 0, a last element equal to 10, and having 200 equally spaced points between 0 and 100. Here, the number of points rather than the increment is speciﬁed; that is, length(x)=200. 1.7 Curve and Surface Plotting Review the sections of the Supplement pertaining to lines, quadratic functions, and trigonometric functions before proceeding further. © 2001 by CRC Press LLC 1.7.1 x-y Parametric Plot Now edit another M-ﬁle called myline.m as follows and execute it. N=10; for m=1:N x(m)=m; y(m)=2*m+3; end plot(x,y) After executing the M-ﬁle using myline, you should see a straight line connecting the points (1, 5) and (10, 23). This demonstration shows the basic construct for creating two arrays and plotting the points with their x-coordinate from a particular location in one array and their y-coordinate from the same location in the second array. We say that the plot command here plotted the y-array vs. the x-array. We note that the points are connected by a continuous line making a smooth curve; we say that the program graphically interpolated the discrete points into a continuous curve. If we desire to see additionally the individual points corresponding to the values of the arrays, the last command should be changed to: plot(x,y,x,y,'o') Example 1.10 Plot the two curves y1 = 2x + 3 and y2 = 4x + 3 on the same graph. Solution: Edit and execute the following script M-ﬁle: for m=1:10 x(m)=m; y1(m)=2*m+3; y2(m)=4*m+3; end plot(x,y1,x,y2) or better m=1:10; x=m; y1=2*m+3; y2=4*m+3; plot(x,y1,x,y2) Finally, note that you can separate graphs in one ﬁgure window. This is done using the subplot function in MATLAB. The arguments of the subplot function are subplot(m,n,p), where m is the number of rows partitioning the graph, n is the number of columns, and p is the particular subgraph chosen (enumerated through the left to right, top to bottom convention). © 2001 by CRC Press LLC 1.7.1.1 Demonstration: Plotting Multiple Figures within a Figure Window Using the data obtained in the previous example, observe the difference in the partition of the page in the following two sets of commands: subplot(2,1,1) plot(x,y1) subplot(2,1,2) plot(x,y2) and clf subplot(1,2,1) plot(x,y1) subplot(1,2,2) plot(x,y2) 1.7.2 More on Parametric Plots in 2-D In the preceding subsection, we generated the x- and y-arrays by ﬁrst writing the x-variable as a linear function of a parameter, and then expressed the dependent variable y as a function of that same parameter. What we did is that, instead of thinking of a function as a relation between an independent variable x and a dependent variable y, we thought of both x and y as being dependent functions of a third independent parameter. This method of curve representation, known as the parametric representation, is described by (x(t), y(t)), where the parameter t varies over some ﬁnite domain (tmin, tmax). Note, however, that in the general case, unlike the examples in the previous chapter subsection, the independent variable x need not be linear in the parameter, nor is the process of parametrization unique. Example 1.11 Plot the trigonometric circle. Solution: Recalling that the x-coordinate of any point on the trigonometric circle has the cosine as x-component and the sine as y-component, the generation of the trigonometric circle is immediate: th=linspace(0,2*pi,101) x=cos(th); y=sin(th); © 2001 by CRC Press LLC plot(x,y) axis square The parametric representation of many common curves is our next topic of interest. The parametric representation is deﬁned such that if x and y are continuous functions of t over the interval I, we can describe a curve in the x-y plane by specifying: C: x = x(t), y = y(t), and t ∈ I More Examples: In the following examples, we want to identify the curves f(x, y) = 0 corresponding to each of the given parametrizations. Example 1.12 C: x = 2t – 1, y = t + 1, and 0 < t < 2. The initial point is at x = –1, y = 1, and the ﬁnal point is at x = 3, y = 3. Solution: The curve f(x, y) = 0 form can be obtained by noting that: 2t – 1 = x ⇒ t = (x + 1)/2 Substitution into the expression for y results in: y= x+3 22 This describes a line with slope 1/2 crossing the x-axis at x = –3. Question: Where does this line cross the y-axis? Example 1.13 C: x = 3 + 3 cos(t), y = 2 + 2 sin(t), and 0 < t < 2π. The initial point is at x = 6, y = 2, and the ﬁnal point is at x = 6, y = 2. Solution: The curve f(x, y) = 0 can be obtained by noting that: sin(t) = y − 2 and cos(t) = x − 3 2 3 Using the trigonometric identity cos2(t) + sin2(t) = 1, we deduce the following equation: © 2001 by CRC Press LLC (y − 2)2 22 + (x − 3)2 32 =1 This is the equation of an ellipse centered at x = 3, y = 2 and having major and minor radii equal to 3 and 2, respectively. Question 1: What are the coordinates of the foci of this ellipse? Question 2: Compare the above curve with the curve deﬁned through: x = 3 + 3 cos(2t), y = 2 + 2 sin(2t), and 0 < t < 2π What conclusions can you draw from your answer? In-Class Exercises Pb. 1.3 Show that the following parametric equations: x = h + a sec(t), y = k + b tan(t), and –π/2 < t < π/2 are those of the hyperbola also represented by the equation: (x − h)2 a2 − (y − k)2 b2 =1 Pb. 1.4 Plot the hyperbola represented by the parametric equations of Pb. 1.3, with h = 2, k = 2, a = 1, b = 2. Find the coordinates of the vertices and the foci. (Hint: One branch of the hyperbola is traced for –π/2 < t < π/2, while the other branch is traced when π/2 < t < 3π/2.) Pb. 1.5 The parametric equations of the cycloid are given by: x = Rωt + R sin(ωt), y = R + R cos(ωt), and 0 < t Show how this parametric equation can be obtained by following the kinematics of a point attached to the outer rim of a wheel that is uniformly rolling, without slippage, on a ﬂat surface. Relate the above parameters to the linear speed and the radius of the wheel. Pb. 1.6 Sketch the curve C deﬁned through the following parametric equations: © 2001 by CRC Press LLC t + 2 x(t) = +1 − 1 3 tan π 3 (1 − t2 ) −1 + 1 3 tan π 3 (1 − t2 ) for − 3 ≤ t ≤ −1 for − 1 < t < 0 for 0 < t < 1 0 y(t) = 1 3 tan π 3 (1 − t 2 ) 1 3 tan π 3 (1 − t 2 ) for − 3 ≤ t ≤ −1 for − 1 < t < 0 for 0 < t < 1 Homework Problems The following set of problems provides the mathematical basis for understanding the graphical display on the screen of an oscilloscope, when in the x-y mode. Pb. 1.7 To put the quadratic expression Ax2 + Bxy + Cy2 + Dx + Ey + F = 0 in standard form (i.e., to eliminate the x-y mixed term), make the transformation x = x′ cos(θ) − y′ sin(θ) y = x′ sin(θ) + y′ cos(θ) Show that the mixed term is eliminated if cot(2θ) ≡ (A − C) . B Pb. 1.8 Consider the parametric equations C: x = a cos(t), y = b sin(t + ϕ), and 0 < t < 2π where the initial point is at x = a, y = b sin(ϕ), and the ﬁnal point is at x = a, y = b sin(ϕ). a. Obtain the equation of the curve in the form f(x, y) = 0. b. Using the results of Pb. 1.7, prove that the ellipse inclination angle is given by: cot(2θ) ≡ (a2 − b2 ) 2ab sin(ϕ) © 2001 by CRC Press LLC Pb. 1.9 If the parametric equations of a curve are given by: C: x = cos(t), y = sin(2t), and 0 < t < 2π where the initial point is at x = 1, y = 0, and the ﬁnal point is at x = 1, y = 0. The curve so obtained is called a Lissajous ﬁgure. It has the shape of a ﬁg- ure 8 with two nodes in the x-direction and only one node in the y-direction. What do you think the parametric equations should be if we wanted m nodes on the x-axis and n nodes on the y-axis? Test your hypothesis by plotting the results. 1.7.3 Plotting a 3-D Curve Our next area of exploration is plotting 3-D curves. Example 1.14 Plot the helix. Solution: To plot a helical curve, we can imagine initially that a point is revolving at a uniform speed around the perimeter of a circle. Now imagine that as the circular motion is continuing, the point is moving away from the x-y plane at some constant linear speed. The parametric representation of this motion can be implemented in MATLAB through the following: for m=1:201 th(m)=2*pi*.01*(m-1); x(m)=cos(th(m)); y(m)=sin(th(m)); z(m)=th(m); end plot3(x,y,z) In-Class Exercises Pb. 1.10 In the helix of Example 1.14, what is the vertical distance (the pitch) between two consecutive helical turns. How can you control this distance? Find two methods of implementation. Pb. 1.11 If instead of a circle in 2-D, as in the helix, the particle describes in 2-D a Lissajous pattern having two nodes in the y-direction and three nodes © 2001 by CRC Press LLC in the x-direction, assuming that the z-parametric equation remains the same, show the resulting 3-D trajectory. Pb. 1.12 What if z(t) is periodic in t? For example, z(t) = cos(t) or z(t) = cos(2t), while the 2-D motion is still circular. Show the 3-D trajectory. In Example 1.14, we used the for loop to generate the dependent arrays for the helix; but as pointed out previously, a more efﬁcient method to program the helix is in the array notation, as follows: th=[0:.01:2]*2*pi; x=cos(th); y=sin(th); z=th; plot3(x,y,z) 1.7.4 Plotting a 3-D Surface We now explore the two different techniques for rendering, in MATLAB, 3-D surface graphics: the mesh and the contour representations. • A function of two variables z = f(x, y) represents a surface in 3-D geometry; for example: z = ax + by + c represents a plane that crosses the vertical axis (z-axis) at c. • There are essentially two main techniques in MATLAB for viewing surfaces: the mesh function and the contour function. • In both techniques, we must ﬁrst create a 2-D array structure (like a checkerboard) with the appropriate x- and y-values. To implement this, we use the MATLAB meshgrid function. • The z-component is then expressed in the variables assigned to implement the meshgrid command. • We then plot the function with either the mesh command or the contour command. The mesh command gives a 3-D rendering of the surface, while the contour command gives contour lines, wherein each contour represents the locus of points on the surface having the same height above the x-y plane. This last rendering technique is that used by mapmakers to represent the topography of a terrain. © 2001 by CRC Press LLC 1.7.4.1 Surface Rendering Example 1.15 Plot the sinc function whose equation is given by: ( ) sin x2 + y2 z= x2 + y2 over the domain –8 < x < 8 and –8 < y < 8. Solution: The implementation of the mesh rendering follows: x=[-8:.1:8]; y=[-8:.1:8]; [X,Y]=meshgrid(x,y); R=sqrt(X.^2+Y.^2)+eps; Z=sin(R)./R; mesh(X,Y,Z) The variable eps is a tolerance number = 2–52 used for determining expressions near apparent singularities, to avoid numerical division by zero. To generate a contour plot, we replace the last command in the above by: contour(X,Y,Z,50) % The fourth argument specifies % the number of contour lines to be shown If we are interested only in a particular contour level, for example, the one with elevation Z0, we use the contour function with an option, as follows: contour(X,Y,Z,[Zo Zo]) Occasionally, we might be interested in displaying simultaneously the mesh and contour rendering of a surface. This is possible through the use of the command meshc. It is the same as the mesh command except that a contour plot is drawn beneath the mesh. Preparatory Activity: Look in your calculus book for some surfaces equations, such as those of the hyperbolic paraboloid and the elliptic paraboloid and others of your choice for the purpose of completing Pb. 1.16 of the next inclass activity. © 2001 by CRC Press LLC In-Class Exercises Pb. 1.13 Use the contour function to graphically ﬁnd the locus of points on the above sinc surface that are 1/2 units above the x-y plane (i.e., the surface intersection with the z = 1/2 plane). Pb. 1.14 Find the x-y plane intersection with the following two surfaces: z1 = 3 + x + y z2 = 4 − 2x − 4y Pb. 1.15 Verify your answers to Pb. 1.14 with that which you would obtain analytically for the shape of the intersection curves of the surfaces with the x-y plane. Also, compute the coordinates of the point of intersection of the two obtained curves. Verify your results graphically. Pb. 1.16 Plot the surfaces that you have selected in your preparatory activity. Look in the help folder for the view command to learn how to view these surfaces from different angles. 1.8 Polar Plots MATLAB can also display polar plots. In the ﬁrst example, we draw an ellipse of the form r = 1 + ε cos(θ) in a polar plot; other shapes are given in the other examples. Example 1.16 Plot the ellipse in a polar plot. Solution: The following sequence of commands plot the polar plot of an ellipse with ε = 0.2: th=0:2*pi/100:2*pi; rho=1+.2*cos(th); polar(th,rho) The shape you obtain may be unfamiliar; but to verify that this is indeed an ellipse, view the curve in a Cartesian graph. For that, you can use the MATLAB polar to Cartesian converter pol2cart, as follows: © 2001 by CRC Press LLC [x,y]=pol2cart(th,rho); plot(x,y) axis equal Example 1.17 Graph the polar plot of a spiral. Solution: The equation of the spiral is given by: r = aθ Its polar plot can be viewed by executing the following script M-ﬁle (a = 3): th=0:2*pi/100:2*pi; rho=3*th; polar(th,rho) In-Class Exercises Pb. 1.17 Prove that the polar equation r = 1 + ε cos(θ), where ε is always between –1 and 1, results in an ellipse. (Hint: Relate ε to the ratio between the semi-major and semi-minor axis.) It is worth noting that the planetary orbits are usually described in this manner in most astronomy books. Pb. 1.18 Plot the three curves described by the following polar equations: r = 2 − 2 sin(θ), r = 1 − 2 sin(θ), r = 2 sin(2θ) Pb. 1.19 Plot: r = sin(2θ) cos(2θ) The above gives a ﬂower-type curve with eight petals. How would you make a ﬂower with 16 petals? Pb. 1.20 Plot: r = sin2(θ) This two-lobed structure shows the power distribution of a simple dipole antenna. Note the directed nature of the radiation. Can you increase the directivity further? © 2001 by CRC Press LLC Pb. 1.21 Acquaint yourself with the polar plots of the following curves: (choose ﬁrst a = 1, then experiment with other values). a. Straight lines: r = 1 cos(θ) + a sin(θ) for 0 ≤ θ ≤ π 2 b. Cissoid of Diocles: r = a sin2(θ) cos(θ) for − π ≤ θ ≤ π 3 3 c. Strophoid: r = a cos(2θ) cos(θ) for − π ≤ θ ≤ π 3 3 d. Folium of Descartes: r = 3a sin(θ) cos(θ) sin3(θ) + cos3(θ) for − π ≤ θ ≤ π 6 2 1.9 Animation A very powerful feature of MATLAB is its ability to render an animation. For example, suppose that we want to visualize the oscillations of an ordinary spring. What are the necessary steps to implement this objective? 1. Determine the parametric equations that describe the curve at a ﬁxed time. In this instance, it is the helix parametric equations as given earlier in Example 1.14. 2. Introduce the time dependence in the appropriate curve parameters. In this instance, make the helix pitch to be oscillatory in time. 3. Generate 3-D plots of the curve at different times. Make sure that your axis deﬁnition includes all cases. 4. Use the movie commands to display consecutively the different frames obtained in step 3. The following script M-ﬁle implements the above workplan: th=0:pi/60:32*pi; a=1; A=0.25; w=2*pi/15; M=moviein(16); for t=1:16; x=a*cos(th); © 2001 by CRC Press LLC y=a*sin(th); z=(1+A*cos(w*(t-1)))*th; plot3(x,y,z,'r'); axis([-2 2 -2 2 0 40*pi]); M(:,t)=getframe; end movie(M,15) The statement M=moviein(16) creates the 2-D structure that stores in each column the data corresponding to a frame at a speciﬁc time. The frames themselves are generated within the for loop. The getframe function returns a pixel image of the image of the different frames. The last command plays the movie n-times (15, in this instance). 1.10 Histograms The most convenient representation for data collected from experiments is in the form of histograms. Typically, you collect data and want to sort it out in different bins; the MATLAB command for this operation is hist. But prior to getting to this point, let us introduce some array-related deﬁnitions and learn the use of the MATLAB commands that compute them. Let {yn} be a data set; it can be represented in MATLAB by an array. The largest element of this array is obtained through the command max(y), and the smallest element is obtained through the command min(y). The mean value of the elements of the array is obtained through the command mean(y), and the standard deviation is obtained through the command std(y). The deﬁnitions of the mean and of the standard deviation are, respectively, given by: N ∑ y(i) y = i=1 N σy = ∑ ∑ N N 2 N (y(i))2 − y(i) i=1 i=1 N(N − 1) where N is the dimension of the array. © 2001 by CRC Press LLC The data (i.e., the array) can be organized into a number of bins (nb) and exhibited through the command [n,y]=hist(y,nb); the array n in the output will be the number of elements in each of the bins. Example 1.18 Find the mean and the standard deviation and draw the histogram, with 20 bins, for an array whose 10,000 elements are chosen from the MATLAB builtin normal distribution with zero mean and standard deviation 1. Solution: Edit and execute the following script M-ﬁle: y=randn(1,10000); meany=mean(y) stdy=std(y) nb=20; hist(y,nb) You will notice that the results obtained for the mean and the standard deviation vary slightly from the theoretical results. This is due to the ﬁnite number of elements chosen for the array and the intrinsic limit in the built-in algorithm used for generating random numbers. NOTE The MATLAB command for generating an N-elements array of random numbers generated uniformly from the interval [0, 1] is rand(1,N). 1.11 Printing and Saving Work in MATLAB Printing a ﬁgure: Use the MATLAB print function to print a displayed ﬁgure directly to your printer. Notice that the printed ﬁgure does not take up the entire page. This is because the default orientation of the graph is in portrait mode. To change these settings, try the following commands on an already generated graphic window: orient('landscape') %full horizontal layout orient('tall') %full vertical layout Printing a program ﬁle (script M-ﬁle): For both the Mac and PC, open the M-ﬁle that you want to print. Go to the File pull-down menu, and select Print. Saving and loading variables (data): You can use the MATLAB save function to either save a particular variable or the entire MATLAB workspace. To do this, follow the following example: © 2001 by CRC Press LLC x=1;y=2; save 'user volume:x' save 'user volume:workspace' The ﬁrst save command saved the variable x into a ﬁle x.mat. You can change the name of the .mat ﬁle so it does not match the variable name, but that would be confusing. The second command saves all variables (x and y) in the workspace into workspace.mat. To load x.mat and workspace.mat, enter MATLAB and use the MATLAB load functions; note what you obtain if you entered the following commands: load 'user volume:x' x load 'user volume:workspace' y After loading the variables, you can see a list of all the variables in your workplace if you enter the MATLAB who command. What would you obtain if you had typed and entered the who command at this point? Now, to clear the workspace of some or all variables, use the MATLAB clear function. clear x %clears variable x from the workspace clear %clears all variables from workspace 1.12 MATLAB Commands Review axis contour clear clf for getframe help hold on(off) Sets the axis limits for both 2-D and 3-D plots. Axis supports the arguments equal and square, which makes the current graphs aspect ratio 1. Plots contour lines of a surface. Clears all variables from the workspace. Clears ﬁgure. Runs a sequence of commands a given number of times. Returns the pixel image of a movie frame. Online help. Holds the plot axis with existing graphics on, so that multiple ﬁgures can be plotted on the same graph (release the hold of the axes). © 2001 by CRC Press LLC if length load linspace meshgrid mesh meshc min max mean moviein movie orient plot plot3 polar pol2cart print quit or exit rand randn subplot save std stem view who xlabel, ylabel, zlabel, title (x>=x1) Conditional evaluation. Gives the length of an array. Loads data or variable values from previous sessions into current MATLAB session. Generates an array with a speciﬁed number of points between two values. Makes a 2-D array of coordinate squares suitable for plotting surface meshes. Plots a mesh surface of a surface stored in a matrix. The same as mesh, but also plots in the same ﬁgure the contour plot. Finds the smallest element of an array. Finds the largest element of an array. Finds the mean of the elements of an array. Creates the matrix that contains the frames of an ani- mation. Plays the movie described by a matrix M. Orients the current graph to your needs. Plots points or pairs of arrays on a 2-D graph. Plots points or array triples on a 3-D graph. Plots a polar plot on a polar grid. Polar to Cartesian conversion. Prints a ﬁgure to the default printer. Leave MATLAB program. Generates an array with elements randomly chosen from the uniform distribution over the interval [0, 1]. Generates an array with elements randomly chosen from the normal distribution function with zero mean and standard deviation 1. Partitions the graphics window into sub-windows. Saves MATLAB variables. Finds the standard deviation of the elements of an array. Plots the data sequence as stems from the x-axis terminated with circles for the data value. Views 3-D graphics from different perspectives. Lists all variables in the workspace. Labels the appropriate axes with text and title. Boolean function that is equal to 1 when the condition inside the parenthesis is satisﬁed, and zero otherwise. © 2001 by CRC Press LLC 2 Difference Equations This chapter introduces difference equations and examines some simple but important cases of their applications. We develop simple algorithms for their numerical solutions and apply these techniques to the solution of some problems of interest to the engineering professional. In particular, it illustrates each type of difference equation that is of widespread interest. 2.1 Simple Linear Forms The following components are needed to deﬁne and solve a difference equation: 1. An ordered array deﬁning an index for the sequence of elements 2. An equation connecting the value of an element having a certain index with the values of some of the elements having lower indices (the order of the equation being deﬁned by the number of lower indices terms appearing in the difference equation) 3. A sufﬁcient number of the values of the elements at the lowest indices to act as seeds in the recursive generation of the higher indexed elements. For example, the Fibonacci numbers are deﬁned as follows: 1. The ordered array is the set of positive integers 2. The deﬁning difference equation is of second order and is given by: F(k + 2) = F(k + 1) + F(k) (2.1) 3. The initial conditions are F(1) = F(2) = 1 (note that the required number of initial conditions should be the same as the order of the equation). 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC From the above, it is then straightforward to compute the ﬁrst few Fibonacci numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, … Example 2.1 Write a program for ﬁnding the ﬁrst 20 Fibonacci numbers. Solution: The following program fulﬁlls this task: N=18; F(1)=1; F(2)=1; for k=1:N F(k+2)=F(k)+F(k+1); end F It should be noted that the value of the different elements of the sequence depends on the values of the initial conditions, as illustrated in Pb. 2.1, which follows. In-Class Exercises Pb. 2.1 Find the ﬁrst 20 elements of the sequence that obeys the same recursion relation as that of the Fibonacci numbers, but with the following initial conditions: F(1) = 0.5 and F(2) = 1 Pb. 2.2 Find the ﬁrst 20 elements of the sequence generated by the following difference equation: F(k + 3) = F(k) + F(k + 1) + F(k + 2) with the following boundary conditions: F(1) = 1, F(2) = 2, and F(3) = 3 Why do we need to specify three initial conditions? © 2001 by CRC Press LLC 2.2 Amortization In this application of difference equations, we examine simple problems of ﬁnance that are of major importance to every engineer, on both the personal and professional levels. When the purchase of any capital equipment or real estate is made on credit, the assumed debt is normally paid for by means of a process known as amortization. Under this plan, a debt is repaid in a sequence of periodic payments where a portion of each payment reduces the outstanding principal, while the remaining portion is for interest on the loan. Suppose that the original debt to be paid is C and that interest charges are compounded at the rate r per payment period. Let y(k) be the outstanding principal after the kth payment, and u(k) the amount of the kth payment. After the kth payment period, the outstanding debt increased by the interest due on the previous principal y(k – 1), and decreased by the amount of payment u(k), this relation can be written in the following difference equation form: y(k) = (1 + r) y(k –1) – u(k) (2.2) We can simplify the problem and assume here that the bank wants its money back in equal amounts over N periods (this can be in days, weeks, months, or years; note, however, that whatever unit is used here should be the same as used for the assignment of the value of the interest rate r). Therefore, let u(k) = p for k = 1, 2, 3, …, N (2.3) Now, using Eq. (2.2), let us iterate the ﬁrst few terms of the difference equation: y(1) = (1 + r)y(0) – p = (1 + r)C – p (2.4) Since C is the original capital borrowed; At k = 2, using Eq. (2.2) and Eq. (2.4), we obtain: y(2) = (1 + r)y(1) – p = (1 + r)2C – p(1 + r) – p (2.5) At k = 3, using Eq. (2.2), (2.4), and (2.5), we obtain: y(3) = (1 + r)y(2) – p = (1 + r)3C – p(1 + r)2 – p(1 + r) – p (2.6) etc. … and for an arbitrary k, we can write, by induction, the general expression: © 2001 by CRC Press LLC k−1 ∑ y(k) = (1 + r)k C − p (1 + r)i i=0 (2.7) Using the expression for the sum of a geometric series, from the appendix, the expression for y(k) then reduces to: y(k) = (1 + r)k C − p (1 + r)k r − 1 (2.8) At k = N, the debt is paid off and the bank is owed no further payment; therefore: y( N ) = 0 = (1 + r)N C − p (1 + r)N r − 1 (2.9) From this equation, we can determine the amount of each of the (equal) payments: p = r(1 + r)N (1 + r)N − 1 C (2.10) Question: What percentage of the ﬁrst payment is going into retiring the principal? In-Class Exercises Pb. 2.3 Given the principal, the number of periods and the interest rate, use Eq. (2.10) to write a MATLAB program to ﬁnd the amount of payment per period, assuming the payment per period is the same for all periods. Pb. 2.4 Use the same reasoning as for the amortization problem to write the difference equation for an individual’s savings plan. Let y(k) be the savings balance on the ﬁrst day of the kth year and u(k) the amount of deposit made in the kth year. Write a MATLAB program to compute y(k) if the sequence u(k) and the interest rate r are given. Specialize to the case where you deposit an amount that increases by the rate of inﬂation i. Compute and plot the total value of the savings as a function of k if the deposit in the ﬁrst year is $1000, the yearly interest rate is 6%, and the yearly rate of inﬂation is 3%. (Hint: For simplicity, assume that the deposits are made on December 31 of each year, and that the balance statement is issued on January 1 of each year.) © 2001 by CRC Press LLC FIGURE 2.1 The ﬁrst few steps in the construction of the Koch curve. 2.3 An Iterative Geometric Construct: The Koch Curve In your previous studies of 2-D geometry, you encountered classical geometric objects such as the circle, the triangle, the square, different polygons, etc. These shapes only approximate the shapes that you observe in nature (e.g., the shapes of clouds, mountain ranges, rivers, coastlines, etc.). In a successful effort to address the limitations of classical geometry, mathematicians have developed, over the last century and more intensely over the last three decades, a new geometry called fractal geometry. This geometry deﬁnes the geometrical object through an iterative transformation applied an inﬁnite number of times on an initial simple geometrical object. We illustrate this new concept in geometry by considering the Koch curve (see Figure 2.1). The Koch curve has the following simple geometrical construction. Begin with a straight line of length L. This initial object is called the initiator. Now partition it into three equal parts. Then replace the middle line segment by an equilateral triangle (the segment you removed is its base). This completes the basic construction, which transformed the line segment into four non-colinear smaller parts. This constructional prescription is called the generator. We now repeat the transformation, taking each of the resulting line segments, partitioning them into three equal parts, removing the middle section, etc. © 2001 by CRC Press LLC This process is repeated indeﬁnitely. Figure 2.1 the ﬁrst two steps of this construction. It is interesting to observe that the Koch curve is an example of a curve where there is no way to ﬁt a tangent to any of its points. In a sense, it is an example of a curve that is made out of corners everywhere. The detailed study of these objects is covered in courses in fractal geometry, chaos, dynamic systems, etc. We limit ourselves here to the simple problems of determining the number of segments, the length of each segment, the length of the curve, and the area bounded by the curve and the horizontal axis, following the kth step: 1. After the ﬁrst step, we are left with a curve made up of four line segments of equal length; after the second step, we have (4 × 4) segments; and the number of segments after k steps, is n(k) = 4k (2.11) 2. If the initiator had length L, the length of the segment after the ﬁrst step is L/3, L/(3)2, after the second step and after k steps: s(k) = L/(3)k (2.12) 3. Combining the results of Eqs. (2.11) and (2.12), we deduce that the length of the curve after k steps: P(k) = L × 4 3 k (2.13) 4. The number of vertices in this curve, denoted by u(k), is equal to the number of segments plus one: u(k) = 4k + 1 (2.14) 5. The area enclosed by the Koch curve and the horizontal line can be deduced from solving a difference equation: the area enclosed after the kth step is equal to the area enclosed in the (k – 1)th step plus the number of the added triangles multiplied by their individual area: Number of new triangles = u(k) − u(k − 1) 3 (2.15) Area of the new equilateral triangle = 3 s2(k) = 4 3 4 1 3 2 k L2 (2.16) © 2001 by CRC Press LLC from which the difference equation for the area can be deduced: A(k) = A(k − 1) + u(k) − u(k 3 − 1) 3 4 L2 32k = A(k − 1) + 3 24 2 3 2k −1 L2 The initial condition for this difference equation is: (2.17) A(1) = 3 L2 49 (2.18) Clearly, the solution of the above difference equation is the sum of a geometric series, and can therefore be written analytically. For k → ∞, this area has the limit: A(k → ∞) = 3 L2 20 (2.19) However, if you did not notice the relationship of the above difference equation with the sum of a geometric series, you can still solve this equation numerically, using the following routine and assuming L = 1: N=25; A=zeros(N,1); %preallocating size of array speeds % computation m=1:N; A(1)=(sqrt(3)/24)*(2/3); for k=2:N A(k)=A(k-1)+(sqrt(3)/24)*((2/3)^(2*k-1)); end stem(m,A,'*') The above plot shows the value of the area on the ﬁrst 20 iterations of the function, and as can be veriﬁed, the numerical limit of this area has the same value as the analytical expression given in Eq. (2.19). Before leaving the Koch curve, we note that although the area of the curve goes to a ﬁnite limit as the index increases, the value of the length of the curve [Eq. (2.13)] continues to increase. This is a feature not encountered in the classical geometric objects with which you are most familiar. © 2001 by CRC Press LLC In-Class Exercise Pb. 2.5 Write a program to draw the Koch curve at the kth step. (Hint: Starting with the farthest left vertex and going clockwise, write a difference equation relating the coordinates of a vertex with those of the preceding vertex, the length of the segment, and the angle that the line connecting the two consecutive vertices makes with the x-axis.) 2.4 Solution of Linear Constant Coefﬁcients Difference Equations In Section 2.1, we explored the general numerical techniques for solving difference equations. In this section, we consider, some special techniques for obtaining the analytical solutions for the class of linear constant coefﬁcients difference equations. The related physical problem is to determine, for a linear system, the output y(k), k > 0, given a speciﬁc input u(k) and a speciﬁc set of initial conditions. We discuss, at this stage, the so-called direct method. The general expression for this class of difference equation is given by: N M ∑ ∑ ajy(k − j) = bmu(k − m) j=0 m=0 (2.20) The direct method assumes that the total solution of a linear difference equation is the sum of two parts — the homogeneous solution and the particular solution: y(k) = yhomog.(k) + ypartic.(k) (2.21) The homogeneous solution is independent of the input u(k), and the RHS of the difference equation is equated to zero; that is, N ∑ ajy(k − j) = 0 j=0 (2.22) 2.4.1 Homogeneous Solution Assume that the solution is of the form: © 2001 by CRC Press LLC yhomog.(k) = λk (2.23) Substituting in the homogeneous equation, we obtain the following algebraic equation: N ∑ ajλk−j = 0 j=0 (2.24) or λk−N (a0λN + a1λN−1 + a2λN−2 + … + aN−1λ + aN ) = 0 (2.25) The polynomial in parentheses is called the characteristic polynomial of the system. The roots can be obtained analytically for all polynomials up to order 4; otherwise, they are obtained numerically. In MATLAB, they can be obtained graphically when they are all real, or through the roots command in the most general case. We introduce this command in Chapter 5. In all the following examples in this chapter, we restrict ourselves to cases for which the roots can be obtained analytically. If we assume that the roots are all distinct, the general solution to the homogeneous difference equation is: yhomog.(k) = C1λk1 + C2λk2 + … + CN λkN (2.26) where λ1, λ2, λ3, …, λN are the roots of the characteristic polynomial. Example 2.2 Find the homogeneous solution of the difference equation y(k) – 3y(k – 1) – 4y(k – 2) = 0 Solution: The characteristic polynomial associated with this equation leads to the quadratic equation: λ2 – 3λ – 4 = 0 The roots of this equation are –1 and 4, respectively. Therefore, the solution of the homogeneous equation is: yhomog.(k) = C1(–1)k + C2(4)k The constants C1 and C2 are determined from the initial conditions y(1) and y(2). Substituting, we obtain: © 2001 by CRC Press LLC C1 = − 4 5 y(1) + y(2) 5 and C2 = y(1) + y(2) 20 NOTE If the characteristic polynomial has roots of multiplicity m, then the portion of the homogeneous solution corresponding to that root can be written, instead of C1λk, as: C1(1)λk + C1(2)k λk + … + C1(m)k m−1λk In-Class Exercises Pb. 2.6 Find the homogeneous solution of the following second-order difference equation: y(k) = 3y(k – 1) – 2y(k – 2) with the initial conditions: y(0) = 1 and y(1) = 2. Then check your results numerically. Pb. 2.7 Find the homogeneous solution of the following second-order difference equation: y(k) = [2 cos(θ)]y(k – 1) – y(k – 2) with the initial conditions: y(–2) = 0 and y(–1) = 1. Check your results numerically. 2.4.2 Particular Solution The particular solution depends on the form of the input signal. The following table summarizes the form of the particular solution of a linear equation for some simple input functions: Input Signal A (constant) AMk AkM {A cos(ω0k), A sin(ω0k)} Particular Solution B (constant) BMk B0kM + B1kM–1 + … + BM B1 cos(ω0k) + B2 sin(ω0k) For more complicated input signals, the z-transform technique provides the simplest solution method. This technique is discussed in great detail in courses on linear systems. © 2001 by CRC Press LLC In-Class Exercise Pb. 2.8 Find the particular solution of the following second-order difference equation: y(k) – 3y(k – 1) + 2y(k – 2) = (3)k for k > 0 2.4.3 General Solution The general solution of a linear difference equation is the sum of its homogeneous solution and its particular solution, with the constants adjusted, so as to satisfy the initial conditions. We illustrate this general prescription with an example. Example 2.3 Find the complete solution of the ﬁrst-order difference equation: y(k + 1) + y(k) = k with the initial condition y(0) = 0. Solution: First, solve the homogeneous equation y(k + 1) + y(k) = 0. The characteristic polynomial is λ + 1 = 0; therefore, yhomog. = C(–1)k The particular solution can be obtained from the above table. Noting that the input signal has the functional form kM, with M = 1, then the particular solution is of the form: ypartic. = B0k + B1 (2.27) Substituting back into the original equation, and grouping the different powers of k, we deduce that: B0 = 1/2 and B1 = –1/4 The complete solution of the difference equation is then: y(k) = C(−1)k + 2k − 1 4 © 2001 by CRC Press LLC The constant C is determined from the initial condition: y(0) = 0 = C(−1)0 + (−1) 4 giving for the constant C the value 1/4. In-Class Exercises Pb. 2.9 Use the following program to model Example 2.3: N=19; y(1)=0; for k=1:N y(k+1)=k-y(k); end y Verify the closed-form answer. Pb. 2.10 Find, for k ≥ 2, the general solution of the second-order difference equation: y(k) – 3y(k – 1) – 4y(k – 2) = 4k + 2 × 4k–1 with the initial conditions y(0) = 1 and y(1) = 9. (Hint: When the functional form of the homogeneous and particular solutions are the same, use the same functional form for the solutions as in the case of multiple roots for the characteristic polynomial.) Answer: y(k) = − 1 25 (−1)k + 26 25 (4)k 6 5 k 4 k Homework Problems Pb. 2.11 Given the general geometric series y(k), where: y(k) = 1 + a + a2 + … + ak show that y(k) obeys the ﬁrst-order equation: © 2001 by CRC Press LLC y(k) = y(k – 1) + ak Pb. 2.12 Show that the response of the system: y(k) = (1 – a)u(k) + a y(k – 1) to a step signal of amplitude c; that is, u(k) = c for all positive k, is given by: y(k) = c(1 – ak+1) for k = 0, 1, 2, … where the initial condition y(–1) = 0. Pb. 2.13 Given the ﬁrst-order difference equation: y(k) = u(k) + y(k – 1) for k = 0, 1, 2, … with the input signal u(k) = k, and the initial condition y(–1) = 0. Verify that its solution also satisﬁes the second-order difference equation y(k) = 2y(k – 1) – y(k – 2) + 1 with the initial conditions y(0) = 0 and y(–1) = 0. Pb. 2.14 Verify that the response of the system governed by the ﬁrst-order difference equation: y(k) = bu(k) + a y(k – 1) to the alternating input: u(k) = (–1)k for k = 0, 1, 2, 3, … is given by: y(k) = b [(−1)k + ak+1] for k = 0, 1, 2, 3,… 1+ a if the initial condition is: y(–1) = 0. Pb. 2.15 The impulse response of a system is the output from this system when excited by an input signal δ(k) that is zero everywhere, except at k = 0, where it is equal to 1. Using this deﬁnition and the general form of the solution of a difference equation, write the output of a linear system described by: y(k) – 3y(k – 1) – 4y(k – 2) = δ(k) + 2δ(k – 1) The initial conditions are: y(–2) = y(–1) = 0. Answer: y(k) = − 1 5 (−1)k + 6 5 (4)k for k > 0 © 2001 by CRC Press LLC Pb. 2.16 The expression for the National Income is given by: y(k) = c(k) + i(k) + g(k) where c is consumer expenditure, i is the induced private investment, g is the government expenditure, and k is the accounting period, typically corresponding to a particular quarter. Samuelson theory, introduced to many engineers in Cadzow’s classic Discrete Time Systems (see reference list), assumes the following properties for the above three components of the National Income: 1. Consumer expenditure in any period k is proportional to the National Income at the previous period: c(k) = ay(k – 1) 2. Induced private investment in any period k is proportional to the increase in consumer expenditure from the preceding period: i(k) = b[c(k) – c(k – 1)] = ab[y(k – 1) – y(k – 2)] 3. Government expenditure is the same for all accounting periods: g(k) = g Combining the above equations, the National Income obeys the secondorder difference equation: y(k) = g + a(1 + b) y(k – 1) – aby(k – 2) for k = 1, 2, 3, … The initial conditions y(–1) and y(0) are to be speciﬁed. Plot the National Income for the ﬁrst 40 quarters of a new national entity, assuming that: a = 1/6, b = 1, g = $10,000,000, y(–1) = $20,000,000, y(0) = $30,000,000. How would the National Income curve change if the marginal propensity to consume (i.e., the constant a) is decreased to 1/8? 2.5 Convolution-Summation of a First-Order System with Constant Coefﬁcients The amortization problem in Section 2.2 was solved by obtaining the present output, y(k), as a linear combination of the present and all past inputs, (u(k), © 2001 by CRC Press LLC u(k – 1), u(k – 2), …). This solution technique is referred to as the convolutionsummation representation: ∞ ∑ y(k) = w(i) u(k − i) i=0 (2.28) where the w(i) is the weighting function (or weight). Usually, the inﬁnite sum is reduced to a ﬁnite sum because the inputs with negative indexes are usually assumed to be zeros. On the other hand, in the difference equation formulation of this class of problems, the present output y(k) is expressed as a linear combination of the present and m most recent inputs and of the n most recent outputs, speciﬁcally: y(k) = b0u(k) + b1u(k – 1) + … + bmu(k – m) – a1y(k – 1) – a2y(k – 2) – … – any(k – n) (2.29) where, of course, n is the order of the difference equation. Elementary techniques for solving this class of equations were introduced in Section 2.4. However, the most powerful technique to directly solve the linear difference equation with constant coefﬁcients is, as pointed out earlier, the z-transform technique. Each of the above formulations of the input-output problem has distinct advantages in different circumstances. The direct difference equation formulation is the most amenable to numerical computations because of lower computer memory requirements, while the convolution-summation technique has the advantage of being suitable for developing mathematical proofs and ﬁnding general features for the difference equation. Relating the parameters of the two formulations of this problem is usually cumbersome without the z-transform technique. However, for ﬁrst-order difference equations, this task is rather simple. Example 2.4 Relate, for a ﬁrst-order difference equation with constant coefﬁcients, the sets {an} and {bn} with {wn}. Solution: The ﬁrst-order difference equation is given by: y(k) = b0u(k) + b1u(k – 1) – a1y(k – 1) where u(k) = 0 for all k negative. From the difference equation and the initial conditions, we can directly write: y(0) = b0u(0) © 2001 by CRC Press LLC Similarly, for k = 1, y(1) = = b0 b0 u(1) u(1) + + b1u(0) b1u(0) − − a1y(0) a1b0u(0) = b0u(1) + (b1 − a1b0 )u(0) y(2) = b0u(2) + (b1 − a1b0 )u(1) − a1(b1 − a1b0 )u(0) y(3) = b0u(3) + (b1 − a1b0 )u(2) − a1(b1 − a1b0 )u(1) + a12 (b1 − a1b0 )u(0) or, more generally, if: y(k) = w(0)u(k) + w(1)u(k – 1) + … + w(k)u(0) then, w(0) = b0 w(i) = (−a1)i−1(b1 − a1b0 ) for i = 1, 2, 3,… In-Class Exercises Pb. 2.17 Using the convolution-summation technique, ﬁnd the closed form solution for: y(k) = u(k) − 1 u(k − 1) + 1 y(k − 1) 3 2 u(k) = 0 for k negative and the input function given by: u(k) = 1 otherwise Compare your analytical answer with the numerical solution. Pb. 2.18 Show that the resultant weight functions for two systems are, respectively: w(k) = w1(k) + w2(k) if connected in parallel k ∑ w(k) = w2(i)w1(k − i) if connected in cascade i=0 © 2001 by CRC Press LLC 2.6 General First-Order Linear Difference Equations* Thus far, we have considered difference equations with constant coefﬁcients. Now we consider ﬁrst-order difference equations with arbitrary functions as coefﬁcients: y(k + 1) + A(k)y(k) = B(k) (2.30) The homogeneous equation corresponding to this form satisﬁes the following equation: l(k + 1) + A(k)l(k) = 0 (2.31) Its expression can be easily found: l(k + 1) = −A(k)l(k) = A(k)A(k − 1)l(k − 1) = … = ∏ = (−1)k+1 A(k ) A( k − 1)… A(0)l(0) = k [−A(i)]l(0) i=0 (2.32) Assuming that the general solution is of the form: y(k) = l(k)v(k) (2.33) let us ﬁnd v(k). Substituting the above trial solution in the difference equation, we obtain: l(k + 1)v(k + 1) + A(k)l(k)v(k) = B(k) (2.34) Further, assuming that v(k + 1) = v(k) + ∆v(k) (2.35) substituting in the difference equation, and recalling that l(k) is the solution of the homogeneous equation, we obtain: ∆v(k) = B(k) l(k + 1) Summing this over the variable k from 0 to k, we deduce that: (2.36) © 2001 by CRC Press LLC where C is a constant. ∑k v(k + 1) = B( j) + C l( j + 1) j=0 (2.37) Example 2.5 Find the general solution of the following ﬁrst-order difference equation: y(k + 1) – k2y(k) = 0 with y(1) = 1. Solution: y(k + 1) = k2y(k) = k2(k − 1)2 y(k − 1) = k2(k − 1)2(k − 2)2 y(k − 2) = k2(k − 1)2(k − 2)2(k − 3)2 y(k − 3) = … = k 2 (k − 1)2 (k − 2)2 (k − 3)2 …(2)2 (1)2 y(1) = (k!)2 Example 2.6 Find the general solution of the following ﬁrst-order difference equation: (k + 1)y(k + 1) – ky(k) = k2 with y(1) = 1. Solution: Reducing this equation to the standard form, we have: A(k) = − k and B(k) = k2 k+1 k+1 The homogeneous solution is given by: l(k + 1) = (k k! + 1)! = (k 1 + 1) The particular solution is given by: ∑ ∑ k v(k + 1) = j2 ( j + 1) + C = k j2 + C = (k + 1)(2k + 1)k + C j=1 ( j + 1) j=1 6 © 2001 by CRC Press LLC where we used the expression for the sum of the square of integers (see Appendix). The general solution is then: y(k + 1) = (2k + 1)k + C 6 (k + 1) From the initial condition y(1) = 1, we deduce that: C = 1. In-Class Exercise Pb. 2.19 Find the general solutions for the following difference equations, assuming that y(1) = 1. a. y(k + 1) – 3ky(k) = 3k. b. y(k + 1) – ky(k) = k. 2.7 Nonlinear Difference Equations In this and the following chapter section, we explore a number of nonlinear difference equations that exhibit some general features typical of certain classes of solutions and observe other instances with novel qualitative features. Our exploration is purely experimental, in the sense that we restrict our treatment to guided computer runs. The underlying theories of most of the models presented are the subject of more advanced courses; however, many educators, including this author, believe that there is virtue in exposing students qualitatively early on to these fascinating and generally new developments in mathematics. 2.7.1 Computing Irrational Numbers In this model, we want to exhibit an example of a nonlinear difference equation whose solution is a sequence that approaches a speciﬁc limit, irrespective, within reasonable constraints, of the initial condition imposed on it. This type of difference equation has been used to compute a class of irrational numbers. For example, a well-deﬁned approximation for computing A is the feedback process: y(k + 1) = 1 2 y(k) + A y(k) (2.38) This equation’s main features are explored in the following exercise. © 2001 by CRC Press LLC In-Class Exercise Pb. 2.20 Using the difference equation given by Eq. (2.38): a. Write down a routine to compute 2 . As an initial guess, take the initial value to be successively: 1, 1.5, 2; even consider 5, 10, and 20. What is the limit of each of the obtained sequences? b. How many iterations are required to obtain 2 accurate to four digits for each of the above initial conditions? c. Would any of the above properties be different for a different choice of A. Now, having established that the above sequence goes to a limit, let us prove that this limit is indeed A. To prove the above assertion, let this limit be denoted by ylim; that is, for large k, both y(k) and y(k + 1) ⇒ ylim, and the above difference equation goes in the limit to: ylim = 1 2 ylim + A ylim Solving this equation, we obtain: (2.39) ylim = A (2.40) It should be noted that the above derivation is meaningful only when a limit exists and is in the domain of deﬁnition of the sequence (in this case, the real numbers). In Section 2.7.2, we encounter a sequence where, for some values of the parameters, there is no limit. 2.7.2 The Logistic Equation Section 2.7.1 illustrated the case in which the solution of a nonlinear difference equation converges to a single limit for large values of the iteration index. In this chapter subsection, we consider the case in which a succession of iterates (called orbits) bifurcate, yielding orbits of period length 2, 4, 8, 16, ad inﬁnitum, ending in what is called a “chaotic” orbit of inﬁnite period length. We illustrate the prototype for this class of difference equations by exploring the logistic difference equation. The logistic equation was introduced by Verhulst to model the growth of populations limited by ﬁnite resources (the name logistic was coined by the French army under Napoleon when this equation was used for the planning of “logement” of troops in camps). In more modern settings of ecology, the © 2001 by CRC Press LLC above model is used to simulate a population growth model. Speciﬁcally, in an ecological or growth process, the normalized measure y(k + 1) of the next generation of a specie (the number of animals, for example) is a linear function of the present measure y(k); that is, y(k + 1) = ry(k) (2.41) where r is the growth parameter. If unchecked, the growth of the specie follows a geometric series, which for r > 1 grows to inﬁnity. But growth is often limited by ﬁnite resources. In other words, the larger y(k), the smaller the growth factor. The simplest way to model this decline in the growth factor is to replace r by r(1 – y(k)), so that as y(k) approaches the theoretical limit (1 in this case), the effective growth factor goes to zero. The difference equation goes to: y(k + 1) = r(1 – y(k))y(k) (2.42) which is the standard form for the logistic equation. In the next series of exercises, we explore the solution of Eq. (2.42) as we vary the value of r. We ﬁnd that qualitatively different classes of solutions may appear for different values of r. We start by writing the simple subroutine that models Eq. (2.42): N=127; r= ; y(1)= ; m=1:N+1; for k=1:N y(k+1)= r*(1-y(k))*y(k); end plot(m,y,'*') x The values of r and y(1) are to be keyed in for each of the speciﬁc cases under consideration. In-Class Exercises In the following two problems, we take in the logistic equation r > 1 and y(1) < 1. Pb. 2.21 Consider the case that 1 < r < 3 and y(1) = 0.5. a. Show that by running the above program for different values of r and y(1) that the iteration of the logistic equation leads to the limit y(N >> 1) = r − r 1 . © 2001 by CRC Press LLC b. Does the value of this limit change if the value of y(1) is modiﬁed, while r is kept ﬁxed? Pb. 2.22 Find the iterates of the logistic equation for the following values of r: 3.1, 3.236068, 3.3, 3.498561699, 3.566667, and 3.569946, assuming the following three initial conditions: y(1) = 0.2, y(1) = 0.5, y(1) = 0.7 In particular, specify for each case: a. The period of the orbit for large N, and the values of each of the iterates. b. Whether the orbit is super-stable (i.e., the periodicity is present for all values of N). This section provided a quick glimpse of two types of nonlinear difference equations, one of which may not necessarily converge to one value. We discovered that a great number of classes of solutions may exist for different values of the equation’s parameters. In Section 2.8 we generalize to 2-D. Section 2.8 illustrates nonlinear difference equations in 2-D geometry. The study of these equations has led in the last few decades to various mathematical discoveries in the branches of mathematics called Symbolic Dynamical theory, Fractal Geometry, and Chaos theory, which have far-reaching implications in many ﬁelds of engineering. The interested student/reader is encouraged to consult the References section of this book for a deeper understanding of this subject. 2.8 Fractals and Computer Art In Section 2.4, we introduced a fractal type having a priori well-deﬁned and apparent spatial symmetries, namely, the Koch curve. In Section 2.7, we discovered that a certain type of 1-D nonlinear difference equation may lead, for a certain range of parameters, to a sequence that may have different orbits. Section 2.8.1 explores examples of 2-D fractals, generated by coupled difference equations, whose solution morphology can also be quite distinct due solely to a minor change in one of the parameters of the difference equations. Section 2.8.2 illustrates another possible feature observed in some types of fractals. We show how the 2-D orbit representing the solution of a particular nonlinear difference equation can also be substantially changed through a minor variation in the initial conditions of the equation. © 2001 by CRC Press LLC FIGURE 2.2 Plot of the Mira curve for a = 0.99. The starting point coordinates are (4, 0). Top panel: b = 1, bottom panel: b = 0.98. 2.8.1 Mira’s Model The coordinates of the points on the Mira curve are generated iteratively through the following system of nonlinear difference equations: x(k + 1) = by(k) + F(x(k)) y(k + 1) = −x(k) + F((x(k + 1))) (2.43) where F(x) = ax + 2(1 1 − + a)x 2 x2 (2.44) We illustrate the different morphologies of the solutions in two different cases, and leave other cases as exercises for your fun and exploration. © 2001 by CRC Press LLC Case 1 Here, a = –0.99, and we consider the cases b = 1 and b = 0.98. The starting point coordinates are (4, 0). See Figure 2.2. This case can be viewed by editing and executing the following script M-ﬁle: for n=1:12000 a=-0.99;b1=1;b2=0.98; x1(1)=4;y1(1)=0;x2(1)=4;y2(1)=0; x1(n+1)=b1*y1(n)+a*x1(n)+2*(1-a)*(x1(n))^2/(1+ (x1(n)^2)); y1(n+1)=-x1(n)+a*x1(n+1)+2*(1-a)*(x1(n+1)^2)/(1+ (x1(n+1)^2)); x2(n+1)=b2*y2(n)+a*x2(n)+2*(1-a)*(x2(n))^2/(1+ (x2(n)^2)); y2(n+1)=-x2(n)+a*x2(n+1)+2*(1-a)*(x2(n+1)^2)/(1+ (x2(n+1)^2)); end subplot(2,1,1); plot(x1,y1,'.') title('a=-0.99 b=1') subplot(2,1,2); plot(x2,y2,'.') title('a=-0.99 b=0.98') Case 2 Here, a = 0.7, and we consider the cases b = 1 and b = 0.9998. The starting point coordinates are (0, 12.1). See Figure 2.3. In-Class Exercise Pb. 2.23 Manifest the computer artist inside yourself. Generate new geometrical morphologies, in Mira’s model, by new choices of the parameters (–1 < a < 1 and b ≈ 1) and of the starting point. You can start with: a −0.48 −0.25 0.1 0.5 0.99 b1 b2 1 0.93 (x1, y1) (4, 0) 1 0.99 (3, 0) 1 0.99 (3, 0) 1 0.9998 (3, 0) 1 0.9998 (0, 12) © 2001 by CRC Press LLC 15 10 5 0 -5 -10 -15 -20 -15 -10 a=0.7 b=1 -5 0 5 10 15 a=0.7 b=0.9998 15 10 5 0 -5 -10 -15 -20 -15 -10 -5 0 5 10 15 20 FIGURE 2.3 Plot of the Mira curve for a = 0.7. The starting point coordinates are (0, 12.1). Top panel: b = 1, bottom panel: b = 0.9998. 2.8.2 Hénon’s Model The coordinates of the Hénon’s orbits are generated iteratively through the following system of nonlinear difference equations: x(k + 1) = ax(k + 1) − b(y(k) − (x(k))2 ) y(k + 1) = bx(k + 1) + a(y(k) − (x(k))2 ) (2.45) where a ≤ 1 and b = 1 − a2 . Executing the following script M-ﬁle illustrates the possibility of generating two distinct orbits if the starting points of the iteration are slightly different (here, a = 0.24), and the starting points are slightly different from each other. The two cases initial point coordinates are given, respectively, by (0.5696, 0.1622) and (0.5650, 0.1650). See Figure 2.4. a=0.24; b=0.9708; © 2001 by CRC Press LLC FIGURE 2.4 Plot of two Hénon orbits having the same a = 0.25 but different starting points. (o) corresponds to the orbit with starting point (0.5696, 0.1622), (x) corresponds to the orbit with starting point (0.5650, 0.1650). x1(1)=0.5696;y1(1)=0.1622; x2(1)=0.5650;y2(1)=0.1650; for n=1:120 x1(n+1)=a*x1(n)-b*(y1(n)-(x1(n))^2); y1(n+1)=b*x1(n)+a*(y1(n)-(x1(n))^2); x2(n+1)=a*x2(n)-b*(y2(n)-(x2(n))^2); y2(n+1)=b*x2(n)+a*(y2(n)-(x2(n))^2); end plot(x1,y1,'ro',x2,y2,'bx') 2.8.2.1 Demonstration Different orbits for Hénon’s model can be plotted if different starting points are randomly chosen. Executing the following script M-ﬁle illustrates the a = 0.24 case, with random initial conditions. See Figure 2.5. a=0.24; b=sqrt(1-a^2); rx=rand(1,40); ry=rand(1,40); © 2001 by CRC Press LLC FIGURE 2.5 Plot of multiple Hénon orbits having the same a = 0.25 but random starting points. for n=1:1500 for m=1:40 x(1,m)=-0.99+2*rx(m); y(1,m)=-0.99+2*ry(m); x(n+1,m)=a*x(n,m)-b*(y(n,m)-(x(n,m))^2); y(n+1,m)=b*x(n,m)+a*(y(n,m)-(x(n,m))^2); end end plot(x,y,'r.') axis([-1 1 -1 1]) axis square 2.9 Generation of Special Functions from Their Recursion Relations* In this section, we go back to more classical mathematics. We consider the case of the special functions of mathematical physics. In this case, we need to © 2001 by CRC Press LLC deﬁne the iterated quantities by two indices: the order of the function and the value of the argument of the function. In many electrical engineering problems, it is convenient to use a class of polynomials called the orthogonal polynomials. For example, in ﬁlter design, the set of Chebyshev polynomials are of particular interest. The Chebyshev polynomials can be deﬁned through recursion relations, which are similar to difference equations and relate the value of a polynomial of a certain order at a particular point to the values of the polynomials of lower orders at the same point. These are deﬁned through the following recursion relation: Tk(x) = 2xTk–1(x) – Tk–2(x) (2.46) Now, instead of giving two values for the initial conditions as we would have in difference equations, we need to give the explicit functions for two of the lower-order polynomials. For example, the ﬁrst- and second-order Chebyshev polynomials are T1(x) = x (2.47) T2(x) = 2x2 – 1 (2.48) Example 2.7 Plot over the interval 0 ≤ x ≤ 1, the ﬁfth-order Chebyshev polynomial. Solution: The strategy to solve this problem is to build an array to represent the x-interval, and then use the difference equation routine to ﬁnd the value of the Chebyshev polynomial at each value of the array, remembering that the indexing should always be a positive integer. The following program implements the above strategy: N=5; x1=1:101; x=(x1-1)/100; T(1,x1)=x; T(2,x1)=2*x.^2-1; for k=3:N T(k,x1)=2.*x.*T(k-1,x1)-T(k-2,x1); end y=T(N,x1); plot(x,y) © 2001 by CRC Press LLC In-Class Exercise Pb. 2.24 By comparing their plots, verify that the above deﬁnition for the Chebyshev polynomial gives the same graph as that obtained from the closed-form expression: TN(x) = cos(N cos–1(x)) for 0 ≤ x ≤ 1 In addition to the Chebyshev polynomials, you will encounter other orthogonal polynomials in your engineering studies. In particular, the solutions of a number of problems in electromagnetic theory and in quantum mechanics (QM) call on the Legendre, Hermite, Laguerre polynomials, etc. In the following exercises, we explore, in a preliminary manner, some of these polynomials. We also explore another important type of the special functions: the spherical Bessel function. Homework Problems Pb. 2.25 Plot the function y deﬁned, in each case: (m + 2)Pm+2 (x) = (2m + 3)xPm+1(x) − (m + 1)Pm(x) a. Legendre polynomials: P1(x) = x and P2 (x) = 1 2 (3x2 − 1) For 0 ≤ x ≤ 1, plot y = P5(x) These polynomials describe the electric ﬁeld distribution from a nonspherical charge distribution. b. Hermite polynomials: H H m+2 ( 1(x) x) = = 2xHm+1(x) − 2(m + 2x and H2(x) = 4x 1)Hm 2 −2 (x) For 0 ≤ x ≤ 6, plot y = A5H5(x) exp(−x2 / 2), where Am = (2m m! π )−1/2 The function y describes the QM wave-function of the harmonic oscillator. c. Laguerre polynomials: LL1m(+x2)(x=) = [(3 1− x + 2m and + −x)Lm+1(x) − (m + L2(x) = (1 − 2x + x2 1)2 Lm / 2) ( x)] / (m + 2) For 0 ≤ x ≤ 6, plot y = exp(–x/2)L5(x) The Laguerre polynomials ﬁgure in the solutions of the QM problem of atoms and molecules. © 2001 by CRC Press LLC Pb. 2.26 The recursion relations can, in addition to deﬁning orthogonal polynomials, also deﬁne some special functions of mathematical physics. For example, the spherical Bessel functions that play an important role in deﬁning the modes of spherical cavities in electrodynamics and scattering amplitudes in both classical and quantum physics are deﬁned through the following recursion relation: With jm+2 (x) = 3 + 2m x jm+1(x) − jm(x) j1(x) = sin(x) x2 − cos(x) x and j2(x) = 3 x3 − 1 x sin(x) − 3 cos(x) x2 Plot j5(x) over the interval 0 < x < 15. © 2001 by CRC Press LLC 3 Elementary Functions and Some of Their Uses The purpose of this chapter is to illustrate and build some practice in the use of elementary functions in selected basic electrical engineering problems. We also construct some simple signal functions that you will encounter in future engineering analysis and design problems. NOTE It is essential to review the Supplement at the end of this book in case you want to refresh your memory on the particular elementary functions covered in the different chapter sections. 3.1 Function Files To analyze and graph functions using MATLAB, we have to be able to construct functions that can be called from within the MATLAB environment. In MATLAB, functions are made and stored in function M-ﬁles. We already used one kind of M-ﬁle (script ﬁle) to store various executable commands in a routine. Function M-ﬁles differ from script M-ﬁles in that they have designated input(s) and output(s). The following is an example of a function. Type and save the following function in a ﬁle named aline.m: function y=aline(x) % (x,y) is a point on a line that has slope 3 % and y-intercept -5 y=3*x-5; NOTES 1. The word function at the beginning of the ﬁle makes it a function rather than a script ﬁle. 2. The function name, aline, that appears in the ﬁrst line of this ﬁle should match the name that we assign to this ﬁle name when saving it (i.e., aline.m). Having created a function M-ﬁle in your user volume, move to the command window to learn how to call this function. There are two basic ways to use a function ﬁle: 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC 1. To evaluate the function for a speciﬁed value x=x1, enter aline(x1) to get the function value at this point; that is, y1 = 3x1 – 5. 2. To plot y1 = 3x1 – 5 for a range of x values, say [–2, 7], enter: fplot('aline',[-2,7]) NOTE The above example illustrates a function with one input and one output. The construction of a function M-ﬁle of a function having n inputs and m outputs starts with: function [y1,y2,...,ym]=funname(x1,x2,...,xn) Above, using a function M-ﬁle, we showed a method to plot the deﬁned function aline on the interval (–2, 7) using the fplot command. An alternative method is, of course, to use arrays, in the manner speciﬁed in Chapter 1. Speciﬁcally, we could have plotted the 'aline' function in the following alternate method: x=-2:.01:7; y=3*x-5; plot(x,y) To compare the two methods, we note that: 1. plot requires a user-supplied x-array (abscissa points) and a constructed y-array (ordinate points), while fplot only requires the name of the function ﬁle, deﬁned previously and stored in a function M-ﬁle and the endpoints of the interval. 2. The fplot automatically creates a sampled domain that is used to plot the function, taking into account the type of function being plotted and using enough points to make the display appear continuous. On the other hand, plot requires that you choose the array length yourself. Both methods, therefore, have their own advantages and it depends on the particular problem whether to use plot or fplot. We are now in position to explore the use of some of the most familiar functions. 3.2 Examples with Afﬁne Functions The equation of an afﬁne function is given by: © 2001 by CRC Press LLC y(x) = ax + b (3.1) In-Class Exercises Pb. 3.1 Generate four function M-ﬁles for the following four functions: y1(x) = 3x + 2; y2(x) = 3x + 5; y3 (x) = − x 3 + 3; y4 (x) = − x 3 + 4 Pb. 3.2 Sketch the functions of Pb. 3.1 on the interval –5 < x < 5. What can you say about the angle between each of the two lines’ pairs. (Did you remember to make your aspect ratio = 1?) Pb. 3.3 Read off the graphs the coordinates of the points of intersection of the lines in Pb. 3.1. (Become familiar with the use and syntax of the zoom and ginput commands for a more accurate reading of the coordinates of a point.) Pb. 3.4 Write a function M-ﬁle for the line passing through a given point and intersecting another given line at a given angle. Hint: tan(a + b) = 1ta−nt(aan)(+a)ttaann((bb)) Application to a Simple Circuit The purpose of this application is to show that: 1. The solution to a simple circuit problem can be viewed as the simultaneous solution of two afﬁne equations, or, equivalently, as the intersection of two straight lines. 2. The variations in the circuit performance can be studied through a knowledge of the afﬁne functions, relating the voltages and the current. Consider the simple circuit shown in Figure 3.1. In the terminology of the circuit engineer, the voltage source VS is called the input to the circuit, and the current I and the voltage V are called the circuit outputs. Thus, this is an example of a system with one input and two outputs. As you may have studied in high school physics courses, all of circuit analysis with resistors as elements can be accomplished using Kirchhoff’s current law, Kirchoff’s voltage law, and Ohm’s law. • Kirchoff’s voltage law: The sum of all voltage drops around a closed loop is balanced by the sum of all voltage sources around the same loop. © 2001 by CRC Press LLC + I R1 Vs R V _ FIGURE 3.1 A simple resistor circuit. • Kirchoff’s current law: The algebraic sum of all currents entering (exiting) a circuit node must be zero. (Assign the + sign to those currents that are entering the node, and the – sign to those current exiting the node.) • Ohm’s law: The ratio of the voltage drop across a resistor to the current passing through the resistor is a constant, deﬁned as the resistance of the element; that is, R = V I The quantities we are looking for include (1) the current I through the circuit, and (2) the voltage V across the load resistor R. Using Kirchoff’s voltage law and Ohm’s law for resistance R1, we obtain: Vs = V + V1 = V + IR1 (3.2) while applying Ohm’s law for the load resistor gives: V = IR (3.3) These two equations can be rewritten in the form of afﬁne functions of I as functions of V: L1: I = (Vs − V) R1 (3.4) L2 : I=V R (3.5) © 2001 by CRC Press LLC If we know the value of Vs, R, and R1, then Eqs. (3.4) and (3.5) can be represented as lines drawn on a plane with ordinate I and abscissa V. Suppose we are interested in ﬁnding the value of the current I and the voltage V when R1 = 100Ω, R = 100Ω, and Vs = 5 V. To solve this problem graphically, we plot each of the L1 and L2 functions on the same graph and ﬁnd their point of intersection. The functions L1 and L2 are programmed as follows: function I=L1(V) R1=100; R=100; Vs=5; I=(Vs-V)/R1; function I=L2(V) R1=100; R=100; Vs=5; I=V/R; Because the voltage V is smaller than the source potential, due to losses in the resistor, a suitable domain for V would be [0, 5]. We now plot the two lines on the same graph: fplot('L1',[0,5]) hold on fplot('L2',[0,5]) hold off In-Class Exercise Pb. 3.5 Verify that the two lines L1 and L2 intersect at the point: (I = 0.025, V = 2.5). In the above analysis, we had to declare the numerical values of the parameters R1 and R in the deﬁnition of each of the two functions. This can, at best, be tedious if you are dealing with more than two function M-ﬁles or two parameters; or worse, can lead to errors if you overlook changing the values of the parameters in any of the relevant function M-ﬁles when you decide to modify them. To avoid these types of problems, it is good practice to call all © 2001 by CRC Press LLC functions from a single script M-ﬁle and link the parameters’ values together so that you only need to edit the calling script M-ﬁle. To link the values of parameters to all functions in use, you can use the MATLAB global command. To see how this works, rewrite the above function M-ﬁles as follows: function I=L1(V) global R1 R Vs=5; I=(Vs-V)/R1; % global statement function I=L2(V) global R1 R Vs=5; I=V/R; % global statement The calling script M-ﬁle now reads: global R1 R R1=100; R=100; V=0:.01:5; I1=L1(V); I2=L2(V); plot(V,I1,V,I2,'-') %global statement %set global resistance values %set the voltage range %evaluate I1 %evaluate I2 %plot the two curves In-Class Exercise Pb. 3.6 In the above script M-ﬁle, we used arrays and the plot command. Rewrite this script ﬁle such that you make use of the fplot command. Further Consideration of Figure 3.1 Calculating the circuit values for ﬁxed resistor values is important, but we can also ask about the behavior of the circuit as we vary the resistor values. Suppose we keep R1 = 100Ω and Vs = 5 V ﬁxed, but vary the value that R can take. To this end, an analytic solution would be useful because it would give us the circuit responses for a range of values of the circuit parameters R1, R, Vs. However, a plot of the lines L1 and L2 for different values of R can also provide a great deal of qualitative information regarding how the simultaneous solution to L1 and L2 changes as the value of R changes. © 2001 by CRC Press LLC The following problem serves to give you a better qualitative idea as to how the circuit outputs vary as different values are chosen for the resistor R. In-Class Exercise Pb. 3.7 This problem still refers to the circuit of Figure 3.1. a. Redraw the lines L1 and L2, using the previous values for the circuit parameters. b. Holding the graph for the case R = 100Ω, sketch L1 and L2 again for R = 50Ω and R = 500Ω. How do the values of the voltage and the current change as R increases; and decreases? c. Determine the largest values of the current and voltage that can exist in this circuit when R varies over non-negative values. d. The usual nomenclature for the circuit conditions is as follows: the circuit is called an open circuit when R = ∞, while it is called a short circuit when R = 0. What are the (V, I) solutions for these two cases? Can you generalize your statement? Now, to validate the qualitative results obtained in Pb. 3.7, let us solve analytically the L1 and L2 system. Solving this system of two linear equations in two unknowns gives, for the current and the voltage, the following expressions: V(R) = R R + R1 Vs (3.6) I ( R) = R 1 + R1 Vs (3.7) Note that the above analytic expressions for V and I are neither linear nor afﬁne functions in the value of the resistance. In-Class Exercise Pb. 3.8 This problem still refers to the circuit of Figure 3.1. a. Keeping the values of Vs and R1 ﬁxed, sketch the functions V(R) and I(R) for this circuit, and verify that the solutions you found previously in Pbs. 3.7 and 3.8, for the various values of R, agree with those found here. © 2001 by CRC Press LLC b. Given that the power lost in a resistive element is the product of the voltage across the resistor multiplied by the current through the resistor, plot the power through the variable resistor as a function of R. c. Determine the value of R such that the power lost in this resistor is maximized. d. Find, in general, the relation between R and R1 that ensures that the power lost in the load resistance is maximized. (This general result is called Thevenin’s theorem.) 3.3 Examples with Quadratic Functions A quadratic function is of the form: y(x) = ax2 + bx + c (3.8) Preparatory Exercises Pb. 3.9 Find the coordinates of the vertex of the parabola described by Eq. (3.8) as functions of the a, b, c parameters. Pb. 3.10 If a = 1, show that the quadratic Eq. (3.8) can be factored as: y(x) = (x – x+)(x – x–) where x± are the roots of the quadratic equation. Further, show that, for arbi- trary a, the product of the roots is c , and their sum is −b . a a In-Class Exercises Pb. 3.11 Develop a function M-ﬁle that inputs the two real roots of a seconddegree equation and returns the value of this function for an arbitrary x. Is this function unique? Pb. 3.12 In your elementary mechanics course, you learned that the trajectory of a projectile in a gravitational ﬁeld (oriented in the –y direction) with © 2001 by CRC Press LLC an initial velocity v0, x in the x-direction and v0, y in the y-direction satisﬁes the following parametric equations: x = v0,xt and y = − 1 2 gt 2 + v0,yt where t is time and the origin of the axis was chosen to correspond to the position of the particle at t = 0 and g = 9.8 ms–2 a. By eliminating the time t, show that the projectile trajectory y(x) is a parabola. b. Noting that the components of the initial velocity can be written as function of the projectile initial speed and its angle of inclination: v0, y = v0 sin(φ) and v0, x = v0 cos(φ) show that, for a given initial speed, the maximum range for the projectile is achieved when the inclination angle of the initial velocity is 45°. c. Plot the range for a ﬁxed inclination angle as a function of the initial speed. 3.4 Examples with Polynomial Functions As pointed out in the Supplement, a polynomial function is an expression of the form: p(x) = anxn + an−1xn−1 + … + a1x + a0 (3.9) where an ≠ 0 for an nth-degree polynomial. In MATLAB, we can represent the polynomial function as an array: p = [anan−1 … a0 ] (3.10) Example 3.1 You are given the array of coefﬁcients of the polynomial. Write a function Mﬁle for this polynomial using array operations. Let p = [1 3 2 1 0 3]: Solution: function y=polfct(x) p=[1 3 2 1 0 3]; © 2001 by CRC Press LLC L=length(p); v=x.^[(L-1):-1:0]; y=sum(p.*v); In-Class Exercises Pb. 3.13 Show that, for the polynomial p deﬁned by Eq. (3.9), the product of the roots is (−1)n a0 , and the sum of the roots is − an−1 . an an Pb. 3.14 Find graphically the real roots of the polynomial p = [1 3 2 1 0 3]. 3.5 Examples with the Trigonometric Functions A time-dependent cosine function of the form: x = a cos(ωt + φ) (3.11) appears often in many applications of electrical engineering: a is called the amplitude, ω the angular frequency, and φ the phase. Note that we do not have to have a separate discussion of the sine function because the sine function, as shown in the Supplement, differs from the cosine function by a constant phase. Therefore, by suitably changing only the value of the phase parameter, it is possible to transform the sine function into a cosine function. In the following example, we examine the period of the different powers of the cosine function; your preparatory task is to predict analytically the relationship between the periods of the two curves given in Example 3.2 and then verify your answer numerically. Example 3.2 Plot simultaneously, x1(t) = cos3(t) and x2 = cos(t) on t ∈ [0, 6π]. Solution: To implement this task, edit and execute the following script M-ﬁle: t=0:.2:6*pi; a=1;w=1; x1=a*(cos(w*t))^3; % t-array % desired parameters % x1-array constructed © 2001 by CRC Press LLC x2=a*cos(w*t); % x2-array constructed plot(t,x1,t,x2,'--') In-Class Exercises Pb. 3.15 Determine the phase relation between the sine and cosine functions of the same argument. Pb. 3.16 The meaning of amplitude, angular frequency, and phase can be better understood using MATLAB to obtain graphs of the cosine function for a family of a values, ω values, and φ values. a. With ω = 1 and φ = π/3, plot the cosine curves corresponding to a = 1:0.1:2. b. With a = 1 and ω = 1, plot the cosine curves corresponding to φ = 0:π/10:π. c. With a = 1 and φ = π/4, plot the cosine curves corresponding to ω = 1:0.1:2. Homework Problem Pb. 3.17 Find the period of the function obtained by summing the following three cosine functions: x1 = 3 cos(t / 3 + π / 3), x2 = cos(t + π), x3 = 1 3 cos 3 2 (t + π) Verify your result graphically. 3.6 Examples with the Logarithmic Function 3.6.1 Ideal Coaxial Capacitor An ideal capacitor can be loosely deﬁned as two metallic plates separated by an insulator. If a potential is established between the plates, for example through the means of connecting the two plates to the different terminals of a battery, the plates will be charged by equal and opposite charges, with the battery serving as a pump to move the charges around. The capacitance of a © 2001 by CRC Press LLC capacitor is deﬁned as the ratio of the magnitude of the charge accumulated on either of the plates divided by the potential difference across the plates. Using the Gauss law of electrostatics, it can be shown that the capacitance per unit length of an inﬁnitely long coaxial cable is: C = 2πε l ln(b / a) (3.12) where a and b are the radius of the internal and external conductors, respectively, and ε is the permittivity of the dielectric material sandwiched between the conductors. (The permittivity of vacuum is approximately ε0 = 8.85 × 10–12, while that of oil, polystyrene, glass, quartz, bakelite, and mica are, respectively, 2.1, 2.6, 4.5–10, 3.8–5, 5, and 5.4-6 larger.) In-Class Exercise Pb. 3.18 Find the ratio of the capacitance of two coaxial cables with the same dielectric material for, respectively: b/a = 5 and 50. 3.6.2 The Decibel Scale In the SI units used by electrical engineers, the unit of power is the Watt. However, in a number of applications, it is convenient to express the power as a ratio of its value to a reference value. Because the value of this ratio can vary over several orders of magnitude, it is often more convenient to represent this ratio on a logarithmic scale, called the decibel scale: G[dB] = 10 log P Pref (3.13) where the function log is the logarithm to base 10. The table below converts the power ratio to its value in decibels (dB): P/Pref (10n) 4 2 1 0.5 0.25 0.1 10–3 dB values (10 n) 6 3 0 –3 –6 –10 –30 © 2001 by CRC Press LLC In-Class Exercise Pb. 3.19 In a measurement of two power values, P1 and P2, it was determined that: G1 = 9 dB and G2 = –11 dB Using the above table, determine the value of the ratio P1/P2. 3.6.3 Entropy Given a random variable X (such as the number of spots on the face of a thrown die) whose possible outcomes are x1, x2, x3, …, and such that the probability for each outcome is, respectively, p(x1), p(x2), p(x3), … then, the entropy for this system described by the outcome of one random variable is deﬁned by: N ∑ H(X) = − p(xi ) log2(p(xi )) i=1 (3.14) where N is the number of possible outcomes, and the logarithm is to base 2. The entropy is a measure of the uncertainty in the value of the random vari- able. In Information Theory, it will be shown that the entropy, so deﬁned, is the number of bits, on average, required to describe the random variable X. In-Class Exercises Pb. 3.20 In each of the following cases, ﬁnd the entropy: a. N = 32 and p(xi ) = 1 32 for all i b. N = 8 and p = 1 2 , 1 4 , 1 8 , 1 16 , 1 64 , 1 64 , 1 64 , 1 64 c. N = 4 and p = 1 2 , 1 4 , 1 8 , 1 8 d. N = 4 and p = 1 2 , 1 4 , 1 4 , 0 © 2001 by CRC Press LLC Pb. 3.21 Assume that you have two dices (die), one red and the other blue. Tabulate all possible outcomes that you can obtain by throwing these die together. Now assume that all you care about is the sum of spots on the two die. Find the entropy of the outcome. Homework Problem Pb. 3.22 A so-called A-law compander (compressor followed by an expander) uses a compressor that relates output to input voltages by: y = ± Ax 1 + log(A) y = ± 1 + log(A x ) 1 + log(A) for x ≤ 1/ A for 1 ≤ x ≤ 1 A Here, the + sign applies when x is positive and the – sign when x is negative. x = vi/V and y = vo/V, where vi and vo are the input and output voltages. The range of allowable voltages is –V to V. The parameter A determines the degree of compression. For a value of A = 87.6, plot y vs. x in the interval [–1, 1]. 3.7 Examples with the Exponential Function Take a few minutes to review the section on the exponential function in the Supplement before proceeding further. (Recall that exp(1) = e.) In-Class Exercises Pb. 3.23 Plot the function y(x) = (x13 + x9 + x5 + x2 + 1) exp(–4x) over the interval [0,10]. Pb. 3.24 Plot the function y(x) = cos(5x) exp(–x/2)) over the interval [0, 10]. Pb. 3.25 From the results of Pbs. 3.23 and 3.24, what can you deduce about the behavior of a function at inﬁnity if one of its factors is an exponentially decreasing function of x, while the other factor is a polynomial or trigonomet- © 2001 by CRC Press LLC ric function of x? What modiﬁcation to the curve is observed if the degree of the polynomial is increased? Application to a Simple RC Circuit The solution giving the voltage across the capacitor in Figure 3.2 following the closing of the switch can be written in the following form: Vc (t) = Vc (0) exp− t RC + Vs 1 − exp− t RC (3.15) Vc(t) is called the time response of the RC circuit, or the circuit output resulting from the constant input Vs. The time constant RC of the circuit has the units of seconds and, as you will observe in the present analysis and other problems in subsequent chapters, its ratio to the characteristic time of a given input potential determines qualitatively the output of the system. FIGURE 3.2 The circuit used in charging a capacitor. In-Class Exercise Pb. 3.26 A circuit designer can produce outputs of various shapes by selecting speciﬁc values for the circuit time constant RC. In the following simulations, you can examine the inﬂuence of this time constant on the response of the circuit of Figure 3.2. Using Vc(0) = 3 volts, Vs = 10 volts (capacitor charging process), and RC = 1 s: a. Sketch a graph of Vc(t). What is the asymptotic value of the solution? How long does it take the capacitor voltage to reach the value of 9 volts? b. Produce an M-ﬁle that will plot several curves of Vc(t) corresponding to: © 2001 by CRC Press LLC (i) RC = 1 (ii) RC = 5 (iii) RC = 10 Which of these time constants results in the fastest approach of Vc(t) toward Vs? c. Repeat the above simulations for the case Vs = 0 (capacitor discharge)? d. What would you expect to occur if Vc(0) = Vs? Homework Problem Pb. 3.27 The Fermi-Dirac distribution, which gives the average population of electrons in a state with energy ε, neglecting the electron spin for the moment, is given by: f (ε) = exp[(ε − 1 µ) / Θ] + 1 where µ is the Fermi (or chemical) potential and Θ is proportional to the absolute (or Kelvin) temperature. a. Plot the function f(ε) as function of ε, for the following cases: (i) µ = 1 and Θ = 0.002 (ii) µ = 0.03 and Θ = 0.025 (iii) µ = 0.01 and Θ = 0.025 (iv) µ = 0.001 and Θ = 0.001 b. What is the value of f(ε) when ε = µ? c. Determine the condition under which we can approximate the Fermi-Dirac distribution function by: f(ε) ≈ exp[(µ – ε)/Θ] 3.8 Examples with the Hyperbolic Functions and Their Inverses 3.8.1 Capacitance of Two Parallel Wires The capacitance per unit length of two parallel wires, each of radius a and having their axis separated by distance D, is given by: © 2001 by CRC Press LLC C= πε 0 l cosh −1 D 2a (3.16) where ε0 is the permittivity of air (taken to be that of vacuum) = 8.854 × 10–12 Farad/m. Question: Write this expression in a different form using the logarithmic function. In-Class Exercises Pb. 3.28 Find the capacitance per unit length of two wires of radii 1 cm separated by a distance of 1 m. Express your answer using the most appropriate of the following sub-units: mF = 10−3 F (milli-Farad); nF = 10−9 F (nano-Farad); fF = 10−15 F (femto-Farad); µF = 10–6 F (micro-Farad); pF = 10−12 F (pico-Farad); aF = 10−18 F (atto-Farad); Pb. 3.29 Assume that you have two capacitors, one consisting of a coaxial cable (radii a and b) and the other of two parallel wires, separated by the distance D. Further assume that the radius of the wires is equal to the radius of the inner cylinder of the coaxial cable. Plot the ratio D as a function of b , a a if we desire the two geometrical conﬁgurations for the capacitor to end up having the same value for the capacitance. Take ε ε0 = 2.6. 3.9 Commonly Used Signal Processing Functions In studying signals and systems, you will also encounter, inter alia, the following functions (or variation thereof), in addition to the functions discussed previously in this chapter: • Unit step function • Unit slope ramp function © 2001 by CRC Press LLC FIGURE 3.3 Various useful signal processing functions. • Unit area rectangle pulse • Unit slope right angle triangle function • Equilateral triangle function • Periodic traces These functions are plotted in Figure 3.3, and the corresponding function Mﬁles are (x is everywhere a scalar): A. Unit Step function function y=stepf(x) global astep if x

0 & s=pi & s=<2*pi y=0; else y=0 end In-Class Exercises Pb. 3.30 In the above deﬁnition of all the special shape functions, we used the if-else-end form. Write each of the function M-ﬁles to deﬁne these same functions using only Boolean expressions. Pb. 3.31 An adder is a device that adds the input signals to give an output signal equal to the sum of the inputs. Using the functions previously obtained in this section, write the function M-ﬁle for the signal in Figure 3.4. Pb. 3.32 A multiplier is a device that multiplies two inputs. Find the product of the inputs given in Figures 3.5 and 3.6. Homework Problems The ﬁrst three problems in this set are a brief introduction to the different analog modulation schemes of communication theory. © 2001 by CRC Press LLC FIGURE 3.4 Proﬁle of the signal of Pb. 3.31. FIGURE 3.5 Proﬁle of the ﬁrst input to Pb. 3.32. Pb. 3.33 In DSB-AM (double-sideband amplitude modulation), the amplitude of the modulated signal is proportional to the message signal, which means that the time domain representation of the modulated signal is given by: uDSB(t) = Acm(t) cos(2πfct) where the carrier-wave shape is c(t) = Ac cos(2πfct) and the message signal is m(t). © 2001 by CRC Press LLC FIGURE 3.6 Proﬁle of the second input to Pb. 3.32. For a message signal given by: 1 m(t) = −3 0 0 ≤ t ≤ t0 / 3 t0 / 3 < t ≤ 2t0 / 3 otherwise a. Write the expression for the modulated signal using the unit area rectangle and the trigonometric functions. b. Plot the modulated signal as function of time. (Let fc = 200 and t0 = 0.01.) Pb. 3.34 In conventional AM, m(t) in the DSB-AM expression for the mod- ulated signal is replaced by [1 + amn(t)], where mn(t) is the normalized mes- sage signal (i.e., mn(t) = m(t) max(m(t)) and a is the index of modulation (0 ≤ a ≤ 1). The modulated signal expression is then given by: uAM (t) = Ac[1 + amn(t)]cos(2πfct) For the same message as that of Pb. 3.33 and the same carrier frequency, and assuming the modulation index a = 0.85: a. Write the expression for the modulated signal. b. Plot the modulated signal. © 2001 by CRC Press LLC Pb. 3.35 The angle modulation scheme, which includes frequency modulation (FM) and phase modulation (PM), has the modulated signal given by: uPM (t) = Ac cos(2πfct + kpm(t)) ∫ uFM (t) = Ac cos 2πfct + 2πkf t −∞ m(τ)dτ Assuming the same message as in Pb. 3.33: a. Write the expression for the modulated signal in both schemes. b. Plot the modulated signal in both schemes. Let kp = kf = 100. Pb. 3.36 If f(x) = f(–x) for all x, then the graph of f(x) is symmetric with respect to the y-axis, and the function f(x) is called an even function. If f(x) = –f(–x) for all x, the graph of f(x) is anti-symmetric with respect to the origin, and we call such a function an odd function. a. Show that any function can be written as the sum of an odd function plus an even function. List as many even and odd functions as you can. b. State what conditions must be true for a polynomial to be even, or to be odd. c. Show that the product of two even functions is even; the product of two odd functions is even; and the product of an odd and even function is odd. d. Replace in c above the word product by either quotient or power and deduce the parity of the resulting function. e. Deduce from the above results that the sign/parity of a function follows algebraic rules. f. Find the even and odd parts of the following functions: (i) f(x) = x7 + 3x4 + 6x + 2 (ii) f(x) = (sin(x) + 3) sinh2(x) exp(–x2) Pb. 3.37 Decompose the signal shown in Figure 3.7 into its even and odd parts: Pb. 3.38 Plot the function y deﬁned through: x2 + 4x + 4 y(x) = 0.16x2 − 0.48x 0 for − 2 ≤ x < −1 for − 1 < x < 1.5 elsewhere and ﬁnd its even and odd parts. © 2001 by CRC Press LLC FIGURE 3.7 Proﬁle of the signal of Pb. 3.37. 3.10 Animation of a Moving Rectangular Pulse You might often want to plot the time development of one of the above signal processing functions if its deﬁning parameters are changing in time. Take, for example, a theatrical spotlight of constant intensity density across its crosssection, but assume that its position varies with time. The light spot size can be represented by a rectangular pulse (e.g., of width 2 m and height 1 m) that is moving to the right with a constant speed of 1 m/s. Assume that the center of the spot is originally at x = 1 m, and that its ﬁnal position is at x = 8 m. We want to write a program that will illustrate its time development, and then play the resulting movie. To illustrate the use of other commands not often utilized in this chapter, we can, instead of the if-else-end syntax used in the previous section, use the Boolean syntax, and deﬁne the array by the linspace command. Edit and execute the following script M-ﬁle: lrect=0;hrect=2; x=linspace(0,10,200); t=linspace(0,8,40); M=moviein(40); for m=1:40 y=(x>=lrect+t(m)).*(x<=hrect+t(m)); plot(x,y,'r') © 2001 by CRC Press LLC axis([-2 12 0 1.2]); M(:,m)=getframe; end movie(M,3) Question: How would you modify the above program if the speed of the light beam is not 1? 3.11 MATLAB Commands Review fplot ginput global zoom Plots a speciﬁed function over a speciﬁed interval. Mouse-controlling command to read off coordinates of a point in a graph. Allows variables to share their values in multiple programs. Zooms in and out on a 2-D plot. © 2001 by CRC Press LLC 4 Numerical Differentiation, Integration, and Solutions of Ordinary Differential Equations This chapter discusses the basic methods for numerically ﬁnding the value of the limit of an indeterminate form, the value of a derivative, the value of a convergent inﬁnite sum, and the value of a deﬁnite integral. Using an improved form of the differentiator, we also present ﬁrst-order iterator techniques for solving ordinary ﬁrst-order and second-order linear differential equations. The Runge-Kutta technique for solving ordinary differential equations (ODE) is brieﬂy discussed. The mode of use of some of the MATLAB packages to perform each of the previous tasks is also described in each instance of interest. 4.1 Limits of Indeterminate Forms DEFINITION If lim u(x) = lim v(x) = 0, the quotient u(x)/v(x) is said to have x→x0 x→x0 an indeterminate form of the 0/0 kind. • If lim u(x) = lim v(x) = ∞, the quotient u(x)/v(x) is said to have an x→x0 x→x0 indeterminate form of the ∞/∞ kind. In your elementary calculus course, you learned that the standard technique for solving this kind of problem is through the use of L’Hopital’s Rule, which states that: if: lim u′(x) = C (4.1) x→x0 v′(x) then: 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC lim u(x) = C x→x0 v(x) (4.2) In this section, we discuss a simple algorithm to obtain this limit using MATLAB. The method consists of the following steps: 1. Construct a sequence of points whose limit is x0. In the examples below, consider the sequence xn = x0 − 1 2 n . Recall in this regard that as n → ∞, the nth power of any number whose magnitude is smaller than one goes to zero. 2. Construct the sequence of function values corresponding to the xsequence, and ﬁnd its limit. Example 4.1 Compute numerically the lim sin(x) . x→0 x Solution: Enter the following instructions in your MATLAB command window: N=20; n=1:N; x0=0; dxn=-(1/2).^n; xn=x0+dxn; yn=sin(xn)./xn; plot(xn,yn) The limit of the yn sequence is clearly equal to 1. The deviation of the sequence of the yn from the value of the limit can be obtained by entering: dyn=yn-1; semilogy(n,dyn) The last command plots the curve with the ordinate y expressed logarithmically. This mode of display is the most convenient in this case because the ordinate spans many decades of values. In-Class Exercises Find the limits of the following functions at the indicated points: Pb. 4.1 (x2 − 2x − 3) at x → 3 (x − 3) Pb. 4.2 1 + sin(x) x − 1 sin(x) at x → 0 © 2001 by CRC Press LLC Pb. 4.3 (x cot(x)) at x → 0 Pb. 4.4 (1 − cos(2x)) x2 at x → 0 Pb. 4.5 sin(2x) cot(3x) at x → 0 4.2 Derivative of a Function DEFINITION The derivative of a certain function at a particular point is deﬁned as: f ′(x0 ) = lim x→x0 f(x) − x− f(x0 ) x0 (4.3) Numerically, the derivative is computed at the point x0 as follows: 1. Construct an x-sequence that approaches x0. 2. Compute a sequence of the function values corresponding to the x-sequence. 3. Evaluate the sequence of the ratio, appearing in the deﬁnition of the derivative in Eq. (4.3). 4. Read off the limit of this ratio sequence. This will be the value of the derivative at the point x0. Example 4.2 Find numerically the derivative of the function ln(1 + x) at x = 0. Solution: Edit and execute the following script M-ﬁle: N=20;n=1:N; x0=0; dxn=(1/2).^[1:N]; xn=x0+dxn; yn=log(1+xn); dyn=yn-log(1+x0); © 2001 by CRC Press LLC deryn=dyn./dxn; plot(n,deryn) The limit of the deryn’s sequence is clearly equal to 1, the value of this function derivative at 0. NOTE The choice of N should always be such that dxn is larger than the machine precision; that is, N < 53, since (1/2)53 ≈ 10–16. In-Class Exercises Find numerically, to one part per 10,000 accuracy, the derivatives of the following functions at the indicated points: Pb. 4.6 Pb. 4.7 Pb. 4.8 Pb. 4.9 x4 (cos3(x) − sin(2x)) at x → π exp(x2 + 3) (2 + cos2(x)) at x → 0 (1 + sin2(x)) (2 − cos3(x)) at x → π / 2 ln x − x 1/ +1 2 at x → 1 Pb. 4.10 tan−1(x2 + 3) at x → 0 Example 4.3 Plot the derivative of the function x2 sin(x) over the interval 0 ≤ x ≤ 2π. Solution: Edit and execute the following script M-ﬁle: dx=10^(-4); x=0:dx:2*pi+dx; df=diff(sin(x).*x.^2)/dx; plot(0:dx:2+pi,df) where diff is a MATLAB command, which when acting on an array X, gives the new array [X(2) – X(1)X(3) – X(2) … X(n) – X(n – 1)], whose length is one unit shorter than the array X. The accuracy of the above algorithm depends on the choice of dx. Ideally, the smaller it is, the more accurate the result. However, using any computer, we should always choose a dx that is larger than the machine precision, while © 2001 by CRC Press LLC still much smaller than the value of the variation of x over which the function changes appreciably. For a systematic method to choose an upper limit on dx, you might want to follow these simple steps: 1. Plot the function on the given interval and identify the point where the derivative is largest. 2. Compute the derivative at that point using the sequence method of Example 4.2, and determine the dx that would satisfy the desired tolerance; then go ahead and use this value of dx in the above routine to evaluate the derivative throughout the given interval. In-Class Exercises Plot the derivatives of the following functions on the indicated intervals: Pb. 4.11 ln x − 1 on 2 < x < 3 x+1 Pb. 4.12 ln 1 + 1 + x2 on 1 < x < 2 x Pb. 4.13 ln tanh(x / 2) on 1 < x < 5 Pb. 4.14 tan−1 sinh(x) on 0 < x < 10 Pb. 4.15 ln csc(x) + tan(x) on 0 < x < π / 2 4.3 Inﬁnite Sums ∞ ∑ An inﬁnite series is denoted by the symbol an. It is important not to conn=1 fuse the series with the sequence {an}. The sequence is a list of terms, while the series is a sum of these terms. A sequence is convergent if the term an approaches a ﬁnite limit; however, convergence of a series requires that the N ∑ sequence of partial sums SN = an approaches a ﬁnite limit. There are n=1 © 2001 by CRC Press LLC cases where the sequence may approach a limit, while the series is divergent. The classical example is that of the sequence n1 ; this sequence approaches the limit zero, while the corresponding series is divergent. In any numerical calculation, we cannot perform the operation of adding an inﬁnite number of terms. We can only add a ﬁnite number of terms. The inﬁnite sum of a convergent series is the limit of the partial sums SN. You will study in your calculus course the different tests for checking the convergence of a series. We summarize below the most useful of these tests. • The Ratio Test, which is very useful for series with terms that contain factorials and/or nth power of a constant, states that: ∑ for an > 0, the ∞ series n=1 an is convergent if lni→m∞ an+1 an <1 ∞ ∑ • The Root Test stipulates that for an > 0, the series an is conver- gent if n=1 lni→m∞(an )1/n < 1 • For an alternating series, the series is convergent if it satisﬁes the conditions that lim n→∞ an =0 and an+1 < an Now look at the numerical routines for evaluating the limit of the partial sums when they exist. ∑ Example 4.4 Compute the sum of the geometrical series SN = N n=1 1 2 n . Solution: Edit and execute the following script M-ﬁle: for N=1:20 n=N:-1:1; fn=(1/2).^n; Sn(N)=sum(fn); end NN=1:20; plot(NN,Sn) © 2001 by CRC Press LLC You will observe that this partial sum converges to 1. NOTE The above summation was performed backwards because this scheme will ensure a more accurate result and will keep all the signiﬁcant digits of the smallest term of the sum. In-Class Exercises Compute the following inﬁnite sums: Pb. 4.16 ∑∞ 1 k=1 (2k − 1)22k−1 ∑ Pb. 4.17 ∞ sin(2k − 1) (2k − 1) k=1 ∑ Pb. 4.18 ∞ cos(k) k4 k=1 ∑ Pb. 4.19 ∞ sin(k / 2) k=1 k3 ∑ Pb. 4.20 ∞ 1 2k sin(k) k=1 4.4 Numerical Integration The algorithm for integration discussed in this section is the second simplest available (the trapezoid rule being the simplest, beyond the trivial, is given at the end of this section as a problem). It has been generalized to become more accurate and efﬁcient through other approximations, including Simpson’s rule, the Newton-Cotes rule, the Gaussian-Laguerre rule, etc. Simpson’s rule is derived in Section 4.6, while other advanced techniques are left to more advanced numerical methods courses. Here, we perform numerical integration through the means of a Rieman sum: we subdivide the interval of integration into many subintervals. Then we take the area of each strip to be the value of the function at the midpoint of the subinterval multiplied by the length of the subinterval, and we add the © 2001 by CRC Press LLC strip areas to obtain the value of the integral. This technique is referred to as the midpoint rule. We can justify the above algorithm by recalling the Mean Value Theorem of Calculus, which states that: b ∫ f(x)dx = (b − a)f(c) (4.4) a where c ∈ [a, b]. Thus, if we divide the interval of integration into narrow subintervals, then the total integral can be written as the sum of the integrals over the subintervals, and we approximate the location of c in a particular subinterval by the midpoint between its boundaries. Example 4.5 Use the above algorithm to compute the value of the deﬁnite integral of the function sin(x) from 0 to π. Solution: Edit and execute the following program: dx=pi/200; x=0:dx:pi-dx; xshift=x+dx/2; yshift=sin(xshift); Int=dx*sum(yshift) You get for the above integral a result that is within 1/1000 error from the analytical result. In-Class Exercises Find numerically, to a 1/10,000 accuracy, the values of the following deﬁnite integrals: ∫ Pb. 4.21 ∞ 0 1 x2 + 1 dx ∞ ∫ Pb. 4.22 exp(−x2 ) cos(2x)dx 0 π/2 ∫ Pb. 4.23 sin6(x) cos7 (x)dx 0 © 2001 by CRC Press LLC ∫ Pb. 4.24 π 0 1 + 2 cos2 (x) dx Example 4.6 x ∫ Plot the value of the indeﬁnite integral f(x)dx as a function of x, where f(x) 0 is the function sin(x) over the interval [0, π]. Solution: We solve this problem for the general function f(x) by noting that: x x−∆x ∫ ∫ f(x)dx ≈ f(x)dx + f(x − ∆x + ∆x / 2)∆x (4.5) 0 0 where we are dividing the x-interval into subintervals and discretizing x to correspond to the coordinates of the boundaries of these subintervals. An array {xk} represents these discrete points, and the above equation is then reduced to a difference equation: Integral(xk) = Integral(xk–1) + f(Shifted(xk–1))∆x (4.6) where Shifted(xk–1) = xk–1 + ∆x/2 (4.7) and the initial condition is Integral(x1) = 0. The above algorithm can then be programmed, for the above speciﬁc func- tion, as follows: a=0; b=pi; dx=0.001; x=a:dx:b-dx; N=length(x); xshift=x+dx/2; yshift=sin(xshift); Int=zeros(1,N+1); Int(1)=0; for k=2:N+1 Int(k)=Int(k-1)+yshift(k-1)*dx; © 2001 by CRC Press LLC end plot([x b],Int) It may be useful to remind the reader, at this point, that the algorithm in Example 4.6 can be generalized to any arbitrary function. However, it should be noted that the key to the numerical calculation accuracy is a good choice for the increment dx. A very rough prescription for the estimation of this quantity, for an oscillating function, can be obtained as follows: 1. Plot the function inside the integral (i.e., the integrand) over the desired interval domain. 2. Verify that the function does not blow-out (i.e., goes to inﬁnity) anywhere inside this interval. 3. Choose dx conservatively, such that at least 30 subintervals are included in any period of oscillation of the function (see Section 6.8 for more details). In-Class Exercises Plot the following indeﬁnite integrals as function of x over the indicated interval: ∫ Pb. 4.25 x 0 cos(x) 1 + sin(x) dx 0< x < π/2 ∫ Pb. 4.26 x 1 (1 + x2/3 )6 x1/3 dx 1< x < 8 ∫ Pb. 4.27 x 0 (x 2 (x + 2) + 2x + 4)2 dx 0tolerance if fc>=fd dnew=c+(1-r)*(b-c); a=c; c=d; fc=fd; d=dnew; fd=feval(funname,dnew); else cnew=a+r*(d-a); b=d; d=c; fd=fc; c=cnew; fc=feval(funname,cnew); end end xmin=(c+d)/2; ymin=feval(funname,xmin); For example, if we wanted to ﬁnd the position of the minimum of the cosine function and its value in the interval 3 < x < 3.5, accurate to 10–4, we would enter in the command window, after having saved the above function M-ﬁle, the following command: © 2001 by CRC Press LLC [xmin,ymin]=goldensection('cos',3,3.5,10^(-4)) 5.3.3 MATLAB fmin and fmins Built-in Function Following methodically the same steps using fzero to ﬁnd the zeros of any function, we can use the fmin command to ﬁnd the minimum of a function of one variable on a given interval. The recommended sequence of steps is as follows: 1. Edit a function M-ﬁle for the function under consideration. 2. Plot the curve of the function over the desired domain, to overview the function shape and have an estimate of the position of the minimum. 3. Use the command fmin to accurately ﬁnd the minimum. The syntax is as follows: xmin=fmin('funname',a,b) % [a,b] is the interval The local maximum of a function f(x) on an interval can be computed by noting that this quantity can be deduced from knowing the values of the coordinates of the local minimum of –f(x). The implementation of this task consists of creating a ﬁle for the negative of this function (call it n-funname) and entering the following commands in the command window: xmax=fmin('n-funname',xi,xf) fmax=-1*feval('n-funname',xmax) Homework Problems Pb. 5.22 We have two posts of height 6 m and 8 m, and separated by a distance of 21 m. A line is to run from the top of one post to the ground between the posts and then to the top of the other post (Figure 5.3). Find the conﬁguration that minimizes the length of the line. Pb. 5.23 Fermat’s principle states that light going from Point A to Point B selects the path which requires the least amount of travel time. Consider the situation in which an engineer in a submarine wants to communicate, using a laser-like pointer, with a detector at the top of the mast of another boat. At what angle θ to the vertical should he point his beam? Assume that the detector is 50 ft above the water surface, the submarine transmitter is 30 ft under the surface, the horizontal distance separating the boat from the submarine is 100 ft, and the velocity of light in water is 3/4 of its velocity in air (Figure 5.4). © 2001 by CRC Press LLC FIGURE 5.3 Schematics for Pb. 5.22. (ACB is the line whose length we want to minimize.) FIGURE 5.4 Schematics for Pb. 5.23. A is the location of the detector at the top of the mast, B is the location of the emitter in the submarine, and BOA is the optical path of the ray of light. © 2001 by CRC Press LLC Minimum of a Function of Two Variables To ﬁnd the local minimum of a multivariable function, we use the MATLAB fmins function. Finding the maximum can be handled by the same technique as outlined for the one variable case. Example 5.4 Find the position of the minimum of the surface f(x, y) = x2 + y2. Solution: 1. First, make a function ﬁle and save it as fname.m. function f=fname(array) x=array(1); % x is stored in first element of array y=array(2); % y is stored in second element of %array f=x.^2+y.^2; % function stored in f 2. Graph the contour plot for the surface; and from it, estimate the coordinates of the minimum: arrayguess=[.1 .1]; The arrayguess holds the initial guess for both coordinates at the minimum. That is, arrayguess=[xguess yguess]; 3. The coordinates of the minimum are then obtained by entering the following commands in the command window: arraymin=fmins('fname',arrayguess) fmin=feval('fname',arraymin) Homework Problem Pb. 5.24 In this problem we propose to apply the above optimization techniques to the important problem of the optical narrow band transmission ﬁlter. This ﬁlter, in very wide use in optics, consists of two parallel semireﬂective surfaces (i.e., mirrors) with reﬂection coatings R1 and R2 and separated by a distance L. Assuming that the material between the mirrors has an index of refraction n and that the incoming beam of light has frequency ω and is making an angle θi with the normal to the semi-reﬂective surfaces, then the ratio of the transmitted light intensity to the incident intensity is © 2001 by CRC Press LLC T = Itransm. = I incid. (1 − (1 − R1)(1 − R2 ) R1 R2 )2 + 4 R1R2 sin 2 π ω ω0 where ω0 = πc nL cos(θt ) , sin(θi ) = n sin(θt ), and θt is the angle that the trans- mitted light makes with the normal to the mirror surfaces. In the following activities, we want to understand how the above transmis- sion ﬁlter responds as a function of the speciﬁed parameters. Choose the fol- lowing parameters: R1 = R2 = 0.8 0 ≤ ω ≤ 4ω0 a. Plot T vs. ω/ω0 for the above frequency range. b. At what frequencies does the transmission reach a maximum? A minimum? c. Devise two methods by which you can tune the ﬁlter so that the maximum of the ﬁlter transmission is centered around a particular physical frequency. d. How sharp is the ﬁlter? By sharp, we mean: what is the width of the transmission band that allows through at least 50% of the incident light? Deﬁne the width relative to ω0. e. Answer question (d) with the values of the reﬂection coatings given now by: R1 = R2 = 0.9 0 ≤ ω ≤ 4ω0 Does the sharpness of the ﬁlter increase or decrease with an increase of the reﬂection coefﬁcients of the coating surfaces for the two mirrors? f. Choosing ω = ω0, plot a 3-D mesh of T as a function of the reﬂection coefﬁcients R1 and R2. Show, both graphically and numerically, that the best performance occurs when the reﬂection coatings are the same. g. Plot the contrast function deﬁned as C = Tmin as a function of the Tmax reﬂection coefﬁcients R1 and R2. How should you choose your mirrors for maximum contrast? © 2001 by CRC Press LLC h. For ω = ω0, plot the variation of the transmission coefﬁcient as function of θi. i. Repeat (h), but now investigate the variation in the transmission coefﬁcient as a function of L. 5.4 MATLAB Commands Review besselj The built-in BesselJ function. fmin Finds the minimum value of a single variable function or a restricted domain. fmins Finds the local minimum of a multivariable function. fsolve Finds a root to a system of nonlinear equations assuming an initial guess. fzero Finds the zero of a single variable function assuming an initial guess. roots Finds the roots of a polynomial if the polynomial coefﬁcients are given. poly Assembles a polynomial from its roots. zoom Zooms in and out on a 2-D plot. © 2001 by CRC Press LLC 6 Complex Numbers 6.1 Introduction Since x2 > 0 for all real numbers x, the equation x2 = –1 admits no real number as a solution. To deal with this problem, mathematicians in the 18th century introduced the imaginary number i = −1 = j. (So as not to confuse the usual symbol for a current with this quantity, electrical engineers prefer the use of the j symbol. MATLAB accepts either symbol, but always gives the answer with the symbol i). Expressions of the form: z = a + jb (6.1) where a and b are real numbers called complex numbers. As illustrated in Section 6.2, this representation has properties similar to that of an ordered pair (a, b), which is represented by a point in the 2-D plane. The real number a is called the real part of z, and the real number b is called the imaginary part of z. These numbers are referred to by the symbols a = Re(z) and b = Im(z). When complex numbers are represented geometrically in the x-y coordinate system, the x-axis is called the real axis, the y-axis is called the imaginary axis, and the plane is called the complex plane. 6.2 The Basics In this section, you will learn how, using MATLAB, you can represent a complex number in the complex plane. It also shows how the addition (or subtraction) of two complex numbers, or the multiplication of a complex number by a real number or by j, can be interpreted geometrically. 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC Example 6.1 Plot in the complex plane, the three points (P1, P2, P3) representing the complex numbers: z1 = 1, z2 = j, z3 = –1. Solution: Enter and execute the following commands in the command window: z1=1; z2=j; z3=-1; plot(z1,'*') axis([-2 2 -2 2]) axis('square') hold on plot(z2,'o') plot(z3,'*') hold off that is, a complex number in the plot command is interpreted by MATLAB to mean: take the real part of the complex number to be the x-coordinate and the imaginary part of the complex number to be the y-coordinate. 6.2.1 Addition Next, we deﬁne addition for complex numbers. The rule can be directly deduced from analogy of addition of two vectors in a plane: the x-component of the sum of two vectors is the sum of the x-components of each of the vectors, and similarly for the y-component. Therefore: If: z1 = a1 + jb1 (6.2) and z2 = a2 + jb2 (6.3) Then: z1 + z2 = (a1 + a2) + j(b1 + b2) (6.4) The addition or subtraction rules for complex numbers are geometrically translated through the parallelogram rules for the addition and subtraction of vectors. Example 6.2 Find the sum and difference of the complex numbers © 2001 by CRC Press LLC z1 = 1 + 2j and z2 = 2 + j Solution: Grouping the real and imaginary parts separately, we obtain: z1 + z2 = + 3j and z1 – z2 = –1 + j Preparatory Exercise Pb. 6.1 Given the complex numbers z1, z2, and z3 corresponding to the vertices P1, P2, and P3 of a parallelogram, ﬁnd z4 corresponding to the fourth vertex P4. (Assume that P4 and P2 are opposite vertices of the parallelogram). Verify your answer graphically for the case: z1 = 2 + j, z2 = 1 + 2 j, z3 = 4 + 3 j 6.2.2 Multiplication by a Real or Imaginary Number If we multiply the complex number z = a + jb by a real number k, the resultant complex number is given by: k × z = k × (a + jb) = ka + jkb (6.5) What happens when we multiply by j? Let us, for a moment, return to Example 6.1. We note the following proper- ties for the three points P1, P2, and P3: 1. The three points are equally distant from the origin of the axis. 2. The point P2 is obtained from the point P1 by a π/2 counter- clockwise rotation. 3. The point P3 is obtained from the point P2 through another π/2 counterclockwise rotation. We also note, by examining the algebraic forms of z1, z2, z3 that: z2 = j z1 and z3 = j z2 = j2z1 = −z1 © 2001 by CRC Press LLC That is, multiplying by j is geometrically equivalent to a counterclockwise rotation by an angle of π/2. 6.2.3 Multiplication of Two Complex Numbers The multiplication of two complex numbers follows the same rules of algebra for real numbers, but considers j2 = –1. This yields: z1 = a1 + jb1 and z2 = a2 + jb2 If: ⇒ z1z2 = (a1a2 − b1b2 ) + j(a1b2 + b1a2 ) (6.6) Preparatory Exercises Solve the following problems analytically. Pb. 6.2 Find z1z2 , z12 , z22 for the following pairs: a. z1 = 3 j; z2 = 1 − j b. z1 = 4 + 6 j; z2 = 2 − 3 j c. z1 = 1 3 (2 + 4 j); z2 = 1 2 (1 − 5j) d. z1 = 1 3 (2 − 4 j); z2 = 1 2 (1 + 5j) Pb. 6.3 Find the real quantities m and n in each of the following equations: a. mj + n(1 + j) = 3 – 2j b. m(2 + 3j) + n(1 – 4j) = 7 + 5j (Hint: Two complex numbers are equal if separately the real and imaginary parts are equal.) Pb. 6.4 Write the answers in standard form: (i.e., a + jb) a. (3 – 2j)2 – (3 + 2j)2 b. (7 + 14j)7 c. (2 + j) 1 2 + 2 j 2 d. j(1 + 7j) – 3j(4 + 2j) Pb. 6.5 Show that for all complex numbers z1, z2, z3, we have the following properties: z1z2 = z2z1 (commutativity property) z1(z2 + z3) = z1z2 + z1z3 (distributivity property) © 2001 by CRC Press LLC FIGURE 6.1 The center of mass of a triangle. (Refer to Pb. 6.6). Pb. 6.6 Consider the triangle ∆(ABC), in which D is the midpoint of the BC segment, and let the point G be deﬁned such that (GD) = 1 (AD). Assuming 3 that zA, zB, zC are the complex numbers representing the points (A, B, C): a. Find the complex number zG that represents the point G. b. Show that (CG) = 2 (CF) and that F is the midpoint of the segment 3 (AB). 6.3 Complex Conjugation and Division DEFINITION The complex conjugate of a complex number z, which is denoted by z, is given by: © 2001 by CRC Press LLC z = a − jb if z = a + jb (6.7) That is, z is obtained from z by reversing the sign of Im(z). Geometrically, z and z form a pair of symmetric points with respect to the real axis (x-axis) in the complex plane. In MATLAB, complex conjugation is written as conj(z). DEFINITION The modulus of a complex number z = a + jb, denoted by z , is given by: z = a2 + b2 (6.8) Geometrically, it represents the distance between the origin and the point representing the complex number z in the complex plane, which by Pythagorean theorem is given by the same quantity. In MATLAB, the modulus of z is denoted by abs(z). THEOREM For any complex number z, we have the result that: z 2 = zz (6.9) PROOF Using the above two deﬁnitions for the complex conjugate and the norm, we can write: zz = (a − jb)(a + jb) = a2 + b2 = z 2 In-Class Exercise Solve the problem analytically, and then use MATLAB to verify your answers. Pb. 6.7 Let z = 3 + 4j. Find z , z, and zz. Verify the above theorem. 6.3.1 Division Using the above deﬁnitions and theorem, we now want to deﬁne the inverse of a complex number with respect to the multiplication operation. We write the results in standard form. © 2001 by CRC Press LLC z−1 = 1 z = 1 a − jb (a + jb) a − jb = a − jb a2 + b2 = z z2 from which we deduce that: (6.10) Re 1 z = Re(z) [Re(z)]2 + [Im(z)]2 (6.11) and Im 1z = − Im(z) [Re(z)]2 + [Im(z)]2 (6.12) To summarize the above results, and to help you build your syntax for the quantities deﬁned in this section, edit the following script M-ﬁle and execute it: z=3+4*j zbar=conj(z) modulz=abs(z) modul2z=z*conj(z) invz=1/z reinvz=real(1/z) iminvz=imag(1/z) In-Class Exercises Pb. 6.8 Analytically and numerically, obtain in the standard form an expression for each of the following quantities: a. 3+ 4j 2 + 5j b. (1 − 3+j j)(3 + j) c. 1− 2j 2 + 3 j − 3+ j 2j Pb. 6.9 For any pair of complex numbers z1 and z2, show that: z1 + z2 = z1 + z2 z1 − z2 = z1 − z2 z1 z2 = z1 z2 (z1 / z2 ) = z1 / z2 z=z © 2001 by CRC Press LLC 6.4 Polar Form of Complex Numbers If we use polar coordinates, we can write the real and imaginary parts of a complex number z = a + jb in terms of the modulus of z and the polar angle θ: a = r cos(θ) = z cos(θ) (6.13) b = r sin(θ) = z sin(θ) (6.14) and the complex number z can then be written in polar form as: z = z cos(θ) + j z sin(θ) = z (cos(θ) + j sin(θ)) (6.15) The angle θ is called the argument of z and is usually evaluated in the interval –π ≤ θ ≤ π. However, we still have the same complex number if we added to the value of θ an integer multiple of 2π. θ = arg(z) tan(θ) = b a (6.16) From the above results, it is obvious that the argument of the complex conjugate of a complex number is equal to minus the argument of this complex number. In MATLAB, the convention for arg(z) is angle(z). In-Class Exercise Pb. 6.10 Find the modulus and argument for each of the following complex numbers: z1 = 1 + 2 j; z2 = 2 + j; z3 = 1 − 2 j; z4 = −1 + 2 j; z5 = −1 − 2 j Plot these points. Can you detect any geometrical pattern? Generalize. The main advantage of writing complex numbers in polar form is that it makes the multiplication and division operations more transparent, and provides a simple geometric interpretation to these operations, as shown below. © 2001 by CRC Press LLC 6.4.1 New Insights into Multiplication and Division of Complex Numbers Consider the two complex numbers z1 and z2 written in polar form: z1 = z1 (cos(θ1) + j sin(θ1)) (6.17) z2 = z2 (cos(θ2 ) + j sin(θ2 )) Their product z1z2 is given by: (6.18) z1z2 = z1 z2 (cos(θ1) cos(θ2 ) − sin(θ1) sin(θ2 )) + j(sin(θ1) cos(θ2 ) + cos(θ1) sin(θ2 )) (6.19) But using the trigonometric identities for the sine and cosine of the sum of two angles: cos(θ1 + θ2 ) = cos(θ1) cos(θ2 ) − sin(θ1) sin(θ2 ) (6.20) sin(θ1 + θ2 ) = sin(θ1) cos(θ2 ) + cos(θ1) sin(θ2 ) (6.21) the product of two complex numbers can then be written in the simpler form: z1z2 = z1 z2 [cos(θ1 + θ2 ) + j sin(θ1 + θ2 )] (6.22) That is, when multiplying two complex numbers, the modulus of the product is the product of the moduli, while the argument is the sum of arguments: z1z2 = z1 z2 (6.23) arg(z1z2 ) = arg(z1) + arg(z2 ) (6.24) The above result can be generalized to the product of n complex numbers and the result is: z1z2 … zn = z1 z2 … zn (6.25) arg(z1z2 …zn ) = arg(z1) + arg(z2 ) + … + (zn ) (6.26) A particular form of this expression is the De Moivre theorem, which states that: © 2001 by CRC Press LLC (cos(θ) + j sin(θ))n = cos(nθ) + j sin(nθ) (6.27) The above results suggest that the polar form of a complex number may be written as a function of an exponential function because of the additivity of the arguments upon multiplication. We revisit this issue later. In-Class Exercises Pb. 6.11 Show that z1 = z2 z1 z2 [cos(θ1 − θ2) + j sin(θ1 − θ2 )]. Pb. 6.12 Explain, using the above results, why multiplication of any complex number by j is equivalent to a rotation of the point representing this number in the complex plane by π/2. Pb. 6.13 By what angle must we rotate the point P(3, 4) to transform it to the point P′(4, 3)? Pb. 6.14 The points z1 = 1 + 2j and z2 = 2 + j are adjacent vertices of a regular hexagon. Find the vertex z3 that is also a vertex of the same hexagon and that is adjacent to z2 (z3 ≠ z1). Pb. 6.15 Show that the points A, B, C representing the complex numbers zA, zB, zC in the complex plane lie on the same straight line if and only if: zA − zc is real. zB − zc Pb. 6.16 Determine the coordinates of the P′ point obtained from the point P(2, 4) through a reﬂection around the line y = x + 2. 2 Pb. 6.17 Consider two points A and B representing, in the complex plane, the complex numbers z1 and 1/ z1. Let P be any point on the circle of radius 1 and centered at the origin (the unit circle). Show that the ratio of the length of the line segments PA and PB is the same, regardless of the position of point P on the unit circle. Pb. 6.18 Find the polar form of each of the following quantities: (1 + j)15 (1 − j)9 , (−1 + j)( j + 2), (1 + j + j2 + j3 )99 © 2001 by CRC Press LLC 6.4.2 Roots of Complex Numbers Given the value of the complex number z, we are interested here in ﬁnding the solutions of the equation: vn = z (6.28) Let us write both the solutions and z in polar forms, v = ρ(cos(α) + j sin(α)) (6.29) z = r(cos(θ) + j sin(θ)) (6.30) From the De Moivre theorem, the expression for vn = z can be written as: ρn(cos(nα) + j sin(nα)) = r(cos(θ) + j sin(θ)) (6.31) Comparing the moduli of both sides, we deduce by inspection that: ρ=n r (6.32) The treatment of the argument should be done with great care. Recalling that two angles have the same cosine and sine if they are equal or differ from each other by an integer multiple of 2π, we can then deduce that: nα = θ + 2kπ k = 0, ± 1, ± 2, ± 3,… (6.33) Therefore, the general expression for the roots is: z1/n = r 1/n cos θ n + 2kπ n + j sin θ n + 2kπ n with k = 0, 1, 2,…, (n − 1) (6.34) Note that the roots reproduce themselves outside the range: k = 0, 1, 2, …, (n – 1). In-Class Exercises Pb. 6.19 Calculate the roots of the equation z5 – 32 = 0, and plot them in the complex plane. © 2001 by CRC Press LLC a. What geometric shape does the polygon with the solutions as vertices form? b. What is the sum of these roots? (Derive your answer both algebraically and geometrically.) 6.4.3 The Function y = ejθ As alluded to previously, the expression cos(θ) + j sin(θ) behaves very much as if it was an exponential; because of the additivity of the arguments of each term in the argument of the product, we denote this quantity by: ej θ = cos(θ) + j sin(θ) (6.35) PROOF Compute the Taylor expansion for both sides of the above equation. The series expansion for ejθ is obtained by evaluating Taylor’s formula at x = jθ, giving (see appendix): ∑ e jθ = ∞ 1 ( jθ)n n=0 n! (6.36) When this series expansion for ej θ is written in terms of its even part and odd part, we have the result: ∑ ∑ ∞ ejθ = 1 ∞ ( jθ)2m + 1 ( jθ)2m+1 m=0 (2m)! m=0 (2m + 1)! However, since j2 = –1, this last equation can also be written as: (6.37) ∑ ∑ e jθ = ∞ (−1)m (θ)2m + j ∞ (−1)m (θ)2m+1 m=0 (2m)! m=0 (2m + 1)! (6.38) which, by inspection, can be veriﬁed to be the sum of the Taylor expansions for the cosine and sine functions. In this notation, the product of two complex numbers z1 and z2 is: r1r2e j(θ1+θ2). It is then a simple matter to show that: If: z = r exp( jθ) (6.39) Then: z = r exp(− jθ) (6.40) © 2001 by CRC Press LLC and z−1 = 1 exp(− jθ) r from which we can deduce Euler’s equations: (6.41) cos(θ) = exp( jθ) + exp(− jθ) 2 (6.42) and sin(θ) = exp( jθ) − exp(− jθ) 2j (6.43) Example 6.3 Use MATLAB to generate the graph of the unit circle in the complex plane. Solution: Because all points on the unit circle are equidistant from the origin and their distance to the origin (their modulus) is equal to 1, we can generate the circle by plotting the N-roots of unity, taking a very large value for N. This can be implemented by executing the following script M-ﬁle. N=720; z=exp(j*2*pi*[1:N]./N); plot(z) axis square In-Class Exercises Pb. 6.20 Using the exponential form of the n-roots of unity, and the expression for the sum of a geometric series (given in the appendix), show that the sum of these roots is zero. Pb. 6.21 Compute the following sums: a. 1 + cos(x) + cos(2x) + … + cos(nx) b. sin(x) + sin(2x) + … + sin(nx) c. cos(α) + cos(α + β) + … + cos(α + nβ) d. sin(α) + sin(α + β) + … + sin(α + nβ) Pb. 6.22 Verify numerically that for z = x + jy: © 2001 by CRC Press LLC lni→m∞1 + z n n = exp(x)(cos(y) + j sin(y)) For what values of y is this quantity pure imaginary? Homework Problems Pb. 6.23 Plot the curves determined by the following parametric representations: a. z = 1 – jt 0 ≤ t ≤ 2 b. z = t + jt2 –∞ < t < ∞ c. z = 2(cos(t) + j sin(t)) d. z = 3(t + j – j exp(–jt)) π < t < 3π 2 2 0 0, the two roots are distinct and real. Call these roots α1 and α2; the solution is then: yhomog. = c1 exp(α1t) + c2 exp(α2t) (6.51) © 2001 by CRC Press LLC In many physical problems of interest, we desire solutions that are zero at inﬁnity, that is, decay over a ﬁnite time. This requires that both α1 and α2 be negative; or if only one of them is negative, that the c coefﬁcient of the exponentially increasing solution be zero. This class of solutions is called the overdamped class. • If b2 – 4ac = 0, the two roots are equal, and we call this root αdegen.. The solution to the differential equation is yhomog. = (c1 + c2 t) exp(αdegen.t) (6.52) The polynomial, multiplying the exponential function, is of degree one here because the degeneracy of the root is of degree two. This class of solutions is referred to as the critically damped class. • If b2 – 4ac < 0, the two roots are complex conjugates of each other, and their real part is negative for physically interesting cases. If we denote these roots by s± = –α ± jβ, the solutions to the homogeneous differential equations take the form: yhomog. = exp(–αt)(c1 cos(βt) + c2 sin(βt)) (6.53) This class of solutions is referred to as the under-damped class. In-Class Exercises Find and plot the transient solutions to the following homogeneous equations, using the indicated initial conditions: Pb. 6.28 a = 1, b = 3, c = 2 y(t = 0) = 1 y′(t = 0) = –3/2 Pb. 6.29 a = 1, b = 2, c = 1 y(t = 0) = 1 y′(t = 0) = 2 Pb. 6.30 a = 1, b = 5, c = 6 y(t = 0) = 1 y′(t = 0) = 0 6.5.2 Steady-State Solutions In this subsection, we ﬁnd the particular solutions of the ODEs when the driving force is a single-term sinusoidal. As pointed out previously, because of the superposition principle, it is also possible to write the steady-state solution for any combination of such inputs. This, combined with the Fourier series techniques (brieﬂy discussed in Chapter 7), will also allow you to write the solution for any periodic function. © 2001 by CRC Press LLC We discuss in detail the particular solution for the ﬁrst-order and the second-order differential equations because these represent, as previously shown in Section 4.7, important cases in circuit analysis. Example 6.5 Find the particular solution to the ﬁrst-order differential equation: a dy + by = A cos(ωt) dt (6.54) Solution: We guess that the particular solution of this ODE is a sinusoidal of the form: ypartic.(t) = B cos(ωt − φ) = B[cos(φ) cos(ωt) + sin(φ) sin(ωt)] = Bc cos(ωt) + Bs sin(ωt) (6.55) Our task now is to ﬁnd Bc and Bs that would force Eq. (6.55) to be the solution of Eq. (6.54). Therefore, we substitute this trial solution in the differential equation and require that, separately, the coefﬁcients of sin(ωt) and cos(ωt) terms match on both sides of the resulting equation. These requirements are necessary for the trial solution to be valid at all times. The resulting conditions are Bs = aω b Bc Bc = Ab a2ω2 + b2 (6.56) from which we can also deduce the polar form of the solution, giving: B2 = A2 a2ω2 + b2 tan(φ) = aω b (6.57) Example 6.6 Find the particular solution to the second-order differential equation: a d2y dt 2 + b dy dt + cy = A cos(ωt) (6.58) Solution: Again, take the trial particular solution to be of the form: © 2001 by CRC Press LLC ypartic.(t) = B cos(ωt − φ) = B[cos(φ) cos(ωt) + sin(φ) sin(ωt)] = Bc cos(ωt) + Bs sin(ωt) Repeating the same steps as in Example 6.5, we ﬁnd: Bs = (c − bω aω 2 )2 + ω 2b2 A Bc = (c − (c − aω2 ) aω 2 )2 + ω 2b2 A B2 = (c − A2 aω 2 )2 + ω2b2 tan(φ) = c bω − aω2 (6.59) (6.60) (6.61) 6.5.3 Applications to Circuit Analysis An important application of the above forms for the particular solutions is in circuit analysis with inductors, resistors, and capacitors as elements. We describe later a more efﬁcient analytical method (phasor representation) for solving this kind of problem; however, we believe that it is important that you also become familiar with the present technique. 6.5.3.1 RC Circuit Referring to the RC circuit shown in Figure 4.4, we derived the differential equation that the potential difference across the capacitor must satisfy; namely: RC dVC dt + VC = V0 cos(ωt) (6.62) This is a ﬁrst-order differential equation, the particular solution of which is given in Example 6.5 if we were to identify the coefﬁcients in the ODE as follows: a = RC, b = 1, A = V0. 6.5.3.2 RLC Circuit Referring to the circuit, shown in Figure 4.5, the voltage across the capacitor satisﬁes the following ODE: LC d 2Vc dt 2 + RC dVC dt + VC = V0 cos(ωt) (6.63) This equation can be identiﬁed with that given in Example 6.6 if the ODE coefﬁcients are speciﬁed as follows: a = LC, b = RC, c = 1, A = V0. © 2001 by CRC Press LLC In-Class Exercises Pb. 6.31 This problem pertains to the RC circuit: a. Write the output signal VC in the amplitude-phase representation. b. Plot the gain response as a function of a normalized frequency that you will have to select. (The gain of a circuit is deﬁned as the ratio of the amplitude of the output signal over the amplitude of the input signal.) c. Determine the phase response of the system (i.e., the relative phase of the output signal to that of the input signal as function of the frequency) also as function of the normalized frequency. d. Can this circuit be used as a ﬁlter (i.e., a device that lets through only a speciﬁed frequency band)? Specify the parameters of this band. Pb. 6.32 This problem pertains to the RLC circuit: a. Write the output signal VC in the amplitude-phase representation. b. Deﬁning the resonance frequency of this circuit as: ω0 = 1 , ﬁnd LC at which frequency the gain is maximum, and ﬁnd the width of the gain curve. c. Plot the gain curve and the phase curve for the following cases: ω0L = 0.1, 1, 10. R d. Can you think of a possible application for this circuit? Pb. 6.33 Can you think of a mechanical analog to the RLC circuit? Identify in that case the physical parameters in the corresponding ODE. Pb. 6.34 Assume that the source potential in the RLC circuit has ﬁve frequency components at ω, 2ω, …, 5ω of equal amplitude. Plot the input and output potentials as a function of time over the interval 0 < ωt < 2π. Assume that ω = ω0 = 1 and ω0L = 1. LC R 6.6 Phasors A technique in widespread use to compute the steady-state solutions of systems with sinusoidal input is the method of phasors. In this and the following two chapter sections, we deﬁne phasors, learn how to use them to add two or © 2001 by CRC Press LLC more signals having the same frequency, and how to ﬁnd the particular solution of an ODE with a sinusoidal driving function. There are two key ideas behind the phasor representation of a signal: 1. A real, sinusoidal time-varying signal may be represented by a complex time-varying signal. 2. This complex signal can be represented as the product of a complex number that is independent of time and a complex signal that is dependent on time. Example 6.7 Decompose the signal V = A cos(ωt + φ) according to the above prescription. Solution: This signal can, using the polar representation of complex numbers, also be written as: V = A cos(ωt + φ) = Re[A exp( j(ωt + φ))] = Re[Ae jφe jωt ] (6.64) where the phasor, denoted with a tilde on top of its corresponding signal symbol, is given by: V˜ = Ae jφ (6.65) (Warning: Do not mix the tilde symbol that we use here, to indicate a phasor, with the overbar that denotes complex conjugation.) Having achieved the above goal of separating the time-independent part of the complex number from its time-dependent part, we now learn how to manipulate these objects. A lot of insight can be immediately gained if we note that this form of the phasor is exactly in the polar form of a complex number, with clear geometric interpretation for its magnitude and phase. 6.6.1 Phasor of Two Added Signals The sum of two signals with common frequencies but different amplitudes and phases is Vtot. = Atot. cos(ωt + φtot. ) = A1 cos(ωt + φ1) + A2 cos(ωt + φ2 ) (6.66) To write the above result in phasor notation, note that the above sum can also be written as follows: Vtot. = Re[A1 exp( j(ωt + φ1)) + A2 exp( j(ωt + φ2 ))] = Re[(A1e jφ1 + A2e jφ2 )e jωt ] (6.67) © 2001 by CRC Press LLC and where V˜tot. = A e jφtot. tot. = V˜1 + V˜2 (6.68) Preparatory Exercise Pb. 6.35 Write the analytical expression for Atot. and φtot. in Eq. (6.68) as functions of the amplitudes and phases of signals 1 and 2. The above result can, of course, be generalized to the sum of many signals; speciﬁcally: N ∑ Vtot. = Atot. cos(ωt + φtot. ) = An cos(ωt + φn ) n=1 ∑ ∑ N N = Re n=1 An exp( jωt + jφ n ) = Ree jωt n=1 Ane jφn (6.69) and N ∑ V˜tot. = V˜n n=1 (6.70) ⇒ Atot. = V˜tot. (6.71) φtot. = arg(V˜tot. ) (6.72) That is, the resultant ﬁeld can be obtained through the simple operation of adding all the complex numbers (phasors) that represent each of the individual signals. Example 6.8 Given ten signals, the phasor of each of the form Ane jφn , where the ampli- tude and phase for each have the functional forms An = 1 n and φn = n2 , write a MATLAB program to compute the resultant sum phasor. © 2001 by CRC Press LLC Solution: Edit and execute the following script M-ﬁle: N=10; n=1:N; amplituden=1./n; phasen=n.^2; phasorn=amplituden.*exp(j.*phasen); phasortot=sum(phasorn); amplitudetot=abs(phasortot) phasetot=angle(phasortot) In-Class Exercises Pb. 6.36 Could you have estimated the answer to Example 6.8? Justify your reasoning. Pb. 6.37 Show that if you add N signals with the same magnitude and frequency but with phases equally distributed over the [0, 2π] interval, the resultant phasor will be zero. (Hint: Remember the result for the sum of the roots of unity.) Pb. 6.38 Show that the resultant signal from adding N signals having the same frequency has the largest amplitude when all the individual signals are in phase (this situation is referred to as maximal constructive interference). Pb. 6.39 In this problem, we consider what happens if the frequency and amplitude of N different signals are still equal, but the different phases of the signals are randomly distributed over the [0, 2π] interval. Find the amplitude of the resultant signal if N = 1000, and compare it with the maximal constructive interference result. (Hint: Recall that the rand(1,N) command generates a 1-D array of N random numbers from the interval [0, 1].) Pb. 6.40 The service provided to your home by the electric utility company is a two-phase service. This means that two 110-V/60-Hz hot lines plus a neutral (ground) line terminate in your panel. The hot lines are π out of phase. a. Which signal would you use to drive your clock radio or your toaster? b. What conﬁguration will you use to drive your oven or your dryer? Pb. 6.41 In most industrial environments, electric power is delivered in what is called a three-phase service. This consists of three 110-V/60-Hz lines with phases given by (0, 2π/3, 4π/3). What is the maximum voltage that you can obtain from any combination of two of these signals? Pb. 6.42 Two- and three-phase power can be extended to N-phase power. In such a scheme, the N-110-V/60-Hz signals are given by: © 2001 by CRC Press LLC Vn = 110 cos120t + 2Nπn and n = 0, 1,…, N − 1 While the sum of the voltage of all the lines is zero, the instantaneous power is not. Find the total power, assuming that the power from each line is proportional to the square of its time-dependent expression. (Hint: Use the double angle formula for the cosine function.) ∑ pn(t) = A2 cos2 ωt + 2Nπn and N −1 P = pn n=0 NOTE Another designation in use for a 110-V line is an rms value of 110, and not the value of the maximum amplitude as used above. 6.7 Interference and Diffraction of Electromagnetic Waves 6.7.1 The Electromagnetic Wave Electromagnetic waves (em waves) are manifest as radio and TV broadcast signals, microwave communication signals, light of any color, X-rays, γ-rays, etc. While these waves have different sources and methods of generation and require different kinds of detectors, they do share some general characteristics. They differ from each other only in the value of their frequencies. Indeed, it was one of the greatest intellectual achievements of the 19th century when Maxwell developed the system of equations, now named in his honor, to describe these waves’ commonality. The most important of these properties is that they all travel in a vacuum with, what is called, the speed of light c (c = 3 × 108 m/s). The detailed study of these waves is the subject of many electrophysics subspecialties. Electromagnetic waves are traveling waves. To understand their mathematical nature, consider a typical expression for the electric ﬁeld associated with such waves: E(z, t) = E0 cos[kz – ωt] (6.73) Here, E0 is the amplitude of the wave, z is the spatial coordinate parallel to the direction of propagation of the wave, and k is the wavenumber. © 2001 by CRC Press LLC Note that if we plot the ﬁeld for a ﬁxed time, for example, at t = 0, the ﬁeld takes the shape of a sinusoidal function in space: E(z, t = 0) = E0 cos[kz] (6.74) From the above equation, one deduces that the wavenumber k = 2π/λ, where λ is the wavelength of the wave (i.e., the length after which the wave shape reproduces itself). Now let us look at the ﬁeld when an observer, located at z = 0, would measure it as a function of time. Then: E(z = 0, t) = E0 cos[ωt] (6.75) The temporal period, that is, the time after which the wave shape reproduces itself, is T= 2π ω , where ω is the angular frequency of the wave. Next, we want to relate the wavenumber to the angular frequency. To do that, consider an observer located at z = 0. The observer measures the ﬁeld at t = 0 to be E0. At time ∆t later, he should measure the same ﬁeld, whether he uses Eq. (6.74) or (6.75) if he takes ∆z = c∆t, the distance that the wave crest has moved, and where c is the speed of propagation of the wave. From this, one deduces that the wavenumber and the angular frequency are related by kc = ω. This relation holds true for all electromagnetic waves; that is, as the frequency increases, the wavelength decreases. If two traveling waves have the same amplitude and frequency, but one is traveling to the right while the other is traveling to the left, the result is a standing wave. The following program permits visualization of this standing wave. x=0:0.01:5; a=1; k=2*pi; w=2*pi; t=0:0.05:2; M=moviein(41); for m=1:41; z1=cos(k*x-w*t(m)); z2=cos(k*x+w*t(m)); z=z1+z2; plot(x,z,'r'); axis([0 5 -3 3]); © 2001 by CRC Press LLC M(:,m)=getframe; end movie(M,20) Compare the spatio-temporal proﬁle of the resultant to that for a single wave (i.e., set x2 = 0). 6.7.2 Addition of Two Electromagnetic Waves In many practical instances, we are faced with the problem that two em waves originating from the same source, but following different spatial paths, meet again at a certain position. We want to ﬁnd the total ﬁeld at this position resulting from adding the two waves. We ﬁrst note that, in the simplest case where the amplitude of the two ﬁelds are kept equal, the effect of the different paths is only to dephase one of the waves from the other by an amount: ∆φ = k∆l, where ∆l is the path difference. In effect, the total ﬁeld is given by: Etot.(t) = E0 cos[ωt + φ1] + E0 cos[ωt + φ2 ] (6.76) where ∆φ = φ1 – φ2. This form is similar to those studied in the addition of two phasors and we will hence describe the problem in this language. The resultant phasor is E˜tot. = E˜1 + E˜ 2 (6.77) Preparatory Exercise Pb. 6.43 Find the modulus and the argument of the resultant phasor given in Eq. (6.74) as a function of E0 and ∆φ. From this expression, deduce the relation that relates the path difference corresponding to when the resultant phasor has maximum magnitude and that when its magnitude is a minimum. The curve describing the modulus square of the resultant phasor is what is commonly referred to as the interference pattern of two waves. 6.7.3 Generalization to N-waves The addition of electromagnetic waves can be generalized to N-waves. © 2001 by CRC Press LLC Example 6.9 Find the resultant ﬁeld of equal-amplitude N-waves, each phase-shifted from the preceding by the same ∆φ. Solution: The problem consists of computing an expression of the following kind: E˜tot. = E˜1 + E˜ 2 + … + E˜ n = E0 (1 + e j ∆φ + e j 2∆φ + … + e j(N−1)∆φ ) (6.78) We have encountered such an expression previously. This sum is that corresponding to the sum of a geometric series. Computing this sum, the modulus square of the resultant phasor is E˜ tot. 2 = E02 (1 − e j N ∆φ ) (1 − e j∆φ ) (1 − e− j N ∆φ ) (1 − e− j∆φ ) = E02 1 − cos(N ∆φ) 1 − cos(∆φ) = E02 sin2(N ∆φ / 2) sin2(∆φ / 2) (6.79) Because the source is the same for each of the components, the modulus of each phasor is related to the source amplitude by E0 = Esource/N. It is usually as function of the source ﬁeld that the results are expressed. In-Class Exercises Pb. 6.44 Plot the normalized square modulus of the resultant of N-waves as a function of ∆φ for different values of N (5, 50, and 500) over the interval –π < ∆φ < π. Pb. 6.45 Find the dependence of the central peak value of Eq. (6.79) on N. Pb. 6.46 Find the phase shift that corresponds to the position of the ﬁrst minimum of Eq. (6.79). Pb. 6.47 Find in Eq. (6.79) the relative height of the ﬁrst maximum (i.e., the one following the central maximum) to that of the central maximum as a function of N. Pb. 6.48 In an antenna array with the ﬁeld representing N aligned, equally spaced individual antennae excited by the same source is given by Eq. (6.78). If the line connecting the point of observation to the center of the array is making an angle θ with the antenna array, the phase shift is ∆φ = 2π λ d cos(θ), © 2001 by CRC Press LLC where λ is the wavelength of radiation and d is the spacing between two consecutive antennae. Draw the polar plot of the total intensity as function of the angle θ for a spacing d = λ/2 for different values of N (2, 4, 6, and 10). Pb. 6.49 Do the results of Pb. 6.48 suggest to you a strategy for designing a multi-antenna system with sharp directivity? Can you think of a method, short of moving the antennae around, that permits this array to sweep a range of angles with maximum directivity? Pb. 6.50 The following program simulates a 25-element array-swept radar beam. th=0:0.01:pi; t=-0.5*sqrt(3):0.05*sqrt(3):0.5*sqrt(3); N=25; M=moviein(21); for m=1:21; I=(1/N^2)*(sin(N*((pi/4)*cos(th)+(pi/4)*t(m)))... ^2)./((sin((pi/4)*cos(th)+(pi/4)*t(m))).^2); polar(th,I); M(:,m)=getframe; end movie(M,10) a. Determine the range of the sweeping angle. b. Can you think of an electronic method for implementing this task? 6.8 Solving ac Circuits with Phasors: The Impedance Method In Section 6.5, we examined the conventional technique for solving some simple ac circuits problems. We suggested that using phasors may speed up the determination of the solution. This is the subject of this chapter section. We will treat, using this technique, the simple RLC circuit already solved through other means in order to give you a measure of the simpliﬁcations that can be achieved in circuit analysis through this technique. We then proceed to use the phasor technique to investigate another circuit conﬁguration: the inﬁnite LC ladder. The power of the phasor technique will also be put to use when we, topologically, solve much more difﬁcult circuit problems than the one-loop category encountered thus far. Essentially, a straightforward © 2001 by CRC Press LLC algebraic technique can give the voltages and currents for any circuit. We illustrate this latter case in Chapter 8. Recalling that the voltage drops across resistors, inductors, and capacitors can all be expressed as function of the current, its derivative, and its integral, our goal is to ﬁnd a technique to replace these operators by simple algebraic operations. The key to achieving this goal is to realize that: If: I = I0 cos(ωt + φ) = Re[e jωt (I0e jφ )] (6.80) Then: and dI dt = −I0ω sin(ωt + φ) = Re[e jωt (I0 ( jω)e jφ )] (6.81) ∫ Idt = I0 ω sin(ωt + φ) = Re e jωt I0 1 jω e jφ (6.82) From Eqs. (4.25) to (4.27) and Eqs. (6.80) to (6.82), we can deduce that the phasors representing the voltages across resistors, inductors, and capacitors can be written as follows: V˜R = I˜R = I˜ZR V˜L = I˜( jωL) = I˜ZL V˜C = I˜ ( jωC) = I˜ZC (6.83) (6.84) (6.85) The terms multiplying the current phasor on the RHS of each of the above equations are called the resistor, the inductor, and the capacitor impedances, respectively. 6.8.1 RLC Circuit Phasor Analysis Let us revisit this problem ﬁrst discussed in Section 4.7. Using Kirchoff’s voltage law and Eqs. (6.83) to (6.85), we can write the following relation between the phasor of the current and that of the source potential: V˜s = I˜R + I˜( jωL) + I˜ ( jωC) = I˜ R + jωL + 1 jωC (6.86) © 2001 by CRC Press LLC That is, we can immediately compute the modulus and the argument of the phasor of the current if we know the values of the circuit components, the source voltage phasor, and the frequency of the source. In-Class Exercises Using the expression for the circuit resonance frequency ω0 previously introduced in Pb. 6.32, for the RLC circuit: Pb. 6.51 Show that the system’s total impedance can be written as: Z = R + jω0L ν − 1 ν , where ν = ω = ω LC ω0 Pb. 6.52 Show that Z(ν) = Z(1/ ν); and from this result, deduce the value of ν at which the impedance is entirely real. Pb. 6.53 Find the magnitude and the phase of the total impedance. Pb. 6.54 Selecting for the values of the circuit elements LC = 1, RC = 3, and ω = 1, compare the results that you obtain through the phasor analytical method with the numerical results for the voltage across the capacitor in an RLC circuit that you found while solving Eq. (4.36). The Transfer Function As you would have discovered solving Pb. 6.54, the ratio of the phasor of the potential difference across the capacitor with that of the ac source can be directly calculated once the value of the current phasor is known. This ratio is called the Transfer Function for this circuit if the voltage across the capacitor is taken as the output of this circuit. It is obtained by combining Eqs. (6.85) and (6.86) and is given by: V˜c V˜s = 1 ( jωRC − ω2LC + 1) = H(ω) (6.87) The Transfer Function concept can be generalized to any ac circuit. It refers to the ratio of the output voltage phasor to the input voltage phasor. It incorporates all the relevant information on the details of the circuit. It is the standard form for representing the response of a circuit to a single sinusoidal function input. © 2001 by CRC Press LLC Homework Problem Pb. 6.55 Plot the magnitude and the phase of theTransfer Function given in Eq. (6.87) as a function of ω, for LC = 1, RC = 3. 6.8.2 The Inﬁnite LC Ladder The LC ladder consists of an inﬁnite repetition of the basic elements shown in Figure 6.2. Vn Vn+1 Z1 Z1 In In+1 Z2 Z2 Z2 (In – In+1) FIGURE 6.2 The circuit of an inﬁnite LC ladder. Using the deﬁnition of impedances, the phasors of the n and (n + 1) voltages and currents are related through: V˜n − V˜n+1 = Z1I˜n (6.88) V˜n+1 = (I˜n − I˜n+1)Z2 (6.89) From Eq. (6.88), we deduce the following expressions for I˜n and I˜n+1: I˜n = V˜n − V˜n+1 Z1 (6.90) I˜n+1 = V˜n+1 − V˜n+2 Z1 (6.91) Substituting these values for the currents in Eq. (6.89), we deduce a secondorder difference equation for the voltage phasor: V˜n+2 − Z1 Z2 + 2 V˜n+1 + V˜n = 0 (6.92) © 2001 by CRC Press LLC The solution of this difference equation can be directly obtained by the techniques discussed in Chapter 2 for obtaining solutions of homogeneous difference equations. The physically meaningful solution is given by: λ = 1+ 1 Z2 Z1 2 − Z12 4 + Z2 Z1 and the voltage phasor at node n is then given by: (6.93) V˜n = V˜sλn (6.94) We consider the model where Z1 = jωL and Z2 = 1/(jωC), respectively, for an inductor and a capacitor. The expression for λ then takes the following form: λ = 1 − υ2 2 − j υ2 − υ4 4 1/2 (6.95) where the normalized frequency is deﬁned by υ = ω / ω0 = ω LC. We plot in Figure 6.3 the magnitude and the phase of the root λ as function of the normalized frequency. As can be directly observed from an examination of Figure 6.3, the magnitude of λ is equal to 1 (i.e., the magnitude of V˜n is also 1) for υ < υcutoff = 2, while it drops precipitously after that, with the dropoff in the potential much steeper with increasing node number. Physically, this represents extremely short penetration through the ladder for signals with frequencies larger than the cutoff frequency. Furthermore, note that for υ < υcutoff = 2, the phase of V˜n increases linearly with the index n; and because it is negative, it corresponds to a delay in the signal as it propagates down the ladder, which corresponds to a ﬁnite velocity of propagation for the signal. Before we leave this ladder circuit, it is worth addressing a practical concern. While it is impossible to realize an inﬁnite-dimensional ladder, the above conclusions do not change by much if we replace the inﬁnite ladder by a ﬁnite ladder and we terminate it after awhile by a resistor with resistance equal to L / C. In-Class Exercise Pb. 6.56 Repeat the analysis given above for the LC ladder circuit, if instead we were to: a. Interchange the positions of the inductors and the capacitors in the ladder circuit. Based on this result and the above LC result, can you design a bandpass ﬁlter with a ﬂat response? © 2001 by CRC Press LLC b. Interchange the inductor elements by resistors. In particular, compute the input impedance of this circuit. FIGURE 6.3 The magnitude (left panel) and the phase (right panel) of the characteristic root of the inﬁnite LC ladder. 6.9 Transfer Function for a Difference Equation with Constant Coefﬁcients* In Section 6.8.1, we found the Transfer Function for what essentially was a simple ODE. In this section, we generalize the technique to ﬁnd the Transfer Function of a difference equation with constant coefﬁcients. The form of the difference equation is given by: y(k) = b0u(k) + b1u(k − 1) + … + bmu(k − m) − a1y(k − 1) − a2y(k − 2) − … − any(k − n) (6.96) Along the same route that we followed in the phasor treatment of ODE, assume that both the input and output are of the form: © 2001 by CRC Press LLC u(k) = Ue jΩk and y(k) = Ye jΩk (6.97) where Ω is a normalized frequency; typically, in electrical engineering applications, the real frequency multiplied by the sampling time. Replacing these expressions in the difference equation, we obtain: m m ∑ ∑ Y ∑ ∑ U = ble − jΩl l=0 n 1+ ale − jΩl = bl z−l l=0 n 1 + alz−l ≡ H(z) l=1 l=1 where, by convention, z = ejΩ. Example 6.10 Find the Transfer Function of the following difference equation: (6.98) y(k) = u(k) + 2 y(k − 1) − 1 y(k − 2) 3 3 Solution: By direct substitution into Eq. (6.98), we ﬁnd: (6.99) H(z) = 1− 1 2 z−1 + 1 z−2 = z2 z2 − 2z+ 1 33 33 (6.100) It is to be noted that the Transfer Function is a ratio of two polynomials. The zeros of the numerator are called the zeros of the Transfer Function, while the zeros of the denominator are called its poles. If the coefﬁcients of the difference equations are real, then by the Fundamental Theorem of Algebra, the zeros and the poles are either real or are pairs of complex conjugate numbers. The Transfer Function fully describes any linear system. As will be shown in linear systems courses, the z-transform of the Transfer Function gives the weights for the solution of the difference equation, while the values of the poles of the Transfer Function determine what are called the system modes of the solution. These are the modes intrinsic to the circuit, and they do not depend on the speciﬁc form of the input function. Furthermore, it is worth noting that the study of recursive ﬁlters, the backbone of digital signal processing, can be simply reduced to a study of the Transfer Function under different conﬁgurations. In Applications 2 and 3 that follow, we brieﬂy illustrate two particular digital ﬁlters in wide use. © 2001 by CRC Press LLC Application 1 Using the Transfer Function formalism, we want to estimate the accuracy of the three integrating schemes discussed in Chapter 4. We want to compare the Transfer Function of each of those algorithms to that of the exact result, obtained upon integrating exactly the function ejωt. The exact result for integrating the function ejωt is, of course, e jωt , thus givjω ing for the exact Transfer Function for integration the expression: H exact = 1 jω (6.101) Before proceeding with the computation of the transfer function for the different numerical schemes, let us pause for a moment and consider what we are actually doing when we numerically integrate a function. We go through the following steps: 1. We discretize the time interval over which we integrate; that is, we deﬁne the sampling time ∆t, such that the discrete points abscissa are given by k(∆t), where k is an integer. 2. We write a difference equation for the integral relating its values at the discrete points with its values and that of the integrand at discrete points with equal or smaller indices. 3. We obtain the value of the integral by iterating the deﬁning difference equation. The test function used for the estimation of the integration methods accuracy is written at the discrete points as: y(k) = e j kω(∆t) (6.102) The difference equations associated with each of the numerical integration schemes are: IT (k + 1) = IT (k) + ∆t 2 (y(k + 1) + y(k)) (6.103) IMP (k + 1) = IMP (k) + ∆ty(k + 1/ 2) (6.104) IS(k + 1) = IS(k − 1) + ∆t 3 (y(k + 1) + 4y(k) + y(k − 1)) (6.105) leading to the following expressions for the respective Transfer Functions: © 2001 by CRC Press LLC HT = ∆t 2 e jω(∆t) e jω(∆t) +1 −1 (6.106) H MP = ∆t e jω(∆t)/2 e jω(∆t) − 1 (6.107) HS = ∆t 3 (e jω(∆t) + 4 + e − jω(∆t) ) e jω(∆t) − e − jω(∆t) (6.108) The measures of accuracy of the integration scheme are the ratios of these Transfer Functions to that of the exact expression. These are given, respectively, by: RT = (ω∆t / 2) cos(ω∆t / 2) sin(ω∆t / 2) (6.109) RMP = (ω∆t / 2) sin(ω∆t / 2) (6.110) RS = ω∆t 3 cos(ω∆t) + 2 sin(ω∆t) (6.111) Table 6.1 gives the value of this ratio as a function of the number of sampling points, per oscillation period, selected in implementing the different integration subroutines: TABLE 6.1 Accuracy of the Different Elementary Numerical Integrating Methods Number of Sampling Points in a Period 100 50 40 30 20 10 5 RT 0.9997 0.9986 0.9978 0.9961 0.9909 0.9591 0.7854 RMP 1.0002 1.0007 1.0011 1.0020 1.0046 1.0206 1.1107 RS 1.0000 1.0000 1.0000 1.0000 1.0001 1.0014 1.0472 As can be noted, the error is less than 1% for any of the discussed methods as long as the number of points in one oscillation period is larger than 20, although the degree of accuracy is best, as we expected based on geometrical arguments, for Simpson’s rule. In a particular application, where a ﬁnite number of frequencies are simultaneously present, the choice of (∆t) for achieving a speciﬁed level of accuracy © 2001 by CRC Press LLC in the integration subroutine should ideally be determined using the shortest of the periods present in the integrand. Application 2 As mentioned earlier, the Transfer Function technique is the prime tool for the analysis and design of digital ﬁlters. In this and the following application, we illustrate its use in the design of a low-pass digital ﬁlter and a digital prototype bandpass ﬁlter. The low-pass ﬁlter, as its name indicates, ﬁlters out the high-frequency components from a signal. Its deﬁning difference equation is given by: y(k) = (1 − a)y(k − 1) + au(k) giving for its Transfer Function the expression: (6.112) H(z) = 1− a (1 − a)z −1 Written as a function of the normalized frequency, it is given by: (6.113) H(e jΩ ) = e jΩ ae jΩ − (1 − a) (6.114) We plot, in Figure 6.4, the magnitude and the phase of the transfer function as a function of the normalized frequency for the value of a = 0.1. Note that the gain is equal to 1 for Ω = 0, and decreases monotonically thereafter. To appreciate the operation of this ﬁlter, consider a sinusoidal signal that has been contaminated by the addition of noise. We can simulate the noise by adding to the original signal an array consisting of random numbers with maximum amplitude equal to 20% of the original signal. The top panel of Figure 6.5 represents the contaminated signal. If we pass this signal through a lowpass ﬁlter, the lower panel of Figure 6.5 shows the outputted ﬁltered signal. As can be observed, the noise, which is a high-frequency signal, has been ﬁltered out and the signal shape has been almost restored to its original shape before that noise was added. The following script M-ﬁle simulates the above operations: t=linspace(0,4*pi,300); N=length(t); s=sin(t); n=0.3*rand(1,N); u=s+n; © 2001 by CRC Press LLC FIGURE 6.4 The gain (top panel) and phase (bottom panel) responses of a low-pass ﬁlter as a function of the frequency. FIGURE 6.5 The action of a low-pass ﬁlter. Top panel: Proﬁle of the signal contaminated by noise. Bottom panel: Proﬁle of the ﬁltered signal. © 2001 by CRC Press LLC y(1)=u(1); for k=2:N y(k)=+0.9*y(k-1)+0.1*u(k); end subplot(2,1,1) plot(t,u) axis([0 4*pi -1.5 1.5]); title('Noisy Signal') subplot(2,1,2) plot(t,y) title('Filtered Signal') axis([0 4*pi -1.5 1.5]); Application 3 The digital prototype bandpass ﬁlter ideally ﬁlters out from a signal all frequencies lower than a given frequency and higher than another frequency. In practice, the cutoffs are not so sharp and the lower and higher cut-off frequencies of the bandpass are deﬁned as those at which the gain curve (i.e., the magnitude of the Transfer Function as function of the frequency) is at (1/ 2 ) its maximum value. The difference equation that describes this prototype ﬁlter is y(k) = {(1 − r) 1 − 2r cos(2Ω0 ) + r2 }u(k) + 2r cos(Ω0 )y(k − 1) − r2y(k − 2) (6.115) where Ω0 is the normalized frequency with maximum gain and r is a number close to 1. The purpose of the following analysis is, given the lower and higher cutoff normalized frequencies, to ﬁnd the quantities Ω0 and r in the above difference equation. The Transfer Function for the above difference equation is given by: where H(z) = z2 − g0z2 2r cos(Ω0 )z + r2 (6.116) g0 = (1 − r) 1 − 2r cos(2Ω0 ) + r2 (6.117) © 2001 by CRC Press LLC and z = ejΩ The gain of this ﬁlter, or equivalently the magnitude of the Transfer Function, is where H(e jΩ ) = (1 − r) 1 − (1 + Ar + 2r cos(2Ω0 ) + r2 Br2 + Ar3 + r 4 ) (6.118) A = −4 cos(Ω) cos(Ω0 ) (6.119) B = 4 cos2(Ω) + 4 cos2(Ω0 ) − 2 (6.120) The lower and upper cutoff frequencies are deﬁned, as previously noted, by the condition: H(e jΩ(1,2) ) = 1 2 (6.121) Substituting condition (6.121) in the gain expression (6.118) leads to the conclusion that the cutoff frequencies are obtained from the solutions of the following quadratic equation: cos 2 (Ω) − (1 + r 2 ) cos(Ω0 r ) cos(Ω) + (1 − r)2 4r2 [4r cos(2Ω0 ) − (1 − r)2 ] + cos2 (Ω0 ) = 0 (6.122) Adding and subtracting the roots of this equation, we deduce after some straightforward algebra, the following determining equations for Ω0 and r: 1. r is the root in the interval [0, 1] of the following eighth-degree polynomial: r8 + (a − b)r6 − 8ar5 + (14a − 2b − 2)r4 − 8ar3 + (a − b)r2 + 1 = 0 (6.123) where a = (cos(Ω1) + cos(Ω2 ))2 (6.124) © 2001 by CRC Press LLC 2. Ω0 is given by: b = (cos(Ω1) − cos(Ω2 ))2 Ω0 = cos −1 ra1/2 1+ r2 (6.125) (6.126) Example 6.12 Write a program to determine the parameters r and Ω0 of a prototype bandpass ﬁlter if the cutoff frequencies and the sampling time are given. Solution: The following script M-ﬁle implements the above target: f1= ; %enter the lower cutoff f2= ; %enter the upper cutoff tau= ; %enter the sampling time w1=2*pi*f1*tau; w2=2*pi*f2*tau; a=(cos(w1)+cos(w2))^2; b=(cos(w1)-cos(w2))^2; p=[1 0 a-b -8*a 14*a-2*b-2 -8*a a-b 0 1]; rr=roots(p); r=rr(find(rr>0 & rr<1 & imag(rr)==0)) w0=acos((r*a^(1/2))/(1+r^2)); f0=(1/(2*pi*tau))*w0 In Figure 6.6, we show the gain and phase response for this ﬁlter, for the case that the cutoff frequencies are chosen to be 1000 Hz and 1200 Hz, and the sampling rate is 10 µs. To test the action of this ﬁlter, we input into it a signal that consists of a mixture of a sinusoid having a frequency at the frequency of the maximum gain of this ﬁlter and a number of its harmonics; for example, u(t) = sin(2πf0t) + 0.5 sin(4πf0t) + 0.6 sin(6πf0t) (6.127) We show in Figure 6.7 the input and the ﬁltered signals. As expected from an analysis of the gain curve, only the fundamental frequency signal has survived. The amplitude of the ﬁltered signal settles to that of the fundamental frequency signal following a short transient period. NOTE Before leaving this topic, it is worth noting that the above prototype bandpass ﬁlter can have sharper cutoff features (i.e., decreasing the value of © 2001 by CRC Press LLC FIGURE 6.6 The transfer function of a prototype bandpass ﬁlter. Top panel: Plot of the gain curve as function of the normalized frequency. Bottom panel: Plot of the phase curve as function of the normalized frequency. FIGURE 6.7 The ﬁltering action of a prototype bandpass ﬁlter. Top panel: Input signal consists of a combination of a fundamental frequency signal (equal to the frequency corresponding to the ﬁlter maximum gain) and two of its harmonics. Bottom panel: Filtered signal. © 2001 by CRC Press LLC the gain curve for frequencies below the lower cutoff and higher than the upper cutoff) through having many of these prototype ﬁlters in cascade. This will be a topic of study in future linear system or ﬁlter design courses. In-Class Exercises Pb. 6.59 Work out the missing algebraic steps in the derivation leading to Eqs. (6.123) through (6.126). Pb. 6.60 Given the following values for the lower and upper cutoff frequencies and the sampling time: f1 = 200 Hz; f2 = 400 Hz; τ = 10–5 s ﬁnd f0 and plot the gain curve as function of the normalized frequency for the bandpass prototype ﬁlter. 6.10 MATLAB Commands Review abs Computes the modulus of a complex number. angle Computes the argument of a complex number. conj Computes the complex conjugate of a complex number. find Finds the locations of elements in an array that satiﬁes certain speciﬁed conditions. imag Computes the imaginary part of a complex number. real Computes the real part of a complex number. © 2001 by CRC Press LLC 7 Vectors 7.1 Vectors in Two Dimensions (2-D) A vector in 2-D is deﬁned by its length and the angle it makes with a reference axis (usually the x-axis). This vector is represented graphically by an arrow. The tail of the arrow is called the initial point of the vector and the tip of the arrow is the terminal point. Two vectors are equal when both their length and angle with a reference axis are equal. 7.1.1 Addition The sum of two vectors r u + r v = r w is a vector constructed graphically as fol- lows. At the tip of the ﬁrst vector, draw a vector equal to the second vector, such that its tail coincides with the tip of the ﬁrst vector. The resultant vector has as its tail that of the ﬁrst vector, and as its tip, the tip of the just-drawn second vector (the Parallelogram Rule) (see Figure 7.1). The negative of a vector is that vector whose tip and tail have been exchanged from those of the vector. This leads to the conclusion that the dif- ference of two vectors is the other diagonal in the parallelogram (Figure 7.2). 7.1.2 Multiplication of a Vector by a Real Number r If we multiply a vector v by ar real number k, the result is a verctor whose length is k times the length of v , and whose direction is that of v if k is pos- itive, and opposite if k is negative. 7.1.3 Cartesian Representation It is most convenient for a vector to be described by its projections on the x-axis and on the y-axis, respectively; these are denoted by (v1, v2) or (vx, vy). In this representation: 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC FIGURE 7.1 Sum of two vectors. FIGURE 7.2 Difference of two vectors. r u = (u1 , u2 ) = (u1)ê1 + (u2 )ê2 (7.1) where ê1 and ê2 are the unit vectors (length is 1) parallel to the x-axis and y-axis, respectively. In terms of this representation, we can write the zero vector, the sum of two vectors, and the multiplication of a vector by a real number as follows: © 2001 by CRC Press LLC r 0 = (0, 0) = 0ê1 + 0ê2 rr r u + v = w = (u1 + v1 , u2 + v2 ) = (u1 + v1)ê1 + (u2 + v2 )ê2 r ku = (ku1 , ku2 ) = (ku1)ê1 + (ku2 )ê2 (7.2) (7.3) (7.4) Preparatory Exercise Pb. 7.1 Using the above deﬁnitions and properties, prove the following identities: r u + r v = r v + r u r (u + r v) + r w = r u + r (v + r w) r u + r 0 = r 0 + r u = r u r u + (−ur) = r 0 r k(lu) = r (kl)u r k(u + r v) = r ku + r kv (k + r l)u = r ku + r lu The norm of a vector is the length of this vector. Using the Pythagorean theorem, its square is: r u 2 = u12 + u22 (7.5) r and therefore the unit vector in the u direction, denoted by êu, is given by: êu = 1 u12 + u22 (u1 , u2 ) (7.6) All of the above can be generalized to 3-D, or for that matter to n-dimensions. For example: êu = u12 1 + u22 + …un2 (u1 , u2 ,…, un ) (7.7) © 2001 by CRC Press LLC 7.1.4 MATLAB Representation of the Above Results MATLAB distinguishes between two kinds of vectors: the column vector and the row vector. As long as the components of the vectors are all real, the difference between the two is in the structure of the array. In the column vector case, the array representation is vertical and in the row vector case, the array representation is horizontal. This distinction is made for the purpose of including in a consistent structure the formulation of the dot product and the deﬁnition of matrix multiplication. Example 7.1 Type and execute the following commands, while interpreting the output at each step: V=[1 3 5 7] W=[1;3;5;7] V' U=3*V Z=U+V Y=V+W %you cannot add a row vector and a column %vector You would have observed that: 1. The difference in the representation of the column and row vectors is in the manner they are separated inside the square brackets. 2. The single quotation mark following a vector with real components changes that vector from being a column vector to a row vector, and vice versa. 3. Multiplying a vector by a scalar simply multiplies each component of this vector by this scalar. 4. You can add two vectors of the same kind and the components would be adding by pairs. 5. You cannot add two vectors of different kinds; the computer will give you an error message alerting you that you are adding two quantities of different dimensions. The MATLAB command for obtaining the norm of a vector is norm. Using this notation, it is a simple matter to deﬁne the unit vector in the same direction as a given vector. Example 7.2 Find the length of the vector and the unit vector u = [1 5 3 2] and the unit vector parallel to it. © 2001 by CRC Press LLC u=[1 5 3 2] lengthu=norm(u) %length of vector u unitu=u/(norm(u)) %unit vector parallel to u lengthunitu=norm(unitu) %verify length of unit vector FIGURE 7.3 The geometry of the generalized Pythagorean theorem. 7.2 Dot (or Scalar) Product If the angle between the vectors r u and r v is θ, then the dot product of the two vectors is: r u ⋅ r v = r u r v cos(θ) (7.8) The dot product can also be expressed as a function of the vectors components. Referring to Figure 7.3, we know from trigonometry the relation relating the length of one side of a triangle with the length of the other two sides and the cosine of the angle between the other two sides. This relation is the generalized Pythagorean theorem. Referring to Figure 7.3, this gives: PQ 2 = r u 2 + r v 2 − 2 r u r v cos(θ) (7.9) but since: © 2001 by CRC Press LLC r PQ = r v − r u ⇒ r u r v cos(θ) = 1( r u 2 + r v 2 − r v − r u 2 ) 2 and the dot product can be written as: (7.10) (7.11) r u ⋅ r v = 1 2 (u12 + u22 + v12 + v22 − (v1 − u1 )2 − (v2 − u2 )2 = u1v1 + u2 v2 (7.12) In an n-dimensional space, the above expression is generalized to: rr u ⋅ v = u1v1 + u2v2 + … + unvn (7.13) and the norm square of the vector can be written as the dot product of the vector with itself; that is, r u 2 = r u ⋅ r u = u12 + u22 + … + un2 (7.14) Example 7.3 r Parallrelism and orthorgonality of two vecrtors in a plane. Let the vectors u and v be given by: u = 3ê1 + 4ê2 and v = aê1 + 7ê2. What is the value of a if the vectors are parallel, and if the vectors are orthogonal? Solution: Case 1: If the vectors are parallel, this means that they make the same angle with the x-axis. The tangent of this angle is equal to the ratio of the vector x-component to its y-component. This means that: a = 3 ⇒ a = 21/ 4 74 Case 2: If the vectors are orthogonal, this means that the angle between them is 90°, and their dot product will be zero because the cosine for that angle is zero. This implies that: 3a + 28 = 0 ⇒ a = −28 / 3 Example 7.4 Find the unit vector in 2-D that is perpendicular to the line ax + by + c = 0. © 2001 by CRC Press LLC Solution: Choose two arbitrary points on this line. Denote their coordinates by (x1, y1) and (x2, y2); being on the line, they satisfy the equation of the line: ax1 + by1 + c = 0 ax2 + by2 + c = 0 Substracting the ﬁrst equation from the second equation, we obtain: a(x2 − x1) + b(y2 − y1) = 0 which means that (a, b) ⊥ (x2 − x1, y2 − y1), and the unit vector perpendicular to the line is: ê⊥ = a, a2 + b2 b a2 + b2 Example 7.5 Find the angle that the lines 3x + 2y + 2 = 0 and 2x – y + 1 = 0 and make together. Solution: The angle between two lines is equal to the angle between their normal unit vectors. The unit vectors normal to each of the lines are, respectively: nˆ1 = 3, 13 2 13 and nˆ 2 = 2 5 , −1 5 Having the two orthogonal unit vectors, it is a simple matter to compute the angle between them: cos(θ) = nˆ1 ⋅ nˆ2 = 4 ⇒ θ = 1.0517 radians 65 7.2.1 MATLAB Representation of the Dot Product The dot product is written as the product of a row vector by a column vector of the same length. Example 7.6 Find the dot product of the vectors: © 2001 by CRC Press LLC u = [1 5 3 7] and v = [2 4 6 8] Solution: Type and execute each of the following commands, while interpreting each output: u=[1 5 3 7] v=[2 4 6 8] u*v' v'*u u*v u'*v u*u' (norm(u))^2 %you cannot multiply two rows As observed from the above results, in MATLAB, the dot product can be obtained only by the multiplication of a row on the left and a column of the same length on the right. If the order of a row and column are exchanged, we obtain a two-dimensional array structure (i.e., a matrix, the subject of Chapter 8). On the other hand, if we multiply two rows, MATLAB gives an error message about the non-matching of dimensions. Observe further, as pointed out previously, the relation between the length of a vector and its dot product with itself. In-Class Exercises Pb. 7.2 Generalize the analytical technique, as previously used in Example 7.4 for ﬁnding the normal to a line in 2-D, to ﬁnd the unit vector in 3-D that is perpendicular to the plane: ax + by + cz + d = 0 (Hint: A vector is perpendicular to a plane if it is perpendicular to two noncollinear vectors in that plane.) Pb. 7.3 Find, in 2-D, the distance of the point P(x0, y0) from the line ax + by + c = 0. (Hint: Remember the geometric deﬁnition of the dot product.) Pb. 7.4 Prove the following identities: r u ⋅ r v = r v ⋅ r u, r u ⋅ r (v + r w) = r u ⋅ r v + r u ⋅ r w, k ⋅ r (u ⋅ r v) = r (ku) ⋅ r v © 2001 by CRC Press LLC 7.3 Components, Direction Cosines, and Projections 7.3.1 Components The components of a vector are the values of each elemenrt in the deﬁning n-tuplet representation. For example, consider the vector u = [1 5 3 7] in real 4-D. We say that its ﬁrst, second, third, and fourth components are 1, 5, 3, and 7, respectively. (We are maintaining, in this section, the arrow notation for the vectors, irrespective of the dimension of the space.) The simplest basis of a n-dimensional vector space is the collection of n unit vectors, each having only one of their components that is non-zero and such that the location of this non-zero element is different for each of these basis vectors. This basis is not unique. For example, in 4-D space, the canonical four-unit orthonormal basis vectors are given, respectively, by: ê1 = [1 0 0 0] (7.15) ê2 = [0 1 0 0] (7.16) ê3 = [0 0 1 0] (7.17) ê4 = [0 0 0 1] (7.18) r and the vector u can be written as a linear combination of the basis vectors: r u = u1ê1 + u2ê2 + u3ê3 + u4ê4 (7.19) The basis vectors are chosen to be orthonormal, which means that in addition to requiring each one of them to have unit length, they are also orthogonal two by two to each other. These properties of the basis vectors leads us to the following important result: the mth component of a vector is obtained by taking the dot product of the vector with the corresponding unit vector, that is, r um = êm ⋅ u (7.20) 7.3.2 Direction Cosines The direction cosines are deﬁned by: © 2001 by CRC Press LLC r cos(γ m ) = urm u = êmr⋅ u u (7.21) In 2-D or 3-D, these quantities have ther geometrical interpretation of being the cosine of the angles that the vector u makes with the x, y, and z axes. 7.3.3 Projections r r The projection of a vector u overr a vector a is a vector whose magnitude ris the dot product of the vector u with the unit vector in the direction of a, denoted by êa, and whose orientation is in the direction of êa: r projar (u) = r (u ⋅ ê a )ê a = ur r⋅ r a a r ar a = r u ⋅ r a r a 2 r a (7.22) r r The cormponent of u that is perrpendicurlar to a is obtained by subtracting from u the projection vector of u over a. MATLAB Example r r Assume that we have the vector u = ê1 + 5ê2 + 3ê3 + 7ê4 and the vector a = 2ê1 + 3ê2 + ê3r + 4ê4. Wr e desire to obtain the comrponents of each rvector, the projection of u over a , and the component of u orthogonal to a . Type, execute, and interpret at each step, each of the following commands using the above deﬁnitions: u=[1 5 3 7] a=[2 3 1 4] u(1) a(2) prjuovera=((u*a')/(norm(a)^2))*a orthoutoa=u-prjuovera prjuovera*orthoutoa' The last command should give you an anrswer thrat is zero, up to machinre round-up errorrs because the projection of u over a and the component of u orthogonal to a are perpendicular. 7.4 The Dirac Notation and Some General Theorems* Thus far, we have established some key practical results in real ﬁnite dimensional vector spaces; namely: © 2001 by CRC Press LLC 1. A vector can be decomposed into a linear combination of the basis vectors. 2. The dot product of two vectors can be written as the multiplication of a row vector by a column vector, each of whose elements are the components of the respective vectors. 3. The norm of a vector, a non-negative quantity, is the square root of the dot product of the vector with itself. 4. The unit vector parallel to a speciﬁc vector is that vector divided by its norm. 5. The projection of a vector on another can be deduced from the dot product of the two vectors. To facilitate the statement of these results in a notation that will be suitable for inﬁnite-dimensional vector spaces (which is very brieﬂy introduced in Section 7.7), Dirac in his elegant formulation of quantum mechanics introduced a simple notation that we now present. The Dirac notation represents the row vector by what he called the “bravector” and the column vector by what he called the “ket-vector,” such that when a dot product is obtained by joining the two vectors, the result will be the scalar “bra-ket” quantity. Speciﬁcally: Column vector r u ⇒ u (7.23) Row vector r v ⇒ v (7.24) Dot product r v ⋅ r u ⇒ vu (7.25) The orthonormality of the basis vectors is written as: m n = δm,n (7.26) where the basis vectors are referred to by their indices, and where δm,n is the Kroenecker delta, equal to 1 when its indices are equal, and zero otherwise. The norm of a vector, a non-negative quantity, is given by: (norm of u )2 = u 2 = u u (7.27) The Decomposition rule is written as: ∑ u = cn n n (7.28) where the components are obtained by multiplying Eq. (7.28) on the left by m . Using Eq. (7.26), we deduce: © 2001 by CRC Press LLC ∑ ∑ m u = cn m n = cnδm,n = cm n n (7.29) Next, using the Dirac notation, we present the proofs of two key theorems of vector algebra: the Cauchy-Schwartz inequality and the triangle inequality. 7.4.1 Cauchy-Schwartz Inequality Let u and v be any non-zero vectors; then: uv 2≤ uu vv PROOF Let ε = ±1, (ε2 = 1); then (7.30) u v =ε u v such that ε = 1 if ε = −1 if u v ≥0 u v ≤0 (7.31) Now, consider the ket εu + tv ; its norm is always non-negative. Computing this norm square, we obtain: εu + tv εu + tv = ε2 u u + εt u v + tε v u + t2 v v = u u + 2εt u v + t2 v v = u u + 2t u v + t2 v v (7.32) The RHS of this quantity is a positive quadratic polynomial in t, and can be written in the standard form: at2 + bt + c ≥ 0 (7.33) The non-negativity of this quadratic polynomial means that it can have at most one real root. This means that the descriminant must satisfy the inequality: b2 – 4ac ≤ 0 (7.34) Replacing a, b, c by their values from Eq. (7.32), we obtain: 4 u v 2−4 u u v v ≤0 (7.35) ⇒ uv 2≤ uu vv (7.36) © 2001 by CRC Press LLC which is the desired result. Note that the equality holds if and only if the two vectors are linearly dependent (i.e., one vector is equal to a scalar multiplied by the other vector). Example 7.7 Show that for any three non-zero numbers, u1, u2, and u3, the following inequality always holds: ( ) 9 ≤ u1 + u2 + u3 1 u1 + 1 u2 + 1 u3 PROOF Choose the vectors v and w such that: (7.37) v = u11/2 , u21/2 , u31/2 (7.38) then: w = 1 u1 1/2 , 1 u2 1/2 , 1 1/2 u3 v w =3 (7.39) (7.40) v v = (u1 + u2 + u3 ) (7.41) w w = 1 u1 +1 u2 + 1 u3 (7.42) Applying the Cauchy-Schwartz inequality in Eq. (7.36) establishes the desired result. The above inequality can be trivially generalized to n-elements, which leads to the following important result for the equivalent resistance for resistors all in series or all in parallel. Application The equivalent resistance of n-resistors all in series and the equivalent resistance of the same n-resistors all in parallel obey the relation: n2 ≤ Rseries Rparallel (7.43) © 2001 by CRC Press LLC PROOF The proof is straightforward. Using Eq. (7.37) and recalling Ohm’s law for n resistors {R1, R2, …, Rn}, the equivalent resistances for this combination, when all resistors are in series or are all in parallel, are given respectively by: Rseries = R1 + R2 + … + Rn (7.44) and 1 = 1 + 1 +…+ 1 Rparallel R1 R2 Rn (7.45) Question: Can you derive a similar theorem for capacitors all in series and all in parallel? (Remember that the equivalent capacitance law is different for capacitors than for resistors.) 7.4.2 Triangle Inequality This is, as the name implies, a generalization of a theorem from Euclidean geometry in 2-D that states that the length of one side of a triangle is smaller or equal to the sum of the the other two sides. Its generalization is u+v ≤ u + v (7.46) PROOF Using the relation between the norm and the dot product, we have: u+v 2 = u+v u+v = u v +2 u v + v v = u2 +2 u v + v 2 ≤ u2 +2 u v + v 2 (7.47) Using the Cauchy-Schwartz inequality for the dot product appearing in the previous inequality, we deduce that: ( ) u + v 2 ≤ u 2 + 2 u v + v 2 = u + v 2 (7.48) which establishes the theorem. Homework Problems Pb. 7.5 Using the Dirac notation, generalize to n-dimensions the 2-D geometry Parallelogram theorem, which states that: The sum of the squares of the diagonals of a parallelogram is equal to twice the sum of the squares of the side; or that: © 2001 by CRC Press LLC r u + r v 2 + r u − r v 2 = r 2u 2 +2 r v 2 Pb. 7.6 Referring to the inequality of Eq. (7.43), which relates the equivalent resistances of n-resistors in series and in parallel, under what conditions does the equality hold? 7.5 Cross Product and Scalar Triple Product* In this section and in Sections 7.6 and 7.7, we restrict our discussions to vectors in a 3-D space, and use the more familiar conventional vector notation. 7.5.1 Cross Product r r DEFINITION If two vectors are givern bry u = (u1, u2 , u3 ) and v = (v1, v2 , v3 ) then their cross product, denoted by u × v, is a vector given by: rr u × v = (u2v3 − u3v2 , u3v1 − u1v3 , u1v2 − u2v1) (7.49) By simple substitution, we can infer the following properties for the cross product as summarized in the preparatory exercises below. Preparatory Exercises Pb. 7.7 Show, using the above deﬁnition for the cross product, that: a. r u ⋅ r (u × r v) = r v ⋅ r (u × r v) = 0 ⇒ r u × r v is orthogonal to both r u and r v b. r u × r v 2 = r u 2 r v 2 − r (u ⋅ vr)2 Called the Lagrange Identity c. r u × r v = −(vr × r u) Noncommutativity d. r u × r (v + r w) = r u × r v + r u × r w Distributive property e. r k(u × r v) = r (ku) × r v = r u × r (kv) f. r u × r 0 = r 0 g. r u × r u = r 0 Pb. 7.8 Verify the following relations for the basis unit vectors: © 2001 by CRC Press LLC ê1 × ê2 = ê3 ; ê2 × ê3 = ê1; ê3 × ê1 = ê2 Pb. 7.9 Ask your instructor to show you how the Right Hand rule is used to determine the direction of a vector equal to the cross product of two other vectors. 7.5.2 Geometric Interpretation of the Cross Product As noted in Pb. 7.7a, the cross product is a vector that is perpendicular to its two constituents. This determines the resultant vector’s direction. To deterrmanidnevritsismθ,atghneintu: de, consider the Lagrange Identity. If the angle between u r u × r v 2 = r u 2 r v 2 − r u 2 r v 2 cos2 (θ) (7.50) and r u × r v = r u r v sin(θ) (7.51) that is, the magnitude of the cross product of two vectors is the area of the parallelogram formed by these vectors. 7.5.3 Scalar Triple Product DEFINITION If rr r u, v, andr wr are vrectors in 3-D, then r u ⋅ (vr × wr ) is called the scalar triple product of u, v, and w. PROPERTY r u ⋅ r (v × r w) = r v ⋅ r (w × r u) = r w ⋅ r (u × r v) (7.52) This property can be trivially proven by writing out the components expansions of the three quantities. 7.5.3.1 Geometric Interpretation of the Scalar Triple Product rr r If the vectors’ u, v, and w original points are brought to the same origin, these three vectors deﬁne a parallelepiped. The absolute value of the scalar triple pshroodwunctecaarlnietrhethnabtevrin×tewrrpirseatevdeacstothr ethvaotluismpeeorpf ethnidsipcuarlaarllteolebpoipthedvr. Waendhavwre, © 2001 by CRC Press LLC and whose magnitude is the area of the base parallerlogram. From the deﬁnition of the scalar product, dotting this vector with u will give a scalar that is the product of the area of the parallelepiped base multiplied by the parallelepiped height, whose magnitude is exactly the volume of the parallelepiped. The circular permutation property of Eq. (7.52) then has a very simple geometric interpretation: in computing the volume of a parallelepiped, it does not matter which surface we call base. MATLAB Representation The cross product of the vectors r u = (u1 , u2 , u3 ) and r v = (v1 , v2 , v3 ) is found using the cross(u,v) command. rr r The triple scalar product of the vectors u, v, and w is found through the det([u;v;w]) command. Make sure that the vectors deﬁned as arguments of these functions are deﬁned as 3-D vectors, so that the commands work and the results make sense. Example 7.8 r r r Given the vectors u = (2, 1, 0), v = (0, 3, 0), w = (1, 2, 3), ﬁnd the cross prod- uct of the separate pairs of these vectors, and the volume of the parallelepi- ped formed by the three vectors. Solution: Type, execute, and interpret at each step, each of the following commands, using the above deﬁnitions: u=[2 1 0] v=[0 3 0] w=[1 2 3] ucrossv=cross(u,v) ucrossw=cross(u,w) vcrossw=cross(v,w) paralvol=abs(det([u;v;w])) paralvol2=abs(cross(u,v)*w') Question: Verify that the last command is an alternate way of writing the volume of the parallelepiped expression. In-Class Exercises Pb. 7.10 Compute the shortest distance from New York to London. (Hint: (1) A great circle is the shortest path between two points on a sphere; (2) the angle between the radial unit vectors passing through each of the cities can be obtained from their respective latitude and longitude.) © 2001 by CRC Press LLC Pb. 7.11 Find two unit vectors that are orthogonal to both vectors given by: r a = (2, −1, 2) and r b = (1, 2, −3) Pb. 7.12 Find the area of the triangle with vertices at the points: A(0, −1, 1), B(3, 1, 0), and C(−2, 0, 2) Pb. 7.13 Find the volume of the parallelepiped formed by the three vectors: r u = (1, 2, 0), r v = (0, 3, 0), r w = (1, 2, 3) Pb. 7.14 Determine the equation of a plane that passes through the point (1, 1, 1) and is normal to the vector (2, 1, 2). Pb. 7.15 Find the angle of intersection of the planes: x + y − z = 0 and x − 3y + z − 1 = 0 Pb. 7.16 Find the distance between the point (3, 1, –2) and the plane z = 2x – 3y. Pb. 7.17 Find the equation of the line that contains the point (3, 2, 1) and is perpendicular to the plane x + 2y – 2z = 2. Write the parametric equation for this line. Pb. 7.18 Find the point of intersection of the plane 2x – 3y + z = 6 and the line x−1 = y+1 = z−2 3 1 2 Pb. 7.19 Show that the points (1, 5), (3, 11), and (5, 17) are collinear. rr r Pb. 7.20 Show that the three vectors u, v, and w are coplanar: r u = (2, 3, 5); r v = (2, 8, 1); r w = (8, 22, 12) Pb. 7.21 Find the unit vector normal to the plane determined by the points (0, 0, 1), (0, 1, 0), and (1, 0, 0). Homework Problem Pb. 7.22 Determine the tetrahedron with the largest surface area whose vertices P0, P1, P2, and P3 are on the unit sphere x2 + y2 + z2 = 1. © 2001 by CRC Press LLC (Hints: (1) Designate the point P0 as north pole and conﬁne P1 to the zero meridian. With this choice, the coordinates of the vertices are given by: P0 = (θ0 = π / 2, φ0 = 0) P1 = (θ1 , φ1 = 0) P2 = (θ2 , φ2 ) P3 = (θ3 , φ3 ) (2) From symmetry, the optimal tetrahedron will have a base (P1, P2, P3) that is an equilateral triangle in a plane parallel to the equatorial plane. The latitude of (P1, P2, P3) is θ, while their longitudes are (0, 2π/3, –2π/3), respectively. (3) The area of the tetrahedron is the sum of the areas of the four triangles (012), (023), (031), (123), where we are indicating each point by its subscript. (4) Express the area as function of θ. Find the value of θ that maximizes this quantity.) 7.6 Vector Valued Functions As you may recall, in Chapter 1 we described curves in 2-D and 3-D by parametric equations. Essentially, we gave each of the coordinates as a function of a parameter. In effect, we generated a vector valued function because the position of the point describing the curve can be written as: r R(t) = x(t)ê1 + y(t)ê2 + z(t)ê3 (7.53) r If the parameter t was chosen to be time, then the tip of the vector R(t) would be the porsition of a point on that curve as a function of time. In mechanics, ﬁnding R(t) is ultimately the goal of any problem in the dynamics of a point particle. In many problems of electrical engineering design of tubes and other microwave engineering devices, we need to determine the position of electrons whose motion we control by a variety of electrical and magnetic ﬁelds geometries. The following are the kinematics variables of the problem. The dynamics form the subject of mechanics courses. To helrp visualize the shape of a curve generated by the tip of the position vector R(t), we introduce the tangent vector and the normal vector to the curve and the curvature of the curve. The velocity vector ﬁeld associated with the above position vector is deﬁned through: © 2001 by CRC Press LLC r dR(t) dt = dx(t) dt ê1 + dy(t) dt ê2 + dz(t) dt ê3 and the unit vector tangent to the curve is given by: r dR(t) Tˆ(t) = drt dR(t) dt (7.54) (7.55) This is, of course, the unit vector that is always in the direction of the velocity of the particle. LEMMA If a vector valued function r V(t) has a constant value, then its derivative r dV(t) is orthogonal to it. dt PROOF The proof of this lemma is straightforward. If the length of the vector rr is constant, then its dot product with itself is a constant; that is, V(t) ⋅ V(t) = C. Differentiating both sides of this equation gives r dV(t) ⋅ r V(t) = 0, and the dt orthogonality between the two vectors is thus established. The tangential unit vector Tˆ(t) is, by deﬁnition, constructed to have unit length. We construct the norm to the curve by taking the unit vector in the direction of the time-derivative of the tangential vector; that is, dTˆ (t) Nˆ (t) = dt dTˆ (t) dt (7.56) The curvature of the curve is dTˆ (t) dt κ= r dR(t) dt (7.57) © 2001 by CRC Press LLC Example 7.9 Find the tangent, normal, and curvature of the trajectory of a particle moving in uniform circular motion of radius a and with angular frequency ω. Solution: The parametric equation of motion is r R(t) = a cos(ωt)ê1 + a sin(ωt)ê2 The velocity vector is r dR(t) dt = −aω sin(ωt)ê1 + aω cos(ωt)ê2 and its magnitude is aω. The tangent vector is therefore: The normal vector is Tˆ(t) = − sin(ωt)ê1 + cos(ωt)ê2 Nˆ (t) = − cos(ωt)ê1 − sin(ωt)ê2 The radius of curvature is dTˆ (t) κ(t) = dt r = −ω cos(ωt)ê1 − ω sin(ωt)ê2 = 1 = constant dR(t) −aω sin(ωt)ê1 + aω cos(ωt)ê2 a dt Homework Problems Pb. 7.23 Show that in 2-D the radius of curvature can be written as: x′y′′ − y′x′′ κ = ((x′)2 + (y′)2 )3/2 where the prime refers to the ﬁrst derivative with respect to time, and the double-prime refers to the second derivative with respect to time. © 2001 by CRC Press LLC Pb. 7.24 Using the parametric equations for an ellipse given in Example 1.13, ﬁnd the curvature of the ellipse as function of t. a. At what points is the curvature a minimum, and at what points is it a maximum? b. What does the velocity do at the points of minimum and maximum curvature? c. On what dates of the year does the planet Earth pass through these points on its trajectory around the sun? 7.7 Line Integral Ar s you may have already learned in an elementary physics courser: if a force F is applied to a particle that moves by an inﬁnitesimal distance ∆l , then the inﬁnitesimal work done by the force on the particle is the scalar product of the force by the displacement; that is, rr ∆W = F ⋅ ∆l (7.58) Now, to calculate the work done when the particle moves along a curve C, located in a plane, we need to deﬁne the concept of a line integral. Suppose that the curve is described parametrically [i.e., x(t) and y(t) are given]. Furthermore, suppose that the vector ﬁeld representing the force is given by: r F = P(x, y)êx + Q(x, y)êy (7.59) The displacement element is given by: ∆l = ∆xêx + ∆yêy (7.60) The inﬁnitesimal element of work, which is the dot product of the above two quantities, can then be written as: ∆W = P∆x + Q∆y (7.61) This expression can be simpliﬁed if the curve is written in parametric form. Assuming the parameter is t, then ∆W can be written as a function of the single parameter t: © 2001 by CRC Press LLC ∆W = P(t) dx dt ∆t + Q(t) dy dt ∆t = P(t) dx dt + Q(t) dy dt ∆t (7.62) and the total work can be written as an integral over the single variable t: ∫ W = t1 t0 P(t) dx dt + Q(t) dy dt dt (7.63) Homework Problems Pb. 7.25 How much work is done in moving thre particle from the point (0, 0) to the point (3, 9) in the presence of the force F along the following two different paths? a. The parabola y = x2. b. The line y = 3x. The force is given by: r F = xyêx + (x2 + y2 )êy r Pb. 7.26 Let F = yêx + xêy. Calculate the work moving from (0, 0) to (1, 1) along each of the following curves: a. The straight line y = x. b. The parabola y = x2. c. The curve C described by the parametric equations: x(t) = t3/2 and y(t) = t5 A vector ﬁeld such as the present one, whose line integral is independent of the path chosen between ﬁxed initial and ﬁnal points, is said to be conservative. In your vector calculus course, you will establish the necessary and sufﬁcient conditions for a vector ﬁeld to be conservative. The importance of conservative ﬁelds lies in the ability of their derivation from a scalar potential. More about this topic will be discussed in electromagnetic courses. 7.8 Inﬁnite Dimensional Vector Spaces* This chapter section introduces some preliminary ideas on inﬁnite-dimensional vector spaces. We assume that the components of this vector space are © 2001 by CRC Press LLC complex numbers rather than real numbers, as we have restricted ourselves thus far. Using these ideas, we discuss, in a very preliminary fashion, Fourier series and Legendre polynomials. We use the Dirac notation to stress the commonalties that unite the ﬁniteand inﬁnite-dimensional vector spaces. We, at this level, sacriﬁce the mathematical rigor for the sake of simplicity, and even commit a few sins in our treatment of limits. A more formal and rigorous treatment of this subject can be found in many books on functional analysis, to which we refer the interested reader for further details. A Hilbert space is much the same type of mathematical object as the vector spaces that you have been introduced to in the preceding sections of this chapter. Its elements are functions, instead of n-dimensional vectors. It is inﬁnite-dimensional because the function has a value, say a component, at each point in space, and space is continuous with an inﬁnite number of points. The Hilbert space has the following properties: 1. The space is linear under the two conditions that: a. If a is a constant and ϕ is any element in the space, then a ψ is also an element of the space; and b. If a and b are constants, and ϕ and ψ are elements belonging to the space, then a ϕ + b ψ is also an element of the space. 2. There is an inner (dot) product for any two elements in the space. The deﬁnition adopted here for this inner product for functions deﬁned in the interval tmin ≤ t ≤ tmax is: ∫ tmax ψ ϕ = ψ(t)ϕ(t)dt tmin (7.64) 3. Any element of the space has a norm (“length”) that is positive and related to the inner product as follows: ∫ ϕ 2 = ϕ ϕ = tmax ϕ(t)ϕ(t)dt tmin (7.65) Note that the requirement for the positivity of a norm is that which necessitated the complex conjugation in the deﬁnition of the bra-vector. 4. The Hilbert space is complete; or loosely speaking, the Hilbert space contains all its limit points. This condition is too technical and will not be further discussed here. In this Hilbert space, we deﬁne similar concepts to those in ﬁnite-dimensional vector spaces: © 2001 by CRC Press LLC • Orthogonality. Two vectors are orthogonal if: ∫ tmax ψ ϕ = ψ(t)ϕ(t)dt = 0 tmin (7.66) • Basis vectors. Any function in Hilbert space can be expanded in a linear combination of the basis vectors {un}, such that: ∑ ϕ = cn un n (7.67) and such that the elements of the basis vectors obey the orthonormality relations: um un = δm,n (7.68) • Decomposition rule. To ﬁnd the cn’s, we follow the same procedure adopted for ﬁnite-dimensional vector spaces; that is, take the inner product of the expansion in Eq. (7.67) with the bra um . We obtain, using the orthonormality relations [Eq. (7.68)], the following: ∑ ∑ um ϕ = cn um un = cnδm,n = cm n n (7.69) Said differently, cm is the projection of the ket ϕ onto the bra um . • The norm as a function of the components. The norm of a vector can be expressed as a function of its components. Using Eqs. (7.67) and (7.68), we obtain: ∑∑ ∑∑ ∑ ϕ 2 = ϕ ϕ = cncm un um = cncmδn,m = 2 cn (7.70) nm nm n Said differently, the norm square of a vector is equal to the sum of the magnitude square of the components. Application 1: The Fourier Series The theory of Fourier series, as covered in your calculus course, states that a function that is periodic, with period equal to 1, in some normalized units can be expanded as a linear combination of the sequence {exp(j2πnt)}, where n is an integer that goes from minus inﬁnity to plus inﬁnity. The purpose here is to recast the familiar Fourier series results within the language and notations of the above formalism. © 2001 by CRC Press LLC Basis: un = exp( j2πnt) and un = exp(− j2πnt) Orthonormality of the basis vectors: (7.71) ∫ um un = 1/2 exp(− j2πmt) exp( j2πnt)dt = 1 −1/2 0 if if m=n m≠n Decomposition rule: where ∞ ∞ ∑ ∑ ϕ = cn un = cn exp( j2πnt) n=−∞ n=−∞ 1/2 ∫ cn = un ϕ = exp(− j2πnt)ϕ(t)dt −1/2 Parseval’s identity: (7.72) (7.73) (7.74) ∫ ∫ ∑ ϕ 2 = ϕ ϕ = 1/2 ϕ(t)ϕ(t)dt = 1/2 ϕ(t) 2 dt = ∞ cn 2 −1/2 −1/2 n=−∞ (7.75) Example 7.9 Derive the analytic expression for the potential difference across the capacitor in the RLC circuit of Figure 4.5 if the temporal proﬁle of the source potential is a periodic function, of period 1, in some normalized units. Solution: 1. Because the potential is periodic with period 1, it can be expanded using Eq. (7.73) in a Fourier series with basis functions {ej2πnt}: ∑ Vs (t) = Re n V˜sn e j 2 πnt (7.76) where V˜sn is the phasor associated with the frequency mode (2πn). (Note that n in the expressions for the phasors is a superscript and not a power.) © 2001 by CRC Press LLC 2. We ﬁnd V˜cn, the capacitor response phasor associated with the V˜sn excitation. This can be found by noting that the voltage across the capacitor is equal to the capacitor impedance multiplied by the current phasor, giving: V˜cn = Zcn I˜ n = Zcn ZcnV˜sn + ZRn + ZLn (7.77) where from the results of Section 6.8, particularly Eqs. (6.83) through (6.85), we have: Zcn = 1 j2πnC (7.78) ZLn = j2πnL (7.79) ZRn = R (7.80) 3. Finally, we use the linearity of the ODE system and write the solution as the linear superposition of the solutions corresponding to the response to each of the basis functions; that is, ∑ Vc (t) = Re n Zcn ZcnV˜sn + ZRn + ZLn e j 2 πnt leading to the expression: (7.81) ∑ Vc (t) = Re n 1− V˜sn (2πn)2 LC + j(2πn)RC e j2πnt (7.82) Homework Problem Pb. 7.27 Consider the RLC circuit. Assuming the same notation as in Section 6.5.3, but now assume that the source potential is given by: Vs = V0 cos6 (ωt) a. Find analytically the potential difference across the capacitance. (Hint: Write the power of the trigonometric function as function of the different multiples of the angle.) © 2001 by CRC Press LLC b. Find numerically the steady-state solution to this problem using the techniques of Chapter 4, and assume for some normalized units the following values for the parameters: LC = 1, RC = 1, ω = 2π c. Compare your numerical results with the analytical results. Application 2: The Legendre Polynomials We propose to show that the Legendre polynomials are an orthonormal basis for all functions of compact support over the interval –1 ≤ x ≤ 1. Thus far, we have encountered the Legendre polynomials twice before. They were deﬁned through their recursion relations in Pb. 2.25, and in Section 4.7.1 through their deﬁning ODE. In this application, we deﬁne the Legendre polynomials through their generating function; show how their deﬁnitions through their recursion relation, or through their ODE, can be deduced from their deﬁnition through their generating function; and show that they constitute an orthonormal basis for functions deﬁned on the interval –1 ≤ x ≤ 1. 1. The generating function for the Legendre polynomials is given by the simple form: ∑ G(x, t) = 1 1 − 2xt + t2 = ∞ l=0 Pl (x)tl (7.83) 2. The lowest orders of Pl(x) can be obtained from the small t-expansion of G(x, t); therefore, expanding Eq. (7.83) to ﬁrst order in t gives: 1 + xt + Ο(t2 ) = P0(x) + tP1(x) + Ο(t2 ) from which, we can deduce that: (7.84) P0(x) = 1 (7.85) P1(x) = x (7.86) 3. By inspection, it is straightforward to verify by substitution that the generating function satisﬁes the equation: (1 − 2xt + t2 ) ∂G + (t − x)G = 0 ∂t (7.87) © 2001 by CRC Press LLC Because power series can be differentiated term by term, Eq. (7.87) gives: ∞ ∞ ∑ ∑ (1 − 2xt + t2 ) lPl(x)tl−1 + (t − x) Pl(x)tl = 0 l=0 l=0 (7.88) Since this equation should hold true for all values of t, this means that all coefﬁcients of any power of t should be zero; therefore: (l + 1)Pl (x) − 2lxPl (x) + (l − 1)Pl−1(x) + Pl−1(x) − xPl (x) = 0 or collecting terms, this can be written as: (7.89) (l + 1)Pl (x) − (2l + 1)xPl (x) + lPl−1(x) = 0 (7.90) This is the recursion relation of Pb. 2.25. 4. By substitution in the explicit expression of the generating function, we can also verify that: (1 − 2xt + t2 ) ∂G − tG = 0 ∂x which leads to: (7.91) ∑ ∑ ∞ (1 − 2xt + t2 ) dPl (x) − dx ∞ Pl (x)tl+1 = 0 l=0 l=0 (7.92) Again, looking at the coefﬁcients of the same power of t permits us to obtain another recursion relation: dPl+1(x) dx − 2x dPl (x) dx + dPl−1(x) dx − Pl (x) = 0 (7.93) Differentiating Eq. (7.90), we ﬁrst eliminate dPl−1(x) and then dx dPl(x) from the resulting equation, and use Eq. (7.93) to obtain dx two new recursion relations: © 2001 by CRC Press LLC dPl+1(x) dx − x dPl (x) dx = (l + 1)Pl (x) and (7.94) x dPl (x) dx − dPl−1(x) dx = lPl (x) (7.95) Adding Eqs. (7.94) and (7.95), we obtain the more symmetric formula: dPl+1(x) dx − dPl−1(x) dx = (2l + 1)Pl (x) (7.96) Replacing l by l – 1 in Eq. (7.94) and eliminating Pl′−1(x) from Eq. (7.95), we ﬁnd that: (1 − x2) dPl (x) dx = l Pl−1 (x) − lxPl (x) Differentiating Eq. (7.97) and using Eq. (7.95), we obtain: (7.97) d dx (1 − x2 ) dPl (x) dx + l(l + 1)Pl (x) = 0 (7.98a) which can be written in the equivalent form: (1 − x2 ) d2Pl (x) dx 2 − 2x dPl (x) dx + l(l + 1)Pl (x) = 0 (7.98b) which is the ODE for the Legendre polynomial, as previously pointed out in Section 4.7.1. 5. Next, we want to show that if l ≠ m, we have the orthogonality between any two elements (with different indices) of the basis; that is 1 ∫ Pl(x)Pm(x)dx = 0 −1 (7.99) To show this relation, we multiply Eq. (7.98) on the left by Pm(x) and integrate to obtain: © 2001 by CRC Press LLC ∫1 −1 Pm (x) d dx (1 − x2 ) dPl (x) dx + l(l + 1)Pl (x)dx = 0 Integrating the ﬁrst term by parts, we obtain: (7.100) ∫1 (x2 −1 − 1) dPm (x) dx dPl (x) dx + l(l + 1)Pm (x)Pl (x)dx = 0 (7.101) Similarly, we can write the ODE for Pm(x), and multiply on the left by Pl(x); this results in the equation: ∫1 (x2 −1 − 1) dPl (x) dx dPm (x) dx + m(m + 1)Pl (x)Pm(x)dx = 0 Now, subtracting Eq. (7.102) from Eq. (7.101), we obtain: (7.102) 1 ∫ [m(m + 1) − l(l + 1)] Pl(x)Pm(x)dx = 0 −1 (7.103) But because l ≠ m, this can only be satisﬁed if the integral is zero, which is the result that we are after. 6. Finally, we compute the normalization of the basis functions; that is, compute: 1 ∫ Pl (x)Pl (x)dx = N 2 l −1 From Eq. (7.90), we can write: (7.104) Pl (x) − (2l − 1)xPl−1(x) + (l − 1)Pl−2 (x) = 0 (7.105) If we multiply this equation by (2l + 1)Pl(x) and subtract from it Eq. (7.90), which we multiplied by (2l + 1)Pl–1(x), we obtain: l(2l + 1)Pl2 (x) + (2l − 1)(l − 1)Pl−1(x)Pl−2 (x) − (l + 1)(2l − 1)Pl−1(x)Pl+1(x) − l(2l − 1)Pl−21(x) = 0 (7.106) Now integrate over the interval [–1, 1] and using Eq. (7.103), we obtain, for l = 2, 3, …: © 2001 by CRC Press LLC ∫ ∫ 1 Pl2 (x)dx −1 = (2l (2l − + 1) 1) 1 Pl−21(x)dx −1 (7.107) Repeated applications of this formula and the use of Eq. (7.86) yields: ∫ ∫ 1 Pl2 (x)dx −1 = 3 (2l + 1) 1 P12 (x)dx −1 = 2 (2l + 1) (7.108) Direct calculations show that this is also valid for l = 0 and l = 1. Therefore, the orthonormal basis functions are given by: ul = l+ 1 2 Pl (x) (7.109) The general theorem that summarizes the decomposition of a function into the Legendre polynomials basis states: THEOREM If the real function f(x) deﬁned over the interval [–1, 1] is piecewise smooth and if the 1 ∫ integral f 2(x)dx < ∞, then the series: −1 where ∞ ∑ f(x) = clPl(x) l=0 (7.110) ∫ cl = l + 1 2 1 f (x)Pl (x)dx −1 converges to f(x) at every continuity point of the function. The proof of this theorem is not given here. (7.111) Example 7.10 Find the decomposition into Legendre polynomials of the following function: f (x) = 0 1 for − 1 ≤ x ≤ a for a < x ≤ 1 (7.112) © 2001 by CRC Press LLC Solution: The conditions for the above theorem are satisﬁed, and ∫ cl = l + 1 2 1 Pl (x)dx a From Eq. (7.96), and noting that Pl(1) = 1, we ﬁnd that: c0 = 1 2 (1 − a) and (7.113) (7.114) cl = − 1 2 [Pl+1 (a) − Pl−1(a)] (7.115) We show in Figure 7.4 the sum of the truncated decomposition for Example 7.10 for different values of lmax. FIGURE 7.4 The plot of the truncated Legendre polynomials expansion of the discontinuous function given by Eq. (7.112), for a = 0.25. Top panel: lmax = 4. Middle panel: lmax = 8. Bottom panel: lmax = 16. © 2001 by CRC Press LLC 7.9 MATLAB Commands Review ' Transposition (i.e., for vectors with real components, this changes a row into a column). norm Computes the Euclidean length of a vector. cross Calculates the cross product of two 3-D vectors. det Determinant; used here to compute the triple scalar product. © 2001 by CRC Press LLC 8 Matrices 8.1 Setting up Matrices DEFINITION A matrix is a collection of numbers arranged in a two-dimensional (2-D) array structure. Each element of the matrix, call it Mi,j, occupies the ith row and jth column. M11 M = M21 M Mm1 M12 M22 M Mm2 M13 M23 M Mm3 L M1n L M2n O M L Mmn (8.1) We say that M is an (m ⊗ n) matrix, which means that it has m rows and n columns. If m = n, we call the matrix square. If m = 1, the matrix is a row vector; and if n = 1, the matrix is a column vector. 8.1.1 Creating Matrices in MATLAB 8.1.1.1 Entering the Elements In this method, the different elements of the matrix are keyed in; for example: M=[1 3 5 7 11; 13 17 19 23 29; 31 37 41 47 53] gives M= 1 3 5 7 11 13 17 19 23 29 31 37 41 47 53 0-8493-????-?/00/$0.00+$.50 ©© 22000010 bbyy CCRRCC PPrreessss LLLLCC To ﬁnd the size of the matrix (i.e., the number of rows and columns), enter: size(M) gives ans = 35 To view a particular element, for example, the (2, 4) element, enter: M(2,4) gives ans = 23 To view a particular row such as the 3rd row, enter: M(3,:) gives ans = 31 37 41 47 53 To view a particular column such as the 4th column, enter: M(:,4) gives ans = 7 23 47 If we wanted to construct a submatrix of the original matrix, for example, one that includes the block from the 2nd to 3rd row (included) and from the 2nd column to the 4th column (included), enter: M(2:3,2:4) © 2001 by CRC Press LLC gives ans = 17 19 23 37 41 47 8.1.1.2 Retrieving Special Matrices from the MATLAB Library MATLAB has some commonly used specialized matrices in its library that can be called as needed. For example: • The matrix of size (m ⊗ n) with all elements being zero is M=zeros(m,n); For example: M=zeros(3,4) gives M= 0000 0000 0000 • The matrix of size (m ⊗ n) with all elements equal to 1 is N=ones(m,n): For example: N=ones(4,3) produces N= 111 111 111 111 • The matrix of size (n ⊗ n) with only the diagonal elements equal to one, otherwise zero, is P=eye(n,n): For example: © 2001 by CRC Press LLC P=eye(4,4) gives P= 1000 0100 0010 0001 • The matrix of size (n ⊗ n) with elements randomly chosen from the interval [0, 1], such as: Q=rand(4,4) gives, in one instance: Q= 0.9708 0.9901 0.7889 0.4387 0.4983 0.2140 0.6435 0.3200 0.9601 0.7266 0.4120 0.7446 0.2679 0.4399 0.9334 0.6833 • We can select to extract the upper triangular part of the Q matrix, but assign to all the lower triangle elements the value zero: upQ=triu(Q) produces upQ = 0.9708 0 0 0 0.4983 0.2140 0 0 0.9601 0.7266 0.4120 0 0.2679 0.4399 0.9334 0.6833 or extract the lower triangular part of the Q matrix, but assign to all the upper triangle elements the value zero: loQ=tril(Q) produces loQ = © 2001 by CRC Press LLC 0.9708 0.9901 0.7889 0.4387 0 0.2140 0.6435 0.3200 0 0 0.4120 0.7446 0 0 0 0.6833 • The single quotation mark (‘) after the name of a matrix changes the matrix rows into becoming its columns, and vice versa, if the elements are all real. If the matrix has complex numbers as elements, it also takes their complex conjugate in addition to the transposition. • Other specialized matrices, including the whole family of sparse matrices, are also included in the MATLAB library. You can ﬁnd more information about them in the help documentation. 8.1.1.3 Functional Construction of Matrices The third method for generating matrices is to give, if it exists, an algorithm that generates each element of the matrix. For example, suppose we want to generate the Hilbert matrix of size (n ⊗ n), where n = 4 and the functional form of the elements are: Mmn = 1 m+ . n The routine for generating this matrix will be as follows: M=zeros(4,4); for m=1:4 for n=1:4 M(m,n)=1/(m+n); end end M • We can also create new matrices by appending known matrices. For example: Let the matrices A and B be given by: A=[1 2 3 4]; B=[5 6 7 8]; We want to expand the matrix A by the matrix B along the horizontal (this is allowed only if both matrices have the same number of rows). Enter: C=[A B] © 2001 by CRC Press LLC gives C= 12345678 Or, we may want to expand A by stacking it on top of B (this is allowed only if both matrices have the same number of columns). Enter: D=[A;B] produces D= 1234 5678 We illustrate the appending operations for larger matrices: deﬁne E as the (2 ⊗ 3) matrix with one for all its elements, and we desire to append it horizontally to D. This is allowed because both have the same number of rows (= 2). Enter: E=ones(2,3) produces E= 111 111 Enter: F=[D E] produces F= 1234111 5678111 Or, we may want to stack two matrices in a vertical conﬁguration. This requires that the two matrices have the same number of columns. Enter: G=ones(2,4) © 2001 by CRC Press LLC gives G= 1111 1111 Enter H=[D;G] produces H= 1234 5678 1111 1111 Finally, the command sum applied to a matrix gives a row in which m-element is the sum of all the elements of the mth column in the original matrix. For example, entering: sum(H) produces ans = 8 10 12 14 8.2 Adding Matrices Adding two matrices is only possible if they have equal numbers of rows and equal numbers of columns; or, said differently, they both have the same size. The addition operation is the obvious one. That is, the (m, n) element of the sum (A+B) is the sum of the (m, n) elements of respectively A and B: Entering (A + B)mn = Amn + Bmn (8.2) © 2001 by CRC Press LLC A=[1 2 3 4]; B=[5 6 7 8]; A+B produces ans = 6 8 10 12 If we had subtraction of two matrices, it would be the same syntax as above but using the minus sign between the matrices. 8.3 Multiplying a Matrix by a Scalar If we multiply a matrix by a number, each element of the matrix is multiplied by that number. Entering: 3*A produces ans = 3 6 9 12 Entering: 3*(A+B) produces ans = 18 24 30 36 8.4 Multiplying Matrices Two matrices A(m ⊗ n) and B(r ⊗ s) can be multiplied only if n = r. The size of the product matrix is (m ⊗ s). An element of the product matrix is obtained from those of the constitutent matrices through the following rule: © 2001 by CRC Press LLC ∑ (AB)kl = AkhBhl h (8.3) This result can be also interpreted by observing that the (k, l) element of the product is the dot product of the k-row of A and the l-column of B. In MATLAB, we denote the product of the matrices A and B by A*B. Example 8.1 Write the different routines for performing the matrix multiplication from the different deﬁnitions of the matrix product. Solution: Edit and execute the following script M-ﬁle: D=[1 2 3; 4 5 6]; E=[3 6 9 12; 4 8 12 16; 5 10 15 20]; F=D*E F1=zeros(2,4); for i=1:2 for j=1:4 for k=1:3 F1(i,j)=F1(i,j)+D(i,k)*E(k,j); end end end F1 F2=zeros(2,4); for i=1:2 for j=1:4 F2(i,j)=D(i,:)*E(:,j); end end F2 The result F is the one obtained using the MATLAB built-in matrix multiplication; the result F1 is that obtained from Eq. (8.3) and F2 is the answer obtained by performing, for each element of the matrix product, the dot product of the appropriate row from the ﬁrst matrix with the appropriate col- © 2001 by CRC Press LLC umn from the second matrix. Of course, all three results should give the same answer, which they do. 8.5 Inverse of a Matrix In this section, we assume that we are dealing with square matrices (n ⊗ n) because these are the only class of matrices for which we can deﬁne an inverse. DEFINITION A matrix M–1 is called the inverse of matrix M if the following conditions are satisﬁed: MM−1 = M−1M = I (8.4) (The identity matrix is the (n ⊗ n) matrix with ones on the diagonal and zero everywhere else; the matrix eye(n,n)in MATLAB.) EXISTENCE The existence of an inverse of a matrix hinges on the condition that the determinant of this matrix is non-zero [det(M) in MATLAB]. We leave the proof of this theorem to future courses in linear algebra. For now, the formula for generating the value of the determinant is given here. • The determinant of a square matrix M, of size (n ⊗ n), is a number equal to: ∑ det(M) = P (−1) M1k1 M2k2 M3k3 … Mnkn P (8.5) where P is the n! permutation of the ﬁrst n-integers. The sign in front of each term is positive if the number of transpositions relating ( ) (1, 2, 3,…, n) and k1, k2 , k3 ,…, kn is even, while the sign is negative otherwise. Example 8.2 Using the deﬁnition for a determinant, as given in Eq. (8.5), ﬁnd the expression for the determinant of a (2 ⊗ 2) and a (3 ⊗ 3) matrix. © 2001 by CRC Press LLC Solution: a. If n = 2, there are only two possibilities for permuting these two numbers, giving the following: (1, 2) and (2, 1). In the ﬁrst permutation, no transposition was necessary; that is, the multiplying factor in Eq. (8.5) is 1. In the second term, one transposition is needed; that is, the multiplying factor in Eq. (8.5) is –1, giving for the determinant the value: ∆ = M11M22 − M12 M21 (8.6) b. If n = 3, there are only six permutations for the sequence (1, 2, 3): namely, (1, 2, 3), (2, 3, 1), and (3, 1, 2), each of which is an even permutation and (3, 2, 1), (2, 1, 3), and (1, 3, 2), which are odd permutations, thereby giving for the determinant the value: ∆ = M11M22 M33 + M12 M23 M31 + M13 M21M32 − (M13 M22 M31 + M12 M21M33 + M11M23 M32 ) (8.7) MATLAB Representation Compute the determinant and the inverse of the matrices M and N, as keyed below: M=[1 3 5; 7 11 13; 17 19 23]; detM=det(M) invM=inv(M) gives detM= -84 invM= -0.0714 -0.3095 -0.1905 -0.7143 -0.7381 -0.2619 -0.6429 -0.3810 -0.1190 On the other hand, entering: N=[2 4 6; 3 5 7; 5 9 13]; detN=det(N) invN=inv(N) © 2001 by CRC Press LLC produces detN = 0 invN Warning: Matrix is close to singular or badly scaled. Homework Problems Pb. 8.1 As earlier deﬁned, a square matrix in which all elements above (below) the diagonal are zeros is called a lower (upper) triangular matrix. Show that the determinant of a triangular n ⊗ n matrix is det(T) = T11T22T33 … Tnn Pb. 8.2 If M is an n ⊗ n matrix and k is a constant, show that: det(kM) = kn det(M) Pb. 8.3 Assuming the following result, which will be proven to you in linear algebra courses: det(MN) = det(M) × det(N) Prove that if the inverse of the matrix M exists, then: det(M−1) = 1 det(M) 8.6 Solving a System of Linear Equations Let us assume that we have a system of n linear equations in n unknowns that we want to solve: M11 x1 + M12 x2 + M13 x3 + … + M1n xn = b1 M21 x1 + M22 x2 + M23 x3 + … + M2n xn = b2 M Mn1 x1 + Mn2 x2 + Mn3 x3 + … + Mnn xn = bn (8.8) © 2001 by CRC Press LLC The above equations can be readily written in matrix notation: M11 M12 M13 L M1n x1 b1 M21 M22 M23 L M2 n x2 b2 M M M O M M=M M M M L M M M Mn1 Mn2 Mn3 L Mnn xn bn (8.9) or MX = B (8.10) where the column of b’ s and x’ s are denoted by B and X. Multiplying, on the left, both sides of this matrix equation by M–1, we ﬁnd that: X = M–1B (8.11) As pointed out previously, remember that the condition for the existence of solutions is a non-zero value for the determinant of M. Example 8.3 Use MATLAB to solve the system of equations given by: x1 + 3x2 + 5x3 = 22 7x1 + 11x2 − 13x3 = −10 17x1 + 19x2 − 23x3 = −14 Solution: Edit and execute the following script M-ﬁle: M=[1 3 5; 7 11 -13; 17 19 -23]; B=[22;-10;-14]; detM=det(M); invM=inv(M); X=inv(M)*B. Verify that the vector X could also have been obtained using the left slash notation: X=M\B. NOTE In this and the immediately preceding chapter sections, we said very little about the algorithm used for computing essentially the inverse of a matrix. This is a subject that will be amply covered in your linear algebra courses. What the interested reader needs to know at this stage is that the © 2001 by CRC Press LLC Gaussian elimination technique (and its different reﬁnements) is essentially the numerical method of choice for the built-in algorithms of numerical softwares, including MATLAB. The following two examples are essential building blocks in such constructions. Example 8.4 Without using the MATLAB inverse command, solve the system of equations: LX = B (8.12) where L is a lower triangular matrix. Solution: In matrix form, the system of equations to be solved is L11 0 L21 L22 0 0 L L 0 x1 0 x2 bb21 M M M O M M=M M M M L M M M Ln1 Ln2 Ln3 L Lnn xn bn (8.13) The solution of this system can be directly obtained if we proceed iteratively. That is, we ﬁnd in the following order: x1, x2, …, xn, obtaining: x1 = b1 L11 x2 = (b2 − L21x1) L22 M k−1 ∑ xk = bk − j=1 Lkj x j Lkk (8.14) The above solution can be implemented by executing the following script M-ﬁle: L=[ ]; b=[ ]; n=length(b); x=zeros(n,1); % enter the L matrix % enter the B column © 2001 by CRC Press LLC x(1)=b(1)/L(1,1); for k=2:n x(k)=(b(k)-L(k,1:k-1)*x(1:k-1))/L(k,k); end x Example 8.5 Solve the system of equations: UX = B, where U is an upper triangular matrix. Solution: The matrix form of the problem becomes: U11 U12 U13 L 0 U22 U23 L U1n U2n x1 x2 b1 b2 M MO M M M = M 0 0 L Un−1 n−1 U n−1 n xn−1 bn−1 0 0 0 L Unn xn bn (8.15) In this case, the solution of this system can also be directly obtained if we proceed iteratively, but this time in the backward order xn, xn–1, …, x1, obtaining: xn = bn Unn xn−1 = (bn−1 − Un−1 n Un−1 n−1 xn ) M n ∑ xk = bk − j=k+1Ukj x j U kk The corresponding script M-ﬁle is (8.16) U=[ ]; b=[ ]; n=length(b); x=zeros(n,1); x(n)=b(n)/U(n,n); for k=n-1:-1:1 % enter the U matrix % enter the B column © 2001 by CRC Press LLC x(k)=(b(k)-U(k,k+1:n)*x(k+1:n))/U(k,k); end x 8.7 Application of Matrix Methods This section provides seven representative applications that illustrate the immense power that matrix formulation and tools can provide to diverse problems of common interest in electrical engineering. 8.7.1 dc Circuit Analysis Example 8.6 Find the voltages and currents for the circuit given in Figure 8.1. V1 5V 50 Ω I1 300 Ω V2 I2 I3 100 Ω V3 Lamp RL FIGURE 8.1 Circuit of Example 8.6. Solution: Using Kirchoff’s current and voltage laws and Ohm’s law, we can write the following equations for the voltages and currents in the circuit, assuming that RL = 2Ω: V1 = 5 V1 − V2 = 50I1 V2 − V3 = 100I2 V2 = 300I3 V3 = 2I2 I1 = I2 + I3 © 2001 by CRC Press LLC NOTE These equations can be greatly simpliﬁed if we use the method of elimination of variables. This is essentially the method of nodes analysis covered in circuit theory courses. At this time, our purpose is to show a direct numerical method for obtaining the solutions. If we form column vector VI, the top three components referring to the voltages V1, V2, V3, and the bottom three components referring to the currents I1, I2, I3, then the following script M-ﬁle provides the solution to the above circuit: M=[1 0 0 0 0 0;1 -1 0 -50 0 0;0 1 -1 0 -100 0;... 0 1 0 0 0 -300;0 0 1 0 -2 0;0 0 0 1 -1 -1]; Vs=[5;0;0;0;0;0]; VI=M\Vs In-Class Exercise Pb. 8.4 Use the same technique as shown in Example 8.6 to solve for the potentials and currents in the circuit given in Figure 8.2. V1 100 Ω V2 50 Ω V3 10 V V4 I1 I2 I3 I4 I5 7V 200 Ω 100 Ω 300 kΩ FIGURE 8.2 Circuit of Pb. 8.4. 8.7.2 dc Circuit Design In design problems, we are usually faced with the reverse problem of the direct analysis problem, such as the one solved in Section 8.7.1. Example 8.7 Find the value of the lamp resistor in Figure 8.1, so that the current ﬂowing through it is given, a priori. Solution: We approach this problem by deﬁning a function ﬁle for the relevant current. In this case, it is © 2001 by CRC Press LLC function ilamp=circuit872(RL) M=[1 0 0 0 0 0;1 -1 0 -50 0 0;0 1 -1 0 -100 0;... 0 1 0 0 0 -300;0 0 1 0 -RL 0;0 0 0 1 -1 -1]; Vs=[5;0;0;0;0;0]; VI=M\Vs; ilamp=VI(5); Then, from the command window, we proceed by calling this function and plotting the current in the lamp as a function of the resistance. Then we graphically read for the value of RL, which gives the desired current value. In-Class Exercise Pb. 8.5 For the circuit of Figure 8.1, ﬁnd RL that gives a 22-mA current in the lamp. (Hint: Plot the current as function of the load resistor.) 8.7.3 ac Circuit Analysis Conceptually, there is no difference between performing an ac steady-state analysis of a circuit with purely resistive elements, as was done in Subsection 8.7.1, and performing the analysis for a circuit that includes capacitors and inductors, if we adopt the tool of impedance introduced in Section 6.8, and we write the circuit equations instead with phasors. The only modiﬁcation from an all-resistors circuit is that matrices now have complex numbers as elements, and the impedances have frequency dependence. For convenience, we illustrate again the relationships of the voltage-current phasors across resistors, inductors, and capacitors: V˜R = I˜R (8.17) V˜L = I˜( jωL) (8.18) V˜C = I˜ ( jωC) (8.19) and restate Kirchoff’s laws again: • Kirchoff’s voltage law: The sum of all voltage drops around a closed loop is balanced by the sum of all voltage sources around the same loop. © 2001 by CRC Press LLC • Kirchoff’s current law: The algebraic sum of all currents entering (exiting) a circuit node must be zero. In-Class Exercise Pb. 8.6 In a bridged-T ﬁlter, the voltage Vs(t) is the input voltage, and the output voltage is that across the load resistor RL. The circuit is given in Figure 8.3. L R1 Vs C R2 RL Vout FIGURE 8.3 Bridged-T ﬁlter. Circuit of Pb. 8.6. Assuming that R1 = R2 = 3 Ω, RL = 2 Ω, C = 0.25 F, and L = 1 H: a. Write the equations for the phasors of the voltages and currents. b. Form the matrix representation for the equations found in part (a). c. Plot the magnitude and phase of V˜out V˜S as a function of the frequency. d. Compare the results obtained in part (c) with the analytical results of the problem, given by: V˜out V˜S = N(ω) D(ω) N(ω) = R2RL (R1 + R2 ) + jωR22 (L + CR1RL ) D(ω) = R2[R1RL + R2RL − ω2LCR1(R2 + RL )] + jω[L(R1R2 + R1RL + R2RL ) + CR1R22RL ] © 2001 by CRC Press LLC 8.7.4 Accuracy of a Truncated Taylor Series In this subsection and subection 8.7.5, we illustrate the use of matrices as a convenient constructional tool to state and manipulate problems with two indices. In this application, we desire to verify the accuracy of the truncated ∑ Taylor series S = N xn as an approximation to the function y = exp(x), over n=0 n! the interval 0 ≤ x < 1. Because this application’s purpose is to illustrate a constructional scheme, we write the code lines as we are proceeding with the different computational steps: 1. We start by dividing the (0, 1) interval into equally spaced segments. This array is given by: x=[0:0.01:1]; M=length(x); 2. Assume that we are truncating the series at the value N = 10: N=10; 3. Construct the matrix W having the following form: 1 x1 1 x2 x12 2! x22 2! x13 3! x23 3! L L x1N N! x2N N! W = 1 x3 x32 2! x33 L x N 3 3! N! M M L M 1 xM M x M2 2! x 3 M 3! O L M x MN N! (8.20) Specify the size of W, and then give the induction rule to go from one column to the next: W(i, j) = x(i) * W(i, j − 1) j−1 (8.21) © 2001 by CRC Press LLC This is implemented in the code as follows: W=ones(M,N); for i=1:M for j=2:N W(i,j)=x(i)*W(i,j-1)/(j-1); end end 4. The value of the truncated series at a speciﬁc point is the sum of the row elements corresponding to its index; however since MATLAB command sum acting on a matrix adds the column elements, we take the sum of the adjoint (the matrix obtained, for real elements, by changing the rows to columns and vice versa) of W to obtain our result. Consequently, add to the code: serexp=sum(W'); 5. Finally, compare the values of the truncated series with that of the exponential function y=exp(x); plot(x,serexp,x,y,'--") In examining the plot resulting from executing the above instructions, we observe that the truncated series give a very good approximation to the exponential over the whole interval. If you would also like to check the error of the approximation as a function of x, enter: dy=abs(y-serexp); semilogy(x,dy) Examining the output graph, you will ﬁnd, as expected, that the error increases with an increase in the value of x. However, the approximation of the exponential by the partial sum of the ﬁrst ten elements of the truncated Taylor series is accurate over the whole domain considered, to an accuracy of better than one part per million. Question: Could you have estimated the maximum error in the above computed value of dy by evaluating the ﬁrst neglected term in the Taylor’s series at x = 1? © 2001 by CRC Press LLC In-Class Exercise Pb. 8.7 Verify the accuracy of truncating at the ﬁfth element the following Taylor series, in a domain that you need to specify, so the error is everywhere less than one part in 10,000: ∑ a. ln(1 + x) = ∞ (−1)n+1 xn n n=1 ∑ b. ∞ sin(x) = (−1)n x 2n+1 (2n + 1)! n=0 ∑ c. cos(x) = ∞ (−1)n x2n n=0 (2n)! 8.7.5 Reconstructing a Function from Its Fourier Components From the results of Section 7.9, where we discussed the Fourier series, it is a simple matter to show that any even periodic function with period 2π can be written in the form of a cosine series, and that an odd periodic function can be written in the form of a sine series of the fundamental frequency and its higher harmonics. Knowing the coefﬁcients of its Fourier series, we would like to plot the function over a period. The purpose of the following example is two-fold: 1. On the mechanistic side, to illustrate again the setting up of a two indices problem in a matrix form. 2. On the mathematical contents side, examining the effects of truncating a Fourier series on the resulting curve. Example 8.8 ∑ Plot y(x) = M Ck cos(kx), if Ck = (−1)k k2 +1 . Choose successively for M the val- k=1 ues 5, 20, and 40. Solution: Edit and execute the following script M-ﬁle: M= ; p=500; k=1:M; © 2001 by CRC Press LLC n=0:p; x=(2*pi/p)*n; a=cos((2*pi/p)*n'*k); c=((-1).^k)./(k.^2+1); y=a*c'; plot(x,y) axis([0 2*pi -1 1.2]) Draw in your notebook the approximate shape of the resulting curve for different values of M. In-Class Exercises Pb. 8.8 For different values of the cutoff, plot the resulting curves for the functions given by the following Fourier series: ∑ y1(x) = 8 π2 ∞ (2k 1 − 1)2 cos((2k − 1)x) k=1 ∑ y2(x) = 4 π ∞ k=1 (−1)k−1 (2k − 1) cos((2k − 1)x) ∑ y3 (x) = 2 π ∞ 1 (2k − 1) sin((2k − 1)x) k=1 Pb. 8.9 The purpose of this problem is to explore the Gibbs phenomenon. This phenomenon occurs as a result of truncating the Fourier series of a discontinuous function. Examine, for example, this phenomenon in detail for the function y3(x) given in Pb. 8.8. The function under consideration is given analytically by: y3 (x) = 0.5 −0.5 for 0 < x < π for π < x < 2π a. Find the value where the truncated Fourier series overshoots the value of 0.5. (Answer: The limiting value of this ﬁrst maximum is 0.58949). b. Find the limiting value of the ﬁrst local minimum. (Answer: The limiting value of this ﬁrst minimum is 0.45142). © 2001 by CRC Press LLC c. Derive, from ﬁrst principles, the answers to parts (a) and (b). (Hint: Look up in a standard integral table the sine integral function.) NOTE An important goal of ﬁlter theory is to ﬁnd methods to smooth these kinds of oscillations. 8.7.6 Interpolating the Coefﬁcients of an (n – 1)-degree Polynomial from n Points The problem at hand can be posed as follows: Given the coordinates of n points: (x1, y1), (x2, y2), …, (xn, yn), we want to ﬁnd the polynomial of degree (n – 1), denoted by pn–1(x), whose curve passes through these points. Let us assume that the polynomial has the following form: pn−1(x) = a1 + a2x + a3x2 + … + anxn−1 (8.22) From a knowledge of the column vectors X and Y, we can formulate this problem in the standard linear system form. In particular, in matrix form, we can write: 1 1 x1 x2 x12 x22 L L x1n−1 x2n−1 aa21 y1 y2 V * A = M M M O M M = M = Y M M ML M M M 1 xn xn2 L xnn−1 an yn (8.23) Knowing the matrix V and the column Y, it is then a trivial matter to deduce the column A: A = V−1 * Y (8.24) What remains to be done is to generate in an efﬁcient manner the matrix V using the column vector X as input. We note the following recursion relation for the elements of V: V(k, j) = x(k) * V(k, j – 1) (8.25) Furthermore, the ﬁrst column of V has all its elements equal to 1. The following routine computes A: © 2001 by CRC Press LLC X=[x1;x2;x3;.......;xn]; Y=[y1;y2;y3;.......;yn]; n=length(X); V=ones(n,n); for j=2:n V(:,j)=X.*V(:,j-1); end A=V\Y In-Class Exercises Find the polynomials that are deﬁned through: Pb. 8.10 The points (1, 5), (2, 11), and (3, 19). Pb. 8.11 The points (1, 8), (2, 39), (3, 130), (4, 341), and (5, 756). 8.7.7 Least Square Fit of Data In Section 8.7.6, we found the polynomial of degree (n – 1) that was uniquely determined by the coordinates of n points on its curve. However, when data ﬁtting is the tool used by experimentalists to verify a theoretical prediction, many more points than the minimum are measured in order to minimize the effects of random errors generated in the acquisition of the data. But this over-determination in the system parameters faces us with the dilemma of what conﬁdence level one gives to the accuracy of speciﬁc data points, and which data points to accept or reject. A priori, one takes all data points, and resorts to a determination of the vector A whose corresponding polynomial comes closest to all the experimental points. Closeness is deﬁned through the Euclidean distance between the experimental points and the predicted curve. This method for minimizing the sum of the square of the Euclidean distance between the optimal curve and the experimental points is referred to as the least-square ﬁt of the data. To have a geometrical understanding of what we are attempting to do, consider the conceptually analogous problem in 3-D of having to ﬁnd the plane with the least total square distance from ﬁve given data points. So what do we do? Using the projection procedure derived in Chapter 7, we deduce each point’s distance from the plane; then we go ahead and adjust the parameters of the plane equation to obtain the smallest total square distance between the points and the plane. In linear algebra courses, using generalized optimiza- © 2001 by CRC Press LLC tion techniques, you will be shown that the best ﬁt to A (i.e., the one called least-square ﬁt) is given (using the rotation of the previous subsection) by: AN = (VTV)–1VTY (8.26) A MATLAB routine to ﬁt a number of (n) points to a polynomial of order (m – 1) now reads: X=[x1;x2;x3;.......;xn]; Y=[y1;y2;y3;.......;yn]; n=length(X); m= %(m-1) is the degree of the polynomial V=ones(n,m); for j=2:m V(:,j)=X.*V(:,j-1); end AN=inv(V'*V)*(V'*Y) MATLAB also has a built-in command to achieve the least-square ﬁt of data. Look up the polyfit function in your help documentation, and learn its use and point out what difference exists between its notation and that of the above routine. In-Class Exercise Pb. 8.12 Find the second-degree polynomials that best ﬁt the data points: (1, 8.1), (2, 24.8), (3, 52.5), (4, 88.5), (5, 135.8), and (6, 193.4). 8.8 Eigenvalues and Eigenvectors* DEFINITION If M is a square n ⊗ n matrix, then a vector v is called an eigenvector and λ, a scalar, is called an eigenvalue, if they satisfy the relation: Mv =λv (8.27) that is, the vector M v is a scalar multiplied by the vector v . © 2001 by CRC Press LLC 8.8.1 Finding the Eigenvalues of a Matrix To ﬁnd the eigenvalues, note that the above deﬁnition of eigenvectors and eigenvalues can be rewritten in the following form: (M − λI) v = 0 (8.28) where I is the identity n ⊗ n matrix. The above set of homogeneous equations admits a solution only if the determinant of the matrix multiplying the vector v is zero. Therefore, the eigenvalues are the roots of the polynomial p(λ), deﬁned as follows: p(λ) = det(M − λI) (8.29) This equation is called the characteristic equation of the matrix M. It is of degree n in λ. (This last assertion can be proven by noting that the contribution to the determinant of (M – λI), coming from the product of the diagonal elements of this matrix, contributes a factor of λn to the expression of the determinant.) Example 8.9 Find the eigenvalues and the eigenvectors of the matrix M, deﬁned as follows: M = 2 1/ 2 4 3 Solution: The characteristic polynomial for this matrix is given by: p(λ) = (2 − λ)(3 − λ) − (4)(1/ 2) = λ2 − 5λ + 4 The roots of this polynomial (i.e., the eigenvalues of the matrix) are, respectively, λ1 = 1 and λ2 = 4 To ﬁnd the eigenvectors corresponding to the above eigenvalues, which we shall denote respectively by v1 and v2 , we must satisfy the following two equations separately: 2 1/ 2 4 3 a b = a 1 b and © 2001 by CRC Press LLC 2 1/ 2 4 3 c d = 4 c d From the ﬁrst set of equations, we deduce that: b = –a/4; and from the second set of equations that d = c/2, thus giving for the eigenvectors v1 and v2 , the following expressions: −1 v1 = a1/ 4 v2 = c −1 −1/ 2 It is common to give the eigenvectors in the normalized form (that is, ﬁx a and c to make v1 v1 = v2 v2 = 1, thus giving for v1 and v2 , the normalized values: v1 = 16 17 −1 1/ 4 = −0.9701 0.2425 v2 = 4 5 −1 −1/ 2 = −0.8944 −0.4472 8.8.2 Finding the Eigenvalues and Eigenvectors Using MATLAB Given a matrix M, the MATLAB command to ﬁnd the eigenvectors and eigenvalues is given by [V,D]=eig(M); the columns of V are the eigenvectors and D is a diagonal matrix whose elements are the eigenvalues. Entering the matrix M and the eigensystem commands gives: V= -0.9701 -0.8944 -0.2425 -0.4472 D= 10 04 Finding the matrices V and D is referred to as diagonalizing the matrix M. It should be noted that this is not always possible. For example, the matrix is not diagonalizable when one or more of the roots of the characteristic poly- © 2001 by CRC Press LLC nomial is zero. In courses of linear algebra, you will study the necessary and sufﬁcient conditions for M to be diagonalizable. In-Class Exercises Pb. 8.13 Show that if M v = λ v , then Mn v = λn v . That is, the eigenvalues of Mn are λn; however, the eigenvectors v ‘s remain the same as those of M. Verify this theorem using the choice in Example 8.9 for the matrix M. Pb. 8.14 Find the eigenvalues of the upper triangular matrix: 1/ 4 0 0 T = −1 2 1/ 2 0 −3 1 Generalize your result to prove analytically that the eigenvalues of any triangular matrix are its diagonal elements. (Hint: Use the previously derived result in Pb. 8.1 for the expression of the determinant of a triangular matrix.) Pb. 8.15 A general theorem, which will be proven to you in linear algebra courses, states that if a matrix is diagonalizable, then, using the above notation: VDV–1 = M Verify this theorem for the matrix M of Example 8.9. a. Using this theorem, show that: n ∏ det(M) = det(D) = λi i b. Also show that: VDnV–1 = Mn c. Apply this theorem to compute the matrix M5, for the matrix M of Example 8.9. Pb. 8.16 Find the non-zero eigenvalues of the 2 ⊗ 2 matrix A that satisﬁes the equation: A = A3 © 2001 by CRC Press LLC Homework Problems The function of a matrix can formally be deﬁned through a Taylor series expansion. For example, the exponential of a matrix M can be deﬁned through: ∑ exp(M) = ∞ Mn n! n=0 Pb. 8.17 Use the results from Pb. 8.15 to show that: exp(M) = V exp(D)V–1 where, for any diagonal matrix: λ1 0 L L 0 exp(λ1) 0 L L 0 0 λ2 M 0 exp(λ2 ) M exp M O M = M O M M 0 0 λ n−1 L0 0 λn M 0 0 L exp(λ n−1 ) 0 0 exp(λ n ) Pb. 8.18 Using the results from Pb. 8.17, we deduce a direct technique for solving the initial value problem for any system of coupled linear ODEs with constant coefﬁcients. Find and plot the solutions in the interval 0 ≤ t ≤ 1 for the following set of ODEs: dx1 dt = x1 + 2x2 dx2 dt = 2x1 − 2x2 with the initial conditions: x1(0) = 1 and x2(0) = 3. (Hint: The solution of dX = AX is X(t) = exp(At)X(0), where X is a time-dependent vector and A is dt a time-independent matrix.) Pb. 8.19 MATLAB has a shortcut for computing the exponential of a matrix. While the command exp(M) takes the exponential of each element of the matrix, the command expm(M) computes the matrix exponential. Verify your results for Pb. 8.18 using this built-in function. © 2001 by CRC Press LLC 8.9 The Cayley-Hamilton and Other Analytical Techniques* In Section 8.8, we presented the general techniques for computing the eigenvalues and eigenvectors of square matrices, and showed their power in solving systems of coupled linear differential equations. In this section, we add to our analytical tools arsenal some techniques that are particularly powerful when elegant solutions are desired in low-dimensional problems. We start with the Cayley-Hamilton theorem. 8.9.1 Cayley-Hamilton Theorem The matrix M satisﬁes its own characteristic equation. PROOF As per Eq. (8.29), the characteristic equation for a matrix is given by: p(λ) = det(M − λI) = 0 (8.30) Let us now form the polynomial of the matrix M having the same coefﬁcients as that of the characteristic equation, p(M). Using the result from Pb. 8.15, and assuming that the matix is diagonalizable, we can write for this polynomial: p(M) = Vp(D)V–1 (8.31) where p(λ1) 0 L L 0 p(D) = 0 M p(λ2 ) O 0 M 0 0 L p(λ n−1 ) 0 0 p(λ n ) (8.32) However, we know that λ1, λ2, …, λn–1, λn are all roots of the characteristic equation. Therefore, thus giving: p(λ1) = p(λ2 ) = … = p(λn−1) = p(λn ) = 0 (8.33) p(D) = 0 (8.34) ⇒ p(M) = 0 (8.35) © 2001 by CRC Press LLC Example 8.10 Using the Cayley-Hamilton theorem, ﬁnd the inverse of the matrix M given in Example 8.9. Solution: The characteristic equation for this matrix is given by: p(M) = M2 – 5M + 4I = 0 Now multiply this equation by M–1 to obtain: M – 5I + 4M–1 = 0 and 3 ⇒ M −1 = 0.25(5I − M) = 4 − 1 8 −1 1 2 Example 8.11 Reduce the following fourth-order polynomial in M, where M is given in Example 8.9, to a ﬁrst-order polynomial in M: P(M) = M4 + M3 + M2 + M + I Solution: From the results of Example 8.10 , we have: M2 = 5M − 4I M3 = 5M2 − 4M = 5(5M − 4I) − 4M = 21M − 20I M4 = 21M2 − 20M = 21(5M − 4I) − 20M = 85M − 84I ⇒ P(M) = 112M − 107I Verify the answer numerically using MATLAB. 8.9.2 Solution of Equations of the Form dX = AX dt We sketched a technique in Pb. 8.17 that uses the eigenvectors matrix and solves this equation. In Example 8.12, we solve the same problem using the Cayley-Hamilton technique. © 2001 by CRC Press LLC Example 8.12 Using the Cayley-Hamilton technique, solve the system of equations: dx1 dt = x1 + 2x2 dx2 dt = 2x1 − 2x2 with the initial conditions: x1(0) = 1 and x2(0) = 3 Solution: The matrix A for this system is given by: A = 1 2 2 −2 and the solution of this system is given by: X(t) = eAtX(0) Given that A is a 2 ⊗ 2 matrix, we know from the Cayley-Hamilton result that the exponential function of A can be written as a ﬁrst-order polynomial in A; thus: P(A) = eAt = aI + bA To determine a and b, we note that the polynomial equation holds as well for the eigenvalues of A, which are equal to –3 and 2; therefore: giving: and e−3t = a − 3b e2t = a + 2b a = 2 e−3t + 3 e2t 5 5 b = 1 e2t − 1 e−3t 55 © 2001 by CRC Press LLC e At = 1 5 2 5 e−3t + 4 e2t 5 e2t − 2 e−3t 5 2 5 e2t − 2 5 e −3t 4 5 e −3t + 1 5 e2t Therefore, the solution of the system of equations is X(t) = 2e2t − e−3t e2t + 2e−3t 8.9.3 Solution of Equations of the Form dX = AX + B(t) dt Multiplying this equation on the left by e–At, we obtain: e−At dX = e−AtAX + e−AtB(t) dt Rearranging terms, we write this equation as: (8.36) e−At dX − e−AtAX = e−AtB(t) dt (8.37) We note that the LHS of this equation is the derivative of e–AtX. Therefore, we can now write Eq. (8.37) as: d [e−AtX(t)] = e−AtB(t) dt This can be directly integrated to give: (8.38) ∫ [e − A t X(t)] t 0 = t e − A τ B(τ)dτ 0 or, written differently as: (8.39) t ∫ e−AtX(t) − X(0) = e−AτB(τ)dτ 0 which leads to the standard form of the solution: (8.40a) © 2001 by CRC Press LLC t ∫ X(t) = eAtX(0) + eA(t−τ)B(τ)dτ 0 (8.40b) We illustrate the use of this solution in ﬁnding the classical motion of an electron in the presence of both an electric ﬁeld and a magnetic ﬂux density. Example 8.13 Find the motion of an electron in the presence of a constant electric ﬁeld and a constant magnetic ﬂux density that are parallel. Solution: Let the electric ﬁeld and the magnetic ﬂux density be given by: r E = E0ê3 r B = B0ê3 Newton’s equation of motion in the presence of both an electric ﬁeld and a magnetic ﬂux density is written as: m r dv = r q(E + r v × r B) dt r where v is the velocity of the electron, and m and q are its mass and charge, respectively. Writing this equation in component form, it reduces to the following matrix equation: d dt v1 v2 v3 = α 0 −1 0 1 0 0 0 v1 0 0 0 v2 v3 + β 0 1 where α = qB0 and β = qE0 . m m This equation can be put in the above standard form for an inhomogeneous ﬁrst-order equation if we make the following identiﬁcations: 0 1 0 0 A = α −1 0 0 and B = β 0 0 0 0 1 First, we note that the matrix A is block diagonalizable; that is, all off-diagonal elements with 3 as either the row or column index are zero, and therefore © 2001 by CRC Press LLC we can separately do the exponentiation of the third component giving e0 = 1; the exponentiation of the top block can be performed along the same steps, using the Cayley-Hamilton techniques from Example 8.12 , giving ﬁnally: cos(αt) sin(αt) 0 eAt = − sin(αt) cos(αt) 0 0 0 1 Therefore, we can write the solutions for the electron’s velocity components as follows: v1(t) cos(αt) v2 v3 ((tt)) = − sin(αt) 0 sin(αt) cos(αt) 0 0 v1(0) 0 0 1 v2 v3 ((00)) + β 0 t or equivalently: v1(t) = v1(0) cos(αt) + v2(0) sin(αt) v2(t) = −v1(0) sin(αt) + v2(0) cos(αt) v3(t) = v3(0) + βt In-Class Exercises Pb. 8.20 Plot the 3-D curve, with time as parameter, for the tip of the velocity vector of an electron with an initial velocity v = v0ê1, where v0 = 105 m/s, entering a region of space where a constant electricrﬁeld and a constant magnetic ﬂux drensity are present and are described by: E = E0ê3, where E0 = –104 V/m, and B = B0ê3, where B0 = 10–2 Wb/m2. The mass of the electron is me = 9.1094 × 10–31 kg, and the magnitude of the electron charge is e = 1.6022 × 10–19 C. Pb. 8.21 Integrate the expression of the velocity vector in Pb. 8.20 to ﬁnd the parametric equations of the electron position vector for the preceding problem conﬁguration, and plot its 3-D curve. Let the origin of the axis be ﬁxed to where the electron enters the region of the electric and magnetic ﬁelds. Pb. 8.22 Find the parametric equations for the electron velocity if the electric ﬁeld and the magnetic ﬂux density are still parallel, the margnetic ﬂux density is still constant, but the electric ﬁeld is now described by E = E0 cos(ωt)ê3. © 2001 by CRC Press LLC Example 8.14 Find the motion of an electron in the presence of a constant electric ﬁeld and a constant magnetic ﬂux density perpendicular to it. Solution: Let the electric ﬁeld and the magnetic ﬂux density be given by: r E = E0ê3 r B = B0ê1 The matrix A is given in this instance by: 0 0 0 A = α 0 0 1 0 −1 0 while the vector B is still given by: 0 B = β 0 1 The matrix eAt is now given by: 1 eAt = 0 0 0 cos(αt) − sin(αt) 0 sin(αt) cos(αt) and the solution for the velocity vector is for this conﬁguration given, using Eq. (8.40), by: v1(t) 1 v2 v3 ((tt)) = 0 0 0 cos(αt) − sin(αt) 0 v1(0) sin(αt) cos(αt) v2 v3 ((00)) + 1 ∫+ t 0 0 0 0 cos[α(t − τ)] − sin[α(t − τ)] 0 0 sin[α(t − τ)] 0 dτ cos[α(t − τ)] β leading to the following parametric representation for the velocity vector: © 2001 by CRC Press LLC v1(t) = v1(0) v2 (t) = v2(0) cos(αt) + v3 (0) sin(αt) + β α [1 − cos(αt)] v3 (t) = −v2 (0) sin(αt) + v3 (0) cos(αt) + β α sin(αt) Homework Problems Pb. 8.23 Plot the 3-D curve, with time as parameter, for the tip of the veloc- ity vector of an electron with an initial velocity r v(0) = v0 3 (ê1 + ê2 + ê3 ), where v0 = 105 m/s, entering a region of space where the electric r ﬁeld and the mag- netic ﬂux density are constant and described by r E = E0ê3, where E0 = –104 V/m; and B = B0ê1, where B0 = 10–2 Wb/m2. Pb. 8.24 Find the parametric equations for the position vector for Pb. 8.23, assuming that the origin of the axis is where the electron enters the region of the force ﬁelds. Plot the 3-D curve that describes the position of the electron. 8.9.4 Pauli Spinors We have shown thus far in this section the power of the Cayley-Hamilton theorem in helping us avoid the explicit computation of the eigenvectors while still analytically solving a number of problems of linear algebra where the dimension of the matrices was essentially 2 ⊗ 2, or in some special cases 3 ⊗ 3. In this subsection, we discuss another analytical technique for matrix manipulation, one that is based on a generalized underlying abstract algebraic structure: the Pauli spin matrices. This is the prototype and precursor to more advanced computational techniques from a ﬁeld of mathematics called Group Theory. The Pauli matrices are 2 ⊗ 2 matrices given by: σ1 = 0 1 1 0 (8.41a) σ2 = 0 j 1 −1 0 (8.41b) σ3 = 1 0 0 −1 (8.41c) © 2001 by CRC Press LLC These matrices have the following properties, which can be easily veriﬁed by inspection: Property 1: σ 2 1 = σ 2 2 = σ 2 3 = I where I is the 2 ⊗ 2 identity matrix. (8.42) Property 2: σ1σ2 + σ2σ1 = σ1σ3 + σ3σ1 = σ2σ3 + σ3σ2 = 0 (8.43) Property 3: σ1σ2 = jσ3 ; σ2σ3 = jσ1 ; σ3σ1 = jσ2 If we deﬁne the quantity r σ ⋅ r v to mean: (8.44) rr σ ⋅ v = σ1v1 + σ2v2 + σ3v3 (8.45) r that is, v = (v1, v2, v3), where the parameters v1, v2, v3 are represented as the components of a vector, the following theorem is valid. THEOREM (σr ⋅ vr)(σr ⋅ r w) = r (v ⋅ r w)I + jσr ⋅ r (v × r w) (8.46) where the vectors’ dot and cross products have the standard deﬁnition. PROOF The left side of this equation can be expanded as follows: rrr r (σ ⋅ v)(σ ⋅ w) = (σ1v1 + σ2v2 + σ3v3 )(σ1w1 + σ2w2 + σ3w3 ) = (σ 12 v1w1 + σ 2 2 v2 w2 + σ 2 3 v3 w3 ) + (σ 1σ 2 v1w2 + σ 2 σ 1v2 w1 ) + (8.47) + (σ1σ3v1w3 + σ3σ1v3w1) + (σ2σ3v2w3 + σ3σ2v3w2 ) Using property 1 of the Pauli’s matrices, the ﬁrst parenthesis on the RHS of Eq. (8.47) can be written as: (σ 2 1 v1w1 + σ 2 2 v2 w2 + σ 2 3 v3 w3 ) = (v1w1 + v2w2 + v3w3 )I = r (v ⋅ r w)I (8.48) Using properties 2 and 3 of the Pauli’s matrices, the second, third, and fourth parentheses on the RHS of Eq. (8.47) can respectively be written as: (σ1σ2v1w2 + σ2σ1v2w1) = jσ3 (v1w2 − v2w1) (8.49) © 2001 by CRC Press LLC (σ1σ3v1w3 + σ3σ1v3w1) = jσ2 (−v1w3 + v3w1) (8.50) (σ2σ3v2w3 + σ3σ2v3w2 ) = jσ1(v2w3 − v3w2 ) (8.51) Recalling that the cross product of two vectors r (v × r w) can be written from Eq. (7.49) in components form as: r (v × r w) = (v2w3 − v3w2 , −v1w3 + v3w1 , v1w2 − v2 w1 ) tbhineesdectoongdi,vtehirjσdr ,⋅a(vrnd× frourth parentheses on the RHS w), thus completing the proof of of Eq. the (8.47) can theorem. be com- COROLLARY If ê is a unit vector, then: (σr ⋅ ê)2 = I PROOF Using Eq. (8.46), we have: r (σ ⋅ ê)2 = (ê ⋅ ê)I + r jσ ⋅ (ê × ê) = I (8.52) where, in the last step, we used the fact that the norm of a unit vector is one and that the cross product of any vector with itself is zero. A direct result of this corollary is that: r (σ ⋅ ê)2m = I (8.53) and r (σ ⋅ ê)2m+1 = r (σ ⋅ ê) (8.54) From the above results, we are led to the theorem: THEOREM r r exp( jσ ⋅ êφ) = cos(φ) + jσ ⋅ ê sin(φ) PROOF If we Taylor expand the exponential function, we obtain: (8.55) © 2001 by CRC Press LLC ∑ exp(jσr ⋅ êφ) = [jφ(σr ⋅ ê)]m m! m (8.56) Now separating the even power and odd rpower terms, using the just derived result for the odd and even powers of (σ ⋅ ê), and Taylor expansions of the cosine and sine functions, we obtain the desired result. Example 8.15 Find the time development of the spin state of an electron in a constant magnetic ﬂux density. Solution: [For readers not interested in the physical background of this problem, they can immediately jump to the paragraph following Eq. (8.59).] Physical Background: In addition to the spatio-temporal dynamics, the electron and all other elementary particles of nature also have internal degrees of freedom; which means that even if the particle has no translational motion, its state may still evolve in time. The spin of a particle is such an internal degree of freedom. The electron spin internal degree of freedom requires for its representation a two-dimensional vector, that is, two fundamental states are possible. As may be familiar to you from your elementary chemistry courses, the up and down states of the electron are required to satisfactorily describe the number of electrons in the different orbitals of the atoms. For the up state, the eigenvalue of the spin matrix is positive; while for the down state, the eigenvalue is negative (respectively h/2 and –h/2, where h = 1.0546 × 10–34 J.s = h/(2π), and h is Planck’s constant). Due to spin, the quantum mechanical dynamics of an electron in a magnetic ﬂux density does not only include quantum mechanically the time development equivalent to the classical motion that we described in Examples 8.13 and 8.14; it also includes precession of the spin around the external magnetic ﬂux density, similar to that experienced by a small magnet dipole in the presence of a magnetic ﬂux density. The magnetic dipole moment due to the spin internal degree of freedom of an electron is proportional to the Pauli’s spin matrix; speciﬁcally: r r µ = −µBσ (8.57) where µB = 0.927 × 10–23 J/Tesla. In the same notation, the electron spin angular momentum is given by: r S= h σr 2 (8.58) © 2001 by CRC Press LLC The electron magnetic dipole, due to spin, interaction with the magnetic ﬂux density is described by the potential: rr V = µBσ ⋅ B (8.59) and the dynamics of the electron spin state in the magnetic ﬂux density is described by Schrodinger’s equation: jh d dt ψ rr = µBσ ⋅ B ψ (8.60) where, as previously mentioned, the Dirac ket-vector is two-dimensional. Mathematical Problem: To put the problem in purely mathematical form, we are asked to ﬁnd the time development of the two-dimensional vector ψ if this vector obeys the system of equations: d dt a(t) b(t) = −j Ω 2 (σr ⋅ ê) a(t) b(t) (8.61) where Ω = µBB0 , and is called the Larmor frequency, and the magnetic ﬂux 2h r density is given by B = B0ê. The solution of Eq. (8.61) can be immediately written because the magnetic ﬂux density is constant. The solution at an arbi- trary time is related to the state at the origin of time through: a(t) b(t) = exp− j Ω 2 (σr ⋅ ê)t a(0) b(0) which from Eq. (8.55) can be simpliﬁed to read: (8.62) a(t) b(t) = cos Ω 2 t I − j(σr ⋅ ê) sin Ω 2 t a(0) b(0) (8.63) If we choose the magnetic ﬂux density to point in the z-direction, then the solution takes the very simple form: a(t) b(t) −jΩt = e e j Ω22tba((00)) (8.64) © 2001 by CRC Press LLC Physically, the above result can be interpreted as the precession of the elec- tron around the direction of the magnetic ﬂux density. To understand this statement, let us ﬁnd the eigenvectors of the σx and σy matrices. These are given by: αx = 1 1 2 1 and βx = 1 1 2 −1 (8.65a) αy = 1 1 2 j and βy = 1 1 2 − j (8.65b) The eigenvalues of σx and σy corresponding to the eigenvectors α are equal to 1, while those corresponding to the eigenvectors β are equal to –1. Now, assume that the electron was initially in the state αx: a(0) b(0) = 1 2 1 1 = αx (8.66) By substitution in Eq. (8.64), we can compute the electron spin state at different times. Thus, for the time indicated, the electron spin state is given by the second column in the list below: t= π ⇒ 2Ω ψ = e − jπ/4α y (8.67) t= π ⇒ Ω ψ = e − jπ/2βx (8.68) t = 3π ⇒ 2Ω ψ = e − j3π/4βy (8.69) t = 2π ⇒ Ω ψ = e− jπα x (8.70) In examining the above results, we note that, up to an overall phase, the electron spin state returns to its original state following a cycle. During this cycle, the electron “pointed” successively in the positive x-axis, the positive y-axis, the negative x-axis, and the negative y-axis before returning again to the positive x-axis, thus mimicking the hand of a clock moving in the counterclockwise direction. It is this “motion” that is referred to as the electron spin precession around the direction of the magnetic ﬂux density. © 2001 by CRC Press LLC In-Class Exercises Pb. 8.25 Find the Larmor frequency for an electron in a magnetic ﬂux density of 100 Gauss (10–2 Tesla). Pb. 8.26 Similar to the electron, the proton and the neutron also have spin as one of their internal degrees of freedom, and similarly attached to this spin, both the proton and the neutron each have a magnetic moment. The magnetic moment attached to the proton and neutron have, respectively, the values µn = –1.91 µN and µp = 2.79 µN, where µN is called the nuclear magneton and is equal to µN = 0.505 × 10–26 Joule/Tesla. Find the precession frequency of the proton spin if the proton is in the presence of a magnetic ﬂux density of strength 1 Tesla. Homework Problem Pb. 8.27 Magnetic resonance imaging (MRI) is one of the most accurate techniques in biomedical imaging. Its principle of operation is as follows. A strong dc magnetic ﬂux density aligns in one of two possible orientations the spins of the protons of the hydrogen nuclei in the water of the tissues (we say that it polarizes them). The other molecules in the system have zero magnetic moments and are therefore not affected. In thermal equilibrium and at room temperature, there are slightly more protons aligned parallel to the magnetic ﬂux density because this is the lowest energy level in this case. A weaker rotating ac transverse ﬂux density attempts to ﬂip these aligned spins. The energy of the transverse ﬁeld absorbed by the biological system, which is proportional to the number of spin ﬂips, is the quantity measured in an MRI scan. It is a function of the density of the polarized particles present in that speciﬁc region of the image, and of the frequency of the ac transverse ﬂux density. In this problem, we want to ﬁnd the frequency of the transverse ﬁeld that will induce the maximum number of spin ﬂips. The ODE describing the spin system dynamics in this case is given by: d dt ψ = j[Ω⊥ cos(ωt)σ1 + Ω⊥ sin(ωt)σ2 + Ωσ3 ] ψ where Ω = µ pB0 h , Ω⊥ = µ pB⊥ , h µp is given in Pb. 8.26, and the magnetic ﬂux density is given by r B = B⊥ cos(ωt)ê1 + B⊥ sin(ωt)ê2 + B0 ê3 © 2001 by CRC Press LLC Assume for simplicity the initial state ψ(t = 0) = 1 0 , and denote the state of the system at time t by ψ(t) = a(t) b(t ) : a. Find numerically at which frequency ω the magnitude of b(t) is maximum. b. Once you have determined the optimal ω, go back and examine what strategy you should adopt in the choice of Ω⊥ to ensure maximum resolution. c. Verify your numerical answers with the analytical solution of this problem, which is given by: b(t) 2 = Ω 2 ⊥ ω˜ 2 sin2(ω˜ t) where ω˜ 2 = (Ω − ω / 2)2 + Ω2⊥. 8.10 Special Classes of Matrices* 8.10.1 Hermitian Matrices Hermitian matrices of ﬁnite or inﬁnite dimensions (operators) play a key role in quantum mechanics, the primary tool for understanding and solving physical problems at the atomic and subatomic scales. In this section, we deﬁne these matrices and ﬁnd key properties of their eigenvalues and eigenvectors. DEFINITION The Hermitian adjoint of a matrix M, denoted by M† is equal to the complex conjugate of its transpose: M† = M T (8.71) For example, in complex vector spaces, the bra-vector will be the Hermitian adjoint of the corresponding ket-vector: v = ( v )† (8.72) © 2001 by CRC Press LLC LEMMA (AB)† = B†A† (8.73) PROOF From the deﬁnition of matrix multiplication and Hermitian adjoint, we have: [(AB)† ]ij = (A B)ji ∑ ∑ = A jk Bki = (A† )kj (B† )ik k k ∑ = (B† )ik (A† )kj = (B†A† )ij k DEFINITION A matrix is Hermitian if it is equal to its Hermitian adjoint; that is H† = H (8.74) THEOREM 1 The eigenvalues of a Hermitian matrix are real. PROOF Let λm be an eigenvalue of H and let vm be the corresponding eigenvector; then: H vm = λm vm (8.75) Taking the Hermitian adjoints of both sides, using the above lemma, and remembering that H is Hermitian, we successively obtain: (H vm )† = vm H† = vm H = vm λm (8.76) Now multiply (in an inner-product sense) Eq. (8.75) on the left with the bra vm and Eq. (8.76) on the right by the ket-vector vm , we obtain: vm H vm = λm vm vm = λm vm vm ⇒ λm = λm (8.77) THEOREM 2 The eigenvectors of a Hermitian matrix corresponding to different eigenvalues are orthogonal; that is, given that: © 2001 by CRC Press LLC H vm = λm vm (8.78) and then: H vn = λn vn λm ≠ λn (8.79) (8.80) vn vm = vm vn = 0 PROOF Because the eigenvalues are real, we can write: (8.81) vn H = vn λn (8.82) Dot this quantity on the right by the ket vm to obtain: vn H vm = vn λn vm = λn vn vm (8.83) On the other hand, if we dotted Eq. (8.78) on the left with the bra-vector vn , we obtain: vn H vm = vn λm vm = λm vn vm Now compare Eqs. (8.83) and (8.84). They are equal, or that: (8.84) λm vn vm = λn vn vm (8.85) However, because λm ≠ λn, this equality can only be satisﬁed if vn vm = 0, which is the desired result. In-Class Exercises Pb. 8.28 Show that any Hermitian 2 ⊗ 2 matrix has a unique decomposition into the Pauli spin matrices and the identity matrix. © 2001 by CRC Press LLC Pb. 8.29 Find the multiplication rule for two 2 ⊗ 2 Hermitian matrices that have been decomposed into the Pauli spin matrices and the identity matrix; that is If: M = a0I + a1σ1 + a2σ2 + a3σ3 and N = b0I + b1σ1 + b2σ2 + b3σ3 Find: the p-components in: P = MN = p0I + p1σ1 + p2σ2 + p3σ3 Homework Problem Pb. 8.30 The Calogero and Perelomov matrices of dimensions n ⊗ n are given by: Ml k = (1 − δ lk ) 1 + j cot (l − k)π n a. Verify that their eigenvalues are given by: λs = 2s – n – 1 where s = 1, 2, 3, …, n. b. Verify that their eigenvectors matrices are given by: Vls = exp − j 2π n ls c. Use the above results to derive the Diophantine summation rule: ∑n−1 cot lπ n sin 2slπ n = n − 2s l=1 where s = 1, 2, 3, …, n – 1. 8.10.2 Unitary Matrices DEFINITION A unitary matrix has the property that its Hermitian adjoint is equal to its inverse: © 2001 by CRC Press LLC U† = U−1 (8.86) An example of a unitary matrix would be the matrix ejHt, if H was Hermitian. THEOREM 1 The eigenvalues of a unitary matrix all have magnitude one. PROOF The eigenvalues and eigenvectors of the unitary matrix satisfy the usual equations for these quantities; that is: U vn = λn vn Taking the Hermitian conjugate of this equation, we obtain: (8.87) vn U† = vn U−1 = vn λn Multiplying Eq. (8.87) on the left by Eq. (8.88), we obtain: (8.88) vn U−1 U vn = vn vn = λ n 2 vn vn (8.89) from which we deduce the desired result that: λn 2 = 1. A direct corollary of the above theorem is that det(U) = 1. This can be proven directly if we remember the result of Pb. 8.15, which states that the determinant of any diagonalizable matrix is the product of its eigenvalues, and the above theorem that proved that each of these eigenvalues has unit magnitude. THEOREM 2 A transformation represented by a unitary matrix keeps invariant the scalar (dot, or inner) product of two vectors. PROOF The matrix U acting on the vectors ϕ and ψ results in two new vectors, denoted by ϕ' and ψ' and such that: ϕ′ = U ϕ (8.90) ψ′ = U ψ Taking the Hermitian adjoint of Eq. (8.90), we obtain: ϕ′ = ϕ U† = ϕ U−1 (8.91) (8.92) © 2001 by CRC Press LLC Multiplying Eq. (8.91) on the left by Eq. (8.92), we obtain: ϕ′ ψ′ = ϕ U−1 U ψ = ϕ ψ (8.93) which is the result that we are after. In particular, note that the norm of the vector under this matrix multiplication remains invariant. We will have the opportunity to study a number of examples of such transformations in Chapter 9. 8.10.3 Unimodular Matrices DEFINITION A unimodular matrix has the deﬁning property that its determinant is equal to one. In the remainder of this section, we restrict our discussion to 2 ⊗ 2 unimodular matrices, as these form the tools for the matrix formulation of ray optics and Gaussian optics, which are two of the major sub-ﬁelds of photonics engineering. Example 8.16 Find the eigenvalues and eigenvectors of the 2 ⊗ 2 unimodular matrix. Solution: Let the matrix M be given by the following expression: M = a c b d The unimodularity condition is then written as: (8.94) det(M) = ad − bc = 1 Using Eq. (8.95), the eigenvalues of this matrix are given by: (8.95) λ± = 1 [(a + 2 d) ± (a + d)2 − 4] (8.96) Depending on the value of (a + d), these eigenvalues can be parameterized in a simple expression. We choose, here, the range –2 ≤ (a + d) ≤ 2 for illustrative purposes. Under this constraint, the following parameterization is convenient: cos(θ) = 1 (a + d) 2 (8.97) © 2001 by CRC Press LLC (For the ranges below –2 and above 2, the hyperbolic cosine function will be more appropriate and similar steps to the ones that we will follow can be repeated.) Having found the eigenvalues, which can now be expressed in the simple form: λ ± = e ± jθ let us proceed to ﬁnd the matrix V, deﬁned as: (8.98) M = VDV−1 or MV = VD (8.99) and where D is the diagonal matrix of the eigenvalues. By direct substitution, in the matrix equation deﬁning V, Eq. (8.99), the following relations can be directly obtained: V11 = λ + − d V21 c (8.100) and V12 = λ − − d V22 c (8.101) If we choose for convenience V11 = V22 = c (which is always possible because each eigenvector can have the value of one of its components arbitrary chosen with the other components expressed as functions of it), the matrix V can be written as: V = e jθ − c d e− jθ − d c (8.102) and the matrix M can be then written as: e jθ − d M = c e− jθ − d e jθ c 0 0 c e− jθ −c (2 j sin(θ)) d − e− jθ e jθ − d (8.103) © 2001 by CRC Press LLC Homework Problem Pb. 8.31 Use the decomposition given by Eq. (8.103) and the results of Pb. 8.15 to prove the Sylvester theorem for the unimodular matrix, which states that: Mn = a c sin[(n + 1)θ] − D sin(nθ) b d n = sin(θ) C sin(nθ) sin(θ) B sin(nθ) sin(θ) D sin(nθ) − sin[(n sin(θ) − 1)θ] where θ is deﬁned in Equation 8.97. Application: Dynamics of the Trapping of an Optical Ray in an Optical Fiber Optical ﬁbers, the main waveguides of land-based optical broadband networks are hair-thin glass ﬁbers that transmit light pulses over very long distances with very small losses. Their waveguiding property is due to a quadratic index of refraction radial proﬁle built into the ﬁber. This proﬁle is implemented in the ﬁber manufacturing process, through doping the glass with different concentrations of impurities at different radial distances. The purpose of this application is to explain how waveguiding can be achieved if the index of refraction inside the ﬁber has the following proﬁle: n = n0 1 − n22 2 r 2 (8.104) where r is the radial distance from the ﬁber axis and n22r2 is a number smaller than 0.01 everywhere inside the ﬁber. This problem can, of course, be solved by ﬁnding the solution of Maxwell equations, or the differential equation of geometrical optics for ray propagation in a non-uniform medium. However, we will not do this in this application. Here, we use only Snell’s law of refraction (see Figure 8.4), which states that at the boundary between two transparent materials with two different indices of refraction, light refracts such that the product of the index of refraction of each medium multiplied by the sine of the angle that the ray makes with the normal to the interface in each medium is constant, and Sylvester’s theorem derived in Pb. 8.31. Let us describe a light ray going through the ﬁber at any point z along its length, by the distance r that the ray is displaced from the ﬁber axis, and by the small angle α that the ray’s direction makes with the ﬁber axis. Now consider two points on the ﬁber axis separated by the small distance δz. We want © 2001 by CRC Press LLC FIGURE 8.4 Parameters of Snell’s law of refraction. to ﬁnd r(z + δz) and α(z + δz), knowing r(z) and α(z). We are looking for the iteration relation that successive applications will permit us to ﬁnd the ray displacement r and α slope at any point inside the ﬁber if we knew their values at the ﬁber entrance plane. We solve the problem in two steps. We ﬁrst assume that there was no bending in the ray, and then ﬁnd the ray transverse displacement following a small displacement. This is straightforward from the deﬁnition of the slope of the ray: δr = α(z)δz (8.105) Because the angle α is small, we approximated the tangent of the angle by the value of the angle in radians. Therefore, if we represent the position and slope of the ray as a column matrix, Eq. (8.105) can be represented by the following matrix representation: r(z + δz) α(z + δz) = 1 0 δz r(z) 1 α(z) (8.106) Next, we want to ﬁnd the bending experienced by the ray in advancing through the distance δz. Because the angles that should be used in Snell’s law are the complementary angles to those that the ray forms with the axis of the ﬁber, and recalling that the glass index of refraction is changing only in the radial direction, we deduce from Snell’s law that: © 2001 by CRC Press LLC n(r + δr) cos(α + δα) = n(r) cos(α) (8.107) Now, taking the leading terms of a Taylor expansion of the LHS of this equation leads us to: n(r) + dn(r) dr δr 1 − (α + δα)2 2 ≈ n(r)1 − α2 2 (8.108) Further simpliﬁcation of this equation gives to ﬁrst order in the variations: δα ≈ 1 αn(r) dn(r) dr δr ≈ 1 n0 (−n0n22r)δz = −(n22δz)r which can be expressed in matrix form as: (8.109) r(z + δz) α(z + δz) = 1 −n22δz 0 r(z) 1 α(z) (8.110) The total variation in the values of the position and slope of the ray can be obtained by taking the product of the two matrices in Eqs. (8.106) and (8.110), giving: ( ) r(z + δz) α(z + δz) = 1 − n2δz −n22δz 2 δz r(z) 1 α(z) (8.111) Equation (8.111) provides us with the required recursion relation to numerically iterate the progress of the ray inside the ﬁber. Thus, the ray distance from the ﬁber axis and the angle that it makes with this axis can be computed at any z in the ﬁber if we know the values of the ray transverse coordinate and its slope at the entrance plane. The problem can also be solved analytically if we note that the determinant of this matrix is 1 (the matrix is unimodular). Sylvester’s theorem provides the means to obtain the following result: r(z) α(z) = cos(n2 z) −n2 sin(n2z) sin(n2 z) n2 cos(n2 z) r(0) α(0) (8.112) Homework Problems Pb. 8.32 Consider an optical ﬁber of radius a = 30µ, n0 = 4/3, and n2 = 103 m–1. Three ray enters this ﬁber parallel to the ﬁber axis at distances of 5µ, 10µ, and 15µ from the ﬁber’s axis. © 2001 by CRC Press LLC a. Write a MATLAB program to follow the progress of the rays through the ﬁber, properly choosing the δz increment. b. Trace these rays going through the ﬁber. Figure 8.5 shows the answer that you should obtain for a ﬁber length of 3 cm. FIGURE 8.5 Traces of rays, originally parallel to the ﬁber’s axis, when propagating inside an optical ﬁber. Pb. 8.33 Using Sylvester’s theorem, derive Eq. (8.112). (Hint: Deﬁne the angle θ, such that sin θ 2 = αδz 2 , and recall that while δz goes to zero, its product with the number of iterations is ﬁnite and is equal to the distance of propagation inside the ﬁber.) Pb. 8.34 Find the maximum angle that an incoming ray can have so that it does not escape from the ﬁber. (Remember to include the refraction at the entrance of the ﬁber.) 8.11 MATLAB Commands Review det expm Compute the determinant of a matrix. Computes the matrix exponential. © 2001 by CRC Press LLC eye Identity matrix. inv Find the inverse of a matrix. ones Matrix with all elements equal to 1. polyfit Fit polynomial to data. triu Extract upper triangle of a matrix. tril Extract lower triangle of a matrix. zeros Matrix with all elements equal to zero. [V,D]=eig(M) Finds the eigenvalues and eigenvectors of a matrix. © 2001 by CRC Press LLC 9 Transformations The theory of transformations concerns itself with changes in the coordinates and shapes of objects upon the action of geometrical operations, dynamical boosts, or other operators. In this chapter, we deal only with linear transformations, using examples from both plane geometry and relativistic dynamics (space-time geometry). We also show how transformation techniques play an important role in image processing. We formulate both the problems and their solutions in the language of matrices. Matrices are still denoted by boldface type and matrix multiplication by an asterisk. 9.1 Two-Dimensional (2-D) Geometric Transformations We ﬁrst concern ourselves with the operations of inversion about the origin of axes, reﬂection about the coordinate axes, rotation around the origin, scaling, and translation. But prior to going into the details of these transformations, we need to learn how to draw closed polygonal ﬁgures in MATLAB so that we can implement and graph the different cases. 9.1.1 Polygonal Figures Construction Consider a polygonal ﬁgure whose vertices are located at the points: (x1 , y1), (x2 , y2 ), …, (xn , yn ) The polygonal ﬁgure can then be thought off as line segments (edges) connecting the vertices in a given order, including the edge connecting the last point to the initial point to ensure that we obtain a closed ﬁgure. The implementation of the steps leading to the drawing of the ﬁgure follows: 1. Label all vertex points. 2. Label the path you follow. 0-8493-????-?/00/$0.00+$.50 ©© 22000010 bbyy CCRRCC PPrreessss LLLLCC 3. Construct a (2 ⊗ (n + 1) matrix, the G matrix, where the elements of the ﬁrst row consist of the ordered (n + 1)-tuplet, (x1, x2, x3, …, xn, x1), and those of the second row consists of the corresponding y coordinates (n + 1)-tuplet. 4. Plot the second row of G as function of its ﬁrst row. Example 9.1 Plot the trapezoid whose vertices are located at the points (2, 1), (6, 1), (5, 3), and (3, 3). Solution: Enter and execute the following commands: G=[2 6 5 3 2; 1 1 3 3 1]; plot(G(1,:),G(2,:)) To ensure that the exact geometrical shape is properly reproduced, remember to instruct your computer to choose the axes such that you have equal x-range and y-range and an aspect ratio of 1. If you would like to add any text anywhere in the ﬁgure, use the command gtext. 9.1.2 Inversion about the Origin and Reﬂection about the Coordinate Axes We concern ourselves here with inversion with respect to the origin and with reﬂection about the x- or y-axis. Inversion about other points or reﬂection about other than the coordinate axes can be deduced from a composition of the present transformations and those discussed later. • The inversion about the origin changes the coordinates as follows: x′ = −x (9.1) y′ = −y In matrix form, this transformation can be represented by: P = −1 0 0 −1 (9.2) • For the reﬂection about the x-axis, denoted by Px, and the reﬂection about the y-axis, denoted by Py, the transformation matrices are given by: © 2001 by CRC Press LLC Px = 1 0 0 −1 (9.3) Py = −1 0 0 1 (9.4) In-Class Exercise Pb. 9.1 Using the trapezoid of Example 9.1, obtain all the transformed G’s as a result of the action of each of the three transformations deﬁned in Eqs. (9.2) through (9.4), and plot the transformed ﬁgures on the same graph. Pb. 9.2 In drawing the original trapezoid, we followed the counterclockwise direction in the sequencing of the different vertices. What is the sequencing of the respective points in each of the transformed G’s? Pb. 9.3 Show that the quantity (x2 + y2) is invariant under separately the action of Px, Py, or P. 9.1.3 Rotation around the Origin The new coordinates of a point in the x-y plane rotated by an angle θ around the z-axis can be directly derived through some elementary trigonometry. Here, instead, we derive the new coordinates using results from the complex numbers chapter (Chapter 6). Recall that every point in a 2-D plane represents a complex number, and multiplication by a complex number of modulus 1 and argument θ results in a rotation of angle θ of the original point. Therefore: z′ = ze jθ x′ + jy′ = (x + jy)(cos(θ) + j sin(θ)) (9.5) = (x cos(θ) − y sin(θ)) + j(x sin(θ) + y cos(θ)) Equating separately the real parts and the imaginary parts, we deduce the action of rotation on the coordinates of a point: x′ = x cos(θ) − y sin(θ) (9.6) y′ = x sin(θ) + y cos(θ) © 2001 by CRC Press LLC The above transformation can also be written in matrix form. That is, if the point is represented by a size 2 column vector, then the new vector is related to the old one through the following transformation: x′ y′ = cos(θ) sin(θ) − sin(θ) cos(θ) x y = R(θ) x y (9.7) The convention for the sign of the angle is the same as that used in Chapter 6, namely that it is measured positive when in the counterclockwise direction. Preparatory Exercises Using the above form for the rotation matrix, verify the following properties: Pb. 9.4 Its determinant is equal to 1. Pb. 9.5 R(–θ) = [R(θ)]–1 = [R(θ)]T Pb. 9.6 R(θ1) ∗ R(θ2) = R(θ1 + θ2) = R(θ2) ∗ R(θ1) Pb. 9.7 (x′)2 + (y′)2 = x2 + y2 Pb. 9.8 Show that P = R(θ = π). Also show that there is no rotation that can reproduce Px or Py. In-Class Exercises Pb. 9.9 Find the coordinates of the image of the point (x, y) obtained by reﬂection about the line y = x. Test your results using MATLAB. Pb. 9.10 Find the transformation matrix corresponding to a rotation of –π/3, followed by an inversion around the origin. Solve the problem in two different ways. Pb. 9.11 By what angle should you rotate the trapezoid so that point (6, 1) of the trapezoid of Example 9.1 is now on the y-axis? 9.1.4 Scaling If the x-coordinate of each point in the plane is multiplied by a positive constant sx , then the effect of this transformation is to expand or compress each plane ﬁgure in the x-direction. If 0 < sx < 1, the result is a compression; and if sx > 1, the result is an expansion. The same can also be done along the y-axis. This class of transformations is called scaling. © 2001 by CRC Press LLC The matrices corresponding to these transformations, in 2-D, are respectively: Sx = s0x 0 1 (9.8) 1 0 Sy = 0 sy (9.9) In-Class Exercises Pb. 9.12 Find the transformation matrix for simultaneously compressing the x-coordinate by a factor of 2, while expanding the y-coordinate by a factor of 2. Apply this transformation to the trapezoid of Example 9.1 and plot the result. Pb. 9.13 Find the inverse matrices for Sx and Sy. 9.1.5 Translation r A translation is deﬁned by a vector T = (tx , ty ), and the transformation of the coordinates is given simply by: x′ = x + tx y′ = y + ty or, written in matrix form as: (9.10) x′ y′ = x y + tx ty (9.11) The effect of translation over the matrix G is described by the relation: GT = G + T * ones(1, n + 1) where n is the number of points being translated. (9.12) © 2001 by CRC Press LLC In-Class Exercise Pb. 9.14 Translate the trapezoid of Example 9.1 by a vector of length 5 that is making an angle of 30° with the x-axis. 9.2 Homogeneous Coordinates As we have seen in Section 9.1, inversion about the origin, reﬂection about the coordinate axes, rotation, and scaling are operations that can be represented by a multiplicative matrix, and therefore the composite operation of acting successively on a ﬁgure by one or more of these operations can be described by a product of matrices. The translation operation, on the other hand, is represented by an addition, and thus cannot be incorporated, as yet, into the matrix multiplication scheme; and consequently, the expression for composite operations becomes less tractable. We illustrate this situation with the following example: Example 9.2 Find the new G that results from rotating the trapezoid of Example 9.1 by a π/4 angle around the point Q (–5, 5). Solution: Because we have thus far deﬁned the rotation matrix only around the origin, our task here is to generalize this result. We solve the problem by reducing it to a combination of elementary operations thus far deﬁned. The strategy for solving the problem goes as follows: 1. Perform a translation to place Q at the origin of a new coordinate system. 2. Perform a π/4 rotation around the new origin, using the above form for rotation. 3. Translate back the origin to its initial location. Written in matrix form, the above operations can be written sequentially as follows: 1. G1 = G + T * ones(1, n + 1) (9.13) where and n = 4. 5 T = −5 (9.14) © 2001 by CRC Press LLC 2. G2 = R(π / 4) ∗ G1 (9.15) 3. G3 = G2 − T * ones(1, n + 1) and the ﬁnal result can be written as: (9.16) G3 = R(π / 4) * G + [(R(π / 4) − 1) * T] * ones(1, n + 1) (9.17) We can implement the above sequence of transformations through the following script M-ﬁle: plot(-5,5,'*') hold on G=[2 6 5 3 2; 1 1 3 3 1]; plot(G(1,:),G(2,:),'b') T=[5;-5]; G1=G+T*ones(1,5); plot(G1(1,:),G1(2,:), 'r') R=[cos(pi/4) -sin(pi/4);sin(pi/4) cos(pi/4)]; G2=R*G1; plot(G2(1,:),G2(2,:),'g') G3=G2-T*ones(1,5); plot(G3(1,:),G3(2,:),'k') axis([-12 12 -12 12]) axis square Although the above formulation of the problem is absolutely correct, the number of terms in the ﬁnal expression for the image can wind up, in more involved problems, being large and cumbersome because of the existence of sums and products in the intermediate steps. Thus, the question becomes: can we incorporate all the transformations discussed thus far into only multiplicative matrices? The answer comes from an old trick that mapmakers have used successfully; namely, the technique of homogeneous coordinates. In this technique, as applied to the present case, we append to any column vector the row with value 1, that is, the point (xm, ym) is now represented by the column vector: xymm 1 (9.18) © 2001 by CRC Press LLC Similarly in the deﬁnition of G, we should append to the old deﬁnition, a row with all elements being 1. In this coordinate representation, the different transformations thus far discussed are now multiplicative and take the following forms: −1 0 0 P = 0 −1 0 0 0 1 (9.19) 1 0 0 Px = 0 −1 0 0 0 1 (9.20) −1 0 0 Py = 0 1 0 0 0 1 (9.21) S = s0x 0 sy 0 0 0 0 1 (9.22) cos(θ) − sin(θ) 0 R(θ) = sin(θ) cos(θ) 0 0 0 1 (9.23) 1 T = 0 0 1 tx ty 0 0 1 (9.24) The composite matrix of any two transformations can now be written as the product of the matrices representing the constituent transformations. Of course, this economizes on the writing of expressions and makes the calculations less prone to trivial errors originating in the expansion of products of sums. Example 9.3 Repeat Example 9.2, but now use the homogeneous coordinates. Solution: The following script M-ﬁle implements the required task: © 2001 by CRC Press LLC plot(-5,5,'*') hold on G=[2 6 5 3 2; 1 1 3 3 1;1 1 1 1 1]; plot(G(1,:),G(2,:),'b') T=[1 0 5;0 1 -5;0 0 1]; G1=T*G; plot(G1(1,:),G1(2,:), 'r') R=[cos(pi/4) -sin(pi/4) 0;sin(pi/4) cos(pi/4) 0;... 0 0 1]; G2=R*G1; plot(G2(1,:),G2(2,:),'g') G3=inv(T)*G2; plot(G3(1,:),G3(2,:),'k') axis([-12 12 -12 12]) axis square hold off 9.3 Manipulation of 2-D Images Currently more and more images are being stored or transmitted in digital form. What does this mean? To simplify the discussion, consider a black and white image and assume that it has a square boundary. The digital image is constructed by the optics of the detecting system (i.e., the camera) to form on a plane containing a 2-D array of detectors, instead of the traditional photographic ﬁlm. Each of these detectors, called a pixel (picture element), measures the intensity of light falling on it. The image is then represented by a matrix having the same size as the detectors’ 2-D array structure, and such that the value of each of the matrix elements is proportional to the intensity of the light falling on the associated detector element. Of course, the resolution of the picture increases as the number of arrays increases. 9.3.1 Geometrical Manipulation of Images Having the image represented by a matrix, it is now possible to perform all kinds of manipulations on it in MATLAB. For example, we could ﬂip it in the left/right directions (fliplr), or in the up/down direction (flipud), or rotate it by 90° (rot90), or for that matter transform it by any matrix transformation. In the remainder of this section, we explore some of the © 2001 by CRC Press LLC techniques commonly employed in the handling and manipulation of digital images. Let us explore and observe the structure of a matrix subjected to the above elementary trasformations. For this purpose, execute and observe the outputs from each of the following commands: M=(1/25)*[1 2 3 4 5;6 7 8 9 10;11 12 13 14 15;16 17 18 19 20;21 22 23 24 25] lrM=fliplr(M) udM=flipud(M) Mr90=rot90(M) A careful examination of the resulting matrix elements will indicate the general features of each of these transformations. You can also see in a visually more suggestive form how each of the transformations changed the image of the original matrix, if we render the image of M and its transform in false colors, that is, we assign a color to each number. To perform this task, choose the colormap(hot) command to obtain the images. In this mapping, the program assigns a color to each pixel, varying from black-red-yellow-white, depending on the magnitude of the intensity at the corresponding detector. Enter, in the following sequence, each of the following commands and at each step note the color distributions of the image: colormap(hot) imagesc(M,[0 1]) imagesc(lrM,[0 1]) imagesc(udM,[0 1]) imagesc(Mr90,[0 1]) The command imagesc produces an intensity image of a data matrix that spans a given range of values. 9.3.2 Digital Image Processing A typical problem in digital image processing involves the analysis of the raw data of an image that was subject, during acquisition, to a blur due to the movement of the camera or to other sources of noise. An example of this situation occurs in the analysis of aerial images; the images are blurred due, inter alia, to the motion of the plane while the camera shutter is open. The question is, can we do anything to obtain a crisper image from the raw data if we know the speed and altitude of the plane when it took the photograph? The answer is afﬁrmative. We consider for our example the photograph of a rectangular board. Construct this image by entering: © 2001 by CRC Press LLC FIGURE 9.1 The raw and processed images of a rectangular board photographed from a moving plane. Top panel: Raw (blurred) image. Bottom panel: Processed image. N=64; A=zeros(N,N); A(15:35,15:45)=1; colormap(gray); imagesc(A,[0 1]) where (N N) is the size of the image (here, N = 64). Now assume that the camera that took the image had moved while the shutter was open by a distance that would correspond in the image plane to L pixels. What will the image look like now? (See Figure 9.1.) The blurring operation was modeled here by the matrix B. The blurred image is simulated through the matrix product: A1 = A * B (9.25) where B, the blurring matrix, is given by the following Toeplitz matrix: L=9; B=toeplitz([ones(L,1);zeros(N-L,1)],[1;zeros(N- 1,1)])/L; © 2001 by CRC Press LLC Here, the blur length was L = 9, and the blurred image A1 was obtained by executing the following commands: A1=A*B; imagesc(A1,[0 1]) To bring back the unblurred picture, simply multiply the matrix A1 on the right by inv(B) and obtain the original image. In practice, one is given the blurred image and asked to reconstruct it while correcting for the blur. What to do? 1. Compute the blur length from the plane speed and height. 2. Construct the Toeplitz matrix, and take its inverse. 3. Apply the inverse of the Toeplitz matrix to the blurred image matrix, obtaining the processed image. 9.3.3 Encrypting an Image If for any reason, two individuals desire to exchange an image but want to keep its contents only to themselves, they may agree beforehand on a scrambling matrix that the ﬁrst individual applies to scramble the sent image, while the second individual applies the inverse of the scramble matrix to unscramble the received image. Given that an average quality image currently has a minimum size of about (1000×1000) pixels, reconstructing the scrambling matrix, if chosen cleverly, would be inaccessible except to the most powerful and specialized computers. The purpose of the following problems is to illustrate an efﬁcient method for building a scrambling matrix. In-Class Exercises Assume for simplicity that the 2-D array size is (10×10), and that the scrambling matrix is chosen such that each row has one element equal to 1, while the others are 0, and no two rows are equal. Pb. 9.15 For the (10×10) matrix dimension, how many possible scrambling matrices S, constructed as per the above prescription, are there? If the matrix size is (1000×1000), how many such scrambling matrices will there be? Pb. 9.16 An original ﬁgure was scrambled by the scrambling matrix S to obtain the image shown in Figure 9.2. The matrix S is (10×10) and has all its elements equal to zero, except S(1, 6) = S(2, 3) = S(3, 2) = S(4, 1) = S(5, 9) = S(6, 4) = S(7, 10) = S(8, 7) = S(9, 8) = S(10, 5) = 1. Find the original image. © 2001 by CRC Press LLC FIGURE 9.2 Scrambled image of Pb. 9.16. 9.4 Lorentz Transformation* 9.4.1 Space-Time Coordinates Einstein’s theory of special relativity studies the relationship of the dynamics of a system, if described in two coordinate systems moving with constant speed one from the other. The theory of special relativity does not assume, as classical mechanics does, that there exists an absolute time common to all coordinate systems. It associates with each coordinate system a four-dimensional space (three space coordinates and one time coordinate). The theory of special relativity associates a space-time transformation to go between two coordinate systems moving uniformily with respect to each other. Each real point event (e.g., the arrival of a light ﬂash on a screen) will be measured in both systems. If we distinguish by primes the data of the second observer from those of the ﬁrst, then the ﬁrst observer will ascribe to the event the coordinates (x, y, z, t), while the second observer will ascribe to it the coordinates (x′, y′, z′, t′); that is, there is no absolute time. The Lorentz transformation gives the rules for going from one coordinate system to the other. Assuming that the velocity v between the two systems has the same direction as the positive x-axis and where the x-axis direction continuously coin- © 2001 by CRC Press LLC cides with that of the x′-axis; and furthermore, that the origin of the spatial coordinates of one system at time t = 0 coincides with the origin of the other system at time t′ = 0, Einstein, on the basis of two postulates, derived the following transformation relating the coordinates of the two systems: x′ = x − vt , y′ = y, z′ = z, t′ = t− v c2 x 1− v2 c2 1− v2 c2 (9.26) where c is the velocity of light in vacuum. The derivation of these formulae are detailed for you in electromagnetic theory or modern physics courses and are not the subject of discussions here. Our purpose here is to show that knowing the above transformations, we can deduce many interesting physical observations as a result thereof. Preparatory Exercise Pb. 9.17 Show that, upon a Lorentz transformation, we have the equality: x′2 + y′2 + z′2 − c2t′2 = x2 + y2 + z2 − c2t2 This is referred to as the Lorentz invariance of the norm of the space-time four-vectors. What is the equivalent invariant in 3-D Euclidean geometry? If we rename our coordinates such that: x1 = x, x2 = y, x3 = z, x4 = jct the Lorentz transformation takes the following matricial form: (9.27) 1 1− β2 Lβ = − 0 0 jβ 1 − β2 00 10 01 00 jβ 1 − β2 0 0 1 1 − β2 (9.28) where β = v , and the relations that were given earlier relating the primed c and unprimed coordinates can be summarized by: © 2001 by CRC Press LLC 1 xx21′′ xx34′′ = − 1− β2 0 0 jβ 1 − β2 00 10 01 00 jβ 1− 0 0 1 β2 ∗ x1 x2 xx34 1 − β2 (9.29) In-Class Exercises Pb. 9.18 Write the above transformation for the case that the two coordinate systems are moving from each other at half the speed of light, and ﬁnd (x′, y′, z′, t′) if x = 2, y = 3, z = 4, ct = 3 Pb. 9.19 Find the determinant of Lβ. Pb. 9.20 Find the multiplicative inverse of Lβ, and compare it to the transpose. Pb. 9.21 Find the approximate expression of Lβ for β << 1. Give a physical interpretation to your result using Newtonian mechanics. 9.4.2 Addition Theorem for Velocities The physical problem of interest here is: assuming that a point mass is moving in the primed system in the x′-y′ plane with uniform speed u′ and its trajectory is making an angle θ′ with the x′-axis, what is the speed of this particle, as viewed in the unprimed system, and what is the angle that its trajectory makes with the x-axis, as observed in the unprimed system? In the unprimed and primed systems, respectively, the parametric equations for the point particle motion are given by: x = ut cos(θ), y = ut sin(θ) (9.30) x′ = u′t′ cos(θ), y′ = u′t′ sin(θ′) (9.31) where u and u′ are the speeds of the particle in the unprimed and primed systems, respectively. Note that if the prime system moves with velocity v with respect to the unprimed system, then the unprimed system moves with a velocity –v with respect to the primed system, and using the Lorentz transformation, we can write the following equalities: © 2001 by CRC Press LLC ut cos(θ) = (u′ cos(θ′) + v) t′ 1− β2 (9.32) ut sin(θ) = u′t′ sin(θ′) (9.33) t = [1 + (u′v / c2 ) cos(θ′)] t′ 1− β2 Dividing Eqs. (9.32) and (9.33) by Eq. (9.34), we obtain: u cos(θ) = [1 (u′ cos(θ′) + v) + (u′v / c2 ) cos(θ′)] (9.34) (9.35) u sin(θ) = [1 u′ sin(θ′) + (u′v / c2 1− β2 ) cos(θ′)] (9.36) From this we can deduce the magnitude and direction of the velocity of the particle, as measured in the unprimed system: u2 = u′2 + v2 + 2u′v cos(θ′) − (u′2v2 / c2 ) sin2(θ′) [1 + (u′v / c2 ) cos(θ′)]2 (9.37) tan(θ) = u′ sin(θ′) 1 − β2 u′ cos(θ′) + v (9.38) Preparatory Exercises Pb. 9.22 Find the velocity of a photon (the quantum of light) in the unprimed system if its velocity in the primed system is u′ = c. (Note the constancy of the velocity of light, if measured from either the primed or the unprimed system. As previously mentioned, this constituted one of only two postulates in Einstein’s formulation of the theory of special relativity, which determined uniquely the form of the dynamical boost transformation.) Pb. 9.23 Show that if u′ is parallel to the x′-axis, then the velocity addition formula takes the following simple form: u = u′ 1+ +v u′v c2 © 2001 by CRC Press LLC Pb. 9.24 Find the approximate form of the above expression for u when β << 1, and show that it reduces to the expression of velocity addition in Newtonian mechanics. In-Class Exercises Pb. 9.25 Find the angle θ, if θ′ = π and u′ = v = c . 2 2 Pb. 9.26 Plot the angle θ as a function of θ′ when v/c = 0.99 and u′/c = 1. Pb. 9.27 Let the variable φ be deﬁned such that tanh(φ) = β. Write the Lorentz transformation matrix as function of φ. Can you give the Lorentz transformation a geometric interpretation in non-Euclidean geometry? Pb. 9.28 Using the result of Pb. 9.27, write the resultant transformation from a boost with parameter φ1, followed by another boost with parameter φ2. Does this rule for composition of Lorentz transformations remind you of a similar transformation that you studied previously in this chapter? 9.5 MATLAB Commands Review colormap Control the color mix of an image. fliplr Flip a matrix left to right. flipud Flip a matrix in the up-to-down direction. imagesc Create a pixel intensity map from data stored in a matrix. load Import data ﬁles from outside MATLAB. rot90 Rotate a matrix by 90°. toeplitz Specialized matrix constructor that describes, inter alia, the operation of a blur in an image. © 2001 by CRC Press LLC 10 A Taste of Probability Theory* 10.1 Introduction In addition to its everyday use in all aspects of our public, personal, and leisure lives, probability plays an important role in electrical engineering practice in at least three important aspects. It is the mathematical tool to deal with three broad areas: 1. The problems associated with the inherent uncertainty in the input of certain systems. The random arrival time of certain inputs to a system cannot be predetermined; for example, the log-on and the log-off times of terminals and workstations connected to a computer network, or the data packets’ arrival time to a computer network node. 2. The problems associated with the distortion of a signal due to noise. The effects of noise have to be dealt with satisfactorily at each stage of a communication system from the generation, to the transmission, to the detection phases. The source of this noise may be due to either ﬂuctuations inherent in the physics of the problem (e.g., quantum effects and thermal effects) or due to random distortions due to externally generated uncontrollable parameters (e.g., weather, geography, etc.). 3. The problems associated with inherent human and computing machine limitations while solving very complex systems. Individual treatment of the dynamics of very large number of molecules in a material, in which more than 1022 molecules may exist in a quart-size container, is not possible at this time, and we have to rely on statistical averages when describing the behavior of such systems. This is the ﬁeld of statistical physics and thermodynamics. Furthermore, probability theory provides the necessary mathematical tools for error analysis in all experimental sciences. It permits estimation of the 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC error bars and the conﬁdence level for any experimentally obtained result, through a methodical analysis and reduction of the raw data. In future courses in probability, random variables, stochastic processes (which is random variables theory with time as a parameter), information theory, and statistical physics, you will study techniques and solutions to the different types of problems from the above list. In this very brief introduction to the subject, we introduce only the very fundamental ideas and results — where more advanced courses seem to almost always start. 10.2 Basics Probability theory is best developed mathematically based on a set of axioms from which a well-deﬁned deductive theory can be constructed. This is referred to as the axiomatic approach. We concentrate, in this section, on developing the basics of probability theory, using a physical description of the underlying concepts of probability and related simple examples, to lead us intuitively to what is usually the starting point of the set theoretic axiomatic approach. Assume that we conduct n independent trials under identical conditions, in each of which, depending on chance, a particular event A of particular interest either occurs or does not occur. Let n(A) be the number of experiments in which A occurs. Then, the ratio n(A)/n, called the relative frequency of the event A to occur in a series of experiments, clusters for n → ∞ about some constant. This constant is called the probability of the event A, and is denoted by: P(A) = lim n(A) n→∞ n (10.1) From this deﬁnition, we know speciﬁcally what is meant by the statement that the probability for obtaining a head in the ﬂip of a fair coin is 1/2. Let us consider the rolling of a single die as our prototype experiment : 1. The possible outcomes of this experiment are elements belonging to the set: S = {1, 2, 3, 4, 5, 6} (10.2) If the die is fair, the probability for each of the elementary elements of this set to occur in the roll of a die is equal to: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1 6 (10.3) © 2001 by CRC Press LLC 2. The observer may be interested not only in the elementary elements occurrence, but in ﬁnding the probability of a certain event which may consist of a set of elementary outcomes; for example: a. An event may consist of “obtaining an even number of spots on the upward face of a randomly rolled die.” This event then consists of all successful trials having as experimental outcomes any member of the set: E = {2, 4, 6} (10.4) b. Another event may consist of “obtaining three or more spots” (hence, we will use this form of abbreviated statement, and not keep repeating: on the upward face of a randomly rolled die). Then, this event consists of all successful trials having experimental outcomes any member of the set: B = {3, 4, 5, 6} (10.5) Note that, in general, events may have overlapping elementary elements. For a fair die, using the deﬁnition of the probability as the limit of a relative frequency, it is possible to conclude, based on experimental trials, that: while P(E) = P(2) + P(4) + P(6) = 1 2 (10.6) P(B) = P(3) + P(4) + P(5) + P(6) = 2 3 (10.7) and P(S) = 1 (10.8) The last equation [Eq. (10.8)] is the mathematical expression for the statement that the probability of the event that includes all possible elementary outcomes is 1 (i.e., certainty). It should be noted that if we deﬁne the events O and C to mean the events of “obtaining an odd number” and “obtaining a number smaller than 3,” respectively, we can obtain these events’ probabilities by enumerating the elements of the subsets of S that represent these events; namely: P(O) = P(1) + P(3) + P(5) = 1 2 (10.9) © 2001 by CRC Press LLC P(C) = P(1) + P(2) = 1 3 (10.10) However, we also could have obtained these same results by noting that the events E and O (B and C) are disjoint and that their union spanned the set S. Therefore, the probabilities for events O and C could have been deduced, as well, through the relations: P(O) = 1 – P(E) (10.11) P(C) = 1 – P(B) (10.12) From the above and similar observations, it would be a satisfactory representation of the physical world if the above results were codiﬁed and elevated to the status of axioms for a formal theory of probability. However, the question becomes how many of these basic results (the axioms) one really needs to assume, such that it will be possible to derive all other results of the theory from this seed. This is the starting point for the formal approach to the probability theory. The following axioms were proven to be a satisfactory starting point. Assign to each event A, consisting of elementary occurrences from the set S, a number P(A), which is designated as the probability of the event A, and such that: 1. 0 ≤ P(A) (10.13) 2. P(S) = 1 (10.14) 3. If: A ∩ B = ∅, where ∅ is the empty set Then: P(A ∪ B) = P(A) + P(B) (10.15) In the following examples, we illustrate some common techniques for ﬁnding the probabilities for certain events. Look around, and you will ﬁnd plenty more. Example 10.1 Find the probability for getting three sixes in a roll of three dice. Solution: First, compute the number of elements in the total sample space. We can describe each roll of the dice by a 3-tuplet (a, b, c), where a, b, and c can take the values 1, 2, 3, 4, 5, 6. There are 63 = 216 possible 3-tuplets. The event that we are seeking is realized only in the single elementary occurrence when the 3-tuplet (6, 6, 6) is obtained; therefore, the probability for this event, for fair dice, is © 2001 by CRC Press LLC P(A) = 1 216 Example 10.2 Find the probability of getting only two sixes in a roll of three dice. Solution: The event in this case consists of all elementary occurrences having the following forms: (a, 6, 6), (6, b, 6), (6, 6, c) where a = 1, …, 5; b = 1, …, 5; and c = 1, …, 5. Therefore, the event A consists of elements corresponding to 15 elementary occurrences, and its probability is P(A) = 15 216 Example 10.3 Find the probability that, if three individuals are asked to guess a number from 1 to 10, their guesses will be different numbers. Solution: There are 1000 distinct equiprobable 3-tuplets (a, b, c), where each component of the 3-tuplet can have any value from 1 to 10. The event A occurs when all components have unequal values. Therefore, while a can have any of 10 possible values, b can have only 9, and c can have only 8. Therefore, n(A) = 8 × 9 × 10, and the probability for the event A is P(A) = 8 × 9 × 10 = 0.72 1000 Example 10.4 An inspector checks a batch of 100 microprocessors, 5 of which are defective. He examines ten items selected at random. If none of the ten items is defective, he accepts the batch. What is the probability that he will accept the batch? Solution: The number of ways of selecting 10 items from a batch of 100 items is: N = 100! 10!(100 − 10)! = 100! 10! 90! = C11000 where Ckn is the binomial coefﬁcient and represents the number of combinations of n objects taken k at a time without regard to order. It is equal to n! k!(n − k)! . All these combinations are equally probable. © 2001 by CRC Press LLC If the event A is that where the batch is accepted by the inspector, then A occurs when all ten items selected belong to the set of acceptable quality units. The number of elements in A is N(A) = 95! 10! 85! = C1905 and the probability for the event A is P(A) = C1905 C11000 = 86 × 87 × 88 × 89 × 90 96 × 97 × 98 × 99 × 100 = 0.5837 In-Class Exercises Pb. 10.1 A cube whose faces are colored is split into 125 smaller cubes of equal size. a. Find the probability that a cube drawn at random from the batch of randomly mixed smaller cubes will have three colored faces. b. Find the probability that a cube drawn from this batch will have two colored faces. Pb. 10.2 An urn has three blue balls and six red balls. One ball was randomly drawn from the urn and then a second ball, which was blue. What is the probability that the ﬁrst ball drawn was blue? Pb. 10.3 Find the probability that the last two digits of the cube of a random integer are 1. Solve the problem analytically, and then compare your result to a numerical experiment that you will conduct and where you compute the cubes of all numbers from 1 to 1000. Pb. 10.4 From a lot of n resistors, p are defective. Find the probability that k resistors out of a sample of m selected at random are found defective. Pb. 10.5 Three cards are drawn from a deck of cards. a. Find the probability that these cards are the Ace, the King, and the Queen of Hearts. b. Would the answer change if the statement of the problem was “an Ace, a King, and a Queen”? Pb. 10.6 Show that: P(A) = 1 − P(A) where A, the complement of A, are all events in S having no element in common with A. © 2001 by CRC Press LLC NOTE In solving certain category of probability problems, it is often convenient to solve for P(A) by computing the probability of its complement and then applying the above relation. Pb. 10.7 Show that if A1, A2, …, An are mutually exclusive events, then: P(A1 ∪ A2 ∪ … ∪ An ) = P(A1) + P(A2 ) + … + P(An ) (Hint: Use mathematical induction and Eq. (10.15).) 10.3 Addition Laws for Probabilities We start by reminding the reader of the key results of elementary set theory: • The Commutative law states that: A∩B = B∩ A (10.16) A∪B = B∪ A • The Distributive laws are written as: (10.17) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (10.18) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) • The Associative laws are written as: (A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C (10.19) (10.20) (A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C • De Morgan’s laws are (10.21) (A ∪ B) = A ∩ B (10.22) (A ∩ B) = A ∪ B (10.23) © 2001 by CRC Press LLC • The Duality principle states that: If in an identity, we replace unions by intersections, intersections by unions, S by ∅, and ∅ by S, then the identity is preserved. THEOREM 1 If we deﬁne the difference of two events A1 – A2 to mean the events in which A1 occurs but not A2, the following equalities are valid: P(A1 − A2 ) = P(A1) − P(A1 ∩ A2 ) (10.24) P(A2 − A1) = P(A2 ) − P(A1 ∩ A2 ) (10.25) P(A1 ∪ A2 ) = P(A1) + P(A2 ) − P(A1 ∩ A2 ) (10.26) PROOF From the basic set theory algebra results, we can deduce the following equalities: A1 = (A1 − A2 ) ∪ (A1 ∩ A2 ) (10.27) A2 = (A2 − A1) ∪ (A1 ∩ A2 ) (10.28) A1 ∪ A2 = (A1 − A2 ) ∪ (A2 − A1) ∪ (A1 ∩ A2 ) (10.29) Further note that the events (A1 – A2), (A2 – A1), and (A1 ∩ A2) are mutually exclusive. Using the results from Pb. 10.7, Eqs. (10.27) and (10.28), and the preceding comment, we can write: P(A1) = P(A1 − A2 ) + P(A1 ∩ A2 ) (10.30) P(A2 ) = P(A2 − A1) + P(A1 ∩ A2 ) (10.31) which establish Eqs. (10.24) and (10.25). Next, consider Eq. (10.29); because of the mutual exclusivity of each event represented by each of the parenthesis on its LHS, we can use the results of Pb. 10.7, to write: P(A1 ∪ A2 ) = P(A1 − A2 ) + P(A2 − A1) + P(A1 ∩ A2 ) using Eqs. (10.30) and (10.31), this can be reduced to Eq. (10.26). (10.32) THEOREM 2 Given any n events A1, A2, …, An and deﬁning P1, P2, P3, …, Pn to mean: © 2001 by CRC Press LLC etc. …, then: n ∑ P1 = P(Ai ) i=1 ∑ P2 = P(Ai ∩ Aj ) 1≤i< j≤n ∑ P3 = P(Ai ∩ Aj ∩ Ak ) 1≤i< j n, ﬁnd the probability that there will be at least one letter in each drawer. b. Plot this probability for n = 12, and 15 ≤ m ≤ 50. (Hint: Take the event Aj to mean that no letter is ﬁled in the jth drawer and use the result of Pb. 10.11.) 10.4 Conditional Probability The conditional probability of an event A assuming C and denoted by P(A C) is, by deﬁnition, the ratio: P(A C) = P(A ∩ C) P(C) (10.37) Example 10.7 Considering the events E, O, B, C as deﬁned in Section 10.2 and the above definition for conditional probability, ﬁnd the probability that the number of spots showing on the die is even, assuming that it is equal to or greater than 3. © 2001 by CRC Press LLC Solution: In the above notation, we are asked to ﬁnd the quantity P(E B). Using Eq. (10.37), this is equal to: 2 P(E B) = P(E ∩ B) P(B) = P({4, 6}) P({3, 4, 5, 6}) = 6 4 = 1 2 6 In this case, P(E B) = P(E). When this happens, we say that the two events E and B are independent. Example 10.8 Find the probability that the number of spots showing on the die is even, assuming that it is larger than 3. Solution: Call D the event of having the number of spots larger than 3. Using Eq. (10.37), P(E D) is equal to: 2 P(E D) = P(E ∩ D) P(D) = P({4, 6}) P({4, 5, 6}) = 6 3 = 2 3 6 In this case, P(E D) ≠ P(E); and thus the two events E and D are not independent. Example 10.9 Find the probability of picking a blue ball ﬁrst, then a red ball from an urn that contains ﬁve red balls and four blue balls. Solution: From the deﬁnition of conditional probability [Eq. (10.37)], we can write: P(Blue ball first and Red ball second) = P(Red ball second Blue ball first) × P(Blue ball first) The probability of picking a blue ball ﬁrst is P(Blue ball first) = Original number of Blue balls = 4 Total number of balls 9 The conditional probability is given by: © 2001 by CRC Press LLC giving: P(Red ball second Blue ball first) = Number of Red balls =5 Number of balls remaining after first pick 8 P(Blue ball first and Red ball second) = 4 × 5 = 5 9 8 18 10.4.1 Total Probability and Bayes Theorems TOTAL PROBABILITY THEOREM If [A1, A2, …, An] is a partition of the total elementary occurrences set S, that is, n U Ai = S and Ai ∩ Aj = ∅ for i ≠ j i=1 and B is an arbitrary event, then: P(B) = P(B A1)P(A1) + P(B A2 )P(A2 ) + … + P(B An )P(An ) (10.38) PROOF From the algebra of sets, and the deﬁnition of a partition, we can write the following equalities: B = B ∩ S = B ∩ (A1 ∪ A2 ∪ … ∪ An ) = (B ∩ A1) ∪ (B ∩ A2 ) ∪ … ∪ (B ∩ An ) (10.39) Since the events (B ∩ Ai ) and (B ∩ Aj ) and are mutually exclusive for i ≠ j, then using the results of Pb. 10.7, we can deduce that: P(B) = P(B ∩ A1) + P(B ∩ A2 ) + … + P(B ∩ An ) (10.40) Now, using the conditional probability deﬁnition [Eq. (10.38)], Eq. (10.40) can be written as: P(B) = P(B A1)P(A1) + P(B A2 )P(A2 ) + … + P(B An )P(An ) This result is known as the Total Probability theorem. (10.41) © 2001 by CRC Press LLC BAYES THEOREM P(Ai B) = P(B A1)P(A1) + P(B Ai )P(Ai ) P(B A2 )P(A2 ) + … + P(B An )P(An ) (10.42) PROOF From the deﬁnition of the conditional probability [Eq. (10.37)], we can write: P(B ∩ Ai ) = P(Ai B)P(B) Again, using Eqs. (10.37) and (10.43), we have: (10.43) P(Ai B) = P(B Ai )P(Ai ) P(B) (10.44) Now, substituting Eq. (10.41) in the denominator of Eq. (10.44), we obtain Eq. (10.42). Example 10.10 A digital communication channel transmits the signal as a collection of ones (1s) and zeros (0s). Assume (statistically) that 40% of the 1s and 33% of the 0s are changed upon transmission. Suppose that, in a message, the ratio between the transmitted 1 and the transmitted 0 was 5/3. What is the probability that the received signal is the same as the transmitted signal if: a. The received signal was a 1? b. The received signal was a 0? Solution: Let O be the event that 1 was received, and Z be the event that 0 was received. If H1 is the hypothesis that 1 was received and H0 is the hypothesis that 0 was received, then from the statement of the problem, we know that: giving: P(H1) = 5 P(H0 ) 3 and P(H1) + P(H0 ) = 1 P( H1 ) = 5 8 and P(H0 ) = 3 8 Furthermore, from the text of the problem, we know that: © 2001 by CRC Press LLC P(O H1) = 3 5 and P(Z H1) = 2 5 P(O H0 ) = 1 3 and P(Z H0 ) = 2 3 From the total probability result [Eq. (10.41)], we obtain: P(O) = P(O H1)P(H1) + P(O H0 )P(H0 ) =3×5+1×3= 1 5838 2 and P(Z) = P(Z H1)P(H1) + P(Z H0 )P(H0 ) = 2×5+2×3 = 1 5838 2 The probability that the received signal is 1 if the transmitted signal was 1 from Bayes theorem: 53 P(H1 O) = P(H1)P(O H1) P(O) = 35 1 = 3 4 2 Similarly, we can obtain the probability that the received signal is 0 if the transmitted signal is 0: 32 P(H0 Z) = P(H0 )P(Z H0 ) P(Z) = 83 1 = 1 2 2 In-Class Exercises Pb. 10.13 Show that when two events A and B are independent, the addition law for probability becomes: P(A ∪ B) = P(A) + P(B) − P(A)P(B) © 2001 by CRC Press LLC Pb. 10.14 Consider four boxes, each containing 1000 resistors. Box 1 contains 100 defective items; Box 2 contains 400 defective items; Box 3 contains 50 defective items; and Box 4 contains 80 defective items. a. What is the probability that a resistor chosen at random from any of the boxes is defective? b. What is the probability that if the resistor is found defective, it came from Box 2? (Hint: The randomness in the selection of the box means that: P(B1) = P(B2) = P(B3) = P(B4) = 0.25.) 10.5 Repeated Trials Bernoulli trials refer to identical, successive, and independent trials, in which an elementary event A can occur with probability: p = P(A) (10.45) or fail to occur with probability: q=1–p (10.46) In the case of n consecutive Bernoulli trials, each elementary event can be described by a sequence of 0s and 1s, such as in the following: ω = 1104040214…4031 n digits − k ones (10.47) where n is the number of trials, k is the number of successes, and (n – k) is the number of failures. Because the trials are independent, the probability for the above single occurrence is: P(ω) = pkqn−k (10.48) The total probability for the event with k successes in n trials is going to be the probability of the single event multiplied by the number of conﬁgurations with a given number of digits and a given number of 1s. The number of such conﬁgurations is given by the binomial coefﬁcient Ckn. Therefore: © 2001 by CRC Press LLC P(k successes in n trials) = Cknpkqn−k (10.49) Example 10.11 Find the probability that the number 3 will appear twice in ﬁve independent rolls of a die. Solution: In a single trial, the probability of success (i.e., 3 showing up) is p= 1 6 Therefore, the probability that it appears twice in ﬁve independent rolls will be P(2 successes in 5 trials) = C25 p 2 q 5 = 5! 2! 3! 1 2 6 5 3 6 = 0.16075 Example 10.12 Find the probability that in a roll of two dice, three occurrences of snake-eyes (one spot on each die) are obtained in ten rolls of the two dice. Solution: The space S of the roll of two dice consists of 36 elementary elements (6 × 6), only one of which results in a snake-eyes conﬁguration; therefore: p = 1/36; k = 3; n = 10 and P(3 successes in 10 trials) = C310 p3q7 = 10! 1 3 3!7! 36 35 7 36 = 0.00211 In-Class Exercises Pb. 10.15 Assuming that a batch of manufactured components has an 80% chance of passing an inspection, what is the chance that at least 16 batches in a lot of 20 would pass the inspection? Pb. 10.16 In an experiment, we keep rolling a fair die until it comes up showing three spots. What are the probabilities that this will take: a. Exactly four rolls? b. At least four rolls? c. At most four rolls? © 2001 by CRC Press LLC Pb. 10.17 Let X be the number of successes in a Bernoulli trials experiment with n trials and the probability of success p in each trial. If the mean number of successes m, also called average value X and expectation value E(X), is deﬁned as: ∑ m ≡ X ≡ E(X) ≡ XP(X) and the variance is deﬁned as: show that: V(X) ≡ E((X − X )2 ) X = np and V(X) = np(1 − p) 10.5.1 Generalization of Bernoulli Trials In the above Bernoulli trials, we considered the case of whether or not a single event A was successful (i.e., two choices). This was the simplest partition of the set S. In cases where we partition the set S in r subsets: S = {A1, A2, …, Ar}, and the probabilities for these single events are, respectively: {p1, p2, …, pr}, where p1 + p2 + … + pr = 1, it can be easily proven that the probability in n independent trials for the event A1 to occur k1 times, the event A2 to occur k1 times, etc., is given by: P(k1 , k2 ,…, kr ; n) = n! k1! k2!…kr! p1k1 p2k2 … prkr where k1 + k2 + … + kr = n (10.50) Example 10.13 Consider the sum of the spots in a roll of two dice. We partition the set of outcomes {2, 3, …, 11, 12} into the three events A1 = {2, 3, 4, 5}, A2 = {6, 7}, A3 = {8, 9, 10, 11, 12}. Find P(1, 7, 2; 10). Solution: The probabilities for each of the events are, respectively: p1 = 10 36 , p2 = 11 36 , p3 = 15 36 © 2001 by CRC Press LLC and P(1, 7, 2; 10) = 10! 1!7! 2! 10 36 1 11 36 7 15 36 2 = 0.00431 10.6 The Poisson and the Normal Distributions In this section, we obtain approximate expressions for the binomial distribution in different limits. We start by considering the expression for the probability of k successes in n Bernoulli trials with two choices for outputs; that is, Eq. (10.49). 10.6.1 The Poisson Distribution Consider the limit when p << 1, but np ≡ a ≈ O(1). Then: P(k = 0) = n! 0! n! p0 (1 − p)n = 1 − a n n But in the limit n → ∞, (10.51) giving: 1 − a n n = e−a (10.52) P(k = 0) = e−a Now consider P(k = 1); it is equal to: (10.53) lim n→∞ P(k = 1) = n! 1!(n − 1)! p1(1 − p)n−1 ≈ a1 − a n n ≈ ae − a For P(k = 2), we obtain: (10.54) lim P(k n→∞ = 2) = n! 2!(n − 2)! p2 (1 − p)n−2 ≈ a2 2! 1 − a n n ≈ a2 2! e−a (10.55) © 2001 by CRC Press LLC Similarly, lim P(k) ≈ ak e−a n→∞ k! (10.56) We compare in Figure 10.1 the exact with the approximate expression for the probability distribution, in the region of validity of the Poisson approximation. Poisson Distribution : n = 100 ; p = 0.03 0.25 Asterisks : Poisson Approximation 0.2 Stems : Exact Distribution 0.15 P(k) 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 k FIGURE 10.1 The Poisson distribution. Example 10.14 A massive parallel computer system contains 1000 processors. Each processor fails independently of all others and the probability of its failure is 0.002 over a year. Find the probability that the system has no failures during one year of operation. Solution: This is a problem of Bernoulli trials with n = 1000 and p = 0.002: P(k = 0) = C01000p0 (1 − p)1000 = (0.998)1000 = 0.13506 or, using the Poisson approximate formula, with a = np = 2: P(k = 0) ≈ e−a = e−2 ≈ 0.13533 © 2001 by CRC Press LLC Example 10.15 Due to the random vibrations affecting its supporting platform, a recording head introduces glitches on the recording medium at the rate of n = 100 glitches per minute. What is the probability that k = 3 glitches are introduced in the recording over any interval of time ∆t = 1s? Solution: If we choose an interval of time equal to 1 minute, the probability for an elementary event to occur in the subinterval ∆t in this 1 minute interval is p= 1 60 The problem reduces to ﬁnding the probability of k = 3 in n = 100 trials. The Poisson formula gives this probability as: P(3) = 1 3! 100 60 3 exp − 100 60 = 0.14573 where a = 100/60. (For comparison purposes, the exact value for this probability, obtained using the binomial distribution expression, is 0.1466.) Homework Problem Pb. 10.18 Let A1, A2, …, Am+1 be a partition of the set S, and let p1, p2, …, pm+1 be the probabilities associated with each of these events. Assuming that n Bernoulli trials are repeated, show, using Eq. (10.50), that the probability that the event A1 occurs k1 times, the event A2 occurs k2 times, etc., is given in the limit n → ∞ by: lim n→∞ P( k1 , k2 , … , km+1 ; n) = (a1 )k1 e −a1 k1! (a2 )k2 e −a2 k2! … (am )km e−am km! where ai = npi. 10.6.2 The Normal Distribution Prior to considering the derivation of the normal distribution, let us recall Sterling’s formula, which is the approximation of n! when n → ∞: lim n! ≈ 2πn nne−n n→∞ (10.57) © 2001 by CRC Press LLC We seek the approximate form of the binomial distribution in the limit of very large n and npq >> 1. Using Eq. (10.57), the expression for the probability given in Eq. (10.49), reduces to: P(k successes in n trials) = 1 2π n k(n − k) np k k nq (n − k) n−k (10.58) Now examine this expression in the neighborhood of the mean (see Pb. 10.17). We deﬁne the distance from this mean, normalized to the square root of the variance, as: x = k − np npq (10.59) Using the leading two terms of the power expansion of (ln(1 + ε) = ε – ε2/2 + …), the natural logarithm of the two parentheses on the RHS of Eq. (10.58) can be approximated by: ln k −k np ≈ −(np + npq x) q np x − 1 2 q np x2 (10.60) ln n − k −(n−k) nq ≈ −(nq − npq x) − p nq x − 1 2 p nq x2 (10.61) Adding Eqs. (10.61) and (10.62), we deduce that: lni→m∞ np k k nq (n − k) n−k = e−x2 (10.62) Furthermore, we can approximate the square root term on the RHS of Eq. (10.58) by its value at the mean; that is n ≈1 n(n − k) npq (10.63) Combining Eqs. (10.62) and (10.63), we can approximate Eq. (10.58), in this limit, by the Gaussian distribution: © 2001 by CRC Press LLC P(k successes in n trials) = 1 2πnpq exp− (k − np)2 2npq (10.64) This result is known as the De Moivre-Laplace theorem. We compare in Figure 10.2 the binomial distribution and its Gaussian approximation in the region of the validity of the approximation. Gaussian Distribution : n = 100 ; p = 0.5 0.08 Asterisks : Gaussian Approximation 0.07 Stems : Exact Distribution 0.06 0.05 P(k) 0.04 0.03 0.02 0.01 0 0 10 20 30 40 50 60 70 80 90 100 k FIGURE 10.2 The normal (Gaussian) distribution. Example 10.16 A fair die is rolled 400 times. Find the probability that an even number of spots show up 200 times, 210 times, 220 times, and 230 times. Solution: In this case, n = 400; p = 0.5; np = 200; and npq = 10. Using Eq. (10.65), we get: P(200 even) = 0.03989; P(210 even) = 0.02419 P(220 even) = 0.00540; P(230 even) = 4.43 × 10−4 Homework Problems Pb. 10.19 Using the results of Pb. 4.34, relate in the region of validity of the Gaussian approximation the quantity: © 2001 by CRC Press LLC k2 ∑ P(k successes in n trials) k = k1 to the Gaussian integral, specifying each of the parameters appearing in your expression. (Hint: First show that in this limit, the summation can be approximated by an integration.) Pb. 10.20 Let A1, A2, …, Ar be a partition of the set S, and let p1, p2, …, pr be the probabilities associated with each of these events. Assuming n Bernoulli trials are repeated, show that, in the limit n → ∞ and where ki are in the vicinity of npi >> 1, the following approximation is valid: P(k1 , k2 ,…, kr ; n) = exp− 1 2 (k1 − np1)2 + … + (kr np1 (2πn)r−1 p1 … pr − npr )2 npr © 2001 by CRC Press LLC Supplement: Review of Elementary Functions In this supplement, we review the basic features and characteristics of the simple elementary functions. S.1 Afﬁne Functions By an afﬁne function, we mean an expression of the form y(x) = ax + b (S.1) In the special case where b = 0, we say that y is a linear function of x. We can interpret the parameters in the above function as representing the slope-intercept form of a straight line. Here, a is the slope, which is a measure of the steepness of a line; and b is the y-intercept (i.e., the line intersects the y-axis at the point (0, b)). The following cases illustrate the different possibilities: 1. a = 0: this speciﬁes a horizontal line at a height b above the x-axis and that has zero slope. 2. a > 0: the height of a point on the line (i.e., the y-value) increases as the value of x increases. 3. a < 0: the height of the line decreases as the value of x increases. 4. b > 0: the line y-intercept is positive. 5. b < 0: the line y-intercept is negative. 6. x = k: this function represents a vertical line passing through the point (k, 0). It should be noted that: • If two lines have the same slope, they are parallel. • Two nonvertical lines are perpendicular if and only if their slopes are negative reciprocals of each other. (It is easy to deduce this 0-8493-????-?/00/$0.00+$.50 ©© 22000010 bbyy CCRRCC PPrreessss LLLLCC property if you remember the relationship that you learned in trigonometry relating the sine and cosine of two angles that differ by π/2.) See Section S.4 for more details. FIGURE S.1 Graph of the line y = ax + b (a = 2, b = 5). S.2 Quadratic Functions Parabola A quadratic parabolic function is an expression of the form: y(x) = ax2 + bx + c where a ≠ 0 (S.2) Any x for which ax2 + bx + c = 0 is called a root or a zero of the quadratic function. The graphs of quadratic functions are called parabolas. If we plot these parabolas, we note the following characteristics: 1. For a > 0, the parabola opens up (convex curve) as shown in Figure S.2. 2. For a < 0, the parabola opens down (concave curve) as shown in Figure S.2. © 2001 by CRC Press LLC FIGURE S.2 Graph of a quadratic parabolic (second-order polynomial) function with 0 or 2 roots. 3. The parabola does not always intersect the x-axis; but where it does, this point’s abscissa is a real root of the quadratic equation. A parabola can cross the x-axis in either 0 or 2 points, or the x-axis can be tangent to it at one point. If the vertex of the parabola is above the x-axis and the parabola opens up, there is no intersection, and hence, no real roots. If, on the other hand, the parabola opens down, the curve will intersect at two values of x equidistant from the vertex position. If the vertex is below the x-axis, we reverse the convexity conditions for the existence of two real roots. We recall that the roots of a quadratic equation are given by: x± = −b ± b2 − 4ac 2a (S.3) When b2 – 4ac < 0, the parabola does not intersect the x-axis. There are no real roots; the roots are said to be complex conjugates. When b2 – 4ac = 0, the x-axis is tangent to the parabola and we have one double root. Geometrical Description of a Parabola The parabola can also be described through the following geometric construction: a parabola is the locus of all points P in a plane that are equidistant from a ﬁxed line (called the directrix) and a ﬁxed point (called the focus) not situated on the line. © 2001 by CRC Press LLC FIGURE S.3 Graph of a parabola deﬁned through geometric parameters. (Parameter values: h = 2, k = 2, p = 1.) d1 = d2 (S.4) The algebraic expression for the parabola, using the above geometric parameters, can be obtained by speciﬁcally writing and equating the expressions for the distances of a point on the parabola from the focus and from the directrix: (x − h)2 + (y − (k + p))2 = y − (k − p) (S.5) Squaring both sides of this equation, this equality reduces to: (x − h)2 = 4p(y − k) (S.6) or in standard form, it can be written: y = x2 4p − h 2p x + h2 + 4pk 4p (S.7) © 2001 by CRC Press LLC Ellipse The standard form of the equation describing an ellipse is given by: (x − h)2 a2 + (y − k)2 b2 =1 (S.8) The ellipse’s center is located at (h, k), and assuming a > b, the major axis length is equal to 2a, the minor axis length is equal to 2b, the foci are located at (h – c, k) and (h + c, k), and those of the vertices at (h – a, k) and (h + a, k); where c2 = a2 – b2 (S.9) Geometric Deﬁnition of an Ellipse An ellipse is the locus of all points P such that the sum of the distance between P and two distinct points (called the foci) is constant and greater than the distance between the two foci. d1 + d2 = 2a (S.10) The center of the ellipse is the midpoint between foci, and the two points of intersection of the line through the foci and the ellipse are called the vertices. The eccentricity of an ellipse is the ratio of the distance between the center and a focus over the distance between the center and a vertex; that is ε = c/a (S.11) FIGURE S.4 Graph of an ellipse deﬁned through geometric parameters. (Parameter values: h = 2, k = 2, a = 3, b = 2.) © 2001 by CRC Press LLC Hyperbola The standard form of the equation describing a hyperbola is given by: (x − h)2 a2 − (y − k)2 b2 =1 (S.12) The center of the hyperbola is located at (h, k), and assuming a > b, the major axis length is equal to 2a, the minor axis length is equal to 2b, the foci are located at (h – c, k) and (h + c, k), and those of the vertices at (h – a, k) and (h + a, k). In this case, c > a > 0 and c > b > 0 and c2 = a2 + b2 (S.13) Geometric Deﬁnition of a Hyperbola A hyperbola is the locus of all points P in a plane such that the absolute value of the difference of the distances between P and the two foci is constant and is less than the distance between the two foci; that is d1 − d2 = 2a (S.14) FIGURE S.5 Graph of a hyperbola deﬁned through geometric parameters. (Parameter values: h = 2, k = 2, a = 1, b = 3.) © 2001 by CRC Press LLC The center of the hyperbola is the midpoint between foci, and the two points of intersection of the line through the foci and the hyperbola are called the vertices. S.3 Polynomial Functions A polynomial function is an expression of the form: p(x) = anxn + an−1xn−1 + … + a1x + a0 (S.15) where an ≠ 0 for an nth degree polynomial. The Fundamental Theorem of Algebra states that, for the above polyno- mial, there are exactly n complex roots; furthermore, if all the polynomial coefﬁcients are real, then the complex roots always come in pairs consisting of a complex number and its complex conjugate. S.4 Trigonometric Functions The trigonometric circle is deﬁned as the circle with center at the origin of the coordinates axes and having radius 1. The trigonometric functions are deﬁned as functions of the components of a point P on the trigonometric circle. Speciﬁcally, if we deﬁne the angle θ as the angle between the x-axis and the line OP, then: • cos(θ) is is the x-component of the point P. • sin(θ) is the y-component of the point P. Using the Pythagorean theorem in the right angle triangle OQP, one deduces that: sin2(θ) + cos2(θ) = 1 (S.16) Using the above deﬁnitions for the sine and cosine functions and elementary geometry, it is easy to note the following properties for the trigonometric functions: sin(−θ) = − sin(θ) and cos(−θ) = cos(θ) (S.17) sin(θ + π) = − sin(θ) and cos(θ + π) = − cos(θ) (S.18) © 2001 by CRC Press LLC FIGURE S.6 The trigonometric circle. sin(θ + π / 2) = cos(θ) and cos(θ + π / 2) = − sin(θ) (S.19) sin(π / 2 − θ) = cos(θ) and cos(π / 2 − θ) = sin(θ) The tangent and cotangent functions are deﬁned as: (S.20) tan(θ) = sin(θ) and cot(θ) = 1 cos(θ) tan(θ) (S.21) Other important trigonometric relations relate the angles and sides of a triangle. These are the so-called Law of Cosines and Law of Sines in a triangle: c2 = a2 + b2 − 2ab cos(γ ) (S.22) sin(α) = sin(β) = sin(γ ) a b c (S.23) where the sides of the triangle are a, b, c, and the angles opposite, respectively, of each of these sides are denoted by α, β, γ. © 2001 by CRC Press LLC S.5 Inverse Trigonometric Functions The inverse of a function y = f(x) is a function, denoted by x = f –1(y), having the property that y = f(f –1(y)). It is important to note that a function f(x) that is single-valued (i.e., to each element x in its domain, there corresponds one, and only one, element y in its range) may have an inverse that is multi-valued (i.e., many x values may correspond to the same y). Typical examples of multi-valued inverse functions are the inverse trigonometric functions. In such instances, a single-valued inverse function can be deﬁned if the range of the inverse function is deﬁned on a more limited region of space. For example, the cos–1 function (called arc cosine) is single-valued if 0 ≤ x ≤ π. Note that the above notation for the inverse of a function should not be confused with the negative-one power of the function f(x), which should be written as: (f(x))–1 or 1/f(x) Also note that because the inverse function reverses the role of the x- and y-coordinates, the graphs of y = f(x) and y = f–1(x) are symmetric with respect to the line y = x (i.e., the ﬁrst bisector of the coordinate axes). S.6 The Natural Logarithmic Function The natural logarithmic function is deﬁned by the following integral: ∫ ln(x) = x 1 dt 1t (S.24) The following properties of the logarithm can be directly deduced from the above deﬁnition: ln(ab) = ln(a) + ln(b) (S.25) ln(ar ) = r ln(a) (S.26) ln 1 a = − ln(a) (S.27) ln a b = ln(a) − ln(b) (S.28) © 2001 by CRC Press LLC To illustrate the technique for deriving any of the above relations, let us consider the ﬁrst of them: ∫ ∫ ∫ ln(ab) = ab 1 dt = a 1 dt + ab 1 dt 1t 1t at (S.29) The ﬁrst term on the RHS is ln(a), while the second term through the substitution u = t/a reduces to the deﬁnition of ln(b). Note that: ln(1) = 0 (S.30) where e = 2.71828. ln(e) = 1 (S.31) S.7 The Exponential Function The exponential function is deﬁned as the inverse function of the natural logarithmic function; that is exp(ln(x)) = x for all x > 0 (S.32) ln(exp(y)) = y for all y (S.33) The following properties of the exponential function hold for all real numbers: exp(a) exp(b) = exp(a + b) (S.34) (exp(a))b = exp(ab) (S.35) exp(−a) = 1 exp(a) (S.36) exp(a) = exp(a − b) exp(b) (S.37) It should be pointed out that any of the above properties can be directly obtained from the deﬁnition of the exponential function and the properties of © 2001 by CRC Press LLC the logarithmic function. For example, the ﬁrst of these relations can be derived as follows: ln(exp(a) exp(b)) = ln(exp(a)) + ln(exp(b)) = a + b (S.38) Taking the exponential of both sides of this equation, we obtain: exp(ln(exp(a) exp(b))) = exp(a) exp(b) = exp(a + b) (S.39) which is the desired result. Useful Features of the Exponential Function If the exponential function is written in the form: y(x) = exp(−bx) (S.40) the following features are apparent: 1. If b > 0, then the function is convergent at (+ inﬁnity) and goes to zero there. 2. If b < 0, then the function blows up at (+ inﬁnity). 3. If b = 0, then the function is everywhere equal to a constant y = 1. 4. The exponential functions are monotonically increasing for b < 0, and monotonically decreasing for b > 0. 5. If b1 > b2 > 0, then everywhere on the positive x-axis, y1(x) < y2(x). 6. The exponential function has no roots. 7. For b > 0, the product of the exponential function with any polynomial goes to zero at (+ inﬁnity). We plot in Figures S.7 and S.8 examples of the exponential function for different values of the parameters. The ﬁrst six properties above are clearly exhibited in these ﬁgures. S.8 The Hyperbolic Functions The hyperbolic cosine function is deﬁned by: cosh(x) = exp(x) + exp(−x) 2 © 2001 by CRC Press LLC (S.41) FIGURE S.7 The graph of the function y = exp(–bx), for different positive values of b. FIGURE S.8 The graph of the function y = exp(–bx), for different negative values of b. © 2001 by CRC Press LLC and the hyperbolic sine function is deﬁned by: sinh(x) = exp(x) − exp(−x) 2 (S.42) Using the above deﬁnitions, it is straightforward to derive the following relations: cosh2(x) − sinh2(x) = 1 (S.43) 1 − tan2(x) = sech2(x) (S.44) S.9 The Inverse Hyperbolic Functions y = sinh−1(x) if x = sinh(y) (S.45) Using the deﬁnition of the hyperbolic functions, we can write the inverse hyperbolic functions in terms of logarithmic functions. For example, considering the inverse hyperbolic sine function from above, we obtain: ey − 2x − e−y = 0 (S.46) multiplying by ey everywhere, we obtain a second-degree equation in ey: e2y − 2xey − 1 = 0 (S.47) Solving this quadratic equation, and choosing the plus term in front of the discriminant, since ey is everywhere positive, we obtain: ey = x + x2 + 1 giving, for the inverse hyperbolic sine function, the expression: (S.48) y = sinh−1(x) = ln(x + x2 + 1) In a similar manner, one can show the following other identities: (S.49) cosh−1(x) = ln(x + x2 − 1) (S.50) © 2001 by CRC Press LLC tanh −1 (x) = 1 2 ln 1 1 + − x x sech −1 (x) = 1 2 ln 1+ 1 x − x 2 (S.51) (S.52) © 2001 by CRC Press LLC Appendix: Some Useful Formulae Sum of Integers and Their Powers ∑n k = n(n + 1) 2 k=1 ∑n k2 = n(n + 1)(2n + 1) k=1 6 ∑n k=1 k3 = n(n + 2 1) 3 ∑n k 4 = n(n + 2)(2n + 1)(3n2 + 3n − 1) 30 k=1 n ∑(2k − 1) = n2 k=1 ∑n (2k − 1)2 = n(4n2 − 1) k=1 3 n ∑ (2k − 1)3 = n2(2n2 − 1) k=1 ∑n k(k + 1)2 = n(n + 1)(n + 2)(3n + 5) 12 k=1 0-8493-????-?/00/$0.00+$.50 © 2000 by CRC Press LLC © 2001 by CRC Press LLC Arithmetic Series ∑n−1 (a + kr) = n [2a + (n − 1)r] k=0 2 Geometric Series ∑n aqk−1 = a(qn − 1) q−1 k=1 q≠1 Arithmo-Geometric Series ∑n−1 (a + kr)qk = a − [a + (n − 1)r]qn (1 − q) + rq(1 − qn−1) (1 − q)2 k=0 q≠1 Taylor’s Series ∑ f (x + a) = ∞ f (k)(x) ak k=0 k! f(x + a, y + b) = f(x, y) + a ∂f ∂x +b ∂f ∂y + 1 2! a2 ∂2 f ∂x 2 + b2 ∂2 f ∂y 2 + 2ab ∂2 f ∂x∂y + … © 2001 by CRC Press LLC Trigonometric Functional Relations sin(x) ± sin(y) = 2 sin 1 2 (x ± y) cos 1 2 (x m y) cos(x) + cos(y) = 2 cos 1 2 (x + y) cos 1 2 (x − y) cos(x) − cos(y) = 2 sin 1 2 (x + y) sin 1 2 (y − x) sin 1 2 x = ± 1 (1 − cos(x)) 2 cos 1 2 x = ± 1 (1 + cos(x)) 2 sin(2x) = 2 sin(x) cos(x) sin(3x) = 3 sin(x) − 4 sin3(x) sin(4x) = cos(x)[4 sin(x) − 8 sin3(x)] cos(2x) = 2 cos2(x) − 1 cos(3x) = 4 cos3(x) − 3 cos(x) cos(4x) = 8 cos4(x) − 8 cos2(x) + 1 Relation of Trigonometric and Hyperbolic Functions sin(x) = − j sinh( jx) cos(x) = cosh( jx) tan(x) = 1 tanh( jx) j © 2001 by CRC Press LLC Expansion of Elementary Functions in Power Series ∑ ex = ∞ xk k=0 k! ∑∞ sin(x) = (−1)k x 2 k +1 (2k + 1)! k=0 ∑ cos(x) = ∞ (−1)k x2k (2k)! k=0 ∑∞ sinh(x) = x 2 k +1 k=0 (2k + 1)! ∑∞ cosh(x) = x2k (2k)! k=0 © 2001 by CRC Press LLC
## 评论