2. Computer Arithmetic
2.1 Introduction
In computer arithmetic two fundamental design principles are of great impor-
tance: number representation and the implementation of algebraic operations
[25, 26, 27, 28, 29]. We will first discuss possible number representations,
(e.g., fixed-point or floating-point), then basic operations like adder and mul-
tiplier, and finally efficient implementation of more difficult operations such
as square roots, and the computation of trigonometric functions using the
CORDIC algorithm or MAC calls.
FPGAs allow a wide variety of computer arithmetic implementations for
the desired digital signal processing algorithms, because of the physical bit-
level programming architecture. This contrasts with the programmable dig-
ital signal processors (PDSPs), with the fixed multiply accumulator core.
Careful choice of the bit width in FPGA design can result in substantial
savings.
NUMBER SYSTEMS
Fixed−point
Floating−point
conventional
Unsigned integer
Two’s complement
One’s complement
Signed magnitude
Diminished by one
unconventional
Signed digit
Fractional
Logarithmic
RNS
conventional
32−bit IEEE
64−bit IEEE
unconventional
18 Bit
Splash II
format
Fig. 2.1.
Survey of number representations.
54
2. Computer Arithmetic
2.2 Number Representation
Deciding whether fixed- or floating-point is more appropriate for the problem
must be done carefully, preferably at an early phase in the project. In general,
it can be assumed that fixed-point implementations have higher speed and
lower cost, while floating-point has higher dynamic range and no need for
scaling, which may be attractive for more complicated algorithms. Figure 2.1
is a survey of conventional and less conventional fixed- and floating-point
number representations. Both systems are covered by a number of standards
but may, if desired, be implemented in a proprietary form.
2.2.1 Fixed-Point Numbers
We will first review the fixed-point number systems shown in Fig. 2.1. Table
2.1 shows the 3-bit coding for the 5 different integer representations.
Unsigned Integer
Let
X
be an
N
-bit unsigned binary number. Then the range is [0, 2
N
−
1]
and the representation is given by:
N
−1
X
=
n=0
x
n
2
n
,
(2.1)
where
x
n
is the
n
th
binary digit of
X
(i.e.,
x
n
∈
[0, 1]). The digit
x
0
is called
the least significant bit (LSB) and has a relative weight of unity. The digit
x
N
−1
is the most significant bit (MSB) and has a relative weight of 2
N
−1
.
Signed-Magnitude (SM)
In signed-magnitude systems the magnitude and the sign are represented
separately. The first bit
x
N
−1
(i.e., the MSB) represents the sign and the
remaining
N
−
1 bits the magnitude. The representation becomes:
X
=
−
N
−2
n
n=0
x
n
2
N
−2
n
n=0
x
n
2
X
≥
0
X<
0.
(2.2)
The range of this representation is [−(2
N
−1
−
1), 2
N
−1
−
1]. The advantage of
the signed-magnitude representation is simplified prevention of overflows, but
the disadvantage is that addition must be split depending on which operand
is larger.
2.2 Number Representation
55
Two’s Complement (2C)
An
N
-bit two’s complement representation of a signed integer, over the range
[−2
N
−1
,
2
N
−1
−
1], is given by:
N
−2
n
n=0
x
n
2
−2
N
−1
+
N
−2
n=0
X
=
x
n
2
n
X
≥
0
X<
0.
(2.3)
The two’s complement (2C) system is by far the most popular signed
numbering system in DSP use today. This is because it is possible to add
several signed numbers, and as long as the final sum is in the
N
-bit range,
we can ignore
any
overflow in the arithmetic. For instance, if we add two
3-bit numbers as follows
3
10
−2
10
1
10
←→
0 1 1
2C
←→
1 1 0
2C
←→
1. 0 0 1
2C
the overflow can be ignored. All computations are modulo 2
N
.
It follows that
it is possible to have intermediate values that can not be correctly repre-
sented, but if the final value is valid then the result is correct. For instance,
if we add the 3-bit numbers 2 + 2
−
3, we would have an intermediate value of
010 + 010 = 100
2C
,
i.e.,
−4
10
, but the result 100
−
011 = 100 + 101 = 001
2C
is correct.
Two’s complement numbers can also be used to implement modulo 2
N
arithmetic without any change in the arithmetic. This is what we will use in
Chap. 5 to design CIC filters.
One’s Complement (1C)
An
N
-bit one’s complement system (1C) can represent integers over the range
[−(2
N
−1
+ 1), 2
N
−1
−
1]. In a one’s complement code, positive and negative
numbers have bit-by-bit complement representations including for the sign
bit. There is, in fact a redundant representation of zero (see Table 2.1). The
representation of signed numbers in a 1C system is formally given by:
N
−2
n
n=0
x
n
2
N
−1
X
=
−2
+1+
N
−2
n=0
x
n
2
n
X
≥
0
X<
0.
(2.4)
For example, the three-bit 1C representation of the numbers
−3
to 3 is shown
in the third column of Table 2.1.
From the following simple example
56
2. Computer Arithmetic
3
10
−2
10
1
10
Carry
1
10
←→
←→
←→
←→
0 1 1
1C
1 0 1
1C
1. 0 0 0
1C
→ → →
1
1C
0 0 1
1C
we remember that in one’s complement a “carry wrap-around” addition is
needed. A carry occurring at the MSB must be added to the LSB to get the
correct final result.
The system can, however, efficiently be used to implement modulo 2
N
−
1
arithmetic without correction. As a result, one’s complement has specialized
value in implementing selected DSP algorithms (e.g., Mersenne transforms
over the integer ring 2
N
−
1; see Chap. 7).
Diminished One System (D1)
A diminished one (D1) system is a biased system. The positive numbers are,
compared with the 2C, diminished by 1. The range for (N +1)-bit D1 numbers
is [−2
N
−1
,
2
N
−1
], excluding 0. The coding rule for a D1 system is defined as
follows:
⎧
⎨
X
=
⎩
N
−2
n
n=0
x
n
2 + 1
−2
N
−1
+
N
−2
n=0
N
x
n
2
n
2
X>
0
X<
0
X=
0.
(2.5)
From adding two D1 numbers
3
10
−2
10
1
10
Carry
1
10
←→
←→
←→
←→
0 1 0
D1
1 1 0
D1
1.
0 0 0
D1
→ × −
1
→
0
D1
0 0 0
D1
we see that, for D1 a complement and add of the
inverted
carry must be
computed.
D1 numbers can efficiently be used to implement modulo 2
N
+1 arithmetic
without any change in the arithmetic. This fact will be used in Chap. 7 to
implement Fermat NTTs in the ring 2
N
+ 1.
Bias System
The biased number system has a bias for all numbers. The bias value is
usually in the middle of the binary range, i.e., bias = 2
N
−1
−
1. For a 3-bit
system, for instance the bias would be 2
3−1
−
1 = 3. The range for
N
-bit
biased numbers is [−2
N
−1
−
1, 2
N
−1
]. Zero is coded as the bias. The coding
rule for a biased system is defined as follows:
2.2 Number Representation
Table 2.1.
Conventional coding of signed binary numbers.
Binary
011
010
001
000
111
110
101
100
1000
2C
3
2
1
0
−1
−2
−3
−4
−
1C
3
2
1
0
−0
−1
−2
−3
−
D1
4
3
2
1
−1
−2
−3
−4
0
SM
3
2
1
0
−3
−2
−1
−0
−
Bias
0
−1
−2
−3
4
3
2
1
−
57
N
−1
X
=
n=0
x
n
2
n
−
bias.
(2.6)
From adding two biased numbers
3
10
+(−2
10
)
4
10
−bias
1
10
←→
←→
←→
←→
←→
1
0
1
0
1
1 0
bias
0 1
bias
1 1
bias
1 1
bias
0 0
bias
we see that, for each addition the bias needs to be subtracted, while for every
subtraction the bias needs to be added.
Bias numbers can efficiently be used to simplify comparison of numbers.
This fact will be used in Sect. 2.2.3 (p. 71) for coding the exponent of floating-
point numbers.
2.2.2 Unconventional Fixed-Point Numbers
In the following we continue the review of number systems according to
Fig. 2.1 (p. 53). The unconventional fixed-point number systems discussed in
the following are not as often used as for instance the 2C system, but can
yield significant improvements for particular applications or problems.
Signed Digit Numbers (SD)
The signed digit (SD) system differs from the traditional binary systems
presented in the previous section in the fact that it is ternary valued (i.e.,
digits have the value
{0,
1,
−1},
where
−1
is sometimes denoted as 1).
SD numbers have proven to be useful in carry-free adders or multipliers
with less complexity, because the effort in multiplication can typically be
评论