热搜关键词: 电路基础ADC数字信号处理封装库PLC

pdf

computer architecture

  • 1星
  • 2015-10-17
  • 922.65KB
  • 需要1积分
  • 0次下载
标签: 计算机架构

计算机架构

计算机架构第五章,一本好书

文档内容节选

44 Solutions to Case Studies and Exercises Chapter 5 Solutions Case Study 1 SingleChip Multicore Multiprocessor 51 a P0 read 120 b P0 write 120 cid197 80 cid198 P0B0 M 120 0080 P3B0 I 120 0020 cid198 P0B0 S 120 0020 returns 0020 c P3 write 120 cid197 80 cid198 P3B0 M 120 0080 d P1 read 110 e P0 write 108 cid197 48 cid198 P0B1 M 108 0048 cid198 P1B2 S 110 0010 returns 0010 P3B1 I 108 0008 f P0 write 130 cid197 78 cid198 P0B2 M 130 0078 g P3 write 130 cid197 78 cid198 P3B2 M 130 0078 a P0 ......

44
Solutions to Case Studies and Exercises
Chapter 5 Solutions
Case Study 1: Single-Chip Multicore Multiprocessor
5.1
a.
P0: read 120
b.
P0: write 120
c.
P3: write 120
d.
P1: read 110
e.
P0: write 108
f.
P0: write 130
48
78
78
80
80
P0.B0: (S, 120, 0020) returns 0020
P0.B0: (M, 120, 0080)
P3.B0: (I, 120, 0020)
P3.B0: (M, 120, 0080)
P1.B2: (S, 110, 0010) returns 0010
P0.B1: (M, 108, 0048)
P3.B1: (I, 108, 0008)
P0.B2: (M, 130, 0078)
M: 110
g.
P3: write 130
5.2
0030 (writeback to memory)
P3.B2: (M, 130, 0078)
a.
P0: read 120, Read miss, satisfied by memory
P0: read 128, Read miss, satisfied by P1’s cache
P0: read 130, Read miss, satisfied by memory, writeback 110
Implementation 1: 100 + 40 + 10 + 100 + 10 = 260 stall cycles
Implementation 2: 100 + 130 + 10 + 100 + 10 = 350 stall cycles
b.
P0: read 100, Read miss, satisfied by memory
P0: write 108
P0: write 130
48, Write hit, sends invalidate
78, Write miss, satisfied by memory, write back 110
Implementation 1: 100 + 15 + 10 + 100 = 225 stall cycles
Implementation 2: 100 + 15 + 10 + 100 = 225 stall cycles
c.
P1: read 120, Read miss, satisfied by memory
P1: read 128, Read hit
P1: read 130, Read miss, satisfied by memory
Implementation 1: 100 + 0 + 100 = 200 stall cycles
Implementation 2: 100 + 0 + 100 = 200 stall cycles
d.
P1: read 100, Read miss, satisfied by memory
P1: write 108
P1: write 130
48, Write miss, satisfied by memory, write back 128
78, Write miss, satisfied by memory
Implementation 1: 100 + 100 + 10 + 100 = 310 stall cycles
Implementation 2: 100 + 100 + 10 + 100 = 310 stall cycles
5.3
See Figure S.28
Copyright © 2012 Elsevier, Inc. All rights reserved.
Chapter 5 Solutions
45
Write miss or invalidate
for this block
Invalid
CPU read
Place read miss on bus
te
Shared
Place write miss on bus
C
P
Pl
U
on ac
wri
bu e in
te
s va
lid
a
Writeback block;
abort memory access
CPU read hit
CPU write
k
ac
y
oc
bl
or
is
m
th
k
me
r
oc
t
fo
bl
or
e
s
ab
at
hi
lid
r t
ck;
va
fo
In
s
blo
is
m
ck
a
e
rit
teb
ri
W
W
Write miss for
this block
Modified
ss
ce
Read miss
Writeback block; abort
memory access
CPU write
Place invalidate on bus
Owned
CPU write hit
CPU read hit
CPU read hit
Figure S.28
Protocol diagram.
5.4
(Showing results for implementation 1)
a.
P1: read 110, Read miss, P0’s cache
P3: read 110, Read miss, MSI satisfies in memory, MOSI satisfies in P0’s
cache
P0: read 110, Read hit
MSI: 40 + 10 + 100 + 0 = 150 stall cycles
MOSI: 40 + 10 + 40 + 10 + 0 = 100 stall cycles
b.
P1: read 120, Read miss, satisfied in memory
P3: read 120, Read hit
P0: read 120, Read miss, satisfied in memory
Both protocols: 100 + 0 + 100 = 200 stall cycles
c.
P0: write 120
80, Write miss, invalidates P3
P3: read 120, Read miss, P0’s cache
P0: read 120, Read hit
Both protocols: 100 + 40 + 10 + 0 = 150 stall cycles
Copyright © 2012 Elsevier, Inc. All rights reserved.
46
Solutions to Case Studies and Exercises
d.
P0: write 108
P0: write 108
5.5
See Figure S.29
88, Send invalidate, invalidate P3
98, Send invalidate, invalidate P3
P3: read 108, Read miss, P0’s cache
Both protocols: 15 + 40 + 10 + 15 = 80 stall cycles
Write miss or invalidate
for this block
Invalid
CPU read, other shared block
Place read miss on bus
C
Pl
PU
on ac
wr
bu e in
ite
s va
lid
at
e
Shared
W
k
oc
bl
s
hi
rt
fo
e
at
us
es
lid
ar
n b
va
sh
o
in
o
ss
or
, n
mi
s
is
ad
d
m
re
ea
e
r
rit
PU
ce
a
Pl
Write miss for this block
Writeback block;
abort memory access
CPU write
Place write miss on bus
CPU read hit
Modified
R
e
W
ad
m
a rit
ac bor eba
iss
ce t m ck
ss e b
m lo
or c
y k;
CPU write hit
Read miss
Excl.
CPU read hit
C
CPU write hit
CPU read hit
Figure S.29
Diagram for a MESI protocol.
5.6
a.
p0: read 100, Read miss, satisfied in memory, no sharers MSI: S, MESI: E
p0: write 100
40, MSI: send invalidate, MESI: silent transition from E to M
MSI: 100 + 15 = 115 stall cycles
MESI: 100 + 0 = 100 stall cycles
b.
p0: read 120, Read miss, satisfied in memory, sharers both to S
p0: write 120
60, Both send invalidates
Both: 100 + 15 = 115 stall cycles
c.
p0: read 100, Read miss, satisfied in memory, no sharers MSI: S, MESI: E
p0: read 120, Read miss, memory, silently replace 120 from S or E
Both: 100 + 100 = 200 stall cycles, silent replacement from E
Copyright © 2012 Elsevier, Inc. All rights reserved.
Chapter 5 Solutions
47
d.
p0: read 100, Read miss, satisfied in memory, no sharers MSI: S, MESI: E
p1: write 100
60, Write miss, satisfied in memory regardless of protocol
Both: 100 + 100 = 200 stall cycles, don’t supply data in E state (some
protocols do)
e.
p0: read 100, Read miss, satisfied in memory, no sharers MSI: S, MESI: E
p0: write 100
p1: write 100
60, MSI: send invalidate, MESI: silent transition from E to M
40, Write miss, P0’s cache, writeback data to memory
MSI: 100 + 15 + 40 + 10 = 165 stall cycles
MESI: 100 + 0 + 40 + 10 = 150 stall cycles
5.7
a.
Assume the processors acquire the lock in order. P0 will acquire it first, incur-
ring 100 stall cycles to retrieve the block from memory. P1 and P3 will stall
until P0’s critical section ends (ping-ponging the block back and forth) 1000
cycles later. P0 will stall for (about) 40 cycles while it fetches the block to
invalidate it; then P1 takes 40 cycles to acquire it. P1’s critical section is 1000
cycles, plus 40 to handle the write miss at release. Finally, P3 grabs the block
for a final 40 cycles of stall. So, P0 stalls for 100 cycles to acquire, 10 to give
it to P1, 40 to release the lock, and a final 10 to hand it off to P1, for a total of
160 stall cycles. P1 essentially stalls until P0 releases the lock, which will be
100 + 1000 + 10 + 40 = 1150 cycles, plus 40 to get the lock, 10 to give it to
P3, 40 to get it back to release the lock, and a final 10 to hand it back to P3.
This is a total of 1250 stall cycles. P3 stalls until P1 hands it off the released
lock, which will be 1150 + 40 + 10 + 1000 + 40 = 2240 cycles. Finally, P3
gets the lock 40 cycles later, so it stalls a total of 2280 cycles.
b.
The optimized spin lock will have many fewer stall cycles than the regular
spin lock because it spends most of the critical section sitting in a spin loop
(which while useless, is not defined as a stall cycle). Using the analysis below
for the interconnect transactions, the stall cycles will be 3 read memory misses
(300), 1 upgrade (15) and 1 write miss to a cache (40 + 10) and 1 write miss to
memory (100), 1 read cache miss to cache (40 + 10), 1 write miss to memory
(100), 1 read miss to cache and 1 read miss to memory (40 + 10 + 100),
followed by an upgrade (15) and a write miss to cache (40 + 10), and finally a
write miss to cache (40 + 10) followed by a read miss to cache (40 + 10) and
an upgrade (15). So approximately 945 cycles total.
c.
Approximately 31 interconnect transactions. The first processor to win arbi-
tration for the interconnect gets the block on its first try (1); the other two
ping-pong the block back and forth during the critical section. Because the
latency is 40 cycles, this will occur about 25 times (25). The first processor
does a write to release the lock, causing another bus transaction (1), and the
second processor does a transaction to perform its test and set (1). The last
processor gets the block (1) and spins on it until the second processor releases
it (1). Finally the last processor grabs the block (1).
Copyright © 2012 Elsevier, Inc. All rights reserved.
48
Solutions to Case Studies and Exercises
d.
Approximately 15 interconnect transactions. Assume processors acquire the
lock in order. All three processors do a test, causing a read miss, then a test
and set, causing the first processor to upgrade and the other two to write
miss (6). The losers sit in the test loop, and one of them needs to get back a
shared block first (1). When the first processor releases the lock, it takes a
write miss (1) and then the two losers take read misses (2). Both have their
test succeed, so the new winner does an upgrade and the new loser takes a
write miss (2). The loser spins on an exclusive block until the winner releases
the lock (1). The loser first tests the block (1) and then test-and-sets it, which
requires an upgrade (1).
5.8
Latencies in implementation 1 of Figure 5.36 are used.
a.
P0: write 110
P0: read 108
b.
P0: write 100
P0: read 108
c.
P0: write 110
P0: write 100
80
90
80
80
Hit in P0’s cache, no stall cycles for either TSO or SC
Hit in P0’s cache, no stall cycles for either TSO or SC
Miss, TSO satisfies write in write buffer (0 stall cycles)
SC must wait until it receives the data (100 stall cycles)
Hit, but must wait for preceding operation: TSO = 0,
SC = 100
Hit in P0’s cache, no stall cycles for either TSO or SC
Miss, TSO satisfies write in write buffer (0 stall
cycles) SC must wait until it receives the data (100
stall cycles)
Miss, TSO satisfies write in write buffer (0 stall
cycles) SC must wait until it receives the data (100
stall cycles)
Hit, but must wait for preceding operation:
TSO = 0, SC = 100
d.
P0: write 100
80
P0: write 110
90
Case Study 2: Simple Directory-Based Coherence
5.9
a.
P0,0: read 100
b.
P0,0: read 128
L1 hit returns 0x0010, state unchanged (M)
L1 miss and L2 miss will replace B1 in L1 and B1 in
L2 which has address 108.
L1 will have 128 in B1 (shared), L2 also will have it
(DS, P0,0)
Memory directory entry for 108 will become <DS, C1>
Memory directory entry for 128 will become <DS, C0>
c, d, …, h: follow same approach
Copyright © 2012 Elsevier, Inc. All rights reserved.
展开预览

猜您喜欢

推荐帖子

评论

登录/注册

意见反馈

求资源

回顶部

推荐内容

热门活动

热门器件

随便看看

 
EEWorld订阅号

 
EEWorld服务号

 
汽车开发圈

电子工程世界版权所有 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号 Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved
×