[20/20/20/20] The performance of a

[20/20/20/20] The performance of a snooping cache-coherent multiprocessor
depends on many detailed implementation issues that determine how quickly
a cache responds with data in an exclusive or M state block. In some implementations,
a CPU read miss to a cache block that is exclusive in another processor’s
Figure 5.35 Multicore (point-to-point) multiprocessor.
P0 P1 P3
. . . . .
B0
B1
B2
B3
B0
B1
B2
B3
B0
B1
B2
B3
I 100
Coherency state
Coherency state
Address tag Data
Coherency state
Address tag Data
00
00
00
00
00
00
00
00
Data
00
00
00
00
10
08
30
10
10
68
10
18
20
08
10
10
108
110
118
100
128
110
118
Memory
On-chip interconnect (with coherency manager)
Address tag
120
108
110
118
S
M
I
I
M
I
S
S
S
I
I
Address Data
....
100
108
110
118
120
128
130
...
....
00
00
00
00
00
00
00
...
....
10
08
10
18
20
28
30
...
414 ■ Chapter Five Thread-Level Parallelism
cache is faster than a miss to a block in memory. This is because caches are
smaller, and thus faster, than main memory. Conversely, in some implementations,
misses satisfied by memory are faster than those satisfied by caches. This
is because caches are generally optimized for “front side” or CPU references,
rather than “back side” or snooping accesses. For the multiprocessor illustrated in
Figure 5.35, consider the execution of a sequence of operations on a single CPU
where
■ CPU read and write hits generate no stall cycles.
■ CPU read and write misses generate Nmemory and Ncache stall cycles if satisfied
by memory and cache, respectively.
■ CPU write hits that generate an invalidate incur Ninvalidate stall cycles.
■ A write-back of a block, due to either a conflict or another processor’s
request to an exclusive block, incurs an additional Nwriteback stall cycles.
Consider two implementations with different performance characteristics summarized
in Figure 5.36. Consider the following sequence of operations assuming
the initial cache state in Figure 5.35. For simplicity, assume that the second
operation begins after the first completes (even though they are on different
processors):
P1: read 110
P3: read 110
For Implementation 1, the first read generates 50 stall cycles because the read is
satisfied by P0’s cache. P1 stalls for 40 cycles while it waits for the block, and P0
stalls for 10 cycles while it writes the block back to memory in response to P1’s
request. Thus, the second read by P3 generates 100 stall cycles because its miss is
satisfied by memory, and this sequence generates a total of 150 stall cycles. For
the following sequences of operations, how many stall cycles are generated by
each implementation?

[20/20/20/20]  The performance of a snooping cache-coherent multiprocessor
depends on many detailed implementation issues that determine how quickly
a cache responds with data in an exclusive or M state block. In some implementations,
a CPU read miss to a cache block that is exclusive in another processor’s
Figure 5.35 Multicore (point-to-point) multiprocessor.
P0 P1 P3
. . . . .
B0
B1
B2
B3
B0
B1
B2
B3
B0
B1
B2
B3
I 100
Coherency state
Coherency state
Address tag Data
Coherency state
Address tag Data
00
00
00
00
00
00
00
00
Data
00
00
00
00
10
08
30
10
10
68
10
18
20
08
10
10
108
110
118
100
128
110
118
Memory
On-chip interconnect (with coherency manager)
Address tag
120
108
110
118
S
M
I
I
M
I
S
S
S
I
I
Address Data
....
100
108
110
118
120
128
130
...
....
00
00
00
00
00
00
00
...
....
10
08
10
18
20
28
30
...
414 ■ Chapter Five Thread-Level Parallelism
cache is faster than a miss to a block in memory. This is because caches are
smaller, and thus faster, than main memory. Conversely, in some implementations,
misses satisfied by memory are faster than those satisfied by caches. This
is because caches are generally optimized for “front side” or CPU references,
rather than “back side” or snooping accesses. For the multiprocessor illustrated in
Figure 5.35, consider the execution of a sequence of operations on a single CPU
where
■ CPU read and write hits generate no stall cycles.
■ CPU read and write misses generate Nmemory and Ncache stall cycles if satisfied
by memory and cache, respectively.
■ CPU write hits that generate an invalidate incur Ninvalidate stall cycles.
■ A write-back of a block, due to either a conflict or another processor’s
request to an exclusive block, incurs an additional Nwriteback stall cycles.
Consider two implementations with different performance characteristics summarized
in Figure 5.36. Consider the following sequence of operations assuming
the initial cache state in Figure 5.35. For simplicity, assume that the second
operation begins after the first completes (even though they are on different
processors):
P1: read 110
P3: read 110
For Implementation 1, the first read generates 50 stall cycles because the read is
satisfied by P0’s cache. P1 stalls for 40 cycles while it waits for the block, and P0
stalls for 10 cycles while it writes the block back to memory in response to P1’s
request. Thus, the second read by P3 generates 100 stall cycles because its miss is
satisfied by memory, and this sequence generates a total of 150 stall cycles. For
the following sequences of operations, how many stall cycles are generated by
each implementation?

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

[20/20/20/20] <5.3> The performance of a snooping cache-coherent multiprocessordepends on many detailed implementation issues that determine how quicklya cache responds with data in an exclusive or M state block. In some implementations,a CPU read miss to a cache block that is exclusive in another processor’sFigure 5.35 Multicore (point-to-point) multiprocessor.P0 P1 P3. . . . .B0B1B2B3B0B1B2B3B0B1B2B3I 100Coherency stateCoherency stateAddress tag DataCoherency stateAddress tag Data0000000000000000Data00000000100830101068101820081010108110118100128110118MemoryOn-chip interconnect (with coherency manager)Address tag120108110118SMIIMISSSIIAddress Data....100108110118120128130.......00000000000000.......10081018202830...414 ■ Chapter Five Thread-Level Parallelismcache is faster than a miss to a block in memory. This is because caches aresmaller, and thus faster, than main memory. Conversely, in some implementations,misses satisfied by memory are faster than those satisfied by caches. Thisis because caches are generally optimized for “front side” or CPU references,rather than “back side” or snooping accesses. For the multiprocessor illustrated inFigure 5.35, consider the execution of a sequence of operations on a single CPUwhere■ CPU read and write hits generate no stall cycles.■ CPU read and write misses generate Nmemory and Ncache stall cycles if satisfiedby memory and cache, respectively.■ CPU write hits that generate an invalidate incur Ninvalidate stall cycles.■ A write-back of a block, due to either a conflict or another processor’srequest to an exclusive block, incurs an additional Nwriteback stall cycles.Consider two implementations with different performance characteristics summarizedin Figure 5.36. Consider the following sequence of operations assumingthe initial cache state in Figure 5.35. For simplicity, assume that the secondoperation begins after the first completes (even though they are on differentprocessors):P1: read 110P3: read 110For Implementation 1, the first read generates 50 stall cycles because the read issatisfied by P0’s cache. P1 stalls for 40 cycles while it waits for the block, and P0stalls for 10 cycles while it writes the block back to memory in response to P1’srequest. Thus, the second read by P3 generates 100 stall cycles because its miss issatisfied by memory, and this sequence generates a total of 150 stall cycles. Forthe following sequences of operations, how many stall cycles are generated byeach implementation?

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

[20/20/20/20] <5.3> Hiệu suất của một đa bộ nhớ cache-mạch lạc rình mò
phụ thuộc vào nhiều vấn đề thực hiện chi tiết mà xác định một cách nhanh chóng như thế nào
một bộ nhớ cache phản ứng với các dữ liệu trong một khối nhà nước độc quyền hoặc M. Trong một số hiện thực,
một CPU đọc nhớ đến một khối bộ nhớ cache mà là độc quyền của một bộ xử lý
hình 5.35 Multicore (point-to-point) đa.
P0 P1 P3
. . . . .
B0
B1
B2
B3
B0
B1
B2
B3
B0
B1
B2
B3
tôi 100
trạng thái mạch lạc
trạng sự liên lạc
Địa chỉ thẻ dữ liệu
sự liên lạc nhà nước
Địa chỉ thẻ dữ liệu
00
00
00
00
00
00
00
00
dữ liệu
00
00
00
00
10
08
30
10
10
68
10
18
20
08
10
10
108
110
118
100
128
110
118
Bộ nhớ
On-chip kết nối (với người quản lý sự mạch lạc)
Địa chỉ tag
120
108
110
118
S
M
I
I
M
I
S
S
S
I
I
Địa chỉ dữ liệu
....
100
108
110
118
120
128
130
. ..
....
00
00
00
00
00
00
00
...
....
10
08
10
18
20
28
30
...
414 ■ Chương năm đề cấp song song
bộ nhớ cache là nhanh hơn so với một lỡ đến một khối trong bộ nhớ. Điều này là do cache là
nhỏ hơn, và do đó nhanh hơn so với bộ nhớ chính. Ngược lại, trong một số hiện thực,
lỡ hài lòng bởi bộ nhớ có tốc độ nhanh hơn so với những người hài lòng bởi cache. Điều này
là bởi vì bộ nhớ đệm thường được tối ưu hóa cho "mặt tiền" hoặc tài liệu tham khảo CPU,
chứ không phải là "mặt sau" hay rình mò truy cập. Đối với đa được minh họa trong
hình 5.35, hãy xem xét việc thực hiện một chuỗi các hoạt động trên một CPU duy nhất
nơi
■ CPU đọc và viết hit tạo ra không có chu kỳ gian hàng.
■ CPU đọc và viết miss tạo Nmemory và Ncache chu kỳ gian hàng nếu được thỏa mãn
bởi bộ nhớ và bộ nhớ cache , tương ứng.
■ CPU ghi hit tạo ra gánh chịu một lệnh vô hiệu chu kỳ gian hàng Ninvalidate.
■ một ghi lại của một khối, do hoặc là một cuộc xung đột hoặc của một bộ xử lý
yêu cầu tới một khối độc quyền, phải gánh chịu thêm một chu kỳ gian hàng Nwriteback.
Hãy xem xét hai triển khai với đặc điểm hoạt động khác nhau được tóm tắt
trong Hình 5.36. Xem xét trình tự sau đây của hoạt động giả định
tình trạng bộ nhớ cache đầu tiên trong hình 5.35. Để đơn giản, giả sử rằng thứ hai
hoạt động bắt đầu sau khi chạy xong đầu tiên (mặc dù họ đang ở trên khác nhau
xử lý):
P1: đọc 110
P3: đọc 110
cho thực hiện 1, đọc đầu tiên tạo ra 50 chu kỳ gian hàng bởi vì đọc được
hài lòng bởi bộ nhớ cache P0 của . Quầy hàng P1 cho 40 chu kỳ trong khi chờ đợi cho các khối, và P0
quầy hàng cho 10 chu kỳ trong khi nó viết khối trở lại bộ nhớ để đáp ứng với P1
yêu cầu. Như vậy, việc đọc thứ hai bởi P3 tạo 100 chu kỳ gian hàng vì bỏ lỡ của nó được
thỏa mãn bởi bộ nhớ, và trình tự này tạo ra tổng cộng 150 chu kỳ gian hàng. Đối với
các trình tự sau đây của các hoạt động, bao nhiêu chu kỳ gian hàng được tạo ra bởi
mỗi hiện thực?

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.