Chapter 10. The cache
In the previous chapter, I described two aspects of the ongoing development of new CPU’s – increased
clock frequencies and the increasing number of transistors being used. Now it is time to look at a very
different yet related technology – the processor’s connection to the RAM, and the use of the L1 and L2
caches.
Speed conflict
The CPU works internally at very high clock frequencies (like 3200 MHz), and no RAM can keep up with
these.
The most common RAM speeds are between 266 and 533 MHz. And these are just a fraction of the
CPU’s working speed. So there is a great chasm between the machine (the CPU) which slaves away at
perhaps 3200 MHz, and the “conveyor belt”, which might only work at 333 MHz, and which has to ship
the data to and from the RAM. These two subsystems are simply poorly matched to each other.
If nothing could be done about this problem, there would be no reason to develop faster CPU’s. If the
CPU had to wait for a bus, which worked at one sixth of its speed, the CPU would be idlefive sixths of
the time. And that would be pure waste.
The solution is to insert small, intermediate stores of high-speed RAM. These buffers (cacheRAM)
provide a much more efficient transition between the fast CPU and the slow RAM. Cache RAM operates
at higher clock frequencies than normal RAM. Data can therefore be read more quickly from the cache.
Data is constantly being moved
The cache delivers its data to the CPU registers. These are tiny storage units which are placed right
inside the processor core, and they are the absolute fastest RAM there is. The size and number of the
registers is designed very specifically for each type of CPU.
Fig. 68. Cache RAM is much faster than normal RAM.
The CPU can move data in different sized packets, such as bytes(8 bits), words(16 bits), dwords(32
bits) or blocks(larger groups of bits), and this often involves the registers. The different data packets
are constantly moving back and forth:
● from the CPU registers to the Level 1 cache.
● from the L1 cache to the registers.
● from one register to another
● from L1 cache to L2 cache, and so on…
The cache stores are a central bridge between the RAM and the registers which exchange data with the
processor’s execution units.
The optimal situation is if the CPU is able to constantly work and fully utilize all clock ticks. This would
mean that the registers would have to always be able to fetch the data which the execution units
require. But this it not the reality, as the CPU typically only utilizes 35% of its clock ticks. However,
without a cache, this utilization would be even lower.
Bottlenecks
CPU caches are a remedy against a very specific set of “bottleneck” problems. There are lots of
“bottlenecks” in the PC – transitions between fast and slower systems, where the fast device has to
wait before it can deliver or receive its data. These bottle necks can have a very detrimental effect on
the PC’s total performance, so they must be minimised.
Fig. 69. A cache increases the CPU’s capacity to fetch the right data from RAM.
The absolute worst bottleneck exists between the CPU and RAM. It is here that we have the heaviest
data traffic, and it is in this area that PC manufacturers are expending a lot of energy on new
development. Every new generation of CPU brings improvements relating to the front side bus.
The CPU’s cache is “intelligent”, so that it can reduce the data traffic on the front side bus. The cache
controller constantly monitors the CPU’s work, and always tries to read in precisely the data the CPU
needs. When it is successful, this is called a cache hit. When the cache does not contain the desired
data, this is called a cache miss.
Two levels of cache
The idea behind cache is that it should function as a “near store” of fast RAM. A store which the CPU
can always be supplied from.
In practise there are always at least two close stores. They are called Level 1, Level 2,and (if
applicable) Level 3cache. Some processors (like the Intel Itanium) have three levels of cache, but
these are only used for very special server applications. In standard PC’s we find processors with L1
and L2 cache.
Fig. 70. The cache system tries to ensure that relevant data is constantly being fetched from RAM, so
that the CPU (ideally) never has to wait for data.
L1 cache
Level 1 cache is built into the actual processor core. It is a piece of RAM, typically 8, 16, 20, 32, 64 or
128 Kbytes, which operates at the same clock frequency as the rest of the CPU. Thus you could say the
L1 cache is part of the processor.
L1 cache is normally divided into two sections, one for dataand one for instructions. For example, an
Athlon processor may have a 32 KB data cache and a 32 KB instruction cache. If the cache is common
for both data and instructions, it is called a unified cache.
● Next chapter.
● Previous chapter.
Copyright Michael Karbo and ELI Aps., Denmark, Europe.
● Next chapter.
● Previous chapter.
đang được dịch, vui lòng đợi..
