Allows a program to request a cache block fetch before it is actually needed by the program.
Bits | Value |
---|---|
0-5 | 31 |
6-8 | /// |
9-10 | TH |
11-15 | RA |
16-20 | RB |
21-30 | 278 |
31 | / |
PowerPC | |
---|---|
dcbt | RA, RB, TH |
The dcbt instruction may improve performance by anticipating a load from the addressed byte. The block containing the byte addressed by the effective address (EA) is fetched into the data cache before the block is needed by the program. The program can later perform loads from the block and may not experience the added delay caused by fetching the block into the cache. Executing the dcbt instruction does not invoke the system error handler.
If general-purpose register (GPR) RA is not 0, the effective address (EA) is the sum of the content of GPR RA and the content of GPR RB. Otherwise, the EA is the content of GPR RB.
Consider the following when using the dcbt instruction:
The Touch Hint field (TH) is used to provide a hint that the program will probably load soon from the storage locations specified by the EA and the TH field. The hint is ignored for locations that are caching-inhibited or guarded. The encodings of the TH field are as follows:
TH | Description |
---|---|
00 | The storage location is the byte addressed by EA. |
01 | The storage locations are the block containing the byte addressed by EA and sequentially following blocks (that is, the blocks containing the bytes addressed by EA + n * block_size, where n = 0, 1, 2...). |
10 | Reserved. |
11 | The storage location are the block containing the byte addressed by EA and sequentially following blocks (that is, the blocks containing the bytes addressed by EA - n * block_size, where n = 0, 1, 2...). |
The dcbt instruction serves as both a basic and extended mnemonic. The dcbt mnemonic with three operands is the basic form, and the dcbt with two operands is the extended form. In the extended form, the TH field is omitted and assumed to be 0b00.
The dcbt instruction has one syntax form and does not affect the Condition Register field 0 or the Fixed-Point Exception register.
RA | Specifies source general-purpose register for EA computation. |
RB | Specifies source general-purpose register for EA computation. |
TH | Indicates when a sequence of data cache blocks might be needed. |
The following code sums the content of a one-dimensional vector:
# Assume that GPR 4 contains the address of the first element # of the sum. # Assume 49 elements are to be summed. # Assume the data cache block size is 32 bytes. # Assume the elements are word aligned and the address # are multiples of 4. dcbt 0,4 # Issue hint to fetch first # cache block. addi 5,4,32 # Compute address of second # cache block. addi 8,0,6 # Set outer loop count. addi 7,0,8 # Set inner loop counter. dcbt 0,5 # Issue hint to fetch second # cache block. lwz 3,4,0 # Set sum = element number 1. bigloop: addi 8,8,-1 # Decrement outer loop count # and set CR field 0. mtspr CTR,7 # Set counter (CTR) for # inner loop. addi 5,5,32 # Computer address for next # touch. lttlloop: lwzu 6,4,4 # Fetch element. add 3,3,6 # Add to sum. bc 16,0,lttlloop # Decrement CTR and branch # if result is not equal to 0. dcbt 0,5 # Issue hint to fetch next # cache block. bc 4,3,bigloop # Branch if outer loop CTR is # not equal to 0. end # Summation complete.
The clcs (Cache Line Compute Size) instruction, clf (Cache Line Flush) instruction, cli (Cache Line Invalidate) instruction, dcbf (Data Cache Block Flush) instruction, dcbi (Data Cache Block Invalidate) instruction, dcbst (Data Cache Block Store) instruction, dcbtst (Data Cache Block Touch for Store) instruction, dcbz or dclz (Data Cache Block Set to Zero) instruction, dclst (Data Cache Line Store) instruction, icbi (Instruction Cache Block Invalidate) instruction, sync (Synchronize) or dcs (Data Cache Synchronize) instruction.