New subject: [suse-programming-e] Hardware access to RAM

13 Mar 2005

      I will explain what happens on i386.
...
I understand that if I have a pointer, ptr, which points to an 
array of some structure then ptr++ will point to the next element. 
But that is all done in software. I want to optimise my software 
to match what happens in hardware.
The software translates a ++ptr to adding the element size to the logical
address stored in ptr. If ptr was declared as an array of int's, then ++ptr
would add 4 to ptr.
...
If I have a pointer pointing to address 256 and then look at address 
257, by how many BYTES have I jumped in physical memory?
2

With "physical memory" I assume you mean physical memory as in the
following:
	Through a hardware/software mechanisms, logical addresses are
translated 
	to linear addresses (segmentation) and then to a physical addresses
	(paging).

With Linux, both addresses 256 and 257 lie in the same page (each 4096
bytes). All the bytes within a single page maps to a single physical page
frame of the same size.

The kernel assigns (almost) arbitrary page frames to pages. So if the two
addresses were on each side of a page boundary, you do not know the
difference of the physical addresses.
...
In old style 
memory of 16 bit words it would jump one word, that is 4 BYTES, but in 
a machine with 8 bit words it would jump one word of 2 BYTES.  That is, 
always one machine word. Also, if your machine was up to date, it always 
had the matching data bus to access a full word (machine word, not a Bill 
Gates 'word') in one take. (Cheap machines, with a cheap data-bus, would 
take two mouths full.)
Does not make sense to me. Are you talking about the the size of the memory
fetches the CPU caches are transfering memory to/from memory ?
...
An equivalent question would be:
If I have a string,  Str(16), of  bytes Str(0) to Str(15) containing
"ABCDEFGHIJKLMNOn"
(n=NUL) in a 32 bit machine, does the request for Str(9)  ie "J"
electronically load this 
character directly into the least significant byte of the register, (doubt
it) or, by 
assembler code (generated by the compiler) Load the characters  "IJKL"
into the register, 
shift right, ("00IJ") and mask ("oooJ") ? That is: does the electronic
system access one 
byte directly? or perhaps the load/shift/mask is done in hardware? (doubt
it)
One byte (assuming you are using 1 byte characters).
...
For me the important issue is, for a 64 bit machine:
I suspect that access to   IntArray(23) is
faster if defined as   int64_t  IntArray(128)
    ie as 64 bit words with the top bits generally being zero than if
defined as  int32_t  IntArray(128)
    ie as 32 bit words
because, for a 64 bit machine, the latter will store two values into each 
physical address.
The CPU caches would rather see your array in the smallest possible memory
region. When you want access to Str(9), it fetches, say, 128 bits around
that value to cache. The when you ask for Str(10), it is already in the
cache. 

But some operations may be faster for 64 bits than for 32 bits. I guess the
operations multiplication and division/remainder might be faster (ask AMD).

So whether int32_t or int64_t is fastest is probably very dependent on your
application.

Regards,
Håkon Hallingstad
Software Developer, EDB
+47 2252 8218
hakon.hallingstad@edb.com
www.edb.com
"IT er ikke alt - men det hjelper"

RE: [suse-programming-e] Hardware access to RAM

Hallingstad Håkon

Colin Carter

tags

participants (2)