Hardware, RAM, databus

13 Mar 2005

      Greetings,
What I really need to know is how modern 64 bit memory is addressed.

I already understand that if I have a pointer, ptr, which points to an array
of some structure then ptr++ will point to the next element.
That is, Ptr++ increments in a multiple of bytes.
But that is all done in software.
I want to optimise my software to match what happens in hardware.

If I have a pointer pointing to address 256 and then look at address 257,
by how many BYTES have I jumped in physical memory?
In old style memory of 16 bit words it would jump one word, that
is 4 BYTES, but in a machine with 8 bit words it would jump one word
of 2 BYTES.  That is, always one machine word.
Also, if your machine was up to date, it always had the matching
data bus to access a full word (machine word, not a Bill Gates 'word')
in one take.  That is, a 16 bit machine had a 16 bit databus.
Cheap machines, with a cheap data-bus, would take two mouths full.

An equivalent question would be:
If I have a string,  Str(16), of  bytes Str(0) to Str(15)
containing  "ABCDEFGHIJKLMNOn"  (n=NUL)
in a 32 bit machine,
does the request for Str(9)  ie "J"
electronically load this character directly into the least significant byte
of the register, (doubt it)
or, by assembler code (generated by the compiler) do the following:
Load the characters  "IJKL" into the register,  shift right, ("00IJ")
and mask ("oooJ") ?
That is: does the electronic system access one byte directly?
or perhaps the load/shift/mask is done in hardware? (doubt it)

For me the important issue is, for a 64 bit machine:
I suspect that access to   IntArray(23) is
faster if defined as   int64_t  IntArray(128)
    ie as 64 bit words with the top bits generally being zero
than if defined as  int32_t  IntArray(128)
    ie as 32 bit words
because, for a 64 bit machine, the latter will store two values into each 
physical address.

(I must experiment with int_fast32_t )

(In a 32 bit machine everybody knows that it requires two bites of
the cherry to get a 64 bit integer or float value.)

The significance of this last question would be that there may be no value in
having data 'nn byte aligned'.

Finally, I am aware that modern machines have a cache.
When I ask for the value of K, as in  M = N + K, does the machine
load a whole page to get N and another page to get K?
If these are int32_t  does the memory do a 32 bit pulse/read of one
address?  Or perhaps a 64 bit pulse/read and mask? or repeated
reads for a whole page?
Does the AMD64 have a 32 bit databus or a 64 bit databus?

Thanks to any hardware boffins,
Colin

Colin Carter

tags

participants (1)