Greetings, What I really need to know is how modern 64 bit memory is addressed. I already understand that if I have a pointer, ptr, which points to an array of some structure then ptr++ will point to the next element. That is, Ptr++ increments in a multiple of bytes. But that is all done in software. I want to optimise my software to match what happens in hardware. If I have a pointer pointing to address 256 and then look at address 257, by how many BYTES have I jumped in physical memory? In old style memory of 16 bit words it would jump one word, that is 4 BYTES, but in a machine with 8 bit words it would jump one word of 2 BYTES. That is, always one machine word. Also, if your machine was up to date, it always had the matching data bus to access a full word (machine word, not a Bill Gates 'word') in one take. That is, a 16 bit machine had a 16 bit databus. Cheap machines, with a cheap data-bus, would take two mouths full. An equivalent question would be: If I have a string, Str(16), of bytes Str(0) to Str(15) containing "ABCDEFGHIJKLMNOn" (n=NUL) in a 32 bit machine, does the request for Str(9) ie "J" electronically load this character directly into the least significant byte of the register, (doubt it) or, by assembler code (generated by the compiler) do the following: Load the characters "IJKL" into the register, shift right, ("00IJ") and mask ("oooJ") ? That is: does the electronic system access one byte directly? or perhaps the load/shift/mask is done in hardware? (doubt it) For me the important issue is, for a 64 bit machine: I suspect that access to IntArray(23) is faster if defined as int64_t IntArray(128) ie as 64 bit words with the top bits generally being zero than if defined as int32_t IntArray(128) ie as 32 bit words because, for a 64 bit machine, the latter will store two values into each physical address. (I must experiment with int_fast32_t ) (In a 32 bit machine everybody knows that it requires two bites of the cherry to get a 64 bit integer or float value.) The significance of this last question would be that there may be no value in having data 'nn byte aligned'. Finally, I am aware that modern machines have a cache. When I ask for the value of K, as in M = N + K, does the machine load a whole page to get N and another page to get K? If these are int32_t does the memory do a 32 bit pulse/read of one address? Or perhaps a 64 bit pulse/read and mask? or repeated reads for a whole page? Does the AMD64 have a 32 bit databus or a 64 bit databus? Thanks to any hardware boffins, Colin