Mailinglist Archive: opensuse-programming (55 mails)

< Previous Next >
RE: [suse-programming-e] Hardware access to RAM
  • From: Hallingstad Håkon <hakon.hallingstad@xxxxxxx>
  • Date: Sun, 13 Mar 2005 10:21:30 +0100
  • Message-id: <2F75935E9D97D411928A00508BAF7BBD04D6DCFF@xxxxxxxxxxxxxxxxxxxxx>
I will explain what happens on i386.

> I understand that if I have a pointer, ptr, which points to an
> array of some structure then ptr++ will point to the next element.
> But that is all done in software. I want to optimise my software
> to match what happens in hardware.

The software translates a ++ptr to adding the element size to the logical
address stored in ptr. If ptr was declared as an array of int's, then ++ptr
would add 4 to ptr.

> If I have a pointer pointing to address 256 and then look at address
> 257, by how many BYTES have I jumped in physical memory?

2

With "physical memory" I assume you mean physical memory as in the
following:
Through a hardware/software mechanisms, logical addresses are
translated
to linear addresses (segmentation) and then to a physical addresses
(paging).

With Linux, both addresses 256 and 257 lie in the same page (each 4096
bytes). All the bytes within a single page maps to a single physical page
frame of the same size.

The kernel assigns (almost) arbitrary page frames to pages. So if the two
addresses were on each side of a page boundary, you do not know the
difference of the physical addresses.

> In old style
> memory of 16 bit words it would jump one word, that is 4 BYTES, but in
> a machine with 8 bit words it would jump one word of 2 BYTES. That is,
> always one machine word. Also, if your machine was up to date, it always
> had the matching data bus to access a full word (machine word, not a Bill
> Gates 'word') in one take. (Cheap machines, with a cheap data-bus, would
> take two mouths full.)

Does not make sense to me. Are you talking about the the size of the memory
fetches the CPU caches are transfering memory to/from memory ?

> An equivalent question would be:
> If I have a string, Str(16), of bytes Str(0) to Str(15) containing
"ABCDEFGHIJKLMNOn"
> (n=NUL) in a 32 bit machine, does the request for Str(9) ie "J"
electronically load this
> character directly into the least significant byte of the register, (doubt
it) or, by
> assembler code (generated by the compiler) Load the characters "IJKL"
into the register,
> shift right, ("00IJ") and mask ("oooJ") ? That is: does the electronic
system access one
> byte directly? or perhaps the load/shift/mask is done in hardware? (doubt
it)

One byte (assuming you are using 1 byte characters).

> For me the important issue is, for a 64 bit machine:
> I suspect that access to IntArray(23) is
> faster if defined as int64_t IntArray(128)
> ie as 64 bit words with the top bits generally being zero than if
defined as int32_t IntArray(128)
> ie as 32 bit words
> because, for a 64 bit machine, the latter will store two values into each
> physical address.

The CPU caches would rather see your array in the smallest possible memory
region. When you want access to Str(9), it fetches, say, 128 bits around
that value to cache. The when you ask for Str(10), it is already in the
cache.

But some operations may be faster for 64 bits than for 32 bits. I guess the
operations multiplication and division/remainder might be faster (ask AMD).

So whether int32_t or int64_t is fastest is probably very dependent on your
application.

Regards,
Håkon Hallingstad
Software Developer, EDB
+47 2252 8218
hakon.hallingstad@xxxxxxx
www.edb.com
"IT er ikke alt - men det hjelper"

< Previous Next >
Follow Ups