Which endian is up?
I'm trying to parse an ELF file to build a human readable representation. I know it's been done before, and there are tools such as objdump, nm, readelf, c++filt, etc. I can look at the source to see how they are implemented. In most cases, that means reading C, not C++. I do have an example of C++ code that works. I thought it might be a good idea to take a different approach than that code takes. Rather than processing the data from in std::ifstream as ELFIO does, I wanted to read it into a buffer, and work on it from there. One option is to use a std::stringstream, and treat it like and do basically the same thing ELFIO does with the std::ifstream. What I currently have is a std::vector<char>, which I thought would be a good thing to use. But if I try to do that, I have to find a different way to do things such as m_pStream->read( reinterpret_cast<char*>( &m_header ), sizeof( m_header ) ); My intent was to work on the file from the point of view of it just being data, rather than a stream. Is that simply "not the way it's done"? Another question I have is this: The ELF file starts with an identification tag which carries such information as encoding type (big endian, little endian), the class of the machine(32 or 64-bit), etc. That's all in specifically ordered bytes. That leaves a question opened. Doesn't the host encoding determine which byte is treated as 0? IOW, the first four bytes are specified as '0x7f', 'E', 'L', 'F'. If that were read by a machine that uses the opposite endian, would that not come across as 'F','L','E','0x7f'? Even more perplexing is this: the rest of the ELF header is formed of heterogeneous data types. See below. Does that mean some of the data might be put into the wrong location by the reinterpret_cast<> above? // Platform specific definitions. typedef unsigned long Elf32_Addr; typedef unsigned short Elf32_Half; typedef unsigned long Elf32_Off; typedef signed long Elf32_Sword; typedef unsigned long Elf32_Word; // ELF file header struct Elf32_Ehdr { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; }; -- Regards, Steven
Disclaimer: I still get confused when I have to deal with endian issues. Reality trumps anything I claim below. To make a stab at some of your questions: Character data is stored in "normal" order in both cases, so text is not a problem. The problem arises when you have a multi-byte entity, such as a long. Then, if the file format and the processor format differ, reading from the file will cause "backwards" bytes. If you read a structure, things start to get complicated. Character arrays should end up in the right place, multi-byte numbers will be backwards. It gets ugly when you have smaller fields packed into a structure. I believe that you can end up with all the fields within a word reversed. I remember having to come up with two definitions for structs to account for this. This may be compiler dependent. In your example below, I think all the data ends up in the right fields, you just have to swap the bytes within all the individual fields (if the endianness differs) except for char e_ident. (Is there a problem if EI_NIDENT isn't divisible by four? I don't think so, but...) Rich On 8/5/05, Steven T. Hatton <hattons@globalsymmetry.com> wrote:
I'm trying to parse an ELF file to build a human readable representation. I know it's been done before, and there are tools such as objdump, nm, readelf, c++filt, etc. I can look at the source to see how they are implemented. In most cases, that means reading C, not C++. I do have an example of C++ code that works. I thought it might be a good idea to take a different approach than that code takes.
Rather than processing the data from in std::ifstream as ELFIO does, I wanted to read it into a buffer, and work on it from there. One option is to use a std::stringstream, and treat it like and do basically the same thing ELFIO does with the std::ifstream. What I currently have is a std::vector<char>, which I thought would be a good thing to use. But if I try to do that, I have to find a different way to do things such as
m_pStream->read( reinterpret_cast<char*>( &m_header ), sizeof( m_header ) );
My intent was to work on the file from the point of view of it just being data, rather than a stream. Is that simply "not the way it's done"?
Another question I have is this: The ELF file starts with an identification tag which carries such information as encoding type (big endian, little endian), the class of the machine(32 or 64-bit), etc. That's all in specifically ordered bytes. That leaves a question opened. Doesn't the host encoding determine which byte is treated as 0? IOW, the first four bytes are specified as '0x7f', 'E', 'L', 'F'. If that were read by a machine that uses the opposite endian, would that not come across as 'F','L','E','0x7f'? Even more perplexing is this: the rest of the ELF header is formed of heterogeneous data types. See below. Does that mean some of the data might be put into the wrong location by the reinterpret_cast<> above?
// Platform specific definitions. typedef unsigned long Elf32_Addr; typedef unsigned short Elf32_Half; typedef unsigned long Elf32_Off; typedef signed long Elf32_Sword; typedef unsigned long Elf32_Word;
// ELF file header struct Elf32_Ehdr { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; }; -- Regards, Steven
-- To unsubscribe, email: suse-programming-e-unsubscribe@suse.com For additional commands, email: suse-programming-e-help@suse.com Archives can be found at: http://lists.suse.com/archive/suse-programming-e
participants (2)
-
Rich Wilson
-
Steven T. Hatton