Quantcast
Viewing latest article 1
Browse Latest Browse All 6

Data Types and Moving Data in Assembly

Image may be NSFW.
Clik here to view.
Edit
I’m still following the Assembly Primer for Hackers from Vivek Ramachandran of SecurityTube in preparation for Penetration Testing with BackTrack. In this review I’ll cover data types and how to move bytes, numbers, pointers and strings between labels and registers.

Data Types

Variables (data/labels) are defined in the .data segment of your assembly program. Here are some of the available data types you’ll commonly use.

Image may be NSFW.
Clik here to view.
Data Types in Assembly

Data types in assembly; photo credit to Vivek Ramachandran

Example code

# Demo program to show how to use Data types and MOVx instructions

.data
	HelloWorld:
		.ascii "Hello World!"

	ByteLocation:
		.byte 10

	Int32:
		.int 2
	Int16:
		.short 3
	Float:
		.float 10.23

	IntegerArray:
		.int 10,20,30,40,50

.bss
	.comm LargeBuffer, 10000

.text
	.globl _start

	_start:
		nop
		# Exit syscall to exit the program

		movl $1, %eax
		movl $0, %ebx
		int $0x80

Moving numbers in assembly

Introduction to mov

This is the mov family of operations. By appending b, w or l you can choose to move 8 bits, 16 bits or 32 bits of data. To demonstrate these operations, we’ll be using the example above.

Moving a byte into a register

movb $0, %al

This will move the integer 0 into the lower 8 bits of the EAX register.

Moving a word into a register

movw $10, %ax

This will move the integer 10 into the lower 16 bits of the EAX register.

Moving a word into a register

movl $20, %eax

This will move the integer 20 into the 32-bit EAX register.

Moving a word into a label

movw $50, Int16

This will move the integer 50 into the 16-bit label Int16.

Moving a label into a register

movl Int32, %eax

This will move the contents of the Int32 label into the 32-bit EAX register.

Moving a register into a label

movb %al, ByteLocation

This will move the contents of the 8-bit AL register into the 8-bit ByteLocation label.

Accessing memory locations (using pointers)

In C we have the concept of pointers. A pointer is simply a variable that points to a location in memory. Typically that memory location holds some data that is important to us and that’s why we’re keeping a pointer to it so we can access the data later. This same concept can be achieved in assembly.

Moving a label’s memory address into a register (creating a pointer)

movl $Int32, %eax

This will move the memory location of the Int32 label into the EAX register. In effect the EAX register is now a pointer to the data held by the Int32 label. Notice that we use movl because memory locations are 4 bytes. Also notice that to access the memory location of a label you prepend the $ character.

Dereferencing a pointer (accessing the contents of a memory address)

Moving a word into a dereferenced location

movl $9, (%eax)

This will move the integer 9 into the memory location held in EAX. In other words, if this were C, %eax would be considered a pointer and (%eax) would be the way we dereference that pointer to change the contents of the location it points to. The equivalent in C would like something like this:

int Int32 = 2;
int *eax;
eax = &Int32;
*eax = 9;

The only difference in the C example is that we had to define eax as an int pointer before we could copy the address of Int32. In assembly we can just copy the address of Int32 directly into the EAX register, circumventing the need for an additional variable. But line 4 of this C example is the equivalent of the assembly example shown above.

So to clarify one more time, EAX does not change at all in this example; EAX still points to the same location! However, the data at that location has changed. So if EAX contains the location of the Int32 label, then Int32 now contains 9. So it’s Int32 that has changed, not EAX.

Notice that we use the parentheses to access the memory location stored in the register (dereference the pointer).

Moving a dereferenced value into a register

movl (%eax), %ebx

Keeping in mind that contains the location of the Int32 label and that Int32 now contains 9, this will move 9 into EBX. In other words, the parentheses

In effect the EBX register is now a pointer to the data held by EAX. Notice that to access the memory location of the register we’re again enclosing the register name in parentheses.

Moving strings in assembly

I can imagine that reading this you might be thinking, “hey, strings are just bytes of data so why can’t I just move them using the same instructions I just learned?” And the answers to that questions is you can! The problem is that strings are oftentimes much larger. A string might be 1 byte, 5 bytes, or 100 bytes. And none of mov instructions discussed above cover anything larger than 4 bytes. So let’s discuss the string operations that are available to alleviate the pains of copying large strings of data.

A key difference between the standard mov operations and the string series of movs, stos and lods operations is the number of operands. With mov, you specify the source and destination via 2 operands. However, with the movs instructions, the source and destination addresses are placed into the ESI and EDI registers respectively. And with stos and lods, the operations interact directly with the EAX register. This will become more clear with some examples.

The DF flag

DF stands for direction flag. This is a flag stored in the CPU that determines whether to increment or decrement a string’s memory address when string operations are called. When DF is 0 (cleared) the addresses are incremented. When DF is 1 (set) the addresses are decremented. In our examples the DF flag will always be cleared.

The usefulness of the DF flag will make more sense in the examples.

Clearing the DF flag

cld

DF is set to 0. Addresses are incremented where applicable.

Setting the DF flag

std

DF is set to 1. Addresses are decremented where applicable.

Example code

In the example below, the following variables have been defined:

.data
	HelloWorldString:
		.asciz "Hello World of Assembly!"

.bss
	.lcomm Destination, 100

movs: Moving a string from one memory location to another memory location

source: %esi; should contain a memory address where the data to be copied resides;
	the data at this address is not modified, but the address stored in the %esi register
	is incremented or decremented according to the DF flag
destination: %edi; should contain a memory address where the data will be copied to;
	after copying, the address stored in the %edi register is incremented or decremented
	according to the DF flag

Variations

movsb: move a single byte
movsw: move 2 bytes
movsl: move 4 bytes

Example

movl $HelloWorldString, %esi
movl $Destination, %edi

movsb
movsw
movsl

In this example, we first move the address of HelloWorldString into the ESI register (the source string). Then we move the address of Destination into EDI (the destination buffer).

When movsb is called, it tells the CPU to move 1 byte from the source to the destination, so the ‘H’ is copied to the first byte in the Destination label. However, that is not the only thing that happens during this operation. You may have noticed that I pointed out how the address stored in the %esi and %edi registers are both incremented or decremented according to the DF flag. Since the DF flag is cleared, both %esi and %edi are incremented by 1 byte.

But why is this useful? Well, what it means is that the next string operation to be called will start copying from the 2nd byte of the source string instead of the first byte. In other words, rather than copying the ‘H’ a second time, we’ll start by copying the ‘e’ in the HelloWorldString instead. This is what makes the movs series of operations far more useful than the mov operations when dealing with strings.

So, as you might imagine, when calling movsw the next 2 bytes are copied and Destination now holds “Hel”. And finally the movsl operation copies 4 bytes into Destination, which makes it “Hello W”.

Of course, the memory locations held in both %esi and %edi have now been incremented by 7 bytes each. So the final values are..

%esi: $HelloWorldString+7
%edi: $Destination+7
HelloWorldString: "Hello World of Assembly!"
Destination: "Hello W"

lods: Moving a string from a memory location into the EAX register

source: %esi; should contain a memory address where the data to be copied resides;
	the data at this address is not modified, but the address stored in the %esi register
	is incremented or decremented according to the DF flag
destination: %eax; the contents of this register are discarded because the data is copied
	directly into the register, NOT to any memory address residing in the register; no
	incrementing or decrementing occurs because the destination is a register and not a
	memory location

Variations

lodsb: move a single byte
lodsw: move 2 bytes
lodsl: move 4 bytes

stos: Moving a string from the EAX register to a memory location

source: %eax; the contents of this register are copied, NOT the contents of any memory
	address residing in the register; no incrementing or decrementing occurs because the
	source is a register and not a memory location
destination: %edi; should contain a memory address where the data will be copied to;
	after copying, the address stored in the %edi register is incremented or decremented
	according to the DF flag

Variations

stosb: move a single byte
stosw: move 2 bytes
stosl: move 4 bytes

rep: Repeating an operation so you can move strings more easily

rep movsb

This will continue executing the movsb operation and decrementing the ECX register until it equals 0. So if you wanted to copy a string in its entirety, you could follow this pseudo-code:

* set ESI to the memory address of the source string
* set EDI to the memory address of the destination string
* set ECX to the length of the source string
* clear the DF flag so ESI and EDI will be incremented for each call to movsb
* call rep movsb

Example

movl $HelloWorldString, %esi
movl $DestinationUsingRep, %edi
movl $25, %ecx # because HelloWorldString contains 24 characters + a null terminator
cld
rep movsb

Here we have movsb being called 25 times (the value of ECX). Because movsb increments both the ESI and EDI register you don’t have to concern yourself with the memory handling at all. So at the end of the example, the values are..

%esi: $HelloWorldString+25
%edi: $Destination+25
%ecx: 0
DF: 0
HelloWorldString: "Hello World of Assembly!"
Destination: "Hello World of Assembly!"

More to Come

I hope you enjoyed reviewing data types and mov operations. Stay tuned for more assembly tips!


Viewing latest article 1
Browse Latest Browse All 6

Trending Articles