博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Designing a Hex Dump Filter—Using Successive Refinement

Posted on 2011-03-22 18:28  天地玄黄  阅读(420)  评论(0编辑  收藏  举报

Define the Program

        把文件中的所有字符都变成用十六进制书写的格式。

Starting with Pseudo-Code

1、最初的伪代码设计如下:

        Read a char from the input file

        convert the character to hex string

        Write the hex string to the output file

        Repeat until done.

2、第一次改进,增添一些细节:

        Read a character from standard input (stdin)

        Apart the 8-bit character into two nybbles, change all of them to hex character, and then put them together

        Witer the hex string to standard output (stdout)

        Repeat until done.

        Exit the program by calling sys_exit.

3、EOF

        Read a character from standard input (stdin)

        Test if we have reached End Of File (EOF)

        If we have reached EOF, we’re done, so jump to exit

        Apart the 8-bit character into two nybbles, change all of them to hex character, and then put them together

        Witer the hex string to standard output (stdout)

        Go back and read another character.

        Exit the program by calling sys_exit.

4、加上标签

Read: Set up registers for the sys_read kernel call.
          Call sys_read to read from stdin.
          Test for EOF.
          If we are at EOF, jump to Exit

          Apart the 8-bit character into two nybbles
          Change all of them into Hex character
          Put the two Hex char together

Write: Set up registers for the sys_write kernel call.
           Call sys_write to write stdout.
           Jump back to Read and get another character

Exit:    Set up registers for termianting the program via sys_exit.
           Call sys_exit.

5、进一步细化如何把nybble转换成Hex character

           Set a looking up table for mapping Decimal to Hex:            Digits “0123456789ABCDEF”
           Set a output table for putting two Hex char together:        HexStr “ 00”

Read:  Set up registers for the sys_read kernel call.
           Call sys_read to read from stdin.
           Test for EOF.
           If we are at EOF, jump to Exit

           Apart the 8-bit character into two nybbles
           For each nybble, the decimal value of this nybble plus the address of Digits, this byte in the Digits is the Hex character
           Put the Least Significant Hex in [HexStr+2].
           Put the Most Significant Hex in [HexStr+1].

Write: Set up registers for the sys_write kernel call.
           Call sys_write to write stdout.
           Jump back to Read and get another character

Exit:    Set up registers for termianting the program via sys_exit.
           Call sys_exit.

这一步细化之后的代码,完全按照这一步的细化方案来写的:

; Executable name	: myhexdump1
; Version		: 1.0
; Created data		: 3/22/2011
; Last update		: 3/22/2001
; Author		: Eric Wang
; Description		: A simple program in assembly for Linux, using NASM 2.05,
;	Demonstrating the conversion of binary values to hexadecial strings.
;	It acts as a very simple hex dump utility for files.
;
; Run it this way:
;	myhexdump1 < (input file)
;
; Build using these commands:
;	nasm -f elf -g -F dwarf myhexdump1.asm
;	ld -o myhexdump1 myhexdump1.o
;

section .bss			; Section containing uninitialized data
	Buff:	resb 1		; Text buffer itself

section .data			; Section containing initialized data
	Digits: db "0123456789ABCDEF"
	HexStr: db " 00"
	HEXLEN: equ $-HexStr
	
section .text			; Section containing code
	global _start		; Linker needs this to find the entry point!

_start:
	nop			; This no-op keeps gdb happy...

; Read from stdin
Read:
	mov eax,3		; Specify sys_read call
	mov ebx,0		; Specify File Descriptor 0: Standard Input
	mov ecx,Buff		; Pass address of the buffer to read to
	mov edx,1		; Tell sys_read to read one char from stdin
	int 80h			; Call sys_read

	cmp eax,0		; If eax==0, sys_read reached EOF on stdin
	je Done			; Jump If Equal (to 0, from compare)

; Apart the 8-bit char into two nybbles
	xor eax,eax			; Clear eax to 0
	mov al,byte [Buff]		; Put the char into eax register, for low nybble
	mov ebx,eax			; For high nybble

	and al,0fh			; mask the high 4-bits in this character
	mov al,byte [Digits+eax]	; Look up table for finding the Hex char
	mov byte [HexStr+2],al		; Put the Hex char into HexStr at least significant place
	
	shr bl,4			; Shift the high 4-bits to the low 4-bits
	mov bl,byte [Digits+ebx]	; Loop up table for finding the Hex char
	mov byte [HexStr+1],bl		; Put the Hex char into HexStr at most significant place

Write:
	mov eax,4		; Specify sys_write call
	mov ebx,1		; Specify File Descriptor 1: Standard Output
	mov ecx,HexStr		; Pass address of the character to write
	mov edx,HEXLEN		; Pass number of characters to write
	int 80h			; Call sys_write

	jmp Read		; Jump back to Read and get another character

; All done!
Done:
	mov eax,1		; Specify sys_exit call
	mov ebx,0		; Return code of zero to Linux
	int 80h			; Call sys_exit

6、Buffered file scan

           Set a 16 bytes buffer.          

           Set a looking up table for mapping Decimal to Hex:            Digits “0123456789ABCDEF”
           Set a output table for putting two Hex char together:        HexStr “ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00”,10

Read:  Set up registers for the sys_read kernel call.
           Call sys_read to read a buffer full of characters from stdin.
           Test for EOF.
           If we are at EOF, jump to Exit

           Set up registers as a pointer to scan the buffer
Scan:  Apart the 8-bit character at buffer pointer into two nybbles
           For each nybble, the decimal value of this nybble plus the address of Digits, this byte in the Digits is the Hex character
           Put the Least Significant Hex in [HexStr+bufferPointer*3+2].
           Put the Most Significant Hex in [HexStr+bufferPointer*3+1].
           Increment buffer pointer.
           If we still have characters in the buffer, jump to Scan.

Write: Set up registers for the sys_write kernel call.
           Call sys_write to write the processed HexStr to stdout.
           Jump back to Read and get another buffer full of characters.

Exit:    Set up registers for termianting the program via sys_exit.
           Call sys_exit.

7、Start talking specifics: which register do what:

           Set a 16 bytes buffer.          

           Set a looking up table for mapping Decimal to Hex:            Digits: db “0123456789ABCDEF”
           Set a output table for putting two Hex char together:        HexStr: db “ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00”,10

Read:  Set up registers for the sys_read kernel call.
           Call sys_read to read a buffer full of characters from stdin.
           Store the number of characters read in esi
           Test for EOF (eax==0).
           If we are at EOF, jump to Exit.

           Set ebp to 0 to point to the first character address in buffer
Scan:  Move the byte at [Buff+ebp] to al
           Copy the value of eax to ebx (Then the low 4-bits in al and the high 4-bits in bl are two nybbles)
           Mask the al with 0Fh, plus this value with the address of Digits, this byte in the Digits is the Least Significant Hex character
           Shift right for 4 bits of the bl, plus this value with the Digits’ address, this byte in Digits is the Most Significant Hex character
           Put the Least Significant Hex in [HexStr+bufferPointer*3+2].
           Put the Most Significant Hex in [HexStr+bufferPointer*3+1].
           Increment buffer pointer.
           If ebp < esi, jump to Scan.

Write: Set up registers for the sys_write kernel call.
           Call sys_write to write the processed HexStr to stdout.
           Jump back to Read and get another buffer full of characters.

Exit:    Set up registers for termianting the program via sys_exit.
           Call sys_exit.

代码,粗体部分是修改的:

; Executable name	: myhexdump2
; Version		: 2.0
; Created data		: 3/22/2011
; Last update		: 3/22/2001
; Author		: Eric Wang
; Description		: A simple program in assembly for Linux, using NASM 2.05,
;	Demonstrating the conversion of binary values to hexadecial strings.
;	It acts as a very simple hex dump utility for files with Buffer Scanning.
;
; Run it this way:
;	myhexdump2 < (input file)
;
; Build using these commands:
;	nasm -f elf -g -F dwarf myhexdump2.asm
;	ld -o myhexdump2 myhexdump2.o
;

section .bss			; Section containing uninitialized data
	BUFFLEN: equ 16		; Our buffer lenth is 16
	Buff:	resb BUFFLEN	; Text buffer itself, resb may means "reset to byte"

section .data			; Section containing initialized data
	Digits: db "0123456789ABCDEF"
	HexStr: db " 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00",10
	HEXLEN: equ $-HexStr
	
section .text			; Section containing code
	global _start		; Linker needs this to find the entry point!

_start:
	nop			; This no-op keeps gdb happy...

; Read from stdin
Read:
	mov eax,3		; Specify sys_read call
	mov ebx,0		; Specify File Descriptor 0: Standard Input
	mov ecx,Buff		; Pass address of the buffer to read to
	mov edx,BUFFLEN		; Tell sys_read to read BUFFLEN characters from stdin
	int 80h			; Call sys_read

	mov esi,eax		; Store the number of characters read in esi
	cmp eax,0		; If eax==0, sys_read reached EOF on stdin
	je Done			; Jump If Equal (to 0, from compare)

	xor ebp,ebp		; set ebp to 0 to point to the first char in buffer
; Apart the 8-bit char into two nybbles
Scan:
	xor eax,eax			; Clear eax to 0
	mov al,byte [Buff+ebp]		; Put the char into eax register, for low nybble
	mov ebx,eax			; For high nybble

	and al,0fh			; mask the high 4-bits in this character
	mov al,byte [Digits+eax]	; Look up table for finding the Hex char
	mov ecx,ebp			; The first step for X3. ecx = ebp
	shl ecx,1			; ecx = ecx * 2, so ecx = ebp * 2
	add ecx,ebp			; ecx = ebp + ecx, so ecx = ebp * 3
	mov byte [HexStr+ecx+2],al	; Put the Hex char into HexStr at least significant place
	
	shr bl,4			; Shift the high 4-bits to the low 4-bits
	mov bl,byte [Digits+ebx]	; Loop up table for finding the Hex char
	mov ecx,ebp			; The first step for X3. ecx = ebp
	shl ecx,1			; ecx = ecx * 2, so ecx = ebp * 2
	add ecx,ebp			; ecx = ebp + ecx, so ecx = ebp * 3
	mov byte [HexStr+ecx+1],bl	; Put the Hex char into HexStr at most significant place

	inc ebp				; increse buffer pointer
	cmp ebp,esi			; Compare ebp and esi
	jb Scan				; Jump if Below (if ebp < esi, jump to Scan)

Write:
	mov eax,4		; Specify sys_write call
	mov ebx,1		; Specify File Descriptor 1: Standard Output
	mov ecx,HexStr		; Pass address of the character to write
	mov edx,HEXLEN		; Pass number of characters to write
	int 80h			; Call sys_write

	jmp Read		; Jump back to Read and get another character

; All done!
Done:
	mov eax,1		; Specify sys_exit call
	mov ebx,0		; Return code of zero to Linux
	int 80h			; Call sys_exit

最后,这个程序还有一个问题:如果最后一次从文件读入的数据不足16个byte,那么HexStr的最后结果就会和上一次的尾部相同。更改的部分如下:

Write:
	mov eax,4		; Specify sys_write call
	mov ebx,1		; Specify File Descriptor 1: Standard Output
	mov ecx,HexStr		; Pass address of the character to write

	cmp esi,BUFFLEN		; For the last read.
	je Normal		; If esi = HEXLEN, this is not the last read, and the buffer is full.
	mov edx,esi		; The real characters in the buffer
	shl edx,1		; edx = esi * 2
	add edx,esi		; edx = esi * 3
	mov byte [HexStr+edx],10	; New line
	inc edx			; edx plus one
	jmp Call		; Do not mov edx,HEXLEN
Normal:	mov edx,HEXLEN		; Pass number of characters to write
Call:	int 80h			; Call sys_write

	jmp Read		; Jump back to Read and get another character