Assembly Language Programming Reading Note

Assembly Language Step-by-Step Programming with DOS and Linux Second Edition Reading Note

Chapter 7: Following Your Instructions Meeting Machine Instructions up Close and Personal

 

Rally Round the Flags, Boys!

Adding and Subtracting One with INC and DEC

1.      Simplest among those are INC and DEC, which increment and decrement an operand by one, respectively.

2.      Both INC and DEC take only one operand.

3.      The difference with INC is that there is no carry. The Carry flag is not affected by INC, so don't try to use it to perform multidigit arithmetic.

Using DEBUG to Watch the Flags

1.      Eight of the nine 8086/8088 flags are represented by two-character symbols.

2.      The odd flag out is Trap flag TF, which is reserved for exclusive use by DEBUG itself and cannot be examined while DEBUG has control of the machine.

3.      DEBUG’s Flag State Symbols

FLAG

NAME

SET SYMBOL

CLEAR SYMBOL

OF

Overflow flag

OV

NV

DF

Direction flag

DN

UP

IE

Interrupt enable flag

EI

DI

SF

Sign flag

NG

PL

ZF

Zero flag

ZR

NZ

AF

Auxiliary carry flag

AC

NA

PF

Parity flag

PE

PO

CF

Carry flag

CY

NC

4.      When you first run DEBUG, the flags are set to their default values, which are these:

  NV UP EI PL NZ NA PO NC

You'll note that all these symbols are clear symbols except for EI, which must be set to allow interrupts to happen. Whether you are aware of it or not, interrupts are happening constantly within your PC. Each keystroke you type on the keyboard triggers an interrupt. Every 55 milliseconds, the system clock triggers an interrupt to allow the BIOS software to update the time and date values kept in memory as long as the PC has power. If you disabled interrupts for any period of time, your real-time clock would stop and your keyboard would freeze up. Needless to say, IE must be kept set nearly all the time.

5.      One thing to keep in mind is that even when a flag doesn't change state from display to display, it was still affected by the previously executed instruction.

Using Type Specifiers

1.      Telling an instruction the size of its operand is what BYTE and WORD do. Used in this way, BYTE and WORD are what we call type specifies. They exist in the broad class of things we call directives. Directives give instructions to the assembler. In this case, they tell the assembler how large the operand is when there is no other way for the assembler to know.

Types in Assembly Language

1.      Unlike nearly all high-level languages such as Pascal and C++, the notion of type in assembly language is almost wholly a question of size. A word is a type, as is a byte, a double word, a quad word, and so on.

2.      The assembler is unconcerned with what an assembly language variable means. (Keeping track of such things is totally up to you.) The assembler only worries about how big it is.

3.      Register data always has a fixed and obvious type, since a register's size cannot be changed.

4.      The type of immediate data depends on the magnitude of the immediate value.

in brief terms, you can define named variables in your assembly language programs using such directives as DB and DW. It looks like this:

  Counter     DB 0
  MixTag      DW 32
By using DB, you give variable Counter a type and hence a size. You must match this type when you use the variable name Counter in an instruction to indicate memory data.
         MOV BL,BYTE [Counter]

5.      So, although NASM uses the DB directive to allocate one byte of memory for the variable Counter, it does not remember that Counter takes up only one byte when you insert Counter as an operand in a machine instruction. You must build that specification into your source code, by using the BYTE directive. This will force you to think a little bit more about what you're doing at every point that you do it; that is, right where you use variable names as instruction operands. Doing so may help you avoid certain really stupid mistakes-like the ones I used to make all the time while I was working with MASM, most of which came out of trying to let the assembler do my thinking for me.

 

Chapter 8: Our Object All Sublime Creating Programs that Work

Overview

This is the best way to learn to assemble: By pulling apart programs written by those who know what they're doing.

The Bones of an Assembly Language Program

1.      This issue of comprehensibility is utterly central to quality assembly language programming.

2.      One of the aims of assembly language coding is to use as few instructions as possible in getting the job done. This does not mean creating as short a source code file as possible.

3.      Comments are neither time nor space wasted. IBM used to say, One line of comments per line of code. That's good-and should be considered a minimum for assembly language work.

         1 ; Source Name            : EAT.ASM
 2 ; Executeable name    : EAT.COM
 3 ; Code model        : Real mode flat model
 4 ; Version        : 1.0
 5 ; Created date        : 6/4/1999
 6 ; update         : 9/10/1999 
 7 ; Author        : Jeff Duntemann
 8 ; Desciption        : A simple example of a DOS .COM file programmed using
 9 ;              NASM-IDE 1.1 and NASM 0.98
10 
11     [BITS 16]        ; Set 16 bit code generation
12     [ORG 0100H]        ; Set code start address to 100h (COM file)
13 START
14     mov    dx, eatmsg    ; Mem data ref without [] loads the ADDRESS!
15     mov    ah, 9        ; Function 9 displays text to standard outpu.
16     int     21H        ; INT 21H makes the call into DOS.
17 
18     mov    ax, 04C00H    ; This DOS function exits the program
19     int    21H        ; and returns control to DOS.
20 
21     [SECTION .data]        ; Section containing initialized data
22     eatmsg    db "Eat at joe's!"1310"$" ;Here's our message
23     


The Simplicity of Flat Model

1.      I recommend placing a summary comment block like this at the top of every source code file you create.

2.      Beneath the comment block is a short sequence of commands directed to the assembler. These commands are placed in square brackets so that NASM knows that they are for its use, and are not to be interpreted as part of the program.

3.      The BITS command tells NASM that the program it's assembling is intended to be run in real mode, which is a 16-bit mode. Using [BITS 32] instead would have brought into play all the marvelous 32-bit protected mode goodies introduced with the 386 and later x86 CPUs.

4.      "ORG" is an abbreviation of origin, and what it specifies is sometimes called the origin of the program, which is where code execution begins. Code execution begins at 0100H for this program. The 0100h value (the h and H are interchangeable) is loaded into the instruction pointer IP by DOS when the program is loaded and run. Why 0100H? The real mode flat model (which is often called the .COM file model) has a 256-byte prefix at the beginning of its single segment. This is the Program Segment Prefix (PSP) and it has several uses that I won't be explaining here. The PSP is basically a data buffer and contains no code. The code cannot begin until after the PSP, so the 0100H value is there to tell DOS to skip those first 256 bytes.

5.      NASM divides your programs into what it calls sections. These sections are less important in real mode flat model than in real mode segmented model, when sections map onto segments. (More on this later.) In flat model, you have only one segment. But the SECTION commands tell NASM where to look for particular types of things. In the .text section, NASM expects to find program code. In the .data section, NASM expects to find the definitions for your initialized variables. A third section is possible, the .bss section, which contains uninitialized data.

Labels

1.    A label is a sort of bookmark, holding a place in the program code and giving it a name that's easier to remember than a memory address. The START: label indicates where the program begins. Technically speaking, the START: label isn't necessary in EAT.ASM. You could eliminate the START: label and the program would still assemble and run. However, I think that every program should have a START: label as a matter of discipline. That's why EAT.ASM has one.

2.    Labels are used to indicate where JMP instructions should jump to.

3.    The only distinguishing characteristic of labels is that they're followed by colons. Some rules govern what constitutes a valid label:

·         Labels must begin with a letter or with an underscore, period, or question mark. These last three have special meanings (especially the period), so I recommend sticking with letters until you're way further along in your study of assembly language and NASM.

·         Labels must be followed by a colon when they are defined. This is basically what tells NASM that the identifier being defined is a label. NASM will punt if no colon is there and will not flag an error, but the colon nails it, and prevents a misspelled mnemonic from being mistaken for a label. So use the colon!

·         Labels are case sensitive. So yikes:, Yikes:, and YIKES: are three completely different labels. This differs from practice in a lot of languages (Pascal particularly) so keep it in mind.

4.    The colon is only placed where the label is defined, not where it is referenced.

Variables for Initialized Data

1.    The identifier eatmsg defines a variable. Specifically, eatmsg is a string variable (more on which follows) but still, as with all variables, it's one of a class of items we call initialized data: something that comes with a value, and not just a box that will accept a value at some future time. A variable is defined by associating an identifier with a data definition directive. Data definition directives look like this:

  MyByte      DB 07H            ; 8 bits in size
  MyWord      DW 0FFFFH         ; 16 bits in size
  MyDouble    DD 0B8000000H     ; 32 bits in size

Think of the DB directive as "Define Byte." DB sets aside one byte of memory for data storage. Think of the DW directive as "Define Word." DW sets aside one word of memory for data storage. Think of the DD directive as "Define Double." DD sets aside a double word in memory for storage, typically for full 32-bit addresses.

 

2.    I find it useful to put some recognizable value in a variable whenever I can, even if the value is to be replaced during the program's run.

String Variables

1.    String variables are an interesting special case. A string is just that: a sequence or string of characters, all in a row in memory.

2.    Strings are a slight exception to the rule that a data definition directive sets aside a particular quantity of memory. The DB directive ordinarily sets aside one byte only. However, a string may be any length you like, as long as it remains on a single line of your source code file. Because there is no data directive that sets aside 16 bytes, or 42, strings are defined simply by associating a label with the place where the string starts. The eatmsg label and its DB directive specify one byte in memory as the string's starting point. The number of characters in the string is what tells the assembler how many bytes of storage to set aside for that string.

3.    Either single quote (') or double quote (") characters may be used to delineate a string, and the choice is up to you, unless you're defining a string value that itself contains one or more quote characters.

4.    You may combine several separate substrings into a single string variable by separating the substrings with commas.

5.    Both add a dollar sign ($) in quotes to the end of the main string data. The dollar sign is used to mark the end of the string for the mechanism that displays the string to the screen.

6.    What, then, of the "13,10" in eatmsg? Inherited from the ancient world of electromechanical Teletype machines, these two characters are recognized by DOS as meaning the end of a line of text that is output to the screen. If anything further is output to the screen, it will begin at the left margin of the next line below. You can concatenate such individual numbers within a string, but you must remember that they will not appear as numbers.

7.    A string is a string of characters. A number appended to a string will be interpreted by most operating system routines as an ASCII character.

Directives versus Instruction Mnemonics

1.    Data definition directives look a little like machine instruction mnemonics, but they are emphatically not machine instructions.

2.    There is no binary opcode for DW, DB, and the other directives.

3.    Machine instructions, as the name implies, are instructions to the CPU itself. Directives, by contrast, are instructions to the assembler.

4.    Understanding directives is easier when you understand the nature of the assembler's job. The assembler scans your source code text file, and as it scans your source code file it builds an object code file on disk. It builds this object code file step by step, one byte at a time, starting at the beginning of the file and working its way through to the end. When it encounters a machine instruction mnemonic, it figures out what binary opcode is represented by that mnemonic and writes that binary opcode (which may be anywhere from one to six actual bytes) to the object code file. When the assembler encounters a directive such as DW, it does not write any opcode to the object code file. DW is a kind of signpost to the assembler, reading "Set aside two bytes of memory right here, for the value that follows." The DW directive specifies an initial value for the variable, and so the assembler writes the bytes corresponding to that value in the two bytes it set aside. The assembler writes the address of the allocated space into a table, beside the label that names the variable. Then the assembler moves on, to the next directive (if there are further directives) or to whatever comes next in the source code file.

The Difference between a Variable's Address and Its Contents

1.    When you place a variable's identifier in a MOV instruction, you are accessing the variable's address, as explained previously. By contrast, if you want to work with the value stored in that variable, you must place the variable's identifier in square brackets.

2.    There are many situations in which you need to move the address of a variable into a register rather than the contents of the variable. In fact, you may find yourself moving the addresses of variables around more than the contents of the variables, especially if you make a lot of calls to DOS and BIOS services.

3.    In assembly language, knowing where a variable is located is essential in order to do lots of important things.

posted @ 2007-01-21 12:56  Freedom  阅读(445)  评论(0编辑  收藏  举报