Conversion Between Signed and Unsigned

C allows casting between different numeric data types.

For example, suppose variable x is declared as int and u as unsigned.

The expression (unsigned) x converts the values of x to an unsigned value,

and (int) u converts the value of to unsigned,or vice versa?

From a mathematical perspective, one can imagine several different conventions.

Clearly, we want to perserve any value that can be represented in both forms.

On the other hand, converting a negtive value to unsigned might yield zero.

Converting an unsigned value that is too large to be represented in two's-complement from might yield TMax.

For most implementations of C, however,the answer to this question is based on a bit-level perspective, rather than on a numeric one.

For example, consider the following code:

#include 
int main(){
    short int v = -12345;
    unsigned short uv = (unsigned short)v;
    printf("v = %d, uv = %u\n", v, uv);
    return 0;
}

When run on a two's-complement machine, it generates the following output:

v = -12345. uv = 53191

What we see here is that the effect of casting is to keep the bit values identical but change how these bits are interpreted.

We saw in Figure 2.14 that the 16-bit two's -complement respresentation of -12,345 is identical to the 16-bit unsigned representation of 53,191.

Casting from short int to unsigned short changed the numeric value, but not the bit representation.

Similarly, consider the following code:

#include 
int main(){
    unsigned u = 4294967295u;
    int tu = (int)u;
    printf("u = %u, tu = %d\n", u, tu);
    return 0;
}

When run on a two's-complement machine, it generates the folowing output:

u = 4294967295, tu = -1

Why ? p100

The numeric values might change, but the bit patterns do not.

#include <stdio.h>
int main(){
    int x = -1;
    unsigned u = 2147483648;
    printf("x = %u = %d\n", x, x);
    printf("u = %u = %d\n", u, u);
    return 0;
}

You need to understand what is the "%u" and why this result can be the truth !

These problems help you think about the relation between Boolean operations and typical ways that programmers apply masking operations. Here is the code:

 1 /* Declaration of functions implementing operations bis and bic */
 2     int bis(int x, int m);
 3     int bic(int x, int m);
 4     return 0;
 5 
 6 /* Computer x|y using only calls to functions bis and bic */
 7 int bool_or(int x,int y)
 8 {
 9     int result = bis(x, y);
10     return result;
11 }
12 
13 /* Computer x^y using only calls to functions bis and bic */
14 int bool_or(int x,int y)
15 {
16     int result = bis(bic(x, y), bic(y, x));
17     return result;
18 }

The bis operation is equivalent to Boolean OR--a bit is set in z if either this bit is set in x or it is set in m. On the other hand, bic(x, m) is equivalent to x&~m;

we want the result to equal 1 only when the corresponding bit of x is 1 and of m is 0.

Given that, we can implement | with a single call to bis. To implement ^, we take advantage of the property

x ^ y = (x & ~y) | (~x & y).

Machine-Level Representation of Programs

Computers execute machine code, sequences of bytes encoding the low-level operations that manipulate data, manage memory, read and write data on storage

devices, and communicate over networks.

GCC invokes both an assembler and a linker to generate the executable machine code from the assembly code.

The type checking provided by a compiler helps detect many programerrors and makes sure we reference and manipulate data in consistent ways.

Best of all, a program written in a high-level language can be compiled and executed on a number of different machines, whereas assembly code is highly machine specific.

Why should we spend our time learning machine code ?

By invoking the compiler with appropriate command-line parameters, the compiler will generate a file showing its output in assembly-code form.

By reading this code, we can understand the optimization capabilities of the compiler and analyze the underlying inefficiencies in the code.

Furthermore, there are times when the layer of abstraction provided by a high-level language hides information about the run-time behavior of a program that we need to understand.

The need for programmers to learn assembly code has shifted over the years from one of being able to write programs directly in assembly to one of being able to read and understand the code generated by compilers.

Relative to the computations expressed in the C code, optimizing compilers can rearrange execution order, eliminate unneeded computations, replace slow operations with faster ones, and even change recursive(递归）computations into iterative ones. Understanding the relation between source code and the generated assembly can often be a challenge-it's much like putting together a puzzle having a slightly the system and working backward. In this case, the system is a machine-generated assembly-language program, rather than something designed by a human.

Each successive version of the GCCcompiler implements more sophisticated optimization algprithms, and these can radically transform a program to the point where it is difficult to understand the relationship between the origianl source code and the generated machine-level program.

One approach is to write entire functions in assembly code and combine them with C functions during the linking stage.

A second is to use GCC's support for embedding assembly code directly within C programs.

8086：One of the first single-chip.

80286: The original platform for MS windows.

i386: This was the first machine in the series that could support a Unix operating system.

i486: Improved performance and integrated the floating-point unit onto the processor chip but did not significantly change the instruction set.

Pentinum: Improved performance, but only added minor extensions to the instruction set.

PentinumPro: Added a class of "conditional move" instructions to the instruction set.

Pentinum II: Continuation of the P6 microarchitecture.

Pentinum III: Introduced a radically new processor design, internally known as the P6 microarchitecture.

Pentinum 4: Extended SSE to SSE2, adding new data types, along with 144 new instructions for these formats.

Pentinum 4E: Added hyperthreading, a method to run two programs.

Core 2: Did not support hyporthreading.

Core i7: Incorporated both hyperthreading and multi-core, with the initial version supporting two executing programs on each core and up to four cores on each chip.

Each successive processor has been designed to be backward compatible - able to run code compiled for any earlier version.

Only by giving specific command-line options, or by compiling for 64-bit operation, will the compiler make use of the more recent extensions.

The GCC command actually invokes a sequence of programs to turn the source code into executable code.

The compiler does most of the work in the overall compilation sequence, transforming prgrams expressed in the relatively abstract executation model provided by C into the very elementary instructions that the processor executes.

Whereas C provides a model in which objects of different data types can be declared and allocated in memory, machine code views the memory as simply a large, byte-addressable array. The operating system manages this virtual address space, translating virtual addresses into the physical sddresses of values in the actual processor memory. A single machine instruction performs only a very elementary operation.

The compiler must generate sequences of such instructions to implement program constructs such as arithmetic expression evaluation, loops, or procedure calls and returns.

The program actually executed by the machine is simply a sequence of bytes encoding a series of instructuons. The machine has very little information about the source code from which these instructions were generated.

C declartion	Intel data type	Assembly code suffix	Size(bytes)
char	Byte	b	1
short	Word	w	2
int	Double word	l	4
long int	Double word	l	4
long long int	-	-	4
char *	Double word	l	4
float	Single precision	s	4
double	Double precision	l	8
long double	Extended precision	t	10/12

As the table indicates, most assembly-code instructions generated by GCC have a single-character suffix denoting the size of the operand.

The different operand possibilities can be classified into three types. The first type, immediate, is for constant values, In ATT-format assembly code, these are written with a '$' followed by an integer using standard C notation, for example, $-577 or $0x1F. Any value that fits into a 32-bit word can be used, although the assembler will use 1 - or 2 byte encodings when possible. The second type, register, denotes the contents of one of the registers, either one of the eight 32-bit registers (e.g.,%eax) for a double-word operation, one of the eight 16-bit registers (e.g.,%ax) for a word operation, or one of the eight single-byte register elements (e.g.,%al) for a byte operation. The third type of operand is a memory reference, in which we access some memory location according to a computed address, often called the effective anddress.

As we will see, the more complex addressing modes are useful when referencing array and structure elements.

Among the most heavily used instructions are those that copy data from one location to another. The generality of the operand notation allows a simple data movement instruction to perform what in many machines would require a number of instructions.

For most other indicates, this data type will be represented using the same 8-byte format of the ordinary double data type.

AS the table indicates, most assembly-code instructions generated by GCC have a single-character suffix denoting the size of the operand.

Note that the assembly code uses the suffix '1' to denote both a 4-byte integer as well as an 8-byte double-precision floating-point number.

This causes no ambiguity, since floating point invovles an entirely different set of instructions and registers.

Accessing information

An IA32 central processing unit (CPU) contains a set of eight registers storing 32-bit values. These registers are used to store integer data as well as pointers Figure 3.2 diagrams the eights registers. Their names all begin with %e, but otherwise, they have have peculiar names.

operand specifiers

Most instructions have one or more operands, specifying the source values to reference in performing an operation and the destination location into which to place the result.

The different operand possibilities can be classified into three types.

1. The first type, immediate, is for constant values. In ATT-format assembly code, these are written with a '$' followed by an integer using standard C notation, for example, $-577 or $0x1F. Any value that fits into a 32-bit word can be used, although the assembler will use 1-or 2

posted @ 2021-10-19 15:35 acmWang 阅读(100) 评论(0) 收藏举报

刷新页面返回顶部

请在图片上左右移动鼠标查看效果

acmWang

最是人间留不住，朱颜辞镜花辞树！

Conversion Between Signed and Unsigned

公告