1. 程式人生 > >linux彙編工具之GAS(AT&T 語法)和NASM(Intel 語法)比較

linux彙編工具之GAS(AT&T 語法)和NASM(Intel 語法)比較

前言:在學校時,學過的彙編是Intel語言的彙編,編譯器是MASM,使用的是DOS系統。慚愧的是那時沒有對組合語言有較深入的研究,有許多問題也不慎瞭解,迷迷糊糊至今。最近,在看《使用開源軟體-自己寫作業系統》http://code.google.com/p/writeos/ 和《自己動手寫作業系統》裡面提到了GNU AS編譯器NASM編譯器 ,於是,重新複習了一下組合語言程式設計的相關知識,對組合語言有了進一步的瞭解。

在Linux0.11核心原始碼中,bootsect.s和setup.s是真實模式下執行的16位程式碼程式,採用近似Intel的組合語言語法並且需要使用Intel8086彙編編譯器和聯結器as86和Ld86,而head.s使用GNU的彙編程式格式,並且執行在保護模式下,需要GNU的as(gas)進行編譯,使用的是AT&T語法。

Linus使用這兩種編譯器的原因是linus那時的彙編編譯器無法支援16位真實模式程式碼程式編譯,在核心2.4.x開始,bootsect.s和head.s程式完全使用統一的as來編寫。關於GNU as的使用,可參考GNU彙編器手冊《Using as-The GNU Assembler》。由此可見,彙編 語法 編譯器 是相互對應的。看來,應該瞭解一下編譯原理。。。

DOS下的組合語言程式設計:

安裝了DOS了以後,再下載MASM611編譯器,並安裝,這樣DOS下的彙編環境就搭建好了。MASM使用的是Intel語法,在學校用的就是這種語法,所以現在對這種語法比較熟悉,心裡面比較容易接受。

Linux下的組合語言程式設計:

一般GNU/Linux系統都會安裝好了GNU Assembler,所以就不用單獨安裝了,可以直接使用了。GAS使用的是AT&T語法。

此外,還有一個彙編編譯器-NASM,它既可以在Linux中使用,也可在Windows中使用,它使用的語法是Intel語法,與MASM類似。

Intel語法和AT&T語法的區別:

以下是一段關於兩者區別的描述

http://www1.imada.sdu.dk/~kslarsen/dm516/Litteratur/IntelnATT.htm

Intel and AT&T Syntax.

Intel and AT&T syntax Assembly language are very different from each other in appearance, and this will lead to confusion when one first comes across AT&T syntax after having learnt Intel syntax first, or vice versa. So lets start with the basics.

In Intel syntax there are no register prefixes or immed prefixes. In AT&T however registers are prefixed with a '%' and immed's are prefixed with a '$'. Intel syntax hexadecimal or binary immed data are suffixed with 'h' and 'b' respectively. Also if the first hexadecimal digit is a letter then the value is prefixed by a '0'.

Example:

Intex Syntax

mov     
eax,1
mov     
ebx,0ffh
int     
80h

AT&T Syntax

movl    
$1,%eax
movl    
$0xff,%ebx
int     
$0x80

The direction of the operands in Intel syntax is opposite from that of AT&T syntax. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T syntax the first operand is the source and the second operand is the destination. The advantage of AT&T syntax in this situation is obvious. We read from left to right, we write from left to right, so this way is only natural.

Example:

Intex Syntax

instr   
dest,source
mov     
eax,[ecx]

AT&T Syntax

instr   
source,dest
movl    
(%ecx),%eax

Memory operands as seen above are different also. In Intel syntax the base register is enclosed in '[' and ']' whereas in AT&T syntax it is enclosed in '(' and ')'.

Example:

Intex Syntax

mov     
eax,[ebx]
mov     
eax,[ebx+3]

AT&T Syntax

movl    
(%ebx),%eax
movl    
3(%ebx),%eax

The AT&T form for instructions involving complex operations is very obscure compared to Intel syntax. The Intel syntax form of these is segreg:[base+index*scale+disp]. The AT&T syntax form is %segreg:disp(base,index,scale).

Index/scale/disp/segreg are all optional and can simply be left out. Scale, if not specified and index is specified, defaults to 1. Segreg depends on the instruction and whether the app is being run in real mode or pmode. In real mode it depends on the instruction whereas in pmode its unnecessary. Immediate data used should not '$' prefixed in AT&T when used for scale/disp.

Example:

Intel Syntax

instr   
foo,segreg:[base+index*scale+disp]
mov     
eax,[ebx+20h]
add     
eax,[ebx+ecx*2h
lea     
eax,[ebx+ecx]
sub     
eax,[ebx+ecx*4h-20h]

AT&T Syntax

instr   
%segreg:disp(base,index,scale),foo
movl    
0x20(%ebx),%eax
addl    
(%ebx,%ecx,0x2),%eax
leal    
(%ebx,%ecx),%eax
subl    
-0x20(%ebx,%ecx,0x4),%eax

As you can see, AT&T is very obscure. [base+index*scale+disp] makes more sense at a glance than disp(base,index,scale).

As you may have noticed, the AT&T syntax mnemonics have a suffix. The significance of this suffix is that of operand size. 'l' is for long, 'w' is for word, and 'b' is for byte. Intel syntax has similar directives for use with memory operands, i.e. byte ptr, word ptr, dword ptr. "dword" of course corresponding to "long". This is similar to type casting in C but it doesnt seem to be necessary since the size of registers used is the assumed datatype.

Example:

Intel Syntax

mov     
al,bl
mov     
ax,bx
mov     
eax,ebx
mov     
eax, dword ptr [ebx]

AT&T Syntax

movb    
%bl,%al
movw    
%bx,%ax
movl    
%ebx,%eax
movl    
(%ebx),%eax

官網或線上文件

NASM(Netwide Assembler)

The Netwide Assembler, NASM, is an 80x86 and x86-64 assembler designed for portability and modularity. It supports a range of object file formats, including Linux and *BSD a.out , ELF , COFF , Mach-O , Microsoft 16-bit OBJ , Win32 and Win64 . It will also output plain binary files. Its syntax is designed to be simple and easy to understand, similar to Intel's but less complex . It supports all currently known x86 architectural extensions, and has strong support for macros.

The Netwide Assembler grew out of an idea on comp.lang.asm.x86 (or possibly alt.lang.asm - I forget which), which was essentially that there didn't seem to be a good free x86-series assembler around, and that maybe someone ought to write one

  • a86 is good, but not free, and in particular you don't get any 32-bit capability until you pay. It's DOS only, too.
  • gas is free, and ports over to DOS and Unix, but it's not very good, since it's designed to be a back end to gcc , which always feeds it correct code. So its error checking is minimal. Also, its syntax is horrible, from the point of view of anyone trying to actually write anything in it. Plus you can't write 16-bit code in it (properly.)
  • as86 is specific to Minix and Linux , and (my version at least) doesn't seem to have much (or any) documentation.
  • MASM isn't very good, and it's (was) expensive, and it runs only under DOS.
  • TASM is better, but still strives for MASM compatibility, which means millions of directives and tons of red tape. And its syntax is essentially MASM's, with the contradictions and quirks that entails (although it sorts out some of those by means of Ideal mode.) It's expensive too. And it's DOS-only.

GNU Assembler