1. 程式人生 > >ARM處理器的堆疊和函式呼叫,以及與Sparc的比較

ARM處理器的堆疊和函式呼叫,以及與Sparc的比較

主要描述一下ARM處理器的堆疊和函式呼叫過程,並和Sparc處理器進行對比分析。

關於ARM處理器的內容來自以下網址,該網站是個學習ARM彙編的好地方,對該篇文章註解了一下,最後和Sparc進行對比。

https://azeria-labs.com/functions-and-the-stack-part-7/

Sparc的原理,Sparc V8 彙編指令、暫存器視窗、堆疊、函式呼叫

STACK AND FUNCTIONS

In this part we will look into a special memory region of the process called the Stack. This chapter covers Stack’s purpose and operations related to it. Additionally, we will go through the implementation, types and differences of functions in ARM.

堆疊是程序的一個特殊記憶體區域。堆疊的使用對於不同處理器的實現是不一樣的。介紹堆疊的實現,型別以及。。。

 STACK

Generally speaking, the Stack is a memory region within the program/process. This part of the memory gets allocated when a process is created. We use Stack for storing temporary data such as local variables of some function, environment variables which helps us to transition between the functions, etc. We interact with the stack using PUSH and POP instructions. As explained in Part 4: Memory Instructions: Load And Store PUSH and POP are aliases to some other memory related instructions rather than real instructions, but we use PUSH and POP for simplicity reasons.

堆疊是屬於某個程式或程序的。當程序建立時,這部分堆疊記憶體也被分配。用堆疊儲存區域性變數,用於幫助我們在函式之間轉移的環境變數,等。為簡便起見,用PUSH和POP來訪問堆疊,類似Sparc的助記符。

Before we look into a practical example it is import for us to know that the Stack can be implemented in various ways. First, when we say that Stack grows, we mean that an item (32 bits of data) is put on to the Stack. The stack can grow UP (when the stack is implemented in a Descending fashion) or DOWN (when the stack is implemented in a Ascending fashion). The actual location where the next (32 bit) piece of information will be put is defined by the Stack Pointer, or to be precise, the memory address stored in the SP register. Here again, the address could be pointing to the current (last) item in the stack or the next available memory slot for the item. If the SP is currently pointing to the last item in the stack (Full stack implementation) the SP will be decreased (in case of Descending Stack) or increased (in case of Ascending Stack) and only then the item will placed in the Stack. If the SP is currently pointing to the next empty slot in the Stack, the data will be first placed and only then the SP will be decreased (Descending Stack) or increased (Ascending Stack).

In our examples we will use the Full descending Stack. Let’s take a quick look into a simple exercise which deals with such a Stack and it’s Stack Pointer.

  按照堆疊的生長方向和堆疊指標SP指向的位置,堆疊可以分為4種。例子中使用Full descending Stack,即上圖第二種,堆疊向低地址生長,SP指向最後一個數據。

 文章製作了很多精美的gif圖,下圖是一個簡單例子中堆疊和暫存器的變化。

 We will see that functions take advantage of Stack for saving local variables, preserving register state, etc. To keep everything organized, functions use Stack Frames, a localized memory portion within the stack which is dedicated for a specific function. A stack frame gets created in the prologue (more about this in the next section) of a function. The Frame Pointer (FP) is set to the bottom of the stack frame and then stack buffer for the Stack Frame is allocated. The stack frame (starting from it’s bottom) generally contains the return address (previous LR), previous Frame Pointer, any registers that need to be preserved, function parameters (in case the function accepts more than 4), local variables, etc. While the actual contents of the Stack Frame may vary, the ones outlined before are the most common. Finally, the Stack Frame gets destroyed during the epilogue of a function.

為了使堆疊使用有組織、有條理,函式使用棧幀stack frame,棧幀是專用於某個函式的堆疊的一部分記憶體區域。整個程序或任務的叫堆疊,某個函式的叫棧幀。

在函式起始處,分配函式的棧幀。FP會設定為棧幀的底部,SP設定為棧幀的頂部?

棧幀一般用於儲存返回地址(之前的LR),之前的FR,需要儲存的暫存器,函式引數(如果函式引數超過4個的話),區域性變數,等。

在函式結束處,棧幀會被釋放。

一個例子,

 1 /* azeria@labs:~$ gcc func.c -o func && gdb func */
 2 int main()
 3 {
 4  int res = 0;
 5  int a = 1;
 6  int b = 2;
 7  res = max(a, b);
 8  return res;
 9 }
10 
11 int max(int a,int b)
12 {
13  do_nothing();
14  if(a<b)
15  {
16  return b;
17  }
18  else
19  {
20  return a;
21  }
22 }
23 int do_nothing()
24 {
25  return 0;
26 }
View Code

We can see in the picture above that currently we are about to leave the function max (see the arrow in the disassembly at the bottom). At this state, the FP (R11) points to 0xbefff254 which is the bottom of our Stack Frame. This address on the Stack (green addresses) stores 0x00010418 which is the return address (previous LR). 4 bytes above this (at 0xbefff250) we have a value 0xbefff26c, which is the address of a previous Frame Pointer. The 0x1 and 0x2 at addresses 0xbefff24c and 0xbefff248 are local variables(其實是輸入引數) which were used during the execution of the function max. So the Stack Frame which we just analyzed had only LR, FP and two local variables.

push  {r11, lr},  在該句之前$sp=0xbefff258,在該句之後,$sp=0xbefff250

add  r11, sp, #4  r11=0xbefff254,即fp

sub  sp, sp, #8  之前,已經用了2個單元的堆疊,還需要兩個單元用於儲存max的輸入引數,因此,將sp=sp-8=0xbefff248

  max函式的棧幀即為0xbefff248~~0xbefff254。

str  r0, [r11, #-8]  將輸入引數1(放在r0傳遞進來的)放在max的棧幀中

str  r0, [r11, #-12]  將輸入引數2(放在r1傳遞進來的)放在max的棧幀中

。。。

sub  sp, r11, #4  將r11減去4賦值給sp(應該是+4啊?),即在max結束處,將sp復原為main的棧幀,sp=0xbefff258

pop  {r11, pc}  將max第一句存的lr賦值給pc,將fp恢復回來

 

FUNCTIONS

To understand functions in ARM we first need to get familiar with the structural parts of a function, which are:

  1. Prologue,起始,序曲
  2. Body
  3. Epilogue,結束,尾聲

The purpose of the prologue is to save the previous state of the program (by storing values of LR and R11 onto the Stack) and set up the Stack for the local variables of the function. While the implementation of the prologue may differ depending on a compiler that was used, generally this is done by using PUSH/ADD/SUB instructions. An example of a prologue would look like this:

函式起始:

(1)儲存之前的狀態(將LR和R11儲存到堆疊,下面第1句)

(2)設定堆疊的fp,一般是將fp=sp+4(因為之前push已經移動了2個單位)

(3)設定堆疊的sp,sp現在已經移動了2個單位,再移動剩餘所需的空間即可。

1 push   {r11, lr}    /* Start of the prologue. Saving Frame Pointer and LR onto the stack */
2 add    r11, sp, #0  /* Setting up the bottom of the stack frame */
3 sub    sp, sp, #16  /* End of the prologue. Allocating some buffer on the stack. This also allocates space for the Stack Frame */

The body part of the function is usually responsible for some kind of unique and specific task. This part of the function may contain various instructions, branches (jumps) to other functions, etc. An example of a body section of a function can be as simple as the following few instructions:

1 mov    r0, #1       /* setting up local variables (a=1). This also serves as setting up the first parameter for the function max */
2 mov    r1, #2       /* setting up local variables (b=2). This also serves as setting up the second parameter for the function max */
3 bl     max          /* Calling/branching to function max */

The sample code above shows a snippet of a function which sets up local variables and then branches to another function. This piece of code also shows us that the parameters of a function (in this case function max) are passed via registers. In some cases, when there are more than 4 parameters to be passed, we would additionally use the Stack to store the remaining parameters. It is also worth mentioning, that a result of a function is returned via the register R0. So what ever the result of a function (max) turns out to be, we should be able to pick it up from the register R0 right after the return from the function. One more thing to point out is that in certain situations the result might be 64 bits in length (exceeds the size of a 32bit register). In that case we can use R0 combined with R1 to return a 64 bit result.

不超過4個的輸入引數可以通過暫存器傳遞,若超過4個引數,則超過的需要通過堆疊傳遞。函式返回值也是通過R0傳遞。

The last part of the function, the epilogue, is used to restore the program’s state to it’s initial one (before the function call) so that it can continue from where it left of. For that we need to readjust the Stack Pointer. This is done by using the Frame Pointer register (R11) as a reference and performing add or sub operation. Once we readjust the Stack Pointer, we restore the previously (in prologue) saved register values by poping them from the Stack into respective registers. Depending on the function type, the POP instruction might be the final instruction of the epilogue. However, it might be that after restoring the register values we use BX instruction for leaving the function. An example of an epilogue looks like this:

函式結束,恢復初始狀態:

(1)設定堆疊的sp,一般通過r11=fp來設定,通常應該是sp=r11+4。

(2)恢復之前儲存的r11=fp和lr到r11和PC。

1 sub    sp, r11, #0  /* Start of the epilogue. Readjusting the Stack Pointer */
2 pop    {r11, pc}    /* End of the epilogue. Restoring Frame Pointer from the Stack, jumping to previously saved LR via direct load into PC. The Stack Frame of a function is finally destroyed at this step. */

So now we know, that:

  1. Prologue sets up the environment for the function;
  2. Body implements the function’s logic and stores result to R0;
  3. Epilogue restores the state so that the program can resume from where it left of before calling the function.

Another key point to know about the functions is their types: leaf and non-leaf. The leaf function is a kind of a function which does not call/branch to another function from itself. A non-leaf function is a kind of a function which in addition to it’s own logic’s does call/branch to another function. The implementation of these two kind of functions are similar. However, they have some differences. To analyze the differences of these functions we will use the following piece of code:

另一個關於函式的要點是,函式分葉子函式和非葉子函式。葉子函式裡不再繼續呼叫其它函式,非葉子函式裡會繼續呼叫其它函式

 1 /* azeria@labs:~$ as func.s -o func.o && gcc func.o -o func && gdb func */
 2 .global main
 3 
 4 main:
 5     push   {r11, lr}    /* Start of the prologue. Saving Frame Pointer and LR onto the stack */
 6     add    r11, sp, #0  /* Setting up the bottom of the stack frame */
 7     sub    sp, sp, #16  /* End of the prologue. Allocating some buffer on the stack */
 8     mov    r0, #1       /* setting up local variables (a=1). This also serves as setting up the first parameter for the max function */
 9     mov    r1, #2       /* setting up local variables (b=2). This also serves as setting up the second parameter for the max function */
10     bl     max          /* Calling/branching to function max */
11     sub    sp, r11, #0  /* Start of the epilogue. Readjusting the Stack Pointer */
12     pop    {r11, pc}    /* End of the epilogue. Restoring Frame pointer from the stack, jumping to previously saved LR via direct load into PC */
13 
14 max:
15     push   {r11}        /* Start of the prologue. Saving Frame Pointer onto the stack */
16     add    r11, sp, #0  /* Setting up the bottom of the stack frame */
17     sub    sp, sp, #12  /* End of the prologue. Allocating some buffer on the stack */
18     cmp    r0, r1       /* Implementation of if(a<b) */
19     movlt  r0, r1       /* if r0 was lower than r1, store r1 into r0 */
20     add    sp, r11, #0  /* Start of the epilogue. Readjusting the Stack Pointer */
21     pop    {r11}        /* restoring frame pointer */
22     bx     lr           /* End of the epilogue. Jumping back to main via LR register */

The example above contains two functions: main, which is a non-leaf function, and max – a leaf function. As mentioned before, the non-leaf function calls/branches to another function, which is true in our case, because we branch to a function max from the function main. The function max in this case does not branch to another function within it’s body part, which makes it a leaf function.

Another key difference is the way the prologues and epilogues are implemented. The following example shows a comparison of prologues of a non-leaf and leaf functions. The main difference here is that the entry of the prologue in the non-leaf function saves more register’s onto the stack. The reason behind this is that by the nature of the non-leaf function, the LR gets modified during the execution of such a function and therefore the value of this register needs to be preserved so that it can be restored later. Generally, the prologue could save even more registers if it’s necessary.

函式起始:對於非葉子函式,因為進一步呼叫其它函式會改變LR暫存器,因此,在函式起始,需要將r11和LR一起壓入堆疊儲存。而對於葉子函式,不再呼叫其它函式,LR不會改變,因此,不需要將LR壓入堆疊。

1 /* A prologue of a non-leaf function */
2 push   {r11, lr}    /* Start of the prologue. Saving Frame Pointer and LR onto the stack */
3 add    r11, sp, #0  /* Setting up the bottom of the stack frame */
4 sub    sp, sp, #16  /* End of the prologue. Allocating some buffer on the stack */
5 
6 /* A prologue of a leaf function */
7 push   {r11}        /* Start of the prologue. Saving Frame Pointer onto the stack */
8 add    r11, sp, #0  /* Setting up the bottom of the stack frame */
9 sub    sp, sp, #12  /* End of the prologue. Allocating some buffer on the stack */

The comparison of the epilogues of the leaf and non-leaf functions, which we see below, shows us that the program’s flow is controlled in different ways: by branching to an address stored in the LR register in the leaf function’s case and by direct POP to PC register in the non-leaf function.

函式結束:對於葉子函式,可以直接bx lr,跳轉到LR處繼續執行,因為,LR未改變。BX的意思為Branch and eXchange ARM/Thumb模式。

對於非葉子函式,需要將之前儲存的LR恢復給PC,來繼續執行。

1 /* An epilogue of a leaf function */
2 add    sp, r11, #0  /* Start of the epilogue. Readjusting the Stack Pointer */
3 pop    {r11}        /* restoring frame pointer */
4 bx     lr           /* End of the epilogue. Jumping back to main via LR register */
5 
6 /* An epilogue of a non-leaf function */
7 sub    sp, r11, #0  /* Start of the epilogue. Readjusting the Stack Pointer */
8 pop    {r11, pc}    /* End of the epilogue. Restoring Frame pointer from the stack, jumping to previously saved LR via direct load into PC */

Finally, it is important to understand the use of BL and BX instructions here. In our example, we branched to a leaf function by using a BL instruction. We use the the label of a function as a parameter to initiate branching. During the compilation process, the label gets replaced with a memory address. Before jumping to that location, the address of the next instruction is saved (linked) to the LR register so that we can return back to where we left off when the function max is finished.

在BL的時候,將呼叫指令的下一條指令地址已經儲存(連結)在了LR暫存器。

The BX instruction, which is used to leave the leaf function, takes LR register as a parameter. As mentioned earlier, before jumping to function max the BL instruction saved the address of the next instruction of the function main into the LR register. Due to the fact that the leaf function is not supposed to change the value of the LR register during it’s execution, this register can be now used to return to the parent (main) function. As explained in the previous chapter, the BX instruction  can eXchange between the ARM/Thumb modes during branching operation. In this case, it is done by inspecting the last bit of the LR register: if the bit is set to 1, the CPU will change (or keep) the mode to thumb, if it’s set to 0, the mode will be changed (or kept) to ARM. This is a nice design feature which allows to call functions from different modes.

BX LR指令中LR暫存器的最後1bit還可以用於切換ARM和Thumb模式。

 

最後一個關於葉子函式和非葉子函式的例子,gif動態圖很長,可以用一些gif編輯軟體,暫停看。

 

ARM和Sparc比較

之前整理的Sparc的原理,Sparc V8 彙編指令、暫存器視窗、堆疊、函式呼叫,https://www.cnblogs.com/yanhc/p/12255886.html

關於函式的呼叫和返回

ARM

ARM跳轉有BL指令,Branch Link(Saves (PC+4) in LR and jumps to function),即首先將跳轉指令的下一條指令地址儲存在LR暫存器中,以便呼叫函式返回時能找到返回地址,然後執行跳轉。

如果是non-leaf函式,在呼叫函式起始,則會將LR和r11=fp都壓入堆疊,在結束時,則彈出給PC和r11;
如果是leaf函式,在呼叫函式起始,則只會將r11=fp壓入堆疊,在結束時,則彈出給r11,同時跳轉到LR,bx lr。

Sparc

對於Sparc處理器,在執行call label時,會將PC拷貝到o7(r15,address of call instruction),call指令本身叫call and link,其中link與ARM中BL的link是一個意思,即儲存一個呼叫函式的連結。不同的是ARM儲存的是跳轉指令的下一條指令地址,Sparc儲存的是跳轉指令地址,這沒關係,對於Sparc來說,只需在返回的時候+4即可得到下一條要執行的地址,即返回的地址。

在呼叫函式起始,如果是非葉子函式,會執行save,旋轉暫存器視窗,該動作相當於將o7=LR儲存起來,同時,上一視窗的sp儲存在當前視窗的fp中。如果是葉子函式,則不會旋轉暫存器視窗。

在呼叫函式結束,返回是ret和retl(retl中l為leaf的意思)。注意到call的時候將PC放在了o7中,所以,返回時,只需要跳轉到o7+8即可。
而對於leaf和non-leaf又有點差別,
對於leaf函式,沒有執行save,沒有暫存器旋轉,因此,retl指令jmpl的目標地址為o7+8;
而對於non-leaf函式,執行了save,有暫存器旋轉,之前的o7變為現在的i7,因此ret指令jmpl的目標地址為i7+8。
同時,對於sparc來說,在non-leaf中還會restore,將暫存器旋轉回來;在leaf中則沒有restore。

關於函式呼叫時的frame pointer,fp儲存

對於ARM,r11為fp。在呼叫callee函式中,會

(1)將fp和lr壓入堆疊push {r11, lr},

(2)讓fp=sp,add r11, sp, #0,

(3)sp減去棧幀長度,sub sp, sp, #16。

對於Sparc,fp=i6,sp=o6。在呼叫callee函式中,執行save %sp, -1024, %sp時,暫存器視窗會旋轉,從而做了

(1)將fp和lr壓入堆疊(當前未使用的暫存器視窗發揮了部分堆疊的作用),

(2)讓fp=sp(fp=i6,sp=o6,以及暫存器視窗旋轉方向,完成了fp=sp操作),

(3)sp減去棧幀長度(save有add的作用)。

總結一下函式呼叫和返回 

呼叫時,要儲存返回地址,arm用BL儲存在LR,sparc用call儲存在o7。

呼叫函式的起始:要儲存返回地址和棧幀(通過儲存fp),同時更新棧幀。arm將r11,fp壓入堆疊,sub sp;sparc用save旋轉暫存器將o7和fp儲存起來,同時sub sp。

呼叫函式的結束:跳轉到返回地址,同時恢復棧幀。arm令sp=r11,出棧r11和LR到r11和pc;sparc用ret返回(jmpl o7+8),同時用restore旋轉暫存器恢復sp=f