1. 程式人生 > >【計算機系統結構】Self-modifying code 自修改程式碼

【計算機系統結構】Self-modifying code 自修改程式碼

Self-modifying code

在電腦科學中,自修改程式碼是一種程式碼,當代碼執行時修改它自身的指令,通常用於減少指令路徑長度以提高效能或簡單地減少額外的,重複的,相似的程式碼,以此來簡化維護。自修改是“標誌設定”和程式的條件分支方法的代替方法,主要用於減少需要被測試的條件(分支條件)的次數。它通常只用於有意進行自修改的程式碼,而不適用於由於一個錯誤如緩衝區溢位而導致的意外修改其自身程式碼的情況。

該方法經常用於有條件地呼叫測試/除錯程式碼,而不需要為每個輸入/輸出 cycle 增加額外的計算開銷。

修改可發生在:
1 只在初始化階段- 基於輸入引數。程式入口指標的更改是一種等效的間接自修改方法,但是需要同時存在一個或多個備選指令路徑,從而增加了程式大小。
2 貫穿執行(on-the-fly)- 基於在執行過程中已經達到的特定的程式狀態。
在任何一種情況下,修改可以直接發生在機器程式碼指令本身,通過在已存在的指令上覆蓋新的指令,(比如:對非條件分支修改一個比較與分支或更改一個“NOP”)

Application in low and high level languages

自修改可以以多種方式完成,根據程式語言以及它支援的指標和/或通過動態編譯或直譯器引擎:
1 覆蓋已經存在的指令(或如opcode, register, flags, address指令路徑)
2 整個指令的直接建立,或記憶體中的指令序列
3 原始碼狀態的建立或修改,緊跟在“mini compile” 或 動態解釋之後。
4 動態地建立整個程式,並執行它

Assembly language

自修改程式碼在使用匯編語言時很容易實現。指令在記憶體內可以動態建立。在一個序列中,相當於標準編譯器可能生成的目的碼。在現代處理器,必須考慮CPU快取中的意外副作用。這種方法經常用於測試“首次”條件,正如這個適當註釋的IBM/360彙編器示例。

SUBRTN NOP OPENED      FIRST TIME HERE?
* The NOP is x'4700'<Address_of_opened>
       OI    SUBRTN+1,X'F0'  YES, CHANGE NOP TO UNCONDITIONAL BRANCH (47F0...)
       OPEN   INPUT               AND  OPEN THE INPUT FILE SINCE IT'S THE FIRST TIME THRU
OPENED GET    INPUT        NORMAL PROCESSING RESUMES HERE
      ...

它使用指令覆蓋來將指令路徑長度縮短(N×1)-1,其中N是檔案上的記錄數(-1是執行覆蓋的開銷)。

可替代程式碼可能每次需要參與測試一個“標誌”。非條件分支稍微比比較指令快一些,以及減少總體路徑長度。在後來的作業系統中,對於駐留在(protected storage)受保護的儲存器中的程式,不能使用這種技術,因此將改為使用指向子程式的指標。這種指標將駐留在(dynamic storage)動態儲存中,並且可以在第一次通過之後隨意更改以繞過OPEN。
( protected storage: Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems.)
(dynamic storage: )

Optimizing a state-dependent loop

Pseudocode example:

repeat N times {
   if STATE is 1
      increase A by one
   else
      decrease A by one
   do something with A
}

Self-modifying code, in this case, would simply be a matter of rewriting the loop like this:

 repeat N times {
    increase A by one
    do something with A
    when STATE has to switch {
       replace the opcode "increase" above with the opcode to decrease, or vice versa
    }
 }

Note that 2-state replacement of the opcode can be easily written as ‘xor var at address with the value “opcodeOf(Inc) xor opcodeOf(dec)”’.
Choosing this solution must depend on the value of ‘N’ and the frequency of state changing.

Self-referential machine learning systems

傳統的機器學習系統具有固定的、預先程式設計的學習演算法來調整它們的引數。然而,自從20世紀80年代以來,Jürgen Schmidhuber已經發布了幾個能夠改變自己學習演算法的自修改系統。它們通過確保自我修改只有在根據使用者給定的適應性、錯誤或獎勵功能有用時才能生存,從而避免災難性的自我重寫的危險。

Operating systems

開發了一個名為W^X(用於“write xor execute”)的作業系統特性,它禁止程式使任何記憶體頁既可寫又可執行。有些系統防止可寫頁被更改為可執行,即使是刪除了寫許可權。其他系統提供勉強稱得上的“後門”,允許記憶體頁面具有多個對映到不同的許可權。

繞過W^X的一種相對可移植的方法是建立具有所有許可權的檔案,然後將該檔案對映到記憶體中兩次。在Linux上,可以使用無文件的SysV共享記憶體標誌來獲得可執行的共享記憶體,而無需建立檔案。

Interaction of cache and self-modifying code

一種無資料cache,指令cache的系統結構,cache一致性必須明通過修改程式碼確地執行(在已修改的記憶體區域重新整理資料cache 和無效的指令cache)

在一些情況,現代處理器中自修改程式碼的短階段執行得更慢。這是因為一個現代處理器通常將嘗試保持程式碼塊在它的cache記憶體。每一次重寫一部分自身程式,重寫部分必須再次載入進cache,這將導致一些延遲,if the modified codelet shares the same cache line with the modifying code, as is the case when the modified memory address is located within a few bytes to the one of the modifying code。

現代處理器上的快取失效問題通常意味著,只有當修改很少發生時,自修改程式碼才會更快,例如在內部迴圈中狀態切換的情況下。

Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the instruction pointer is modified, the processor will not notice, but instead execute the code as it was before it was modified. See prefetch input queue (PIQ). PC processors must handle self-modifying code correctly for backwards compatibility reasons but they are far from efficient at doing so。
大多數現代的處理器在執行機器程式碼之前載入它,這意味著如果修改了太靠近指令指標的指令,處理器將不會注意到,而是像修改之前一樣執行程式碼。參見預取輸入佇列( prefetch input queue,PIQ)。由於向後相容的原因,PC處理器必須正確地處理自修改程式碼,但是這樣做遠沒有效率。

Massalin’s Synthesis kernel

Henry Massalin博士論文[7]中提出的合成核心是一個很小的Unix核心,它採用結構化的,甚至面向物件的方法來自修改程式碼,其中程式碼是為單獨的象檔案控制代碼這樣的物件建立的;為特定任務生成程式碼允許合成核心。to(如JIT直譯器可能採用的)應用許多優化,例如常數摺疊或公共子表示式消除。

優點
1 可為函式執行建立快速路徑,減少一些額外的重複的條件分支。
2 自修改程式碼可以提升演算法效率

缺點
1 自修改程式碼更難閱讀和維護,因為原始碼清單中的指令不一定是將要執行的指令。如果清楚要呼叫的函式的名稱是稍後要識別的函式的佔位符,則由函式指標的替換組成的自修改可能不那麼神祕。
2 自修改程式碼可以重寫為測試標誌的程式碼,並根據測試的結果分支到其他序列,但是自修改程式碼通常執行得更快。
3 現代帶有指令流水線的處理器,如果它修改了處理器已經從記憶體讀進流水線的指令,頻繁地修改自身程式碼可能執行得越慢。在一些這樣的處理器上,確保修改的指令被正確執行的唯一方法是重新整理流水線並重新讀取許多指令。
4 自修改程式碼並不能用於所有場景,比如:
4.1 W^X 型作業系統。
4.2 許多 Harvard architecture microcontrollers。
4.3 多執行緒應用程式可能有多個執行緒執行同一段自修改程式碼,可能導致計算錯誤和應用程式失敗。

WIKI 原文

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. Self-modification is an alternative to the method of “flag setting” and conditional program branching, used primarily to reduce the number of times a condition needs to be tested. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

The method is frequently used for conditionally invoking test/debugging code without requiring additional computational overhead for every input/output cycle.

The modifications may be performed:

only during initialization – based on input parameters (when the process is more commonly described as software ‘configuration’ and is somewhat analogous, in hardware terms, to setting jumpers for printed circuit boards). Alteration of program entry pointers is an equivalent indirect method of self-modification, but requiring the co-existence of one or more alternative instruction paths, increasing the program size.
throughout execution (‘on-the-fly’) – based on particular program states that have been reached during the execution
In either case, the modifications may be performed directly to the machine code instructions themselves, by overlaying new instructions over the existing ones (for example: altering a compare and branch to an unconditional branch or alternatively a ‘NOP’).

In the IBM/360 and Z/Architecture instruction set, an EXECUTE (EX) instruction logically overlays the second byte of its target instruction with the low-order 8 bits of register 1. This provides the effect of self-modification although the actual instruction in storage is not altered.

Application in low and high level languages
Self-modification can be accomplished in a variety of ways depending upon the programming language and its support for pointers and/or access to dynamic compiler or interpreter ‘engines’:

overlay of existing instructions (or parts of instructions such as opcode, register, flags or address) or
direct creation of whole instructions or sequences of instructions in memory
creating or modification of source code statements followed by a ‘mini compile’ or a dynamic interpretation (see eval statement)
creating an entire program dynamically and then executing it
Assembly language
Self-modifying code is quite straightforward to implement when using assembly language. Instructions can be dynamically created in memory (or else overlaid over existing code in non-protected program storage), in a sequence equivalent to the ones that a standard compiler may generate as the object code. With modern processors, there can be unintended side effects on the CPU cache that must be considered. The method was frequently used for testing ‘first time’ conditions, as in this suitably commented IBM/360 assembler example. It uses instruction overlay to reduce the instruction path length by (N×1)−1 where N is the number of records on the file (−1 being the overhead to perform the overlay).

SUBRTN NOP OPENED      FIRST TIME HERE?
* The NOP is x'4700'<Address_of_opened>
       OI    SUBRTN+1,X'F0'  YES, CHANGE NOP TO UNCONDITIONAL BRANCH (47F0...)
       OPEN   INPUT               AND  OPEN THE INPUT FILE SINCE IT'S THE FIRST TIME THRU
OPENED GET    INPUT        NORMAL PROCESSING RESUMES HERE
      ...

Alternative code might involve testing a “flag” each time through. The unconditional branch is slightly faster than a compare instruction, as well as reducing the overall path length. In later operating systems for programs residing in protected storage this technique could not be used and so changing the pointer to the subroutine would be used instead. The pointer would reside in dynamic storage and could be altered at will after the first pass to bypass the OPEN (having to load a pointer first instead of a direct branch & link to the subroutine would add N instructions to the path length – but there would be a corresponding reduction of N for the unconditional branch that would no longer be required).

Below is an example in Zilog Z80 assembly language. The code increments register “B” in range [0,5]. The “CP” compare instruction is modified on each loop.

;======================================================================
ORG 0H
CALL FUNC00
HALT
;======================================================================
FUNC00:
LD A,6
LD HL,label01+1
LD B,(HL)
label00:
INC B
LD (HL),B
label01:
CP $0
JP NZ,label00
RET
;======================================================================

Self-modifying code is sometimes used to overcome limitations in a machine’s instruction set. For example, in the Intel 8080 instruction set, one cannot input a byte from an input port that is specified in a register. The input port is statically encoded in the instruction itself, as the second byte of a two byte instruction. Using self-modifying code, it is possible to store a register’s contents into the second byte of the instruction, then execute the modified instruction in order to achieve the desired effect.

High level languages
Some compiled languages explicitly permit self-modifying code. For example, the ALTER verb in COBOL may be implemented as a branch instruction that is modified during execution.[1] One batch programming technique is to use self-modifying code.[2] Clipper and SPITBOL also provide facilities for explicit self-modification. The Algol compiler on B6700 systems offered an interface to the operating system whereby executing code could pass a text string or a named disc file to the Algol compiler and was then able to invoke the new version of a procedure.

With interpreted languages, the “machine code” is the source text and may be susceptible to editing on-the-fly: in SNOBOL the source statements being executed are elements of a text array. Other languages, such as Perl and Python, allow programs to create new code at run-time and execute it using an eval function, but do not allow existing code to be mutated. The illusion of modification (even though no machine code is really being overwritten) is achieved by modifying function pointers, as in this JavaScript example:

    var f = function (x) {return x + 1};

    // assign a new definition to f:
    f = new Function('x', 'return x + 2');

Lisp macros also allow runtime code generation without parsing a string containing program code.

The Push programming language is a genetic programming system that is explicitly designed for creating self-modifying programs. While not a high level language, it is not as low level as assembly language.[3]

Compound modification
Prior to the advent of multiple windows, command-line systems might offer a menu system involving the modification of a running command script. Suppose a DOS script (or “batch”) file Menu.bat contains the following:

  :StartAfresh                <-A line starting with a colon marks a label.
   ShowMenu.exe

Upon initiation of Menu.bat from the command line, ShowMenu presents an on-screen menu, with possible help information, example usages and so forth. Eventually the user makes a selection that requires a command somename to be performed: ShowMenu exits after rewriting the file Menu.bat to contain

   :StartAfresh
   ShowMenu.exe
   CALL C:\Commands\somename.bat
   GOTO StartAfresh

Because the DOS command interpreter does not compile a script file and then execute it, nor does it read the entire file into memory before starting execution, nor yet rely on the content of a record buffer, when ShowMenu exits, the command interpreter finds a new command to execute (it is to invoke the script file somename, in a directory location and via a protocol known to ShowMenu), and after that command completes, it goes back to the start of the script file and reactivates ShowMenu ready for the next selection. Should the menu choice be to quit, the file would be rewritten back to its original state. Although this starting state has no use for the label, it, or an equivalent amount of text is required, because the DOS command interpreter recalls the byte position of the next command when it is to start the next command, thus the re-written file must maintain alignment for the next command start point to indeed be the start of the next command.

Aside from the convenience of a menu system (and possible auxiliary features), this scheme means that the ShowMenu.exe system is not in memory when the selected command is activated, a significant advantage when memory is limited.

Control tables

Control table interpreters can be considered to be, in one sense, ‘self-modified’ by data values extracted from the table entries (rather than specifically hand coded in conditional statements of the form “IF inputx = ‘yyy’”).

History
The IBM SSEC, demonstrated in January 1948, had the ability to modify its instructions or otherwise treat them exactly like data. However, the capability was rarely used in practice.[4] In the early days of computers, self-modifying code was often used to reduce use of limited memory, or improve performance, or both. It was also sometimes used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the control flow. This use is still relevant in certain ultra-RISC architectures, at least theoretically; see for example one instruction set computer. Donald Knuth’s MIX architecture also used self-modifying code to implement subroutine calls.

Usage

Self-modifying code can be used for various purposes:

Semi-automatic optimizing of a state dependent loop.
1 Run-time code generation, or specialization of an algorithm in runtime or loadtime (which is popular, for example, in the domain of real-time graphics) such as a general sort utility – preparing code to perform the key comparison described in a specific invocation.
2 Altering of inlined state of an object, or simulating the high-level construction of closures.
3 Patching of subroutine (pointer) address calling, usually as performed at load/initialization time of dynamic libraries, or else on each invocation, patching the subroutine’s internal references to its parameters so as to use their actual addresses. (i.e. Indirect ‘self-modification’).
4 Evolutionary computing systems such as genetic programming.
5 Hiding of code to prevent reverse engineering (by use of a disassembler or debugger) or to evade detection by virus/spyware scanning software and the like.
6 Filling 100% of memory (in some architectures) with a rolling pattern of repeating opcodes, to erase all programs and data, or to burn-in hardware.
7 Compressing code to be decompressed and executed at runtime, e.g., when memory or disk space is limited.
8 Some very limited instruction sets leave no option but to use self-modifying code to perform certain functions. For example, a one instruction set computer (OISC) machine that uses only the subtract-and-branch-if-negative “instruction” cannot do an indirect copy (something like the equivalent of “*a = **b” in the C language) without using self-modifying code.
9 Booting. Early microcomputers often used self-modifying code in their bootloaders. Since the bootloader was keyed in via the front panel at every power-on, it did not matter if the bootloader modified itself.

However, even today many bootstrap loaders are self-relocating, and a few are even self-modifying.
Altering instructions for fault-tolerance.[5]

Optimizing a state-dependent loop

Pseudocode example:

repeat N times {
   if STATE is 1
      increase A by one
   else
      decrease A by one
   do something with A
}

Self-modifying code, in this case, would simply be a matter of rewriting the loop like this:

 repeat N times {
    increase A by one
    do something with A
    when STATE has to switch {
       replace the opcode "increase" above with the opcode to decrease, or vice versa
    }
 }

Note that 2-state replacement of the opcode can be easily written as ‘xor var at address with the value “opcodeOf(Inc) xor opcodeOf(dec)”’.

Choosing this solution must depend on the value of ‘N’ and the frequency of state changing.

Specialization

Suppose a set of statistics such as average, extrema, location of extrema, standard deviation, etc. are to be calculated for some large data set. In a general situation, there may be an option of associating weights with the data, so each xi is associated with a wi and rather than test for the presence of weights at every index value, there could be two versions of the calculation, one for use with weights and one not, with one test at the start. Now consider a further option, that each value may have associated with it a boolean to signify whether that value is to be skipped or not. This could be handled by producing four batches of code, one for each permutation and code bloat results. Alternatively, the weight and the skip arrays could be merged into a temporary array (with zero weights for values to be skipped), at the cost of processing and still there is bloat. However, with code modification, to the template for calculating the statistics could be added as appropriate the code for skipping unwanted values, and for applying weights. There would be no repeated testing of the options and the data array would be accessed once, as also would the weight and skip arrays, if involved.

Use as camouflage

Self-modifying code was used to hide copy protection instructions in 1980s disk-based programs for platforms such as IBM PC and Apple II. For example, on an IBM PC (or compatible), the floppy disk drive access instruction ‘int 0x13’ would not appear in the executable program’s image but it would be written into the executable’s memory image after the program started executing.

Self-modifying code is also sometimes used by programs that do not want to reveal their presence, such as computer viruses and some shellcodes. Viruses and shellcodes that use self-modifying code mostly do this in combination with polymorphic code. Modifying a piece of running code is also used in certain attacks, such as buffer overflows.

Self-referential machine learning systems

Traditional machine learning systems have a fixed, pre-programmed learning algorithm to adjust their parameters. However, since the 1980s Jürgen Schmidhuber has published several self-modifying systems with the ability to change their own learning algorithm. They avoid the danger of catastrophic self-rewrites by making sure that self-modifications will survive only if they are useful according to a user-given fitness, error or reward function.[6]

Operating systems

Because of the security implications of self-modifying code, all of the major operating systems are careful to remove such vulnerabilities as they become known. The concern is typically not that programs will intentionally modify themselves, but that they could be maliciously changed by an exploit.

As consequence of the troubles that can be caused by these exploits, an OS feature called W^X (for “write xor execute”) has been developed that prohibits a program from making any page of memory both writable and executable. Some systems prevent a writable page from ever being changed to be executable, even if write permission is removed. Other systems provide a ‘back door’ of sorts, allowing multiple mappings of a page of memory to have different permissions. A relatively portable way to bypass W`X is to create a file with all permissions, then map the file into memory twice. On Linux, one may use an undocumented SysV shared memory flag to get executable shared memory without needing to create a file.[citation needed]

Regardless, at a meta-level, programs can still modify their own behavior by changing data stored elsewhere (see metaprogramming) or via use of polymorphism.

Interaction of cache and self-modifying code
On architectures without coupled data and instruction cache (some ARM and MIPS cores) the cache synchronization must be explicitly performed by the modifying code (flush data cache and invalidate instruction cache for the modified memory area).

In some cases short sections of self-modifying code execute more slowly on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay, if the modified codelet shares the same cache line with the modifying code, as is the case when the modified memory address is located within a few bytes to the one of the modifying code.

The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely, such as in the case of a state switching inside an inner loop.[citation needed]

Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the instruction pointer is modified, the processor will not notice, but instead execute the code as it was before it was modified. See prefetch input queue (PIQ). PC processors must handle self-modifying code correctly for backwards compatibility reasons but they are far from efficient at doing so.[citation needed]

Massalin’s Synthesis kernel

The Synthesis kernel presented in Henry Massalin’s Ph.D. thesis[7] is a tiny Unix kernel that takes a structured, or even object oriented, approach to self-modifying code, where code is created for individual quajects, like filehandles; generating code for specific tasks allows the Synthesis kernel to (as a JIT interpreter might) apply a number of optimizations such as constant folding or common subexpression elimination.

The Synthesis kernel was very fast, but was written entirely in assembly. The resulting lack of portability has prevented Massalin’s optimization ideas from being adopted by any production kernel. However, the structure of the techniques suggests that they could be captured by a higher level language, albeit one more complex than existing mid-level languages. Such a language and compiler could allow development of faster operating systems and applications.

Paul Haeberli and Bruce Karsh have objected to the “marginalization” of self-modifying code, and optimization in general, in favor of reduced development costs.[8]

Advantages

Fast paths can be established for a program’s execution, reducing some otherwise repetitive conditional branches.
Self-modifying code can improve algorithmic efficiency.

Disadvantages

Self-modifying code is harder to read and maintain because the instructions in the source program listing are not necessarily the instructions that will be executed. Self-modification that consists of substitution of function pointers might not be as cryptic, if it is clear that the names of functions to be called are placeholders for functions to be identified later.

Self-modifying code can be rewritten as code that tests a flag and branches to alternative sequences based on the outcome of the test, but self-modifying code typically runs faster.

On modern processors with an instruction pipeline, code that modifies itself frequently may run more slowly, if it modifies instructions that the processor has already read from memory into the pipeline. On some such processors, the only way to ensure that the modified instructions are executed correctly is to flush the pipeline and reread many instructions.

Self-modifying code cannot be used at all in some environments, such as the following:

1 Application software running under an operating system with strict W^X security cannot execute instructions in pages it is allowed to write to—only the operating system is allowed to both write instructions to memory and later execute those instructions.
2 Many Harvard architecture microcontrollers cannot execute instructions in read-write memory, but only instructions in memory that it cannot write to, ROM or non-self-programmable flash memory.
3 A multithreaded application may have several threads executing the same section of self-modifying code, possibly resulting in computation errors and application failures.