Linux 核心資料結構:點陣圖(Bitmap)
https://github.com/0xAX/linux-insides/blob/master/DataStructures/bitmap.md
Data Structures in the Linux Kernel
Bit arrays and bit operations in the Linux kernel
Besides different linked and tree based
data structures, the Linux kernel provides API for bit
arrays or bitmap
. Bit arrays are heavily used in the Linux kernel and following source code files contain
common API
Besides these two files, there is also architecture-specific header file which provides optimized bit operations for certain architecture. We consider x86_64 architecture, so in our case it will be:
header file. As I just wrote above, the bitmap
is heavily used in the Linux kernel. For example a bit
array
bit
array
stores set of allocated irqs during initialization of the Linux
kernel and etc.
So, the main goal of this part is to see how bit arrays
are implemented in the Linux kernel. Let's start.
Declaration of bit array
Before we will look on API
for bitmaps manipulation, we must know how to declare it in the Linux kernel. There
are two common method to declare own bit array. The first simple way to declare a bit array is to array of unsigned
long
. For example:
unsigned long my_bitmap[8]
The second way is to use the DECLARE_BITMAP
macro which is defined in the include/linux/types.h header
file:
#define DECLARE_BITMAP(name,bits) \ unsigned long name[BITS_TO_LONGS(bits)]
We can see that DECLARE_BITMAP
macro takes two parameters:
name
- name of bitmap;bits
- amount of bits in bitmap;
and just expands to the definition of unsigned long
array with BITS_TO_LONGS(bits)
elements,
where the BITS_TO_LONGS
macro converts a given number of bits to number of longs
or
in other words it calculates how many 8
byte elements inbits
:
#define BITS_PER_BYTE 8 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
So, for example DECLARE_BITMAP(my_bitmap, 64)
will produce:
>>> (((64) + (64) - 1) / (64)) 1
and:
unsigned long my_bitmap[1];
After we are able to declare a bit array, we can start to use it.
Architecture-specific bit operations
We already saw above a couple of source code and header files which provide API for manipulation of bit arrays. The most important and widely used API of bit arrays is architecture-specific and located as we already know in thearch/x86/include/asm/bitops.h header file.
First of all let's look at the two most important functions:
set_bit
;clear_bit
.
I think that there is no need to explain what these function do. This is already must be clear from their name. Let's look on their implementation. If you will look into the arch/x86/include/asm/bitops.h header
file, you will note that each of these functions represented by two variants: atomic and not. Before
we will start to dive into implementations of these functions, first of all we must to know a little about atomic
operations.
In simple words atomic operations guarantees that two or more operations will not be performed on the same data concurrently. The x86
architecture
provides a set of atomic instructions, for example xchg instruction, cmpxchg instruction
and etc. Besides atomic instructions, some of non-atomic instructions can be made atomic with the help of the lockinstruction.
It is enough to know about atomic operations for now, so we can begin to consider implementation of set_bit
and clear_bit
functions.
First of all, let's start to consider non-atomic
variants of this function. Names of non-atomic set_bit
and clear_bit
starts
from double underscore. As we already know, all of these functions are defined in the arch/x86/include/asm/bitops.h header
file and the first function is __set_bit
:
static inline void __set_bit(long nr, volatile unsigned long *addr) { asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); }
As we can see it takes two arguments:
nr
- number of bit in a bit array.addr
- address of a bit array where we need to set bit.
Note that the addr
parameter is defined with volatile
keyword
which tells to compiler that value maybe changed by the given address. The implementation of the __set_bit
is
pretty easy. As we can see, it just contains one line of inline assemblercode. In our case we are
using the bts instruction which selects a bit which is specified with the first operand
(nr
in our case) from the bit array, stores the value of the selected bit in the CF flags
register and set this bit.
Note that we can see usage of the nr
, but there is addr
here.
You already might guess that the secret is in ADDR
. The ADDR
is
the macro which is defined in the same header code file and expands to the string which contains value of the given address and +m
constraint:
#define ADDR BITOP_ADDR(addr) #define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
Besides the +m
, we can see other constraints in the __set_bit
function.
Let's look on they and try to understand what do they mean:
+m
- represents memory operand where+
tells that the given operand will be input and output operand;I
- represents integer constant;r
- represents register operand
Besides these constraint, we also can see - the memory
keyword which tells compiler that this code will change
value in memory. That's all. Now let's look at the same function but at atomic
variant. It looks more complex
that its non-atomic
variant:
static __always_inline void set_bit(long nr, volatile unsigned long *addr) { if (IS_IMMEDIATE(nr)) { asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)CONST_MASK(nr)) : "memory"); } else { asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); } }
First of all note that this function takes the same set of parameters that __set_bit
, but additionally marked
with the__always_inline
attribute. The __always_inline
is
macro which defined in the include/linux/compiler-gcc.h and just expands
to the always_inline
attribute:
#define __always_inline inline __attribute__((always_inline))
which means that this function will be always inlined to reduce size of the Linux kernel image. Now let's try to understand implementation of the set_bit
function.
First of all we check a given number of bit at the beginning of the set_bit
function. The IS_IMMEDIATE
macro
defined in the same header file and expands to the call of the builtin gcc function:
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
The __builtin_constant_p
builtin function returns 1
if
the given parameter is known to be constant at compile-time and returns 0
in other case. We no need to use slow bts
instruction
to set bit if the given number of bit is known in compile time constant. We can just apply bitwise
or for byte from the give address which contains given bit and masked number of bits where high bit is 1
and
other is zero. In other case if the given number of bit is not known constant at compile-time, we do the same as we did in the __set_bit
function.
The CONST_MASK_ADDR
macro:
#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3))
expands to the give address with offset to the byte which contains a given bit. For example we have address 0x1000
and
the number of bit is 0x9
. So, as 0x9
is one
byte + one bit
our address with be addr + 1
:
>>> hex(0x1000 + (0x9 >> 3)) '0x1001'
The CONST_MASK
macro represents our given number of bit as byte where high bit is 1
and
other bits are 0
:
#define CONST_MASK(nr) (1 << ((nr) & 7))
>>> bin(1 << (0x9 & 7)) '0b10'
In the end we just apply bitwise or
for these values. So, for example if our address will be 0x4097
and
we need to set 0x9
bit:
>>> bin(0x4097) '0b100000010010111' >>> bin((0x4097 >> 0x9) | (1 << (0x9 & 7))) '0b100010'
the ninth
bit will be set.
Note that all of these operations are marked with LOCK_PREFIX
which is expands to the lock instruction
which guarantees atomicity of this operation.
As we already know, besides the set_bit
and __set_bit
operations,
the Linux kernel provides two inverse functions to clear bit in atomic and non-atomic context. They are clear_bit
and __clear_bit
.
Both of these functions are defined in the sameheader file and takes
the same set of arguments. But not only arguments are similar. Generally these functions are very similar on the set_bit
and __set_bit
.
Let's look on the implementation of the non-atomic __clear_bit
function:
static inline void __clear_bit(long nr, volatile unsigned long *addr) { asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); }
Yes. As we see, it takes the same set of arguments and contains very similar block of inline assembler. It just uses the btrinstruction
instead of bts
. As we can understand form the function's name, it clears a given bit by the given address. Thebtr
instruction
acts like bts
. This instruction also selects a given bit which is specified in the first operand, stores its
value in the CF
flag register and clears this bit in the given bit array which is specified with second operand.
The atomic variant of the __clear_bit
is clear_bit
:
static __always_inline void clear_bit(long nr, volatile unsigned long *addr) { if (IS_IMMEDIATE(nr)) { asm volatile(LOCK_PREFIX "andb %1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)~CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX "btr %1,%0" : BITOP_ADDR(addr) : "Ir" (nr)); } }
and as we can see it is very similar on set_bit
and just contains two differences. The first difference it uses btr
instruction
to clear bit when the set_bit
uses bts
instruction
to set bit. The second difference it uses negated mask and and
instruction to clear bit in the given byte when
the set_bit
uses or
instruction.
That's all. Now we can set and clear bit in any bit array and and we can go to other operations on bitmasks.
Most widely used operations on a bit arrays are set and clear bit in a bit array in the Linux kernel. But besides this operations it is useful to do additional operations on a bit array. Yet another widely used operation in the Linux kernel - is to know is
a given bit set or not in a bit array. We can achieve this with the help of the test_bit
macro. This macro is
defined in thearch/x86/include/asm/bitops.h header file and expands
to the call of the constant_test_bit
or variable_test_bit
depends
on bit number:
#define test_bit(nr, addr) \ (__builtin_constant_p((nr)) \ ? constant_test_bit((nr), (addr)) \ : variable_test_bit((nr), (addr)))
So, if the nr
is known in compile time constant, the test_bit
will
be expanded to the call of the constant_test_bit
function or variable_test_bit
in
other case. Now let's look at implementations of these functions. Let's start from thevariable_test_bit
:
static inline int variable_test_bit(long nr, volatile const unsigned long *addr) { int oldbit; asm volatile("bt %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit) : "m" (*(unsigned long *)addr), "Ir" (nr)); return oldbit; }
The variable_test_bit
function takes similar set of arguments as set_bit
and
other function take. We also may see inline assembly code here which executes bt and sbb instruction.
The bt
or bit
test
instruction selects a given bit which is specified with first operand from the bit array which is specified with the second operand and stores its value in the CF bit
of flags register. The second sbb
instruction subtracts first operand from second and subtracts value of the CF
.
So, here write a value of a given bit number from a given bit array to the CF
bit of flags register and execute sbb
instruction
which calculates: 00000000 - CF
and writes the result to the oldbit
.
The constant_test_bit
function does the same as we saw in the set_bit
:
static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr) { return ((1UL << (nr & (BITS_PER_LONG-1))) & (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; }
It generates a byte where high bit is 1
and other bits are 0
(as
we saw in CONST_MASK
) and applies bitwise and to
the byte which contains a given bit number.
The next widely used bit array related operation is to change bit in a bit array. The Linux kernel provides two helper for this:
__change_bit
;change_bit
.
As you already can guess, these two variants are atomic and non-atomic as for example set_bit
and __set_bit
.
For the start, let's look at the implementation of the __change_bit
function:
static inline void __change_bit(long nr, volatile unsigned long *addr) { asm volatile("btc %1,%0" : ADDR : "Ir" (nr)); }
Pretty easy, is not it? The implementation of the __change_bit
is the same as __set_bit
,
but instead of bts
instruction, we are using btc.
This instruction selects a given bit from a given bit array, stores its value in the CF
and changes its value
by the applying of complement operation. So, a bit with value 1
will be 0
and
vice versa:
>>> int(not 1) 0 >>> int(not 0) 1
The atomic version of the __change_bit
is the change_bit
function:
static inline void change_bit(long nr, volatile unsigned long *addr) { if (IS_IMMEDIATE(nr)) { asm volatile(LOCK_PREFIX "xorb %1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX "btc %1,%0" : BITOP_ADDR(addr) : "Ir" (nr)); } }
It is similar on set_bit
function, but also has two differences. The first difference is xor
operation
instead of or
and the second is btc
instead
of bts
.
For this moment we know the most important architecture-specific operations with bit arrays. Time to look at generic bitmap API.
Common bit operations
Besides the architecture-specific API from the arch/x86/include/asm/bitops.h header file, the Linux kernel provides common API for manipulation of bit arrays. As we know from the beginning of this part, we can find it in the include/linux/bitmap.hheader file and additionally in the * lib/bitmap.c source code file. But before these source code files let's look into theinclude/linux/bitops.h header file which provides a set of useful macro. Let's look on some of they.
First of all let's look at following four macros:
fo