Memory is allocated to applications using the malloc subsystem. The malloc subsystem is a memory management API that consists of the following subroutines:
The malloc subsystem manages a logical memory object called a heap. The heap is a region of memory that resides in the application's address space between the last byte of data allocated by the compiler and the end of the data region. The heap is the memory object from which memory is allocated and to which memory is returned by the malloc subsystem API.
The malloc subsystem performs the following fundamental memory operations:
Performed by the malloc and calloc subroutines.
Performed by the free subroutine.
Performed by the realloc subroutine.
The mallopt and mallinfo subroutines are supported for System V compatibility. The mallinfo subroutine can be used during program development to obtain information about the heap managed by the malloc subroutine. The mallopt subroutine can be used to disclaim page-aligned, page-sized free memory, and to enable and disable the default allocator. Similar to the malloc subroutine, the valloc subroutine is provided for compatibility with the Berkeley Compatibility Library.
For additional information, see the following sections:
A 32-bit application program running on the system has an address space that is divided into the following segments:
A 64-bit application program running on the system has an address space that is divided into the following segments:
0x0000 0000 0000 0000 to 0x0000 0000 0fff ffff | Contains the kernel |
0x0000 0000 f000 0000 to 0x0000 0000 0fff ffff | Reserved |
0x0000 0001 0000 0000 to 0x07ff ffff ffff ffff | Contains the application program text and application program data, and shared memory or mmap services |
0x0800 0000 0000 0000 to 0x08ff ffff ffff ffff | Privately loaded objects |
0x0900 0000 0000 0000 to 0x09ff ffff ffff ffff | Shared library text and data |
0x0f00 0000 0000 0000 to 0x0fff ffff ffff ffff | Application stack |
The _edata location is an identifier that points to the first byte following the last byte of program data. The heap is created by the malloc subsystem when the first block of data is allocated. The malloc subroutine creates the heap by calling the sbrk subroutine to move up the _edata location to make room for the heap. The malloc subroutine then expands the heap as the needs of the application dictate. Space for the heap is acquired in increments determined by the BRKINCR value. This value can be examined using the mallinfo subroutine.
The heap is divided into allocated blocks and freed blocks. The free pool consists of the memory available for subsequent allocation. An allocation is completed by first removing a block from the free pool and then returning to the free pool a pointer to this block. A reallocation is completed by allocating a block of storage of the new size, moving the data to the new block, and freeing the original block. The allocated blocks consist of the pieces of the heap being used by the application. Because the memory blocks are not physically removed from the heap (they simply change state from free to in-use), the size of the heap does not decrease when memory is freed by the application.
The allocation policy refers to the set of data structures and algorithms employed to represent the heap and to implement allocation, deallocation, and reallocation. The malloc subsystem supports two allocation policies: the default allocation policy and the malloc 3.1 allocation policy. The interface to the malloc subsystem is identical for both allocation policies.
The default allocation policy is generally more efficient and is the preferred choice for the majority of applications. The malloc 3.1 allocation policy has some unique behavioral characteristics that may be beneficial in specific circumstances, as described under Comparing the Default and malloc 3.1 Allocation Policies . However, the malloc 3.1 allocation policy is only available for use with 32-bit applications. It is not supported for 64-bit applications.
The default allocation policy maintains the free space in the heap as a free tree. The free tree is a binary tree in which nodes are sorted vertically by length and horizontally by address. The data structure imposes no limitation on the number of block sizes supported by the tree, allowing a wide range of potential block sizes. Tree-reorganization techniques optimize access times for node location, insertion, and deletion, and also protect against fragmentation.
The default allocation policy provides support for the following optional capabilities:
The number of bytes required for a block is calculated using a roundup function. The equation is as follows:
If x mod y = 0, then Roundup(x,y) = x else, Roundup(x,y) = (x/y rounded down to the nearest integer + 1)y p = sizeof(prefix)=8 pad = Roundup(len + p,16)
The leftmost node of the tree that is greater than or equal to the size of the malloc subroutine len parameter value is removed from the tree. If the block found is larger than the needed size, the block is divided into two blocks: one of the needed size, and the second a remainder. The second block, called the runt, is returned to the free tree for future allocation. The first block is returned to the caller.
If a block of sufficient size is not found in the free tree, the heap is expanded, a block the size of the acquired extension is added to the free tree, and allocation continues as previously described.
Memory blocks deallocated with the free subroutine are returned to the tree, at the root. Each node along the path to the insertion point for the new node is examined to see if it adjoins the node being inserted. If it does, the two nodes are merged and the newly merged node is relocated in the tree. Length determines the depth of a node in the tree. If no adjoining block is found, the node is simply inserted at the appropriate place in the tree. Merging adjacent blocks can significantly reduce heap fragmentation.
If the size of the reallocated block will be larger than the original block, the original block is returned to the free tree with the free subroutine so that any possible coalescence can occur. A new block of the requested size is then allocated, the data is moved from the original block to the new block, and the new block is returned to the caller.
If the size of the reallocated block is smaller than the original block, the block is split and the smaller one is returned to the free tree.
The malloc 3.1 allocation policy can be invoked by entering:
MALLOCTYPE=3.1; export MALLOCTYPE
Thereafter, all 32-bit programs run by the shell will use the malloc 3.1 allocation policy (64-bit programs will continue to use the default allocation policy). Setting the MALLOCTYPE environment variable to any value other than 3.1 causes the default allocation policy to be used.
The malloc 3.1 allocation policy maintains the heap as a set of 28 hash buckets, each of which points to a linked list. Hashing is a method of transforming a search key into an address for the purpose of storing and retrieving items of data. The method is designed to minimize average search time. A bucket is one or more fields in which the result of an operation is kept. Each linked list contains blocks of a particular size. The index into the hash buckets indicates the size of the blocks in the linked list. The size of the block is calculated using the following formula:
size = 2 i + 4
where i identifies the bucket. This means that the blocks in the list anchored by bucket zero are 20+4 = 16 bytes long. Therefore, given that a prefix is 8 bytes in size, these blocks can satisfy requests for blocks between 0 and 8 bytes long. The following table illustrates how requested sizes are distributed among the buckets.
Bucket | Block Size | Sizes Mapped | Pages Used |
---|---|---|---|
0 | 16 | 0 ... 8 |
|
1 | 32 | 9 ... 24 |
|
2 | 64 | 25 ... 56 |
|
3 | 128 | 57 ... 120 |
|
4 | 256 | 121 ... 248 |
|
5 | 512 | 249 ... 504 |
|
6 | 1K | 505 ... 1K-8 |
|
7 | 2K | 1K-7 ... 2K-8 |
|
8 | 4K | 2K-7 ... 4K-8 | 2 |
9 | 8K | 4K-7 ... 8K-8 | 3 |
10 | 16K | 8K-7 ... 16K-8 | 5 |
11 | 32K | 16K-7 ... 32K-8 | 9 |
12 | 64K | 32K-7 ... 64K-8 | 17 |
13 | 128K | 64K-7 ... 128K-8 | 33 |
14 | 256K | 128K-7 ... 256K-8 | 65 |
15 | 512K | 256K-7 ... 512K-8 | 129 |
16 | 1M | 256K-7 ... 1M-8 | 257 |
17 | 2M | 1M-7 ... 2M-8 | 513 |
18 | 4M | 2M-7 ... 4M-8 | 1K + 1 |
19 | 8M | 4M-7 ... 8M-8 | 2K + 1 |
20 | 16M | 8M-7 ... 16M-8 | 4K + 1 |
21 | 32M | 16M-7 ... 32M-8 | 8K + 1 |
22 | 64M | 32M-7 ... 64M-8 | 16K + 1 |
23 | 128M | 64M-7 ... 128M-8 | 32K + 1 |
24 | 256M | 128M-7 ... 256M-8 | 64K + 1 |
25 | 512M | 256M-7 ... 512M-8 | 128K + 1 |
26 | 1024M | 512M-7 ... 1024M-8 | 256K + 1 |
27 | 2048M | 1024M-7 ... 2048M-8 | 512K + 1 |
A block is allocated from the free pool by first converting the requested bytes to an index in the bucket array, using the following equation:
needed = requested + 8 If needed <= 16, then bucket = 0 If needed > 16, then bucket = (log(needed)/log(2) rounded down to the nearest integer) - 3
The size of each block in the list anchored by the bucket is block size = 2 bucket + 4. If the list in the bucket is null, memory is allocated using the sbrk subroutine to add blocks to the list. If the block size is less than a page, then a page is allocated using the sbrk subroutine, and the number of blocks arrived at by dividing the block size into the page size are added to the list. If the block size is equal to or greater than a page, needed memory is allocated using the sbrk subroutine, and a single block is added to the free list for the bucket. If the free list is not empty, the block at the head of the list is returned to the caller. The next block on the list then becomes the new head.
When a block of memory is returned to the free pool, the bucket index is calculated as with allocation. The block to be freed is then added to the head of the free list for the bucket.
When a block of memory is reallocated, the needed size is compared against the existing size of the block. Because of the wide variance in sizes handled by a single bucket, the new block size often maps to the same bucket as the original block size. In these cases, the length of the prefix is updated to reflect the new size and the same block is returned. If the needed size is greater than the existing block, the block is freed, a new block is allocated from the new bucket, and the data is moved from the old block to the new block.
The malloc 3.1 allocation policy is available for use with 32-bit applications only. If MALLOCTYPE=3.1 is specified for a 64-bit application, the default allocation policy is used instead.
The malloc 3.1 allocation policy does not support any of the following capabilities:
Because the malloc 3.1 allocation policy rounds up the size of each allocation request to the next power of 2, it can produce considerable virtual- and real-memory fragmentation and poor locality of reference. The default allocation policy is generally a better choice because it allocates exactly the amount of space requested and is more efficient about reclaiming previously used blocks of memory.
Unfortunately, some application programs may depend inadvertently on side effects of the malloc 3.1 allocation policy for acceptable performance or even for correct functioning. For example, a program that overruns the end of an array may function correctly when using the malloc 3.1 allocator only because of the additional space provided by the rounding-up process. The same program is likely to experience erratic behavior or even fail when used with default allocator because the default allocator allocates only the number of bytes requested.
As another example, because of the inefficient space reclamation of the malloc 3.1 allocation algorithm, the application program almost always receives space that has been set to zeros (when a process touches a given page in its working segment for the first time, that page is set to zeros). Applications may depend on this side effect for correct execution. In fact, zeroing out of the allocated space is not a specified function of the malloc subroutine and would result in an unnecessary performance penalty for programs that initialize only as required and possibly not to zeros. Because the default allocator is more aggressive about reusing space, programs that are dependent on receiving zeroed storage from malloc will probably fail when the default allocator is used.
Similarly, if a program continually reallocates a structure to a slightly greater size, the malloc 3.1 allocator may not need to move the structure very often. In many cases, the realloc subroutine can make use of the extra space provided by the rounding implicit in the malloc 3.1 allocation algorithm. The default allocator will usually have to move the structure to a slightly larger area because of the likelihood that something else has been called by the malloc subroutine just above it. This may present the appearance of a degradation in realloc subroutine performance when the default allocator is used instead of the malloc 3.1 allocator. In reality, it is the surfacing of a cost that is implicit in the application program's structure.
To reduce the overhead of glink code, the function descriptor for the malloc subsystem functions are overwritten with the function descriptor for the actual implementation. Because some programs might not work when function pointers are modified, the no_overwrite option can be used to disable this optimization.
To disable optimization, set the MALLOCTYPE environment variable as follows:
MALLOCTYPE=no_overwrite