[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

General Programming Concepts:
Writing and Debugging Programs

System Memory Allocation Using the malloc Subsystem

Memory is allocated to applications using the malloc subsystem. The malloc subsystem is a memory management API that consists of the following subroutines:

The malloc subsystem manages a logical memory object called a heap. The heap is a region of memory that resides in the application's address space between the last byte of data allocated by the compiler and the end of the data region. The heap is the memory object from which memory is allocated and to which memory is returned by the malloc subsystem API.

The malloc subsystem performs the following fundamental memory operations:

The mallopt and mallinfo subroutines are supported for System V compatibility. The mallinfo subroutine can be used during program development to obtain information about the heap managed by the malloc subroutine. The mallopt subroutine can be used to disclaim page-aligned, page-sized free memory, and to enable and disable the default allocator. Similar to the malloc subroutine, the valloc subroutine is provided for compatibility with the Berkeley Compatibility Library.

For additional information, see the following sections:

Working with the Heap in 32-bit Applications

A 32-bit application program running on the system has an address space that is divided into the following segments:

0x00000000 to 0x0fffffff Contains the kernel
0x10000000 to 0x1fffffff Contains the application program text
0x20000000 to 0x2fffffff Contains the application program data and the application stack
0x30000000 to 0xcfffffff Available for use by shared memory or mmap services
0xd0000000 to 0xdfffffff Contains shared library text
0xe0000000 to 0xefffffff Available for use by shared memory or mmap services
0xf0000000 to 0xffffffff Contains the application shared library data

Working with the Heap in 64-bit Applications

A 64-bit application program running on the system has an address space that is divided into the following segments:

0x0000 0000 0000 0000 to 0x0000 0000 0fff ffff Contains the kernel
0x0000 0000 f000 0000 to 0x0000 0000 0fff ffff Reserved
0x0000 0001 0000 0000 to 0x07ff ffff ffff ffff Contains the application program text and application program data, and shared memory or mmap services
0x0800 0000 0000 0000 to 0x08ff ffff ffff ffff Privately loaded objects
0x0900 0000 0000 0000 to 0x09ff ffff ffff ffff Shared library text and data
0x0f00 0000 0000 0000 to 0x0fff ffff ffff ffff Application stack

The _edata location is an identifier that points to the first byte following the last byte of program data. The heap is created by the malloc subsystem when the first block of data is allocated. The malloc subroutine creates the heap by calling the sbrk subroutine to move up the _edata location to make room for the heap. The malloc subroutine then expands the heap as the needs of the application dictate. Space for the heap is acquired in increments determined by the BRKINCR value. This value can be examined using the mallinfo subroutine.

The heap is divided into allocated blocks and freed blocks. The free pool consists of the memory available for subsequent allocation. An allocation is completed by first removing a block from the free pool and then returning to the free pool a pointer to this block. A reallocation is completed by allocating a block of storage of the new size, moving the data to the new block, and freeing the original block. The allocated blocks consist of the pieces of the heap being used by the application. Because the memory blocks are not physically removed from the heap (they simply change state from free to in-use), the size of the heap does not decrease when memory is freed by the application.

Understanding System Allocation Policy

The allocation policy refers to the set of data structures and algorithms employed to represent the heap and to implement allocation, deallocation, and reallocation. The malloc subsystem supports two allocation policies: the default allocation policy and the malloc 3.1 allocation policy. The interface to the malloc subsystem is identical for both allocation policies.

The default allocation policy is generally more efficient and is the preferred choice for the majority of applications. The malloc 3.1 allocation policy has some unique behavioral characteristics that may be beneficial in specific circumstances, as described under Comparing the Default and malloc 3.1 Allocation Policies . However, the malloc 3.1 allocation policy is only available for use with 32-bit applications. It is not supported for 64-bit applications.

Understanding the Default Allocation Policy

The default allocation policy maintains the free space in the heap as a free tree. The free tree is a binary tree in which nodes are sorted vertically by length and horizontally by address. The data structure imposes no limitation on the number of block sizes supported by the tree, allowing a wide range of potential block sizes. Tree-reorganization techniques optimize access times for node location, insertion, and deletion, and also protect against fragmentation.

The default allocation policy provides support for the following optional capabilities:

Allocation

The number of bytes required for a block is calculated using a roundup function. The equation is as follows:

If x mod y = 0, then
Roundup(x,y) = x
else,
Roundup(x,y) = (x/y rounded down to the nearest integer + 1)y

p = sizeof(prefix)=8

pad = Roundup(len + p,16)

The leftmost node of the tree that is greater than or equal to the size of the malloc subroutine len parameter value is removed from the tree. If the block found is larger than the needed size, the block is divided into two blocks: one of the needed size, and the second a remainder. The second block, called the runt, is returned to the free tree for future allocation. The first block is returned to the caller.

If a block of sufficient size is not found in the free tree, the heap is expanded, a block the size of the acquired extension is added to the free tree, and allocation continues as previously described.

Deallocation

Memory blocks deallocated with the free subroutine are returned to the tree, at the root. Each node along the path to the insertion point for the new node is examined to see if it adjoins the node being inserted. If it does, the two nodes are merged and the newly merged node is relocated in the tree. Length determines the depth of a node in the tree. If no adjoining block is found, the node is simply inserted at the appropriate place in the tree. Merging adjacent blocks can significantly reduce heap fragmentation.

Reallocation

If the size of the reallocated block will be larger than the original block, the original block is returned to the free tree with the free subroutine so that any possible coalescence can occur. A new block of the requested size is then allocated, the data is moved from the original block to the new block, and the new block is returned to the caller.

If the size of the reallocated block is smaller than the original block, the block is split and the smaller one is returned to the free tree.

Understanding the malloc 3.1 Allocation Policy

The malloc 3.1 allocation policy can be invoked by entering:

MALLOCTYPE=3.1; export MALLOCTYPE

Thereafter, all 32-bit programs run by the shell will use the malloc 3.1 allocation policy (64-bit programs will continue to use the default allocation policy). Setting the MALLOCTYPE environment variable to any value other than 3.1 causes the default allocation policy to be used.

The malloc 3.1 allocation policy maintains the heap as a set of 28 hash buckets, each of which points to a linked list. Hashing is a method of transforming a search key into an address for the purpose of storing and retrieving items of data. The method is designed to minimize average search time. A bucket is one or more fields in which the result of an operation is kept. Each linked list contains blocks of a particular size. The index into the hash buckets indicates the size of the blocks in the linked list. The size of the block is calculated using the following formula:

size = 2 i + 4

where i identifies the bucket. This means that the blocks in the list anchored by bucket zero are 20+4 = 16 bytes long. Therefore, given that a prefix is 8 bytes in size, these blocks can satisfy requests for blocks between 0 and 8 bytes long. The following table illustrates how requested sizes are distributed among the buckets.

Note
This algorithm can use as much as twice the amount of memory actually allocated by the application. An extra page is required for buckets larger than 4096 bytes because objects of a page in size or larger are page-aligned. Because the prefix immediately precedes the block, an entire page is required solely for the prefix.
Bucket Block Size Sizes Mapped Pages Used
0 16 0 ... 8
1 32 9 ... 24
2 64 25 ... 56
3 128 57 ... 120
4 256 121 ... 248
5 512 249 ... 504
6 1K 505 ... 1K-8
7 2K 1K-7 ... 2K-8
8 4K 2K-7 ... 4K-8 2
9 8K 4K-7 ... 8K-8 3
10 16K 8K-7 ... 16K-8 5
11 32K 16K-7 ... 32K-8 9
12 64K 32K-7 ... 64K-8 17
13 128K 64K-7 ... 128K-8 33
14 256K 128K-7 ... 256K-8 65
15 512K 256K-7 ... 512K-8 129
16 1M 256K-7 ... 1M-8 257
17 2M 1M-7 ... 2M-8 513
18 4M 2M-7 ... 4M-8 1K + 1
19 8M 4M-7 ... 8M-8 2K + 1
20 16M 8M-7 ... 16M-8 4K + 1
21 32M 16M-7 ... 32M-8 8K + 1
22 64M 32M-7 ... 64M-8 16K + 1
23 128M 64M-7 ... 128M-8 32K + 1
24 256M 128M-7 ... 256M-8 64K + 1
25 512M 256M-7 ... 512M-8 128K + 1
26 1024M 512M-7 ... 1024M-8 256K + 1
27 2048M 1024M-7 ... 2048M-8 512K + 1

Allocation

A block is allocated from the free pool by first converting the requested bytes to an index in the bucket array, using the following equation:

needed = requested + 8

If needed <= 16,
then
bucket = 0

If needed > 16,
then
bucket = (log(needed)/log(2) rounded down to the nearest integer) - 3

The size of each block in the list anchored by the bucket is block size = 2 bucket + 4. If the list in the bucket is null, memory is allocated using the sbrk subroutine to add blocks to the list. If the block size is less than a page, then a page is allocated using the sbrk subroutine, and the number of blocks arrived at by dividing the block size into the page size are added to the list. If the block size is equal to or greater than a page, needed memory is allocated using the sbrk subroutine, and a single block is added to the free list for the bucket. If the free list is not empty, the block at the head of the list is returned to the caller. The next block on the list then becomes the new head.

Deallocation

When a block of memory is returned to the free pool, the bucket index is calculated as with allocation. The block to be freed is then added to the head of the free list for the bucket.

Reallocation

When a block of memory is reallocated, the needed size is compared against the existing size of the block. Because of the wide variance in sizes handled by a single bucket, the new block size often maps to the same bucket as the original block size. In these cases, the length of the prefix is updated to reflect the new size and the same block is returned. If the needed size is greater than the existing block, the block is freed, a new block is allocated from the new bucket, and the data is moved from the old block to the new block.

Limitations

The malloc 3.1 allocation policy is available for use with 32-bit applications only. If MALLOCTYPE=3.1 is specified for a 64-bit application, the default allocation policy is used instead.

The malloc 3.1 allocation policy does not support any of the following capabilities:

Comparing the Default and malloc 3.1 Allocation Policies

Because the malloc 3.1 allocation policy rounds up the size of each allocation request to the next power of 2, it can produce considerable virtual- and real-memory fragmentation and poor locality of reference. The default allocation policy is generally a better choice because it allocates exactly the amount of space requested and is more efficient about reclaiming previously used blocks of memory.

Unfortunately, some application programs may depend inadvertently on side effects of the malloc 3.1 allocation policy for acceptable performance or even for correct functioning. For example, a program that overruns the end of an array may function correctly when using the malloc 3.1 allocator only because of the additional space provided by the rounding-up process. The same program is likely to experience erratic behavior or even fail when used with default allocator because the default allocator allocates only the number of bytes requested.

As another example, because of the inefficient space reclamation of the malloc 3.1 allocation algorithm, the application program almost always receives space that has been set to zeros (when a process touches a given page in its working segment for the first time, that page is set to zeros). Applications may depend on this side effect for correct execution. In fact, zeroing out of the allocated space is not a specified function of the malloc subroutine and would result in an unnecessary performance penalty for programs that initialize only as required and possibly not to zeros. Because the default allocator is more aggressive about reusing space, programs that are dependent on receiving zeroed storage from malloc will probably fail when the default allocator is used.

Similarly, if a program continually reallocates a structure to a slightly greater size, the malloc 3.1 allocator may not need to move the structure very often. In many cases, the realloc subroutine can make use of the extra space provided by the rounding implicit in the malloc 3.1 allocation algorithm. The default allocator will usually have to move the structure to a slightly larger area because of the likelihood that something else has been called by the malloc subroutine just above it. This may present the appearance of a degradation in realloc subroutine performance when the default allocator is used instead of the malloc 3.1 allocator. In reality, it is the surfacing of a cost that is implicit in the application program's structure.

Optimizing Malloc

To reduce the overhead of glink code, the function descriptor for the malloc subsystem functions are overwritten with the function descriptor for the actual implementation. Because some programs might not work when function pointers are modified, the no_overwrite option can be used to disable this optimization.

To disable optimization, set the MALLOCTYPE environment variable as follows:

MALLOCTYPE=no_overwrite
Note
The no_overwrite option can be used with any other MALLOCTYPE option. Separate multiple options with a comma.

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]