https://web.archive.org/web/20081025112805/http://users.snip.net/~gbooker/AS400/arch.htm# Notes for storage research David McKenzie   Contents 1 AS/400 Architecture 1.1 Hardware 1.1.1 Processors 1.1.1.1 Service Processor 1.1.1.2 IOP's 1.1.1.3 System Processor 1.1.1.3.1 Control Storage 1.1.1.3.2 Registers 1.1.1.3.3 VAT - Virtual Address Translator 1.2 Software 1.2.1 Layers 1.3 Objects 1.3.1 MI Objects 1.3.2 OS/400 Objects 1.4 Single Level Storage 1.4.1 Addresses 1.4.1.1 V=R addresses (Virtual = Real) 1.4.1.2 Permanent addresses 1.4.1.3 Temporary addresses 1.4.1.4 Access Group addresses 1.4.2 Address translation 1.4.2.1 RAR's (Resolved Address Registers) 1.4.2.2 Lookaside Buffer 1.4.2.3 Primary Directory 1.4.3 Tagged pointers 2 Storage Management 2.1 Auxiliary Storage Management 2.1.1 Units of storage 2.1.1.1 Sectors 2.1.1.2 Segments 2.1.1.3 Segment groups 2.1.2 Extents 2.1.3 Access Groups 2.1.4 Directories 2.1.4.1 Free space directory 2.1.4.2 Permanent directory 2.1.4.3 Temporary directory 2.1.4.4 Access group member directory 2.2 Main Storage Management 2.2.1 Paging 2.2.1.1 Faults 2.2.1.2 Perform Paging Request instruction 2.2.2 Primary directory 2.2.3 Hash table 2.2.4 Pools   1. Notes for storage research 1.1 AS/400 Architecture 1.1.1 Hardware 1.1.1.1 Processors The AS/400 houses several processors: 1.1.1.1.1 Service Processor The service processor is a Motorola 68020 microprocessor which perform various functions when the main system processor is not functional, as follows: Directs IPL, loading microcode for the system processor. Interfaces to the control panel. Dumps main memory to disk after a system processor failure. On stage 2 machines (D models), the service processor function is performed by the Multiple Function I/O Processor card. The service processor is programmed in an IBM-proprietary source language called PL.8, so named because it constitutes about 80% of PL/I. 1.1.1.1.2 IOP's These are separate cards that control various I/O devices, such as the Magnetic Storage Device Controller and the Multiline Communications Adapter. They are generally 68020 microprocessors programmed in PL.8. 1.1.1.1.3 System Processor The system processor is a microcoded, as opposed to hard-wired processor. IBM's term for what is usually called microcode in the industry is HLIC, Horizontal Licensed Internal Code (on the S/38 called HMC--Horizontal Microcode). It runs in control storage (see below). The code that the industry would call machine code IBM calls VLIC, Vertical Licensed Internal Code (previously VMC). It runs in main memory. HLIC executes the VLIC interpretively; that is, each VLIC instruction is read and executed by HLIC routines. The machine cannot execute VLIC directly. The VLIC closely resembles 370 mainframe machine code, having the same 2, 4, and 6 byte instruction formats (though the indexed formats are missing) and using base-displacement addressing. IBM's curious directional terminology, horizontal and vertical, for the microcode stems simply from the way the source code appears on the printed page: HLIC instructions print out as boxes running left to right across the page, while VLIC instructions print as traditional assembly code--one instruction after another down the page. The system processor board contains the following components: 1.1.1.1.3.1 Control Storage This small, fast memory stores the HLIC. Some infrequently-used HLIC routines are stored in main memory and swapped into control storage to execute. The word length on the original AS/400 models was 42 bits, plus ECC bits, but this has probably increased on the D models. The bits of HLIC words typically control processor hardware directly; for example gating a hardware register to an ALU input. 1.1.1.1.3.2 Registers In addition to various hardware registers addressed by HLIC, the processor has 16 48-bit registers that are accessible to VLIC machine code. They may contain data in byte, half-word (16-bit) and word lengths, or 48-bit addresses. It is important to note that these are virtual addresses, capable of addressing the entire single-level address space, including all data on disk. 1.1.1.1.3.3 VAT - Virtual Address Translator A significant portion of the real estate on the system processor chips is used for translating virtual addresses to real (physical) memory addresses. See below under "Address Translation." 1.1.2 Software 1.1.2.1 Layers The OS/400 operating system is organized in three distinct layers, corresponding to the three types of machine code: HLIC, VLIC and MI. Each layer is written in a different assembler language, is compiled by a separate compiler at IBM Rochester, and communicates with the adjacent layer(s) through very specific, well-defined interfaces. The architecture was designed this way so that a given layer could be altered or even completely re-implemented without affecting the other layers, although IBM has not up to now made use of this flexibility. The layers are: HLIC: Horizontal Licensed Internal Code. About 310 K bytes running in control storage. VLIC: Vertical Licensed Internal Code. About 3.5 M bytes in 5000 modules having names starting with "#" that run in main memory. MI: Machine Interface (Operating System/400). About 75 M bytes in 5,500 programs stored in the QSYS library. MI is actually only an abstract machine language. It exists only during compilation when it is translated into VLIC machine code, which is stored as the object program and runs in main memory. 1.1.3 Objects The operating system organizes storage into objects. Each object has a name and belongs to one of a fixed set of object types. Each object type defines the properties common to objects of that type. The system and user programs that manipulate these objects issue commands and instructions that are specific to the object type, and the operating system ensures that the commands are valid for the type. This approach is in contrast to operating systems such as DOS and UNIX in which everything is stored in files of raw bytes, with the properties of the objects that are embodied in the files being determined by the programs that manipulate them. This allows errors such as trying to execute a data file as a program. 1.1.3.1 MI Objects At the MI level there are 25 types, identified by a 1-byte type field, and a total of 196 sub-types, identified by a second 1-byte field. Examples of types are program (02), queue (0A) and space (19). Sub-types indicate further specialization of the types, for example data queue (0A01), and job description (1903), a specialized kind of space. Each MI object is implemented in storage as one or more segment groups (see below under "Storage Management"). Each object has a virtual address, which is the address of the first or only segment group comprising it. 1.1.3.2 OS/400 Objects There are 64 types of objects defined at the OS/400 level, such as file, program, job queue, and device description. Some of these objects consist of a single MI object, such as job description, which consists only of a space, type 1903. Many OS/400 types consist of multiple MI objects, for example library, consisting of a context (04), an index (0E) and an "Object Information Repository," which is an MI space (19). The multiple MI objects are tied together by embedded pointers to the other pieces of the OS/400 object. 1.1.4 Single Level Storage The single-level storage concept is probably the most important technical innovation of the AS/400, and of the S/38 before it--and perhaps one of the least understood. 1.1.4.1 Addresses All storage, both main memory and disk, is addressed by a 48-bit address. This gives a theoretical address space of 256 trillion bytes (2 ** 48), but in fact several address bits are used for specific purposes, reducing the address space size. 1.1.4.1.1 V=R addresses (Virtual = Real) Virtual addresses in which the high order bits are 1000 are real (physical) memory addresses, used for speed since no translation is required. They address critical operating system data that are "pinned" in main memory; that is, are never paged to disk. 1.1.4.1.2 Permanent addresses Addresses with the high order bit = 0 address permanent storage, that is, storage segments that will survive an IPL. 1.1.4.1.3 Temporary addresses Addresses with the high order bit = 1 address temporary objects which the system destroys during the IPL process. 1.1.4.1.4 Access Group addresses Addresses having 11 in the high order two bits address temporary objects contained in access groups (see below). 1.1.4.2 Address translation Before data at a virtual address can be accessed, the address must be translated to a real memory address in order to locate the data in memory, if it is in memory. This translation occurs during the execution of a VLIC instruction for the address of the instruction itself, for each operand (up to 2 operands per instruction), and for I/O operations executed by HLIC. To speed up translation, the processor saves recently used address pairs (a virtual address and its corresponding real address) in several places, and looks for a particular virtual address in the following order: 1.1.4.2.1 RAR's (Resolved Address Registers) These hardware registers save the most recently used virtual address and corresponding resolved (translated) real address. There is one RAR for instruction addresses, 2 for VLIC operands, and an array of RAR's for I/O addresses. 1.1.4.2.2 Lookaside Buffer The lookaside buffer is an array of hardware registers for recently translated addresses. It operates as an associative memory. 1.1.4.2.3 Primary Directory This is a directory stored in main memory, containing the virtual addresses of all pages currently in memory. It is pinned, meaning it is never paged to disk, and occupies one sixteenth of total memory. For further details of address translation, see "Main Storage Management," below. 1.1.4.3 Tagged pointers Most computers allow programs to compute addresses, for example to access an array. This opens the possibility of a program error causing the wrong storage to be addressed. If data is modified at a wrong address, the results can be disastrous, resulting in a system crash. Operating systems usually provide memory protection facilities to guard against the possibility. The AS/400's solution for the MI level of code is to keep all addresses in 16-byte pointers stored in memory. Each 16 bytes of system memory has an extra tag bit associated with it, not included in the 16 bytes. If a particular 16 byte area stores a pointer, the tag bit is set to indicate that fact. When MI instructions use the pointer to address data, the hardware verifies that the tag bit is set; if not, an exception is signalled and the user sees the ubiquitous message "Referenced location in a space that does not contain a pointer." Special MI instructions are provided to manipulate pointers, for example "Set Space Pointer;" these instructions implicitly cause the tag bit to be set. However, they can only be used to set pointers to point to authorized places; they can't be used to address things like operating system code or data structures. It is possible to use normal non-pointer memory manipulation instructions to alter the contents of a pointer in memory, but the hardware implicitly resets the tag bit. Thus, a program attempting, by accident or design, to "counterfeit" a pointer to point to an unauthorized place causes it to become unusable as a pointer. 1.2 Storage Management The storage management code of the operating system is divided into two components: Auxiliary and Main, corresponding to disk and main memory. These components are part of VLIC. 1.2.1 Auxiliary Storage Management Auxiliary storage management (ASM) is responsible for allocating storage, assigning virtual addresses and maintaining the directories used to locate addresses on disk. To ASM, disk storage appears as multiple drives, each addressed by a unit number, and each containing a number of 520 byte sectors addressed by relative sector number, from 0 to 1 less than the number of sectors on the drive. 1.2.1.1 Units of storage ASM organizes the virtual storage address space in a heirarchy of units, as follows: 1.2.1.1.1 Sectors The basic unit of storage is the sector. Sectors are recorded on disk as 520 bytes: 512 bytes of data and an 8 byte header. The header contains the virtual address, size of the extent to which the sector belongs (an extent is a collection of contiguous sectors having contiguous virtual addresses), data specifying where the pointer tag bits for the sector are stored within the sector, and some flags. Because the virtual address is stored in each sector, the storage management directories can be reconstructed by examining the sectors on disk. Directory recovery is one of the main causes of a long IPL time after an abnormal system shutdown. 1.2.1.1.2 Segments A segment is up to 64 K of contiguous virtual addresses starting on a 64 K boundary, that is, an address having the last 16 bits zero. The left 4 bytes of a virtual address is called the segment ID (SID). The 128 sectors comprising a segment are not necessarily stored contiguously on disk; they may be scattered at different locations on different drives. They are located using the directories (see below). 1.2.1.1.3 Segment groups A segment group is up to 16 megabytes of contiguous virtual addresses starting on a 16 M boundary, that is, an address having the last 24 bits zero. In other words, it is 256 segments starting on a 16 M boundary. The left 24 bits of a virtual address indicate one of up to 16 M segment groups, and the right 24 bits indicate an offset within the segment group. The significance of the segment group is that each MI object occupies a unique segment group; objects larger than 16 megabytes consist of multiple segment groups. This implies that there can be no more than 16 million MI objects on a single AS/400 (in fact the number is smaller). 1.2.1.2 Extents Sectors on disk are arranged in extents. An extent is a group of contiguous disk sectors having contiguous virtual addresses. The number of sectors in an extent is 1 to 32,768 in powers of two, e.g. 1, 2, 4, 8... sectors. This gives a size from 512 bytes to 16 megabytes. Extents are located on disk at boundaries equal to their size; thus a 1 sector extent can be in any disk sector, an 8 sector extent can start at any sector number that is a multiple of 8, and so forth. 1.2.1.3 Access Groups An access group is a VLIC-level object that collects other objects together to allow them to be read into memory as a group, rather that being read in page by page through the normal paging process. This reduces the number of disk accesses required. The main use of access groups is for process access groups (PAG's). Each job on the AS/400 has a PAG containing storage for the fields in the programs that are running in the job, the job's open files and their buffers, and other data related to the job. Thus, all of the data unique to a job can be loaded into memory quickly when the job becomes active after a wait--for example when the user presses the ENTER key. An access group consists of a table of contents and multiple disk blocks that store the objects contained in the access group. Each block is 32 K bytes (64 sectors) stored contiguously on disk on one track. The table of contents contains the disk address (unit and starting RRN) of each block, and the virtual addresses of each of the sectors in the block. A separate directory, the Access Group Member Directory, allows the system to locate the access group when a program addresses any sector in any object contained in an access group. 1.2.1.4 Directories The system maintains several directories to allow it to find storage by virtual address: 1.2.1.4.1 Free space directory The free space directory is an index containing one entry for every unallocated extent on disk; that is, those blocks of disk not currently allocated to any virtual address. Each entry has the disk unit, size and beginning disk RRN of the extent. 1.2.1.4.2 Permanent directory This directory is an index used to locate segments contained in permanent objects on disk. A permanent object is one that will survive an IPL. Each entry in the permanent directory describes a range of contiguous virtual addresses; it contains the virtual address of the first page and from one to four extent descriptors. Each extent descriptor describes a contiguous disk block and indicates the disk unit, the RRN and the size of the extent. 1.2.1.4.3 Temporary directory The temporary directory is an index of all segments belonging to objects that are destroyed during the IPL process; for example, job structure objects like the PAG that are used only while a job is active. Its format is identical to the permanent directory. 1.2.1.4.4 Access group member directory The access group member directory maps virtual addresses of objects contained in access groups to the virtual address of the access group. It allows storage management to read the entire access group into main storage when a page of an object within it is referenced. 1.2.2 Main Storage Management Main storage management (MSM) is responsible for bringing pages from disk into main memory when their virtual addresses are referenced by a program, and for writing pages to disk when the data has been changed and the memory they occupy is needed for other pages. 1.2.2.1 Paging Paging refers to the process of transferring pages of storage between disk and main memory. A page (512 bytes) is the smallest unit of storage manipulated by storage management. Paging occurs as the result of two mechanisms: page faults and the "perform paging request" VLIC machine instruction. 1.2.2.1.1 Faults A page fault occurs when a program refers to a virtual address in a page that is not currently in main memory. Such a reference can occur, for example, when a VLIC instruction is executed that moves data to or from the address, or when an I/O operation reads or writes to the address. The microcode converts the page fault exception to a "perform paging request" instruction and calls the routine that executes that function. 1.2.2.1.2 Perform Paging Request instruction This is a VLIC instruction that is used by other VLIC routines to request MSM to perform a paging function. Several options may be requested, such as: Bring: Read pages into main storage. Bring access group: Read pages of an access group into main storage. Exchange bring: Replace allocated main storage pages with other pages from disk. Clear: Clear pages to zeros. Remove: Delete pages, making the memory they occupied available. Purge: Write pages to disk. 1.2.2.2 Primary directory The primary directory maps virtual addresses to the frames of main memory they occupy. (A frame is the physical 512-byte area of memory occupied by a page of storage.) For maximum speed, the directory is "pinned" in main memory, meaning it remains in memory all the time--never being paged to disk. There is one 32- byte entry for each 512-byte frame of physical memory, which implies that one sixteenth of memory (512/32) is occupied by the primary directory. The directory entries appear in the same order as the corresponding memory frames. Each entry in the primary directory contains the virtual address of the page currently stored in the frame, its disk address, links for the hash chain and pool chains (see below) and various flags. 1.2.2.3 Hash table During the address translation process, virtual addresses are looked up in the primary directory using a hashing technique. The hardware calculates a hash value from the virtual address by shifting, reversing and exclusive-or'ing various bits of the address, giving an index into the primary directory. Because the number of pages in physical memory is much smaller than the number of pages in the virtual address space, many virtual addresses will hash to the same index; this is called synonym collision. The first page loaded into memory for a given index is placed at the corresponding location in the directory; subsequent synonym pages are placed in different entries and chained to the first entry using link fields in the directory entries. 1.2.2.4 Pools Separate memory pools may be configured in OS/400 for different subsystems. The pools are implemented by MSM as chains of frames using link fields in the primary directory entries. There are two chains per pool: a search chain used for finding free frames in the pool, and a chain of changed frames that need to be written to disk.