Both sides previous revisionPrevious revisionNext revision | Previous revision |
linux:kernel:huge_page_table [2019/12/15 22:05] – peter | linux:kernel:huge_page_table [2022/10/08 11:43] (current) – peter |
---|
====== Kernel - Huge Page Table ====== | ====== Kernel - Huge Page Table ====== |
| |
Here is a brief summary of **hugetlbpage** support in the Linux kernel. This support is built on top of multiple page size support that is provided by most modern architectures. For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. A TLB is a cache of virtual-to-physical translations. Typically this is a very scarce resource on processor. | Here is a brief summary of **hugetlbpage** support in the Linux kernel. |
| |
Operating systems try to make best use of limited number of TLB resources. This optimization is more critical now as bigger and bigger physical memories (several GBs) are more readily available. | * This support is built on top of multiple page size support that is provided by most modern architectures. |
| * For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. |
| * A TLB is a cache of virtual-to-physical translations. |
| * Typically this is a very scarce resource on processor. |
| |
| Operating systems try to make best use of limited number of TLB resources. |
| |
| * This optimization is more critical now as bigger and bigger physical memories (several GBs) are more readily available. |
| |
Users can use the huge page support in Linux kernel by either using the **mmap** system call or standard SYSV shared memory system calls (shmget, shmat). | Users can use the huge page support in Linux kernel by either using the **mmap** system call or standard SYSV shared memory system calls (shmget, shmat). |
First the Linux kernel needs to be built with the **CONFIG_HUGETLBFS** (present under "File systems") and **CONFIG_HUGETLB_PAGE** (selected automatically when CONFIG_HUGETLBFS is selected) configuration options. | First the Linux kernel needs to be built with the **CONFIG_HUGETLBFS** (present under "File systems") and **CONFIG_HUGETLB_PAGE** (selected automatically when CONFIG_HUGETLBFS is selected) configuration options. |
| |
The **/proc/meminfo** file provides information about the total number of persistent hugetlb pages in the kernel's huge page pool. It also displays information about the number of free, reserved and surplus huge pages and the default huge page size. The huge page size is needed for generating the proper alignment and size of the arguments to system calls that map huge page regions. | The **/proc/meminfo** file provides information about the total number of persistent hugetlb pages in the kernel's huge page pool. |
| |
| * It also displays information about the number of free, reserved and surplus huge pages and the default huge page size. |
| * The huge page size is needed for generating the proper alignment and size of the arguments to system calls that map huge page regions. |
| |
The output of "cat /proc/meminfo" will include lines like: | The output of "cat /proc/meminfo" will include lines like: |
When adjusting the persistent hugepage count via **nr_hugepages_mempolicy**, any memory policy mode--bind, preferred, local or interleave--may be used. The resulting effect on persistent huge page allocation is as follows: | When adjusting the persistent hugepage count via **nr_hugepages_mempolicy**, any memory policy mode--bind, preferred, local or interleave--may be used. The resulting effect on persistent huge page allocation is as follows: |
| |
1) Regardless of mempolicy mode [see [[Kernel:Memory Policy|numa_memory_policy]]], persistent huge pages will be distributed across the node or nodes specified in the mempolicy as if "interleave" had been specified. However, if a node in the policy does not contain sufficient contiguous memory for a huge page, the allocation will not "fallback" to the nearest neighbor node with sufficient contiguous memory. To do this would cause undesirable imbalance in the distribution of the huge page pool, or possibly, allocation of persistent huge pages on nodes not allowed by the task's memory policy. | 1) Regardless of mempolicy mode [see [[Linux:Kernel:Memory Policy|numa_memory_policy]]], persistent huge pages will be distributed across the node or nodes specified in the mempolicy as if "interleave" had been specified. However, if a node in the policy does not contain sufficient contiguous memory for a huge page, the allocation will not "fallback" to the nearest neighbor node with sufficient contiguous memory. To do this would cause undesirable imbalance in the distribution of the huge page pool, or possibly, allocation of persistent huge pages on nodes not allowed by the task's memory policy. |
| |
2) One or more nodes may be specified with the bind or interleave policy. If more than one node is specified with the preferred policy, only the lowest numeric id will be used. Local policy will select the node where the task is running at the time the nodes_allowed mask is constructed. For local policy to be deterministic, the task must be bound to a cpu or cpus in a single node. Otherwise, the task could be migrated to some other node at any time after launch and the resulting node will be indeterminate. Thus, local policy is not very useful for this purpose. Any of the other mempolicy modes may be used to specify a single node. | 2) One or more nodes may be specified with the bind or interleave policy. If more than one node is specified with the preferred policy, only the lowest numeric id will be used. Local policy will select the node where the task is running at the time the nodes_allowed mask is constructed. For local policy to be deterministic, the task must be bound to a cpu or cpus in a single node. Otherwise, the task could be migrated to some other node at any time after launch and the resulting node will be indeterminate. Thus, local policy is not very useful for this purpose. Any of the other mempolicy modes may be used to specify a single node. |