User Tools

Site Tools


linux:kernel:huge_page_table

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
linux:kernel:huge_page_table [2019/12/15 21:53] – created peterlinux:kernel:huge_page_table [2022/10/08 11:43] (current) peter
Line 1: Line 1:
 ====== Kernel - Huge Page Table ====== ====== Kernel - Huge Page Table ======
  
-Here is a brief summary of **hugetlbpage** support in the Linux kernel.  This support is built on top of multiple page size support that is provided by most modern architectures.  For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M.  A TLB is a cache of virtual-to-physical translations.   Typically this is a very scarce resource on processor.+Here is a brief summary of **hugetlbpage** support in the Linux kernel.
  
-Operating systems try to make best use of limited number of TLB resources.  This optimization is more critical now as bigger and bigger physical memories (several GBs) are more readily available.+  * This support is built on top of multiple page size support that is provided by most modern architectures. 
 +  * For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. 
 +  * A TLB is a cache of virtual-to-physical translations. 
 +  * Typically this is a very scarce resource on processor. 
 + 
 +Operating systems try to make best use of limited number of TLB resources. 
 + 
 +  This optimization is more critical now as bigger and bigger physical memories (several GBs) are more readily available.
  
 Users can use the huge page support in Linux kernel by either using the **mmap** system call or standard SYSV shared memory system calls (shmget, shmat). Users can use the huge page support in Linux kernel by either using the **mmap** system call or standard SYSV shared memory system calls (shmget, shmat).
Line 9: Line 16:
 First the Linux kernel needs to be built with the **CONFIG_HUGETLBFS** (present under "File systems") and **CONFIG_HUGETLB_PAGE** (selected automatically when CONFIG_HUGETLBFS is selected) configuration options. First the Linux kernel needs to be built with the **CONFIG_HUGETLBFS** (present under "File systems") and **CONFIG_HUGETLB_PAGE** (selected automatically when CONFIG_HUGETLBFS is selected) configuration options.
  
-The **/proc/meminfo** file provides information about the total number of persistent hugetlb pages in the kernel's huge page pool.  It also displays information about the number of free, reserved and surplus huge pages and the default huge page size.  The huge page size is needed for generating the proper alignment and size of the arguments to system calls that map huge page regions.+The **/proc/meminfo** file provides information about the total number of persistent hugetlb pages in the kernel's huge page pool. 
 + 
 +  It also displays information about the number of free, reserved and surplus huge pages and the default huge page size. 
 +  The huge page size is needed for generating the proper alignment and size of the arguments to system calls that map huge page regions.
  
 The output of "cat /proc/meminfo" will include lines like: The output of "cat /proc/meminfo" will include lines like:
Line 51: Line 61:
 This command will try to adjust the number of default sized huge pages in the huge page pool to 20, allocating or freeing huge pages, as required. This command will try to adjust the number of default sized huge pages in the huge page pool to 20, allocating or freeing huge pages, as required.
  
-On a NUMA platform, the kernel will attempt to distribute the huge page pool over all the set of allowed nodes specified by the NUMA memory policy of the task that modifies nr_hugepages.  The default for the allowed nodes--when the task has default memory policy--is all on-line nodes with memory.  Allowed nodes with insufficient available, contiguous memory for a huge page will be silently skipped when allocating persistent huge pages.  See the discussion below of the interaction of task memory policy, [[Kernel:CPU Sets|cpusets]] and per node attributes with the allocation and freeing of persistent huge pages.+On a NUMA platform, the kernel will attempt to distribute the huge page pool over all the set of allowed nodes specified by the NUMA memory policy of the task that modifies nr_hugepages.  The default for the allowed nodes--when the task has default memory policy--is all on-line nodes with memory.  Allowed nodes with insufficient available, contiguous memory for a huge page will be silently skipped when allocating persistent huge pages.  See the discussion below of the interaction of task memory policy, [[Linux:Kernel:CPU Sets|cpusets]] and per node attributes with the allocation and freeing of persistent huge pages.
  
 The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the allocation attempt.  If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any. The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the allocation attempt.  If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any.
Line 116: Line 126:
 When adjusting the persistent hugepage count via **nr_hugepages_mempolicy**, any memory policy mode--bind, preferred, local or interleave--may be used.  The resulting effect on persistent huge page allocation is as follows: When adjusting the persistent hugepage count via **nr_hugepages_mempolicy**, any memory policy mode--bind, preferred, local or interleave--may be used.  The resulting effect on persistent huge page allocation is as follows:
  
-1) Regardless of mempolicy mode [see [[Kernel:Memory Policy|numa_memory_policy]]], persistent huge pages will be distributed across the node or nodes specified in the mempolicy as if "interleave" had been specified.  However, if a node in the policy does not contain sufficient contiguous memory for a huge page, the allocation will not "fallback" to the nearest neighbor node with sufficient contiguous memory.  To do this would cause undesirable imbalance in the distribution of the huge page pool, or possibly, allocation of persistent huge pages on nodes not allowed by the task's memory policy.+1) Regardless of mempolicy mode [see [[Linux:Kernel:Memory Policy|numa_memory_policy]]], persistent huge pages will be distributed across the node or nodes specified in the mempolicy as if "interleave" had been specified.  However, if a node in the policy does not contain sufficient contiguous memory for a huge page, the allocation will not "fallback" to the nearest neighbor node with sufficient contiguous memory.  To do this would cause undesirable imbalance in the distribution of the huge page pool, or possibly, allocation of persistent huge pages on nodes not allowed by the task's memory policy.
  
 2) One or more nodes may be specified with the bind or interleave policy.  If more than one node is specified with the preferred policy, only the lowest numeric id will be used.  Local policy will select the node where the task is running at the time the nodes_allowed mask is constructed.  For local policy to be deterministic, the task must be bound to a cpu or cpus in a single node.  Otherwise, the task could be migrated to some other node at any time after launch and the resulting node will be indeterminate.  Thus, local policy is not very useful for this purpose.  Any of the other mempolicy modes may be used to specify a single node. 2) One or more nodes may be specified with the bind or interleave policy.  If more than one node is specified with the preferred policy, only the lowest numeric id will be used.  Local policy will select the node where the task is running at the time the nodes_allowed mask is constructed.  For local policy to be deterministic, the task must be bound to a cpu or cpus in a single node.  Otherwise, the task could be migrated to some other node at any time after launch and the resulting node will be indeterminate.  Thus, local policy is not very useful for this purpose.  Any of the other mempolicy modes may be used to specify a single node.
linux/kernel/huge_page_table.1576446796.txt.gz · Last modified: 2020/07/15 09:30 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki