mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	
	
		
			1133 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 
							 | 
						e46e7b77c9 | 
							
							
								
								mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
							
							
							
							
							
							
							
							The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations.  The
preferred zone is calculated on this basis for statistic purposes and as
a starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic
or being able to ignore watermarks then the starting point in the
zonelist iterator is no longer correct.  This patch resets the zonelist
iterator in the allocator slowpath if the context can ignore memory
policies.  This will alter the zone used for statistics but only after
it is known that it makes sense for that context.  Resetting it before
entering the slowpath would potentially allow an ALLOC_CPUSET allocation
to be accounted for against the wrong zone.  Note that while nodemask is
not explicitly set to the original nodemask, it would only have been
overwritten if cpuset_enabled() and it was reset before the slowpath was
entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
Fixes: 
							
						 | 
						
							||
| 
							 | 
						0d0bd89435 | 
							
							
								
								mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
							
							
							
							
							
							
							
							Geert Uytterhoeven reported the following problem that bisected to commit  | 
						
							||
| 
							 | 
						83b9355bf6 | 
							
							
								
								mm, page_alloc: prevent infinite loop in buffered_rmqueue()
							
							
							
							
							
							
							
							In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
buffered_rmqueue() when check_new_pcp() returns 1, because the bad page
is never removed from the pcp list.  Fix this by removing the page
before retrying.  Also we don't need to check if page is non-NULL,
because we simply grab it from the list which was just tested for being
non-empty.
Fixes: 
							
						 | 
						
							||
| 
							 | 
						f86e427197 | 
							
							
								
								mm: check the return value of lookup_page_ext for all call sites
							
							
							
							
							
							
							
							Per the discussion with Joonsoo Kim [1], we need check the return value of lookup_page_ext() for all call sites since it might return NULL in some cases, although it is unlikely, i.e. memory hotplug. Tested with ltp with "page_owner=0". [1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE [akpm@linux-foundation.org: fix build-breaking typos] [arnd@arndb.de: fix build problems from lookup_page_ext] Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						e570f56ccc | 
							
							
								
								mm: check_new_page_bad() directly returns in __PG_HWPOISON case
							
							
							
							
							
							
							
							Currently we check page->flags twice for "HWPoisoned" case of check_new_page_bad(), which can cause a race with unpoisoning. This race unnecessarily taints kernel with "BUG: Bad page state". check_new_page_bad() is the only caller of bad_page() which is interested in __PG_HWPOISON, so let's move the hwpoison related code in bad_page() to it. Link: http://lkml.kernel.org/r/20160518100949.GA17299@hori1.linux.bs1.fc.nec.co.jp Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						29b52de182 | 
							
							
								
								mm, kasan: fix to call kasan_free_pages() after poisoning page
							
							
							
							
							
							
							
							When CONFIG_PAGE_POISONING and CONFIG_KASAN is enabled,
free_pages_prepare()'s codeflow is below.
  1)kmemcheck_free_shadow()
  2)kasan_free_pages()
    - set shadow byte of page is freed
  3)kernel_poison_pages()
  3.1) check access to page is valid or not using kasan
    ---> error occur, kasan think it is invalid access
  3.2) poison page
  4)kernel_map_pages()
So kasan_free_pages() should be called after poisoning the page.
Link: http://lkml.kernel.org/r/1463220405-7455-1-git-send-email-iamyooon@gmail.com
Signed-off-by: seokhoon.yoon <iamyooon@gmail.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						4b50bcc7ed | 
							
							
								
								mm: use phys_addr_t for reserve_bootmem_region() arguments
							
							
							
							
							
							
							
							Since commit  | 
						
							||
| 
							 | 
						2a138dc7e5 | 
							
							
								
								mm: use existing helper to convert "on"/"off" to boolean
							
							
							
							
							
							
							
							It's more convenient to use existing function helper to convert string "on/off" to boolean. Link: http://lkml.kernel.org/r/1461908824-16129-1-git-send-email-mnghuan@gmail.com Signed-off-by: Minfei Huang <mnghuan@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						31e49bfda1 | 
							
							
								
								mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
							
							
							
							
							
							
							
							Joonsoo has reported that he is able to trigger OOM for !costly high order requests (heavy fork() workload close the OOM) with the new oom detection rework. This is because we rely only on should_reclaim_retry when the compaction is disabled and it only checks watermarks for the requested order and so we might trigger OOM when there is a lot of free memory. It is not very clear what are the usual workloads when the compaction is disabled. Relying on high order allocations heavily without any mechanism to create those orders except for unbound amount of reclaim is certainly not a good idea. To prevent from potential regressions let's help this configuration some. We have to sacrifice the determinsm though because there simply is none here possible. should_compact_retry implementation for !CONFIG_COMPACTION, which was empty so far, will do watermark check for order-0 on all eligible zones. This will cause retrying until either the reclaim cannot make any further progress or all the zones are depleted even for order-0 pages. This means that the number of retries is basically unbounded for !costly orders but that was the case before the rework as well so this shouldn't regress. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.org Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						86a294a81f | 
							
							
								
								mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders
							
							
							
							
							
							
							
							"mm: consider compaction feedback also for costly allocation" has
removed the upper bound for the reclaim/compaction retries based on the
number of reclaimed pages for costly orders.  While this is desirable
the patch did miss a mis interaction between reclaim, compaction and the
retry logic.  The direct reclaim tries to get zones over min watermark
while compaction backs off and returns COMPACT_SKIPPED when all zones
are below low watermark + 1<<order gap.  If we are getting really close
to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a
high order request (e.g.  hugetlb order-9) while the reclaim is not able
to release enough pages to get us over low watermark.  The reclaim is
still able to make some progress (usually trashing over few remaining
pages) so we are not able to break out from the loop.
I have seen this happening with the same test described in "mm: consider
compaction feedback also for costly allocation" on a swapless system.
The original problem got resolved by "vmscan: consider classzone_idx in
compaction_ready" but it shows how things might go wrong when we
approach the oom event horizont.
The reason why compaction requires being over low rather than min
watermark is not clear to me.  This check was there essentially since
							
						 | 
						
							||
| 
							 | 
						7854ea6c28 | 
							
							
								
								mm: consider compaction feedback also for costly allocation
							
							
							
							
							
							
							
							PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside
should_reclaim_retry currently where we decide to not retry after at
least order worth of pages were reclaimed or the watermark check for at
least one zone would succeed after reclaiming all pages if the reclaim
hasn't made any progress.  Compaction feedback is mostly ignored and we
just try to make sure that the compaction did at least something before
giving up.
The first condition was added by 
							
						 | 
						
							||
| 
							 | 
						33c2d21438 | 
							
							
								
								mm, oom: protect !costly allocations some more
							
							
							
							
							
							
							
							should_reclaim_retry will give up retries for higher order allocations if none of the eligible zones has any requested or higher order pages available even if we pass the watermak check for order-0. This is done because there is no guarantee that the reclaimable and currently free pages will form the required order. This can, however, lead to situations where the high-order request (e.g. order-2 required for the stack allocation during fork) will trigger OOM too early - e.g. after the first reclaim/compaction round. Such a system would have to be highly fragmented and there is no guarantee further reclaim/compaction attempts would help but at least make sure that the compaction was active before we go OOM and keep retrying even if should_reclaim_retry tells us to oom if - the last compaction round backed off or - we haven't completed at least MAX_COMPACT_RETRIES active compaction rounds. The first rule ensures that the very last attempt for compaction was not ignored while the second guarantees that the compaction has done some work. Multiple retries might be needed to prevent occasional pigggy backing of other contexts to steal the compacted pages before the current context manages to retry to allocate them. compaction_failed() is taken as a final word from the compaction that the retry doesn't make much sense. We have to be careful though because the first compaction round is MIGRATE_ASYNC which is rather weak as it ignores pages under writeback and gives up too easily in other situations. We therefore have to make sure that MIGRATE_SYNC_LIGHT mode has been used before we give up. With this logic in place we do not have to increase the migration mode unconditionally and rather do it only if the compaction failed for the weaker mode. A nice side effect is that the stronger migration mode is used only when really needed so this has a potential of smaller latencies in some cases. Please note that the compaction doesn't tell us much about how successful it was when returning compaction_made_progress so we just have to blindly trust that another retry is worthwhile and cap the number to something reasonable to guarantee a convergence. If the given number of successful retries is not sufficient for a reasonable workloads we should focus on the collected compaction tracepoints data and try to address the issue in the compaction code. If this is not feasible we can increase the retries limit. [mhocko@suse.com: fix warning] Link: http://lkml.kernel.org/r/20160512061636.GA4200@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						ede3771373 | 
							
							
								
								mm: throttle on IO only when there are too many dirty and writeback pages
							
							
							
							
							
							
							
							wait_iff_congested has been used to throttle allocator before it retried another round of direct reclaim to allow the writeback to make some progress and prevent reclaim from looping over dirty/writeback pages without making any progress. We used to do congestion_wait before commit  | 
						
							||
| 
							 | 
						0a0337e0d1 | 
							
							
								
								mm, oom: rework oom detection
							
							
							
							
							
							
							
							__alloc_pages_slowpath has traditionally relied on the direct reclaim and did_some_progress as an indicator that it makes sense to retry allocation rather than declaring OOM. shrink_zones had to rely on zone_reclaimable if shrink_zone didn't make any progress to prevent from a premature OOM killer invocation - the LRU might be full of dirty or writeback pages and direct reclaim cannot clean those up. zone_reclaimable allows to rescan the reclaimable lists several times and restart if a page is freed. This is really subtle behavior and it might lead to a livelock when a single freed page keeps allocator looping but the current task will not be able to allocate that single page. OOM killer would be more appropriate than looping without any progress for unbounded amount of time. This patch changes OOM detection logic and pulls it out from shrink_zone which is too low to be appropriate for any high level decisions such as OOM which is per zonelist property. It is __alloc_pages_slowpath which knows how many attempts have been done and what was the progress so far therefore it is more appropriate to implement this logic. The new heuristic is implemented in should_reclaim_retry helper called from __alloc_pages_slowpath. It tries to be more deterministic and easier to follow. It builds on an assumption that retrying makes sense only if the currently reclaimable memory + free pages would allow the current allocation request to succeed (as per __zone_watermark_ok) at least for one zone in the usable zonelist. This alone wouldn't be sufficient, though, because the writeback might get stuck and reclaimable pages might be pinned for a really long time or even depend on the current allocation context. Therefore there is a backoff mechanism implemented which reduces the reclaim target after each reclaim round without any progress. This means that we should eventually converge to only NR_FREE_PAGES as the target and fail on the wmark check and proceed to OOM. The backoff is simple and linear with 1/16 of the reclaimable pages for each round without any progress. We are optimistic and reset counter for successful reclaim rounds. Costly high order pages mostly preserve their semantic and those without __GFP_REPEAT fail right away while those which have the flag set will back off after the amount of reclaimable pages reaches equivalent of the requested order. The only difference is that if there was no progress during the reclaim we rely on zone watermark check. This is more logical thing to do than previous 1<<order attempts which were a result of zone_reclaimable faking the progress. [vdavydov@virtuozzo.com: check classzone_idx for shrink_zone] [hannes@cmpxchg.org: separate the heuristic into should_reclaim_retry] [rientjes@google.com: use zone_page_state_snapshot for NR_FREE_PAGES] [rientjes@google.com: shrink_zones doesn't need to return anything] Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						c5d01d0d18 | 
							
							
								
								mm, compaction: simplify __alloc_pages_direct_compact feedback interface
							
							
							
							
							
							
							
							__alloc_pages_direct_compact communicates potential back off by two variables: - deferred_compaction tells that the compaction returned COMPACT_DEFERRED - contended_compaction is set when there is a contention on zone->lock resp. zone->lru_lock locks __alloc_pages_slowpath then backs of for THP allocation requests to prevent from long stalls. This is rather messy and it would be much cleaner to return a single compact result value and hide all the nasty details into __alloc_pages_direct_compact. This patch shouldn't introduce any functional changes. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						ea7ab982b6 | 
							
							
								
								mm, compaction: change COMPACT_ constants into enum
							
							
							
							
							
							
							
							Compaction code is doing weird dances between COMPACT_FOO -> int -> unsigned long But there doesn't seem to be any reason for that. All functions which return/use one of those constants are not expecting any other value so it really makes sense to define an enum for them and make it clear that no other values are expected. This is a pure cleanup and shouldn't introduce any functional changes. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						59dc76b0d4 | 
							
							
								
								mm: vmscan: reduce size of inactive file list
							
							
							
							
							
							
							
							The inactive file list should still be large enough to contain readahead windows and freshly written file data, but it no longer is the only source for detecting multiple accesses to file pages. The workingset refault measurement code causes recently evicted file pages that get accessed again after a shorter interval to be promoted directly to the active list. With that mechanism in place, we can afford to (on a larger system) dedicate more memory to the active file list, so we can actually cache more of the frequently used file pages in memory, and not have them pushed out by streaming writes, once-used streaming file reads, etc. This can help things like database workloads, where only half the page cache can currently be used to cache the database working set. This patch automatically increases that fraction on larger systems, using the same ratio that has already been used for anonymous memory. [hannes@cmpxchg.org: cgroup-awareness] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						4741526b83 | 
							
							
								
								mm, page_alloc: restore the original nodemask if the fast path allocation failed
							
							
							
							
							
							
							
							The page allocator fast path uses either the requested nodemask or cpuset_current_mems_allowed if cpusets are enabled. If the allocation context allows watermarks to be ignored then it can also ignore memory policies. However, on entering the allocator slowpath the nodemask may still be cpuset_current_mems_allowed and the policies are enforced. This patch resets the nodemask appropriately before entering the slowpath. Link: http://lkml.kernel.org/r/20160504143628.GU2858@techsingularity.net Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						4e6118016e | 
							
							
								
								mm, page_alloc: uninline the bad page part of check_new_page()
							
							
							
							
							
							
							
							Bad pages should be rare so the code handling them doesn't need to be inline for performance reasons. Put it to separate function which returns void. This also assumes that the initial page_expected_state() result will match the result of the thorough check, i.e. the page doesn't become "good" in the meanwhile. This matches the same expectations already in place in free_pages_check(). !DEBUG_VM bloat-o-meter: add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140) function old new delta check_new_page_bad - 134 +134 get_page_from_freelist 3468 3194 -274 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						e2769dbdc5 | 
							
							
								
								mm, page_alloc: don't duplicate code in free_pcp_prepare
							
							
							
							
							
							
							
							The new free_pcp_prepare() function shares a lot of code with free_pages_prepare(), which makes this a maintenance risk when some future patch modifies only one of them. We should be able to achieve the same effect (skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to free_pages_prepare() and making it inline, so the checks (and the order != 0 parts) are eliminated from the call from free_pcp_prepare(). !DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining free_pages_prepare() and the elimination seems to work as expected DEBUG_VM bloat-o-meter: add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257) function old new delta __free_pages_ok 297 1060 +763 free_hot_cold_page 480 752 +272 free_pages_prepare 778 - -778 Here inlining didn't occur before, and added some code, but it's ok for a debug option. [akpm@linux-foundation.org: fix build] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						479f854a20 | 
							
							
								
								mm, page_alloc: defer debugging checks of pages allocated from the PCP
							
							
							
							
							
							
							
							Every page allocated checks a number of page fields for validity.  This
catches corruption bugs of pages that are already freed but it is
expensive.  This patch weakens the debugging check by checking PCP pages
only when the PCP lists are being refilled.  All compound pages are
checked.  This potentially avoids debugging checks entirely if the PCP
lists are never emptied and refilled so some corruption issues may be
missed.  Full checking requires DEBUG_VM.
With the two deferred debugging patches applied, the impact to a page
allocator microbenchmark is
                                             4.6.0-rc3                  4.6.0-rc3
                                           inline-v3r6            deferalloc-v3r7
  Min      alloc-odr0-1               344.00 (  0.00%)           317.00 (  7.85%)
  Min      alloc-odr0-2               248.00 (  0.00%)           231.00 (  6.85%)
  Min      alloc-odr0-4               209.00 (  0.00%)           192.00 (  8.13%)
  Min      alloc-odr0-8               181.00 (  0.00%)           166.00 (  8.29%)
  Min      alloc-odr0-16              168.00 (  0.00%)           154.00 (  8.33%)
  Min      alloc-odr0-32              161.00 (  0.00%)           148.00 (  8.07%)
  Min      alloc-odr0-64              158.00 (  0.00%)           145.00 (  8.23%)
  Min      alloc-odr0-128             156.00 (  0.00%)           143.00 (  8.33%)
  Min      alloc-odr0-256             168.00 (  0.00%)           154.00 (  8.33%)
  Min      alloc-odr0-512             178.00 (  0.00%)           167.00 (  6.18%)
  Min      alloc-odr0-1024            186.00 (  0.00%)           174.00 (  6.45%)
  Min      alloc-odr0-2048            192.00 (  0.00%)           180.00 (  6.25%)
  Min      alloc-odr0-4096            198.00 (  0.00%)           184.00 (  7.07%)
  Min      alloc-odr0-8192            200.00 (  0.00%)           188.00 (  6.00%)
  Min      alloc-odr0-16384           201.00 (  0.00%)           188.00 (  6.47%)
  Min      free-odr0-1                189.00 (  0.00%)           180.00 (  4.76%)
  Min      free-odr0-2                132.00 (  0.00%)           126.00 (  4.55%)
  Min      free-odr0-4                104.00 (  0.00%)            99.00 (  4.81%)
  Min      free-odr0-8                 90.00 (  0.00%)            85.00 (  5.56%)
  Min      free-odr0-16                84.00 (  0.00%)            80.00 (  4.76%)
  Min      free-odr0-32                80.00 (  0.00%)            76.00 (  5.00%)
  Min      free-odr0-64                78.00 (  0.00%)            74.00 (  5.13%)
  Min      free-odr0-128               77.00 (  0.00%)            73.00 (  5.19%)
  Min      free-odr0-256               94.00 (  0.00%)            91.00 (  3.19%)
  Min      free-odr0-512              108.00 (  0.00%)           112.00 ( -3.70%)
  Min      free-odr0-1024             115.00 (  0.00%)           118.00 ( -2.61%)
  Min      free-odr0-2048             120.00 (  0.00%)           125.00 ( -4.17%)
  Min      free-odr0-4096             123.00 (  0.00%)           129.00 ( -4.88%)
  Min      free-odr0-8192             126.00 (  0.00%)           130.00 ( -3.17%)
  Min      free-odr0-16384            126.00 (  0.00%)           131.00 ( -3.97%)
Note that the free paths for large numbers of pages is impacted as the
debugging cost gets shifted into that path when the page data is no
longer necessarily cache-hot.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						4db7548ccb | 
							
							
								
								mm, page_alloc: defer debugging checks of freed pages until a PCP drain
							
							
							
							
							
							
							
							Every page free checks a number of page fields for validity. This catches premature frees and corruptions but it is also expensive. This patch weakens the debugging check by checking PCP pages at the time they are drained from the PCP list. This will trigger the bug but the site that freed the corrupt page will be lost. To get the full context, a kernel rebuild with DEBUG_VM is necessary. [akpm@linux-foundation.org: fix build] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						002f290627 | 
							
							
								
								cpuset: use static key better and convert to new API
							
							
							
							
							
							
							
							An important function for cpusets is cpuset_node_allowed(), which
optimizes on the fact if there's a single root CPU set, it must be
trivially allowed.  But the check "nr_cpusets() <= 1" doesn't use the
cpusets_enabled_key static key the right way where static keys eliminate
branching overhead with jump labels.
This patch converts it so that static key is used properly.  It's also
switched to the new static key API and the checking functions are
converted to return bool instead of int.  We also provide a new variant
__cpuset_zone_allowed() which expects that the static key check was
already done and they key was enabled.  This is needed for
get_page_from_freelist() where we want to also avoid the relatively
slower check when ALLOC_CPUSET is not set in alloc_flags.
The impact on the page allocator microbenchmark is less than expected
but the cleanup in itself is worthwhile.
                                             4.6.0-rc2                  4.6.0-rc2
                                       multcheck-v1r20               cpuset-v1r20
  Min      alloc-odr0-1               348.00 (  0.00%)           348.00 (  0.00%)
  Min      alloc-odr0-2               254.00 (  0.00%)           254.00 (  0.00%)
  Min      alloc-odr0-4               213.00 (  0.00%)           213.00 (  0.00%)
  Min      alloc-odr0-8               186.00 (  0.00%)           183.00 (  1.61%)
  Min      alloc-odr0-16              173.00 (  0.00%)           171.00 (  1.16%)
  Min      alloc-odr0-32              166.00 (  0.00%)           163.00 (  1.81%)
  Min      alloc-odr0-64              162.00 (  0.00%)           159.00 (  1.85%)
  Min      alloc-odr0-128             160.00 (  0.00%)           157.00 (  1.88%)
  Min      alloc-odr0-256             169.00 (  0.00%)           166.00 (  1.78%)
  Min      alloc-odr0-512             180.00 (  0.00%)           180.00 (  0.00%)
  Min      alloc-odr0-1024            188.00 (  0.00%)           187.00 (  0.53%)
  Min      alloc-odr0-2048            194.00 (  0.00%)           193.00 (  0.52%)
  Min      alloc-odr0-4096            199.00 (  0.00%)           198.00 (  0.50%)
  Min      alloc-odr0-8192            202.00 (  0.00%)           201.00 (  0.50%)
  Min      alloc-odr0-16384           203.00 (  0.00%)           202.00 (  0.49%)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						0b423ca22f | 
							
							
								
								mm, page_alloc: inline pageblock lookup in page free fast paths
							
							
							
							
							
							
							
							The function call overhead of get_pfnblock_flags_mask() is measurable in the page free paths. This patch uses an inlined version that is faster. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						e5b31ac2ca | 
							
							
								
								mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
							
							
							
							
							
							
							
							The original count is never reused so it can be removed. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						da838d4fcb | 
							
							
								
								mm, page_alloc: pull out side effects from free_pages_check
							
							
							
							
							
							
							
							Check without side-effects should be easier to maintain. It also removes the duplicated cpupid and flags reset done in !DEBUG_VM variant of both free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables the next patch. It shouldn't result in new branches, thanks to inlining of the check. !DEBUG_VM bloat-o-meter: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27) function old new delta __free_pages_ok 748 739 -9 free_pcppages_bulk 1403 1385 -18 DEBUG_VM: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28) function old new delta free_pages_prepare 806 778 -28 This is also slightly faster because cpupid information is not set on tail pages so we can avoid resets there. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						bb552ac6c6 | 
							
							
								
								mm, page_alloc: un-inline the bad part of free_pages_check
							
							
							
							
							
							
							
							From: Vlastimil Babka <vbabka@suse.cz> !DEBUG_VM size and bloat-o-meter: add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246) function old new delta free_pages_check_bad - 124 +124 free_pcppages_bulk 1288 1171 -117 __free_pages_ok 948 695 -253 DEBUG_VM: add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90) function old new delta free_pages_check_bad - 124 +124 free_pages_prepare 1112 898 -214 [akpm@linux-foundation.org: fix whitespace] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						7bfec6f47b | 
							
							
								
								mm, page_alloc: check multiple page fields with a single branch
							
							
							
							
							
							
							
							Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later.  A bad page could be due to a number
of fields.  Instead of using multiple branches, this patch combines
multiple fields into a single branch.  A detailed check is only
necessary if that check fails.
                                             4.6.0-rc2                  4.6.0-rc2
                                        initonce-v1r20            multcheck-v1r20
  Min      alloc-odr0-1               359.00 (  0.00%)           348.00 (  3.06%)
  Min      alloc-odr0-2               260.00 (  0.00%)           254.00 (  2.31%)
  Min      alloc-odr0-4               214.00 (  0.00%)           213.00 (  0.47%)
  Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
  Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
  Min      alloc-odr0-32              165.00 (  0.00%)           166.00 ( -0.61%)
  Min      alloc-odr0-64              162.00 (  0.00%)           162.00 (  0.00%)
  Min      alloc-odr0-128             161.00 (  0.00%)           160.00 (  0.62%)
  Min      alloc-odr0-256             170.00 (  0.00%)           169.00 (  0.59%)
  Min      alloc-odr0-512             181.00 (  0.00%)           180.00 (  0.55%)
  Min      alloc-odr0-1024            190.00 (  0.00%)           188.00 (  1.05%)
  Min      alloc-odr0-2048            196.00 (  0.00%)           194.00 (  1.02%)
  Min      alloc-odr0-4096            202.00 (  0.00%)           199.00 (  1.49%)
  Min      alloc-odr0-8192            205.00 (  0.00%)           202.00 (  1.46%)
  Min      alloc-odr0-16384           205.00 (  0.00%)           203.00 (  0.98%)
Again, the benefit is marginal but avoiding excessive branches is
important.  Ideally the paths would not have to check these conditions
at all but regrettably abandoning the tests would make use-after-free
bugs much harder to detect.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						93ea9964d1 | 
							
							
								
								mm, page_alloc: remove field from alloc_context
							
							
							
							
							
							
							
							The classzone_idx can be inferred from preferred_zoneref so remove the unnecessary field and save stack space. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						c33d6c06f6 | 
							
							
								
								mm, page_alloc: avoid looking up the first zone in a zonelist twice
							
							
							
							
							
							
							
							The allocator fast path looks up the first usable zone in a zonelist and
then get_page_from_freelist does the same job in the zonelist iterator.
This patch preserves the necessary information.
                                             4.6.0-rc2                  4.6.0-rc2
                                        fastmark-v1r20             initonce-v1r20
  Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
  Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
  Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
  Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
  Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
  Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
  Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
  Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
  Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
  Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
  Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
  Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
  Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
  Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
  Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)
The benefit is negligible and the results are within the noise but each
cycle counts.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						48ee5f3696 | 
							
							
								
								mm, page_alloc: shortcut watermark checks for order-0 pages
							
							
							
							
							
							
							
							Watermarks have to be checked on every allocation including the number
of pages being allocated and whether reserves can be accessed.  The
reserves only matter if memory is limited and the free_pages adjustment
only applies to high-order pages.  This patch adds a shortcut for
order-0 pages that avoids numerous calculations if there is plenty of
free memory yielding the following performance difference in a page
allocator microbenchmark;
                                             4.6.0-rc2                  4.6.0-rc2
                                         optfair-v1r20             fastmark-v1r20
  Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
  Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
  Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
  Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
  Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
  Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
  Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
  Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
  Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
  Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
  Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
  Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
  Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
  Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
  Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						305347550b | 
							
							
								
								mm, page_alloc: reduce cost of fair zone allocation policy retry
							
							
							
							
							
							
							
							The fair zone allocation policy is not without cost but it can be
reduced slightly.  This patch removes an unnecessary local variable,
checks the likely conditions of the fair zone policy first, uses a bool
instead of a flags check and falls through when a remote node is
encountered instead of doing a full restart.  The benefit is marginal
but it's there
                                             4.6.0-rc2                  4.6.0-rc2
                                         decstat-v1r20              optfair-v1r20
  Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
  Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
  Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
  Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
  Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
  Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
  Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
  Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
  Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
  Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
  Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
  Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
  Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
  Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
  Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)
The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not
triggered by this particular test so the benefit for some corner cases
is understated.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						4fcb097117 | 
							
							
								
								mm, page_alloc: shorten the page allocator fast path
							
							
							
							
							
							
							
							The page allocator fast path checks page multiple times unnecessarily. This patch avoids all the slowpath checks if the first allocation attempt succeeds. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						3777999dd4 | 
							
							
								
								mm, page_alloc: check once if a zone has isolated pageblocks
							
							
							
							
							
							
							
							When bulk freeing pages from the per-cpu lists the zone is checked for isolated pageblocks on every release. This patch checks it once per drain. [mgorman@techsingularity.net: fix locking radce, per Vlastimil] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						83d4ca8148 | 
							
							
								
								mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
							
							
							
							
							
							
							
							__GFP_HARDWALL only has meaning in the context of cpusets but the fast
path always applies the flag on the first attempt.  Move the
manipulations into the cpuset paths where they will be masked by a
static branch in the common case.
With the other micro-optimisations in this series combined, the impact
on a page allocator microbenchmark is
                                             4.6.0-rc2                  4.6.0-rc2
                                         decstat-v1r20                micro-v1r20
  Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
  Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
  Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
  Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
  Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
  Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
  Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
  Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
  Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
  Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
  Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
  Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
  Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
  Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
  Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						5bb1b16975 | 
							
							
								
								mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
							
							
							
							
							
							
							
							page is guaranteed to be set before it is read with or without the initialisation. [akpm@linux-foundation.org: fix warning] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						be06af002f | 
							
							
								
								mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist
							
							
							
							
							
							
							
							Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						4dfa6cd8fd | 
							
							
								
								mm, page_alloc: remove unnecessary local variable in get_page_from_freelist
							
							
							
							
							
							
							
							zonelist here is a copy of a struct field that is used once. Ditch it. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						fa379b9586 | 
							
							
								
								mm, page_alloc: convert nr_fair_skipped to bool
							
							
							
							
							
							
							
							The number of zones skipped to a zone expiring its fair zone allocation quota is irrelevant. Convert to bool. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						c603844bdc | 
							
							
								
								mm, page_alloc: convert alloc_flags to unsigned
							
							
							
							
							
							
							
							alloc_flags is a bitmask of flags but it is signed which does not necessarily generate the best code depending on the compiler. Even without an impact, it makes more sense that this be unsigned. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						f75fb889d1 | 
							
							
								
								mm, page_alloc: avoid unnecessary zone lookups during pageblock operations
							
							
							
							
							
							
							
							Pageblocks have an associated bitmap to store migrate types and whether the pageblock should be skipped during compaction. The bitmap may be associated with a memory section or a zone but the zone is looked up unconditionally. The compiler should optimise this away automatically so this is a cosmetic patch only in many cases. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						754078eb45 | 
							
							
								
								mm, page_alloc: use __dec_zone_state for order-0 page allocation
							
							
							
							
							
							
							
							__dec_zone_state is cheaper to use for removing an order-0 page as it
has fewer conditions to check.
The performance difference on a page allocator microbenchmark is;
                                             4.6.0-rc2                  4.6.0-rc2
                                         optiter-v1r20              decstat-v1r20
  Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
  Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
  Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
  Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
  Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
  Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
  Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
  Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
  Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
  Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
  Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
  Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
  Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
  Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
  Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						682a3385e7 | 
							
							
								
								mm, page_alloc: inline the fast path of the zonelist iterator
							
							
							
							
							
							
							
							The page allocator iterates through a zonelist for zones that match the
addressing limitations and nodemask of the caller but many allocations
will not be restricted.  Despite this, there is always functional call
overhead which builds up.
This patch inlines the optimistic basic case and only calls the iterator
function for the complex case.  A hindrance was the fact that
cpuset_current_mems_allowed is used in the fastpath as the allowed
nodemask even though all nodes are allowed on most systems.  The patch
handles this by only considering cpuset_current_mems_allowed if a cpuset
exists.  As well as being faster in the fast-path, this removes some
junk in the slowpath.
The performance difference on a page allocator microbenchmark is;
                                             4.6.0-rc2                  4.6.0-rc2
                                      statinline-v1r20              optiter-v1r20
  Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
  Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
  Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
  Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
  Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
  Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
  Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
  Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
  Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
  Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
  Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
  Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
  Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
  Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
  Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)
perf indicated that next_zones_zonelist disappeared in the profile and
__next_zones_zonelist did not appear.  This is expected as the
micro-benchmark would hit the inlined fast-path every time.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						060e74173f | 
							
							
								
								mm, page_alloc: inline zone_statistics
							
							
							
							
							
							
							
							zone_statistics has one call-site but it's a public function.  Make it
static and inline.
The performance difference on a page allocator microbenchmark is;
                                             4.6.0-rc2                  4.6.0-rc2
                                      statbranch-v1r20           statinline-v1r20
  Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
  Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
  Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
  Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
  Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
  Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
  Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
  Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
  Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
  Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
  Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
  Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
  Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
  Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
  Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						175145748d | 
							
							
								
								mm, page_alloc: use new PageAnonHead helper in the free page fast path
							
							
							
							
							
							
							
							The PageAnon check always checks for compound_head but this is a
relatively expensive check if the caller already knows the page is a
head page.  This patch creates a helper and uses it in the page free
path which only operates on head pages.
With this patch and "Only check PageCompound for high-order pages", the
performance difference on a page allocator microbenchmark is;
                                             4.6.0-rc2                  4.6.0-rc2
                                               vanilla           nocompound-v1r20
  Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
  Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
  Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
  Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
  Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
  Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
  Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
  Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
  Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
  Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
  Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
  Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
  Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
  Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
  Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
  Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
  Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
  Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
  Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
  Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
  Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
  Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
  Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
  Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
  Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
  Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
  Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
  Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
  Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
  Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)
There is a sizable boost to the free allocator performance.  While there
is an apparent boost on the allocation side, it's likely a co-incidence
or due to the patches slightly reducing cache footprint.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						d61f859039 | 
							
							
								
								mm, page_alloc: only check PageCompound for high-order pages
							
							
							
							
							
							
							
							Another year, another round of page allocator optimisations focusing
this time on the alloc and free fast paths.  This should be of help to
workloads that are allocator-intensive from kernel space where the cost
of zeroing is not nceessraily incurred.
The series is motivated by the observation that page alloc
microbenchmarks on multiple machines regressed between 3.12.44 and 4.4.
Second, there is discussions before LSF/MM considering the possibility
of adding another page allocator which is potentially hazardous but a
patch series improving performance is better than whining.
After the series is applied, there are still hazards.  In the free
paths, the debugging checking and page zone/pageblock lookups dominate
but there was not an obvious solution to that.  In the alloc path, the
major contributers are dealing with zonelists, new page preperation, the
fair zone allocation and numerous statistic updates.  The fair zone
allocator is removed by the per-node LRU series if that gets merged so
it's nor a major concern at the moment.
On normal userspace benchmarks, there is little impact as the zeroing
cost is significant but it's visible
  aim9
                                 4.6.0-rc3             4.6.0-rc3
                                   vanilla         deferalloc-v3
  Min      page_test   828693.33 (  0.00%)   887060.00 (  7.04%)
  Min      brk_test   4847266.67 (  0.00%)  4966266.67 (  2.45%)
  Min      exec_test     1271.00 (  0.00%)     1275.67 (  0.37%)
  Min      fork_test    12371.75 (  0.00%)    12380.00 (  0.07%)
The overall impact on a page allocator microbenchmark for a range of orders
and number of pages allocated in a batch is
                                            4.6.0-rc3                  4.6.0-rc3
                                               vanilla            deferalloc-v3r7
  Min      alloc-odr0-1               428.00 (  0.00%)           316.00 ( 26.17%)
  Min      alloc-odr0-2               314.00 (  0.00%)           231.00 ( 26.43%)
  Min      alloc-odr0-4               256.00 (  0.00%)           192.00 ( 25.00%)
  Min      alloc-odr0-8               222.00 (  0.00%)           166.00 ( 25.23%)
  Min      alloc-odr0-16              207.00 (  0.00%)           154.00 ( 25.60%)
  Min      alloc-odr0-32              197.00 (  0.00%)           148.00 ( 24.87%)
  Min      alloc-odr0-64              193.00 (  0.00%)           144.00 ( 25.39%)
  Min      alloc-odr0-128             191.00 (  0.00%)           143.00 ( 25.13%)
  Min      alloc-odr0-256             203.00 (  0.00%)           153.00 ( 24.63%)
  Min      alloc-odr0-512             212.00 (  0.00%)           165.00 ( 22.17%)
  Min      alloc-odr0-1024            221.00 (  0.00%)           172.00 ( 22.17%)
  Min      alloc-odr0-2048            225.00 (  0.00%)           179.00 ( 20.44%)
  Min      alloc-odr0-4096            232.00 (  0.00%)           185.00 ( 20.26%)
  Min      alloc-odr0-8192            235.00 (  0.00%)           187.00 ( 20.43%)
  Min      alloc-odr0-16384           236.00 (  0.00%)           188.00 ( 20.34%)
  Min      alloc-odr1-1               519.00 (  0.00%)           450.00 ( 13.29%)
  Min      alloc-odr1-2               391.00 (  0.00%)           336.00 ( 14.07%)
  Min      alloc-odr1-4               313.00 (  0.00%)           268.00 ( 14.38%)
  Min      alloc-odr1-8               277.00 (  0.00%)           235.00 ( 15.16%)
  Min      alloc-odr1-16              256.00 (  0.00%)           218.00 ( 14.84%)
  Min      alloc-odr1-32              252.00 (  0.00%)           212.00 ( 15.87%)
  Min      alloc-odr1-64              244.00 (  0.00%)           206.00 ( 15.57%)
  Min      alloc-odr1-128             244.00 (  0.00%)           207.00 ( 15.16%)
  Min      alloc-odr1-256             243.00 (  0.00%)           207.00 ( 14.81%)
  Min      alloc-odr1-512             245.00 (  0.00%)           209.00 ( 14.69%)
  Min      alloc-odr1-1024            248.00 (  0.00%)           214.00 ( 13.71%)
  Min      alloc-odr1-2048            253.00 (  0.00%)           220.00 ( 13.04%)
  Min      alloc-odr1-4096            258.00 (  0.00%)           224.00 ( 13.18%)
  Min      alloc-odr1-8192            261.00 (  0.00%)           229.00 ( 12.26%)
  Min      alloc-odr2-1               560.00 (  0.00%)           753.00 (-34.46%)
  Min      alloc-odr2-2               424.00 (  0.00%)           351.00 ( 17.22%)
  Min      alloc-odr2-4               339.00 (  0.00%)           393.00 (-15.93%)
  Min      alloc-odr2-8               298.00 (  0.00%)           246.00 ( 17.45%)
  Min      alloc-odr2-16              276.00 (  0.00%)           227.00 ( 17.75%)
  Min      alloc-odr2-32              271.00 (  0.00%)           221.00 ( 18.45%)
  Min      alloc-odr2-64              264.00 (  0.00%)           217.00 ( 17.80%)
  Min      alloc-odr2-128             264.00 (  0.00%)           217.00 ( 17.80%)
  Min      alloc-odr2-256             264.00 (  0.00%)           218.00 ( 17.42%)
  Min      alloc-odr2-512             269.00 (  0.00%)           223.00 ( 17.10%)
  Min      alloc-odr2-1024            279.00 (  0.00%)           230.00 ( 17.56%)
  Min      alloc-odr2-2048            283.00 (  0.00%)           235.00 ( 16.96%)
  Min      alloc-odr2-4096            285.00 (  0.00%)           239.00 ( 16.14%)
  Min      alloc-odr3-1               629.00 (  0.00%)           505.00 ( 19.71%)
  Min      alloc-odr3-2               472.00 (  0.00%)           374.00 ( 20.76%)
  Min      alloc-odr3-4               383.00 (  0.00%)           301.00 ( 21.41%)
  Min      alloc-odr3-8               341.00 (  0.00%)           266.00 ( 21.99%)
  Min      alloc-odr3-16              316.00 (  0.00%)           248.00 ( 21.52%)
  Min      alloc-odr3-32              308.00 (  0.00%)           241.00 ( 21.75%)
  Min      alloc-odr3-64              305.00 (  0.00%)           241.00 ( 20.98%)
  Min      alloc-odr3-128             308.00 (  0.00%)           244.00 ( 20.78%)
  Min      alloc-odr3-256             317.00 (  0.00%)           249.00 ( 21.45%)
  Min      alloc-odr3-512             327.00 (  0.00%)           256.00 ( 21.71%)
  Min      alloc-odr3-1024            331.00 (  0.00%)           261.00 ( 21.15%)
  Min      alloc-odr3-2048            333.00 (  0.00%)           266.00 ( 20.12%)
  Min      alloc-odr4-1               767.00 (  0.00%)           572.00 ( 25.42%)
  Min      alloc-odr4-2               578.00 (  0.00%)           429.00 ( 25.78%)
  Min      alloc-odr4-4               474.00 (  0.00%)           346.00 ( 27.00%)
  Min      alloc-odr4-8               422.00 (  0.00%)           310.00 ( 26.54%)
  Min      alloc-odr4-16              399.00 (  0.00%)           295.00 ( 26.07%)
  Min      alloc-odr4-32              392.00 (  0.00%)           293.00 ( 25.26%)
  Min      alloc-odr4-64              394.00 (  0.00%)           293.00 ( 25.63%)
  Min      alloc-odr4-128             405.00 (  0.00%)           305.00 ( 24.69%)
  Min      alloc-odr4-256             417.00 (  0.00%)           319.00 ( 23.50%)
  Min      alloc-odr4-512             425.00 (  0.00%)           326.00 ( 23.29%)
  Min      alloc-odr4-1024            426.00 (  0.00%)           329.00 ( 22.77%)
  Min      free-odr0-1                216.00 (  0.00%)           178.00 ( 17.59%)
  Min      free-odr0-2                152.00 (  0.00%)           125.00 ( 17.76%)
  Min      free-odr0-4                120.00 (  0.00%)            99.00 ( 17.50%)
  Min      free-odr0-8                106.00 (  0.00%)            85.00 ( 19.81%)
  Min      free-odr0-16                97.00 (  0.00%)            80.00 ( 17.53%)
  Min      free-odr0-32                92.00 (  0.00%)            76.00 ( 17.39%)
  Min      free-odr0-64                89.00 (  0.00%)            74.00 ( 16.85%)
  Min      free-odr0-128               89.00 (  0.00%)            73.00 ( 17.98%)
  Min      free-odr0-256              107.00 (  0.00%)            90.00 ( 15.89%)
  Min      free-odr0-512              117.00 (  0.00%)           108.00 (  7.69%)
  Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
  Min      free-odr0-2048             132.00 (  0.00%)           125.00 (  5.30%)
  Min      free-odr0-4096             135.00 (  0.00%)           130.00 (  3.70%)
  Min      free-odr0-8192             137.00 (  0.00%)           130.00 (  5.11%)
  Min      free-odr0-16384            137.00 (  0.00%)           131.00 (  4.38%)
  Min      free-odr1-1                318.00 (  0.00%)           289.00 (  9.12%)
  Min      free-odr1-2                228.00 (  0.00%)           207.00 (  9.21%)
  Min      free-odr1-4                182.00 (  0.00%)           165.00 (  9.34%)
  Min      free-odr1-8                163.00 (  0.00%)           146.00 ( 10.43%)
  Min      free-odr1-16               151.00 (  0.00%)           135.00 ( 10.60%)
  Min      free-odr1-32               146.00 (  0.00%)           129.00 ( 11.64%)
  Min      free-odr1-64               145.00 (  0.00%)           130.00 ( 10.34%)
  Min      free-odr1-128              148.00 (  0.00%)           134.00 (  9.46%)
  Min      free-odr1-256              148.00 (  0.00%)           137.00 (  7.43%)
  Min      free-odr1-512              151.00 (  0.00%)           140.00 (  7.28%)
  Min      free-odr1-1024             154.00 (  0.00%)           143.00 (  7.14%)
  Min      free-odr1-2048             156.00 (  0.00%)           144.00 (  7.69%)
  Min      free-odr1-4096             156.00 (  0.00%)           142.00 (  8.97%)
  Min      free-odr1-8192             156.00 (  0.00%)           140.00 ( 10.26%)
  Min      free-odr2-1                361.00 (  0.00%)           457.00 (-26.59%)
  Min      free-odr2-2                258.00 (  0.00%)           224.00 ( 13.18%)
  Min      free-odr2-4                208.00 (  0.00%)           223.00 ( -7.21%)
  Min      free-odr2-8                185.00 (  0.00%)           160.00 ( 13.51%)
  Min      free-odr2-16               173.00 (  0.00%)           149.00 ( 13.87%)
  Min      free-odr2-32               166.00 (  0.00%)           145.00 ( 12.65%)
  Min      free-odr2-64               166.00 (  0.00%)           146.00 ( 12.05%)
  Min      free-odr2-128              169.00 (  0.00%)           148.00 ( 12.43%)
  Min      free-odr2-256              170.00 (  0.00%)           152.00 ( 10.59%)
  Min      free-odr2-512              177.00 (  0.00%)           156.00 ( 11.86%)
  Min      free-odr2-1024             182.00 (  0.00%)           162.00 ( 10.99%)
  Min      free-odr2-2048             181.00 (  0.00%)           160.00 ( 11.60%)
  Min      free-odr2-4096             180.00 (  0.00%)           159.00 ( 11.67%)
  Min      free-odr3-1                431.00 (  0.00%)           367.00 ( 14.85%)
  Min      free-odr3-2                306.00 (  0.00%)           259.00 ( 15.36%)
  Min      free-odr3-4                249.00 (  0.00%)           208.00 ( 16.47%)
  Min      free-odr3-8                224.00 (  0.00%)           186.00 ( 16.96%)
  Min      free-odr3-16               208.00 (  0.00%)           176.00 ( 15.38%)
  Min      free-odr3-32               206.00 (  0.00%)           174.00 ( 15.53%)
  Min      free-odr3-64               210.00 (  0.00%)           178.00 ( 15.24%)
  Min      free-odr3-128              215.00 (  0.00%)           182.00 ( 15.35%)
  Min      free-odr3-256              224.00 (  0.00%)           189.00 ( 15.62%)
  Min      free-odr3-512              232.00 (  0.00%)           195.00 ( 15.95%)
  Min      free-odr3-1024             230.00 (  0.00%)           195.00 ( 15.22%)
  Min      free-odr3-2048             229.00 (  0.00%)           193.00 ( 15.72%)
  Min      free-odr4-1                561.00 (  0.00%)           439.00 ( 21.75%)
  Min      free-odr4-2                418.00 (  0.00%)           318.00 ( 23.92%)
  Min      free-odr4-4                339.00 (  0.00%)           269.00 ( 20.65%)
  Min      free-odr4-8                299.00 (  0.00%)           239.00 ( 20.07%)
  Min      free-odr4-16               289.00 (  0.00%)           234.00 ( 19.03%)
  Min      free-odr4-32               291.00 (  0.00%)           235.00 ( 19.24%)
  Min      free-odr4-64               298.00 (  0.00%)           238.00 ( 20.13%)
  Min      free-odr4-128              308.00 (  0.00%)           251.00 ( 18.51%)
  Min      free-odr4-256              321.00 (  0.00%)           267.00 ( 16.82%)
  Min      free-odr4-512              327.00 (  0.00%)           269.00 ( 17.74%)
  Min      free-odr4-1024             326.00 (  0.00%)           271.00 ( 16.87%)
  Min      total-odr0-1               644.00 (  0.00%)           494.00 ( 23.29%)
  Min      total-odr0-2               466.00 (  0.00%)           356.00 ( 23.61%)
  Min      total-odr0-4               376.00 (  0.00%)           291.00 ( 22.61%)
  Min      total-odr0-8               328.00 (  0.00%)           251.00 ( 23.48%)
  Min      total-odr0-16              304.00 (  0.00%)           234.00 ( 23.03%)
  Min      total-odr0-32              289.00 (  0.00%)           224.00 ( 22.49%)
  Min      total-odr0-64              282.00 (  0.00%)           218.00 ( 22.70%)
  Min      total-odr0-128             280.00 (  0.00%)           216.00 ( 22.86%)
  Min      total-odr0-256             310.00 (  0.00%)           243.00 ( 21.61%)
  Min      total-odr0-512             329.00 (  0.00%)           273.00 ( 17.02%)
  Min      total-odr0-1024            346.00 (  0.00%)           290.00 ( 16.18%)
  Min      total-odr0-2048            357.00 (  0.00%)           304.00 ( 14.85%)
  Min      total-odr0-4096            367.00 (  0.00%)           315.00 ( 14.17%)
  Min      total-odr0-8192            372.00 (  0.00%)           317.00 ( 14.78%)
  Min      total-odr0-16384           373.00 (  0.00%)           319.00 ( 14.48%)
  Min      total-odr1-1               838.00 (  0.00%)           739.00 ( 11.81%)
  Min      total-odr1-2               619.00 (  0.00%)           543.00 ( 12.28%)
  Min      total-odr1-4               495.00 (  0.00%)           433.00 ( 12.53%)
  Min      total-odr1-8               440.00 (  0.00%)           382.00 ( 13.18%)
  Min      total-odr1-16              407.00 (  0.00%)           353.00 ( 13.27%)
  Min      total-odr1-32              398.00 (  0.00%)           341.00 ( 14.32%)
  Min      total-odr1-64              389.00 (  0.00%)           336.00 ( 13.62%)
  Min      total-odr1-128             392.00 (  0.00%)           341.00 ( 13.01%)
  Min      total-odr1-256             391.00 (  0.00%)           344.00 ( 12.02%)
  Min      total-odr1-512             396.00 (  0.00%)           349.00 ( 11.87%)
  Min      total-odr1-1024            402.00 (  0.00%)           357.00 ( 11.19%)
  Min      total-odr1-2048            409.00 (  0.00%)           364.00 ( 11.00%)
  Min      total-odr1-4096            414.00 (  0.00%)           366.00 ( 11.59%)
  Min      total-odr1-8192            417.00 (  0.00%)           369.00 ( 11.51%)
  Min      total-odr2-1               921.00 (  0.00%)          1210.00 (-31.38%)
  Min      total-odr2-2               682.00 (  0.00%)           576.00 ( 15.54%)
  Min      total-odr2-4               547.00 (  0.00%)           616.00 (-12.61%)
  Min      total-odr2-8               483.00 (  0.00%)           406.00 ( 15.94%)
  Min      total-odr2-16              449.00 (  0.00%)           376.00 ( 16.26%)
  Min      total-odr2-32              437.00 (  0.00%)           366.00 ( 16.25%)
  Min      total-odr2-64              431.00 (  0.00%)           363.00 ( 15.78%)
  Min      total-odr2-128             433.00 (  0.00%)           365.00 ( 15.70%)
  Min      total-odr2-256             434.00 (  0.00%)           371.00 ( 14.52%)
  Min      total-odr2-512             446.00 (  0.00%)           379.00 ( 15.02%)
  Min      total-odr2-1024            461.00 (  0.00%)           392.00 ( 14.97%)
  Min      total-odr2-2048            464.00 (  0.00%)           395.00 ( 14.87%)
  Min      total-odr2-4096            465.00 (  0.00%)           398.00 ( 14.41%)
  Min      total-odr3-1              1060.00 (  0.00%)           872.00 ( 17.74%)
  Min      total-odr3-2               778.00 (  0.00%)           633.00 ( 18.64%)
  Min      total-odr3-4               632.00 (  0.00%)           510.00 ( 19.30%)
  Min      total-odr3-8               565.00 (  0.00%)           452.00 ( 20.00%)
  Min      total-odr3-16              524.00 (  0.00%)           424.00 ( 19.08%)
  Min      total-odr3-32              514.00 (  0.00%)           415.00 ( 19.26%)
  Min      total-odr3-64              515.00 (  0.00%)           419.00 ( 18.64%)
  Min      total-odr3-128             523.00 (  0.00%)           426.00 ( 18.55%)
  Min      total-odr3-256             541.00 (  0.00%)           438.00 ( 19.04%)
  Min      total-odr3-512             559.00 (  0.00%)           451.00 ( 19.32%)
  Min      total-odr3-1024            561.00 (  0.00%)           456.00 ( 18.72%)
  Min      total-odr3-2048            562.00 (  0.00%)           459.00 ( 18.33%)
  Min      total-odr4-1              1328.00 (  0.00%)          1011.00 ( 23.87%)
  Min      total-odr4-2               997.00 (  0.00%)           747.00 ( 25.08%)
  Min      total-odr4-4               813.00 (  0.00%)           615.00 ( 24.35%)
  Min      total-odr4-8               721.00 (  0.00%)           550.00 ( 23.72%)
  Min      total-odr4-16              689.00 (  0.00%)           529.00 ( 23.22%)
  Min      total-odr4-32              683.00 (  0.00%)           528.00 ( 22.69%)
  Min      total-odr4-64              692.00 (  0.00%)           531.00 ( 23.27%)
  Min      total-odr4-128             713.00 (  0.00%)           556.00 ( 22.02%)
  Min      total-odr4-256             738.00 (  0.00%)           586.00 ( 20.60%)
  Min      total-odr4-512             753.00 (  0.00%)           595.00 ( 20.98%)
  Min      total-odr4-1024            752.00 (  0.00%)           600.00 ( 20.21%)
This patch (of 27):
order-0 pages by definition cannot be compound so avoid the check in the
fast path for those pages.
[akpm@linux-foundation.org: use unlikely(order) in free_pages_prepare(), per Vlastimil]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						3da88fb3ba | 
							
							
								
								mm, oom: move GFP_NOFS check to out_of_memory
							
							
							
							
							
							
							
							__alloc_pages_may_oom is the central place to decide when the
out_of_memory should be invoked.  This is a good approach for most
checks there because they are page allocator specific and the allocation
fails right after for all of them.
The notable exception is GFP_NOFS context which is faking
did_some_progress and keep the page allocator looping even though there
couldn't have been any progress from the OOM killer.  This patch doesn't
change this behavior because we are not ready to allow those allocation
requests to fail yet (and maybe we will face the reality that we will
never manage to safely fail these request).  Instead __GFP_FS check is
moved down to out_of_memory and prevent from OOM victim selection there.
There are two reasons for that
	- OOM notifiers might release some memory even from this context
	  as none of the registered notifier seems to be FS related
	- this might help a dying thread to get an access to memory
          reserves and move on which will make the behavior more
          consistent with the case when the task gets killed from a
          different context.
Keep a comment in __alloc_pages_may_oom to make sure we do not forget
how GFP_NOFS is special and that we really want to do something about
it.
Note to the current oom_notifier users:
The observable difference for you is that oom notifiers cannot depend on
any fs locks because we could deadlock.  Not that this would be allowed
today because that would just lockup machine in most of the cases and
ruling out the OOM killer along the way.  Another difference is that
callbacks might be invoked sooner now because GFP_NOFS is a weaker
reclaim context and so there could be reclaimable memory which is just
not reachable now.  That would require GFP_NOFS only loads which are
really rare and more importantly the observable result would be dropping
of reconstructible object and potential performance drop which is not
such a big deal when we are struggling to fulfill other important
allocation requests.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
							
						 | 
						
							||
| 
							 | 
						fc2bd799c7 | 
							
							
								
								mm/page_alloc: correct highmem memory statistics
							
							
							
							
							
							
							
							ZONE_MOVABLE could be treated as highmem so we need to consider it for accurate statistics. And, in following patches, ZONE_CMA will be introduced and it can be treated as highmem, too. So, instead of manually adding stat of ZONE_MOVABLE, looping all zones and check whether the zone is highmem or not and add stat of the zone which can be treated as highmem. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						ba6b0979e3 | 
							
							
								
								power: add zone range overlapping check
							
							
							
							
							
							
							
							There is a system thats node's pfns are overlapped as follows: -----pfn--------> N0 N1 N2 N0 N1 N2 Therefore, we need to care this overlapping when iterating pfn range. mark_free_pages() iterates requested zone's pfn range and unset all range's bitmap first. And then it marks freepages in a zone to the bitmap. If there is an overlapping zone, above unset could clear previous marked bit and reference to this bitmap in the future will cause the problem. To prevent it, this patch adds a zone check in mark_free_pages(). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  | 
						
							||
| 
							 | 
						b9eb63191a | 
							
							
								
								mm/memory_hotplug: add comment to some functions related to memory hotplug
							
							
							
							
							
							
							
							__offline_isolated_pages() and test_pages_isolated() are used by memory hotplug. These functions require that range is in a single zone but there is no code to do this because memory hotplug checks it before calling these functions. To avoid confusing future user of these functions, this patch adds comments to them. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>  |