forked from mirrors/linux
		
	Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
This commit is contained in:
		
						commit
						7c225c69f8
					
				
					 250 changed files with 2278 additions and 4086 deletions
				
			
		|  | @ -1864,13 +1864,6 @@ | |||
| 			Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, | ||||
| 			the default is off. | ||||
| 
 | ||||
| 	kmemcheck=	[X86] Boot-time kmemcheck enable/disable/one-shot mode | ||||
| 			Valid arguments: 0, 1, 2 | ||||
| 			kmemcheck=0 (disabled) | ||||
| 			kmemcheck=1 (enabled) | ||||
| 			kmemcheck=2 (one-shot mode) | ||||
| 			Default: 2 (one-shot mode) | ||||
| 
 | ||||
| 	kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. | ||||
| 			Default is 0 (don't ignore, but inject #GP) | ||||
| 
 | ||||
|  |  | |||
|  | @ -21,7 +21,6 @@ whole; patches welcome! | |||
|    kasan | ||||
|    ubsan | ||||
|    kmemleak | ||||
|    kmemcheck | ||||
|    gdb-kernel-debugging | ||||
|    kgdb | ||||
|    kselftest | ||||
|  |  | |||
|  | @ -1,733 +0,0 @@ | |||
| Getting started with kmemcheck | ||||
| ============================== | ||||
| 
 | ||||
| Vegard Nossum <vegardno@ifi.uio.no> | ||||
| 
 | ||||
| 
 | ||||
| Introduction | ||||
| ------------ | ||||
| 
 | ||||
| kmemcheck is a debugging feature for the Linux Kernel. More specifically, it | ||||
| is a dynamic checker that detects and warns about some uses of uninitialized | ||||
| memory. | ||||
| 
 | ||||
| Userspace programmers might be familiar with Valgrind's memcheck. The main | ||||
| difference between memcheck and kmemcheck is that memcheck works for userspace | ||||
| programs only, and kmemcheck works for the kernel only. The implementations | ||||
| are of course vastly different. Because of this, kmemcheck is not as accurate | ||||
| as memcheck, but it turns out to be good enough in practice to discover real | ||||
| programmer errors that the compiler is not able to find through static | ||||
| analysis. | ||||
| 
 | ||||
| Enabling kmemcheck on a kernel will probably slow it down to the extent that | ||||
| the machine will not be usable for normal workloads such as e.g. an | ||||
| interactive desktop. kmemcheck will also cause the kernel to use about twice | ||||
| as much memory as normal. For this reason, kmemcheck is strictly a debugging | ||||
| feature. | ||||
| 
 | ||||
| 
 | ||||
| Downloading | ||||
| ----------- | ||||
| 
 | ||||
| As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel. | ||||
| 
 | ||||
| 
 | ||||
| Configuring and compiling | ||||
| ------------------------- | ||||
| 
 | ||||
| kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of | ||||
| configuration variables must have specific settings in order for the kmemcheck | ||||
| menu to even appear in "menuconfig". These are: | ||||
| 
 | ||||
| - ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n`` | ||||
| 	This option is located under "General setup" / "Optimize for size". | ||||
| 
 | ||||
| 	Without this, gcc will use certain optimizations that usually lead to | ||||
| 	false positive warnings from kmemcheck. An example of this is a 16-bit | ||||
| 	field in a struct, where gcc may load 32 bits, then discard the upper | ||||
| 	16 bits. kmemcheck sees only the 32-bit load, and may trigger a | ||||
| 	warning for the upper 16 bits (if they're uninitialized). | ||||
| 
 | ||||
| - ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y`` | ||||
| 	This option is located under "General setup" / "Choose SLAB | ||||
| 	allocator". | ||||
| 
 | ||||
| - ``CONFIG_FUNCTION_TRACER=n`` | ||||
| 	This option is located under "Kernel hacking" / "Tracers" / "Kernel | ||||
| 	Function Tracer" | ||||
| 
 | ||||
| 	When function tracing is compiled in, gcc emits a call to another | ||||
| 	function at the beginning of every function. This means that when the | ||||
| 	page fault handler is called, the ftrace framework will be called | ||||
| 	before kmemcheck has had a chance to handle the fault. If ftrace then | ||||
| 	modifies memory that was tracked by kmemcheck, the result is an | ||||
| 	endless recursive page fault. | ||||
| 
 | ||||
| - ``CONFIG_DEBUG_PAGEALLOC=n`` | ||||
| 	This option is located under "Kernel hacking" / "Memory Debugging" | ||||
| 	/ "Debug page memory allocations". | ||||
| 
 | ||||
| In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also | ||||
| located under "Kernel hacking". With this, you will be able to get line number | ||||
| information from the kmemcheck warnings, which is extremely valuable in | ||||
| debugging a problem. This option is not mandatory, however, because it slows | ||||
| down the compilation process and produces a much bigger kernel image. | ||||
| 
 | ||||
| Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory | ||||
| Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows | ||||
| a description of the kmemcheck configuration variables: | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK`` | ||||
| 	This must be enabled in order to use kmemcheck at all... | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK_``[``DISABLED`` | ``ENABLED`` | ``ONESHOT``]``_BY_DEFAULT`` | ||||
| 	This option controls the status of kmemcheck at boot-time. "Enabled" | ||||
| 	will enable kmemcheck right from the start, "disabled" will boot the | ||||
| 	kernel as normal (but with the kmemcheck code compiled in, so it can | ||||
| 	be enabled at run-time after the kernel has booted), and "one-shot" is | ||||
| 	a special mode which will turn kmemcheck off automatically after | ||||
| 	detecting the first use of uninitialized memory. | ||||
| 
 | ||||
| 	If you are using kmemcheck to actively debug a problem, then you | ||||
| 	probably want to choose "enabled" here. | ||||
| 
 | ||||
| 	The one-shot mode is mostly useful in automated test setups because it | ||||
| 	can prevent floods of warnings and increase the chances of the machine | ||||
| 	surviving in case something is really wrong. In other cases, the one- | ||||
| 	shot mode could actually be counter-productive because it would turn | ||||
| 	itself off at the very first error -- in the case of a false positive | ||||
| 	too -- and this would come in the way of debugging the specific | ||||
| 	problem you were interested in. | ||||
| 
 | ||||
| 	If you would like to use your kernel as normal, but with a chance to | ||||
| 	enable kmemcheck in case of some problem, it might be a good idea to | ||||
| 	choose "disabled" here. When kmemcheck is disabled, most of the run- | ||||
| 	time overhead is not incurred, and the kernel will be almost as fast | ||||
| 	as normal. | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK_QUEUE_SIZE`` | ||||
| 	Select the maximum number of error reports to store in an internal | ||||
| 	(fixed-size) buffer. Since errors can occur virtually anywhere and in | ||||
| 	any context, we need a temporary storage area which is guaranteed not | ||||
| 	to generate any other page faults when accessed. The queue will be | ||||
| 	emptied as soon as a tasklet may be scheduled. If the queue is full, | ||||
| 	new error reports will be lost. | ||||
| 
 | ||||
| 	The default value of 64 is probably fine. If some code produces more | ||||
| 	than 64 errors within an irqs-off section, then the code is likely to | ||||
| 	produce many, many more, too, and these additional reports seldom give | ||||
| 	any more information (the first report is usually the most valuable | ||||
| 	anyway). | ||||
| 
 | ||||
| 	This number might have to be adjusted if you are not using serial | ||||
| 	console or similar to capture the kernel log. If you are using the | ||||
| 	"dmesg" command to save the log, then getting a lot of kmemcheck | ||||
| 	warnings might overflow the kernel log itself, and the earlier reports | ||||
| 	will get lost in that way instead. Try setting this to 10 or so on | ||||
| 	such a setup. | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT`` | ||||
| 	Select the number of shadow bytes to save along with each entry of the | ||||
| 	error-report queue. These bytes indicate what parts of an allocation | ||||
| 	are initialized, uninitialized, etc. and will be displayed when an | ||||
| 	error is detected to help the debugging of a particular problem. | ||||
| 
 | ||||
| 	The number entered here is actually the logarithm of the number of | ||||
| 	bytes that will be saved. So if you pick for example 5 here, kmemcheck | ||||
| 	will save 2^5 = 32 bytes. | ||||
| 
 | ||||
| 	The default value should be fine for debugging most problems. It also | ||||
| 	fits nicely within 80 columns. | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK_PARTIAL_OK`` | ||||
| 	This option (when enabled) works around certain GCC optimizations that | ||||
| 	produce 32-bit reads from 16-bit variables where the upper 16 bits are | ||||
| 	thrown away afterwards. | ||||
| 
 | ||||
| 	The default value (enabled) is recommended. This may of course hide | ||||
| 	some real errors, but disabling it would probably produce a lot of | ||||
| 	false positives. | ||||
| 
 | ||||
| - ``CONFIG_KMEMCHECK_BITOPS_OK`` | ||||
| 	This option silences warnings that would be generated for bit-field | ||||
| 	accesses where not all the bits are initialized at the same time. This | ||||
| 	may also hide some real bugs. | ||||
| 
 | ||||
| 	This option is probably obsolete, or it should be replaced with | ||||
| 	the kmemcheck-/bitfield-annotations for the code in question. The | ||||
| 	default value is therefore fine. | ||||
| 
 | ||||
| Now compile the kernel as usual. | ||||
| 
 | ||||
| 
 | ||||
| How to use | ||||
| ---------- | ||||
| 
 | ||||
| Booting | ||||
| ~~~~~~~ | ||||
| 
 | ||||
| First some information about the command-line options. There is only one | ||||
| option specific to kmemcheck, and this is called "kmemcheck". It can be used | ||||
| to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT`` | ||||
| option. Its possible settings are: | ||||
| 
 | ||||
| - ``kmemcheck=0`` (disabled) | ||||
| - ``kmemcheck=1`` (enabled) | ||||
| - ``kmemcheck=2`` (one-shot mode) | ||||
| 
 | ||||
| If SLUB debugging has been enabled in the kernel, it may take precedence over | ||||
| kmemcheck in such a way that the slab caches which are under SLUB debugging | ||||
| will not be tracked by kmemcheck. In order to ensure that this doesn't happen | ||||
| (even though it shouldn't by default), use SLUB's boot option ``slub_debug``, | ||||
| like this: ``slub_debug=-`` | ||||
| 
 | ||||
| In fact, this option may also be used for fine-grained control over SLUB vs. | ||||
| kmemcheck. For example, if the command line includes | ||||
| ``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only | ||||
| for the "dentry" slab cache, and with kmemcheck tracking all the other | ||||
| caches. This is advanced usage, however, and is not generally recommended. | ||||
| 
 | ||||
| 
 | ||||
| Run-time enable/disable | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| 
 | ||||
| When the kernel has booted, it is possible to enable or disable kmemcheck at | ||||
| run-time. WARNING: This feature is still experimental and may cause false | ||||
| positive warnings to appear. Therefore, try not to use this. If you find that | ||||
| it doesn't work properly (e.g. you see an unreasonable amount of warnings), I | ||||
| will be happy to take bug reports. | ||||
| 
 | ||||
| Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.:: | ||||
| 
 | ||||
| 	$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck | ||||
| 
 | ||||
| The numbers are the same as for the ``kmemcheck=`` command-line option. | ||||
| 
 | ||||
| 
 | ||||
| Debugging | ||||
| ~~~~~~~~~ | ||||
| 
 | ||||
| A typical report will look something like this:: | ||||
| 
 | ||||
|     WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) | ||||
|     80000000000000000000000000000000000000000088ffff0000000000000000 | ||||
|      i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u | ||||
|              ^ | ||||
| 
 | ||||
|     Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A | ||||
|     RIP: 0010:[<ffffffff8104ede8>]  [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 | ||||
|     RSP: 0018:ffff88003cdf7d98  EFLAGS: 00210002 | ||||
|     RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 | ||||
|     RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84 | ||||
|     RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000 | ||||
|     R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e | ||||
|     R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8 | ||||
|     FS:  0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000 | ||||
|     CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033 | ||||
|     CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0 | ||||
|     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | ||||
|     DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 | ||||
|      [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 | ||||
|      [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 | ||||
|      [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 | ||||
|      [<ffffffff8100c7b5>] int_signal+0x12/0x17 | ||||
|      [<ffffffffffffffff>] 0xffffffffffffffff | ||||
| 
 | ||||
| The single most valuable information in this report is the RIP (or EIP on 32- | ||||
| bit) value. This will help us pinpoint exactly which instruction that caused | ||||
| the warning. | ||||
| 
 | ||||
| If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do | ||||
| is give this address to the addr2line program, like this:: | ||||
| 
 | ||||
| 	$ addr2line -e vmlinux -i ffffffff8104ede8 | ||||
| 	arch/x86/include/asm/string_64.h:12 | ||||
| 	include/asm-generic/siginfo.h:287 | ||||
| 	kernel/signal.c:380 | ||||
| 	kernel/signal.c:410 | ||||
| 
 | ||||
| The "``-e vmlinux``" tells addr2line which file to look in. **IMPORTANT:** | ||||
| This must be the vmlinux of the kernel that produced the warning in the | ||||
| first place! If not, the line number information will almost certainly be | ||||
| wrong. | ||||
| 
 | ||||
| The "``-i``" tells addr2line to also print the line numbers of inlined | ||||
| functions.  In this case, the flag was very important, because otherwise, | ||||
| it would only have printed the first line, which is just a call to | ||||
| ``memcpy()``, which could be called from a thousand places in the kernel, and | ||||
| is therefore not very useful.  These inlined functions would not show up in | ||||
| the stack trace above, simply because the kernel doesn't load the extra | ||||
| debugging information. This technique can of course be used with ordinary | ||||
| kernel oopses as well. | ||||
| 
 | ||||
| In this case, it's the caller of ``memcpy()`` that is interesting, and it can be | ||||
| found in ``include/asm-generic/siginfo.h``, line 287:: | ||||
| 
 | ||||
|     281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from) | ||||
|     282 { | ||||
|     283         if (from->si_code < 0) | ||||
|     284                 memcpy(to, from, sizeof(*to)); | ||||
|     285         else | ||||
|     286                 /* _sigchld is currently the largest know union member */ | ||||
|     287                 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld)); | ||||
|     288 } | ||||
| 
 | ||||
| Since this was a read (kmemcheck usually warns about reads only, though it can | ||||
| warn about writes to unallocated or freed memory as well), it was probably the | ||||
| "from" argument which contained some uninitialized bytes. Following the chain | ||||
| of calls, we move upwards to see where "from" was allocated or initialized, | ||||
| ``kernel/signal.c``, line 380:: | ||||
| 
 | ||||
|     359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info) | ||||
|     360 { | ||||
|     ... | ||||
|     367         list_for_each_entry(q, &list->list, list) { | ||||
|     368                 if (q->info.si_signo == sig) { | ||||
|     369                         if (first) | ||||
|     370                                 goto still_pending; | ||||
|     371                         first = q; | ||||
|     ... | ||||
|     377         if (first) { | ||||
|     378 still_pending: | ||||
|     379                 list_del_init(&first->list); | ||||
|     380                 copy_siginfo(info, &first->info); | ||||
|     381                 __sigqueue_free(first); | ||||
|     ... | ||||
|     392         } | ||||
|     393 } | ||||
| 
 | ||||
| Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The | ||||
| variable ``first`` was found on a list -- passed in as the second argument to | ||||
| ``collect_signal()``. We  continue our journey through the stack, to figure out | ||||
| where the item on "list" was allocated or initialized. We move to line 410:: | ||||
| 
 | ||||
|     395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, | ||||
|     396                         siginfo_t *info) | ||||
|     397 { | ||||
|     ... | ||||
|     410                 collect_signal(sig, pending, info); | ||||
|     ... | ||||
|     414 } | ||||
| 
 | ||||
| Now we need to follow the ``pending`` pointer, since that is being passed on to | ||||
| ``collect_signal()`` as ``list``. At this point, we've run out of lines from the | ||||
| "addr2line" output. Not to worry, we just paste the next addresses from the | ||||
| kmemcheck stack dump, i.e.:: | ||||
| 
 | ||||
|      [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 | ||||
|      [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 | ||||
|      [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 | ||||
|      [<ffffffff8100c7b5>] int_signal+0x12/0x17 | ||||
| 
 | ||||
| 	$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \ | ||||
| 		ffffffff8100b87d ffffffff8100c7b5 | ||||
| 	kernel/signal.c:446 | ||||
| 	kernel/signal.c:1806 | ||||
| 	arch/x86/kernel/signal.c:805 | ||||
| 	arch/x86/kernel/signal.c:871 | ||||
| 	arch/x86/kernel/entry_64.S:694 | ||||
| 
 | ||||
| Remember that since these addresses were found on the stack and not as the | ||||
| RIP value, they actually point to the _next_ instruction (they are return | ||||
| addresses). This becomes obvious when we look at the code for line 446:: | ||||
| 
 | ||||
|     422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) | ||||
|     423 { | ||||
|     ... | ||||
|     431                 signr = __dequeue_signal(&tsk->signal->shared_pending, | ||||
|     432						 mask, info); | ||||
|     433			/* | ||||
|     434			 * itimer signal ? | ||||
|     435			 * | ||||
|     436			 * itimers are process shared and we restart periodic | ||||
|     437			 * itimers in the signal delivery path to prevent DoS | ||||
|     438			 * attacks in the high resolution timer case. This is | ||||
|     439			 * compliant with the old way of self restarting | ||||
|     440			 * itimers, as the SIGALRM is a legacy signal and only | ||||
|     441			 * queued once. Changing the restart behaviour to | ||||
|     442			 * restart the timer in the signal dequeue path is | ||||
|     443			 * reducing the timer noise on heavy loaded !highres | ||||
|     444			 * systems too. | ||||
|     445			 */ | ||||
|     446			if (unlikely(signr == SIGALRM)) { | ||||
|     ... | ||||
|     489 } | ||||
| 
 | ||||
| So instead of looking at 446, we should be looking at 431, which is the line | ||||
| that executes just before 446. Here we see that what we are looking for is | ||||
| ``&tsk->signal->shared_pending``. | ||||
| 
 | ||||
| Our next task is now to figure out which function that puts items on this | ||||
| ``shared_pending`` list. A crude, but efficient tool, is ``git grep``:: | ||||
| 
 | ||||
| 	$ git grep -n 'shared_pending' kernel/ | ||||
| 	... | ||||
| 	kernel/signal.c:828:	pending = group ? &t->signal->shared_pending : &t->pending; | ||||
| 	kernel/signal.c:1339:	pending = group ? &t->signal->shared_pending : &t->pending; | ||||
| 	... | ||||
| 
 | ||||
| There were more results, but none of them were related to list operations, | ||||
| and these were the only assignments. We inspect the line numbers more closely | ||||
| and find that this is indeed where items are being added to the list:: | ||||
| 
 | ||||
|     816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, | ||||
|     817				int group) | ||||
|     818 { | ||||
|     ... | ||||
|     828		pending = group ? &t->signal->shared_pending : &t->pending; | ||||
|     ... | ||||
|     851		q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && | ||||
|     852						     (is_si_special(info) || | ||||
|     853						      info->si_code >= 0))); | ||||
|     854		if (q) { | ||||
|     855			list_add_tail(&q->list, &pending->list); | ||||
|     ... | ||||
|     890 } | ||||
| 
 | ||||
| and:: | ||||
| 
 | ||||
|     1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) | ||||
|     1310 { | ||||
|     .... | ||||
|     1339	 pending = group ? &t->signal->shared_pending : &t->pending; | ||||
|     1340	 list_add_tail(&q->list, &pending->list); | ||||
|     .... | ||||
|     1347 } | ||||
| 
 | ||||
| In the first case, the list element we are looking for, ``q``, is being | ||||
| returned from the function ``__sigqueue_alloc()``, which looks like an | ||||
| allocation function.  Let's take a look at it:: | ||||
| 
 | ||||
|     187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags, | ||||
|     188						 int override_rlimit) | ||||
|     189 { | ||||
|     190		struct sigqueue *q = NULL; | ||||
|     191		struct user_struct *user; | ||||
|     192 | ||||
|     193		/* | ||||
|     194		 * We won't get problems with the target's UID changing under us | ||||
|     195		 * because changing it requires RCU be used, and if t != current, the | ||||
|     196		 * caller must be holding the RCU readlock (by way of a spinlock) and | ||||
|     197		 * we use RCU protection here | ||||
|     198		 */ | ||||
|     199		user = get_uid(__task_cred(t)->user); | ||||
|     200		atomic_inc(&user->sigpending); | ||||
|     201		if (override_rlimit || | ||||
|     202		    atomic_read(&user->sigpending) <= | ||||
|     203				t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) | ||||
|     204			q = kmem_cache_alloc(sigqueue_cachep, flags); | ||||
|     205		if (unlikely(q == NULL)) { | ||||
|     206			atomic_dec(&user->sigpending); | ||||
|     207			free_uid(user); | ||||
|     208		} else { | ||||
|     209			INIT_LIST_HEAD(&q->list); | ||||
|     210			q->flags = 0; | ||||
|     211			q->user = user; | ||||
|     212		} | ||||
|     213 | ||||
|     214		return q; | ||||
|     215 } | ||||
| 
 | ||||
| We see that this function initializes ``q->list``, ``q->flags``, and | ||||
| ``q->user``. It seems that now is the time to look at the definition of | ||||
| ``struct sigqueue``, e.g.:: | ||||
| 
 | ||||
|     14 struct sigqueue { | ||||
|     15	       struct list_head list; | ||||
|     16	       int flags; | ||||
|     17	       siginfo_t info; | ||||
|     18	       struct user_struct *user; | ||||
|     19 }; | ||||
| 
 | ||||
| And, you might remember, it was a ``memcpy()`` on ``&first->info`` that | ||||
| caused the warning, so this makes perfect sense. It also seems reasonable | ||||
| to assume that it is the caller of ``__sigqueue_alloc()`` that has the | ||||
| responsibility of filling out (initializing) this member. | ||||
| 
 | ||||
| But just which fields of the struct were uninitialized? Let's look at | ||||
| kmemcheck's report again:: | ||||
| 
 | ||||
|     WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) | ||||
|     80000000000000000000000000000000000000000088ffff0000000000000000 | ||||
|      i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u | ||||
| 	     ^ | ||||
| 
 | ||||
| These first two lines are the memory dump of the memory object itself, and | ||||
| the shadow bytemap, respectively. The memory object itself is in this case | ||||
| ``&first->info``. Just beware that the start of this dump is NOT the start | ||||
| of the object itself! The position of the caret (^) corresponds with the | ||||
| address of the read (ffff88003e4a2024). | ||||
| 
 | ||||
| The shadow bytemap dump legend is as follows: | ||||
| 
 | ||||
| - i: initialized | ||||
| - u: uninitialized | ||||
| - a: unallocated (memory has been allocated by the slab layer, but has not | ||||
|   yet been handed off to anybody) | ||||
| - f: freed (memory has been allocated by the slab layer, but has been freed | ||||
|   by the previous owner) | ||||
| 
 | ||||
| In order to figure out where (relative to the start of the object) the | ||||
| uninitialized memory was located, we have to look at the disassembly. For | ||||
| that, we'll need the RIP address again:: | ||||
| 
 | ||||
|     RIP: 0010:[<ffffffff8104ede8>]  [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 | ||||
| 
 | ||||
| 	$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8: | ||||
| 	ffffffff8104edc8:	mov    %r8,0x8(%r8) | ||||
| 	ffffffff8104edcc:	test   %r10d,%r10d | ||||
| 	ffffffff8104edcf:	js     ffffffff8104ee88 <__dequeue_signal+0x168> | ||||
| 	ffffffff8104edd5:	mov    %rax,%rdx | ||||
| 	ffffffff8104edd8:	mov    $0xc,%ecx | ||||
| 	ffffffff8104eddd:	mov    %r13,%rdi | ||||
| 	ffffffff8104ede0:	mov    $0x30,%eax | ||||
| 	ffffffff8104ede5:	mov    %rdx,%rsi | ||||
| 	ffffffff8104ede8:	rep movsl %ds:(%rsi),%es:(%rdi) | ||||
| 	ffffffff8104edea:	test   $0x2,%al | ||||
| 	ffffffff8104edec:	je     ffffffff8104edf0 <__dequeue_signal+0xd0> | ||||
| 	ffffffff8104edee:	movsw  %ds:(%rsi),%es:(%rdi) | ||||
| 	ffffffff8104edf0:	test   $0x1,%al | ||||
| 	ffffffff8104edf2:	je     ffffffff8104edf5 <__dequeue_signal+0xd5> | ||||
| 	ffffffff8104edf4:	movsb  %ds:(%rsi),%es:(%rdi) | ||||
| 	ffffffff8104edf5:	mov    %r8,%rdi | ||||
| 	ffffffff8104edf8:	callq  ffffffff8104de60 <__sigqueue_free> | ||||
| 
 | ||||
| As expected, it's the "``rep movsl``" instruction from the ``memcpy()`` | ||||
| that causes the warning. We know about ``REP MOVSL`` that it uses the register | ||||
| ``RCX`` to count the number of remaining iterations. By taking a look at the | ||||
| register dump again (from the kmemcheck report), we can figure out how many | ||||
| bytes were left to copy:: | ||||
| 
 | ||||
|     RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 | ||||
| 
 | ||||
| By looking at the disassembly, we also see that ``%ecx`` is being loaded | ||||
| with the value ``$0xc`` just before (ffffffff8104edd8), so we are very | ||||
| lucky. Keep in mind that this is the number of iterations, not bytes. And | ||||
| since this is a "long" operation, we need to multiply by 4 to get the | ||||
| number of bytes. So this means that the uninitialized value was encountered | ||||
| at 4 * (0xc - 0x9) = 12 bytes from the start of the object. | ||||
| 
 | ||||
| We can now try to figure out which field of the "``struct siginfo``" that | ||||
| was not initialized. This is the beginning of the struct:: | ||||
| 
 | ||||
|     40 typedef struct siginfo { | ||||
|     41	       int si_signo; | ||||
|     42	       int si_errno; | ||||
|     43	       int si_code; | ||||
|     44 | ||||
|     45	       union { | ||||
|     .. | ||||
|     92	       } _sifields; | ||||
|     93 } siginfo_t; | ||||
| 
 | ||||
| On 64-bit, the int is 4 bytes long, so it must the union member that has | ||||
| not been initialized. We can verify this using gdb:: | ||||
| 
 | ||||
| 	$ gdb vmlinux | ||||
| 	... | ||||
| 	(gdb) p &((struct siginfo *) 0)->_sifields | ||||
| 	$1 = (union {...} *) 0x10 | ||||
| 
 | ||||
| Actually, it seems that the union member is located at offset 0x10 -- which | ||||
| means that gcc has inserted 4 bytes of padding between the members ``si_code`` | ||||
| and ``_sifields``. We can now get a fuller picture of the memory dump:: | ||||
| 
 | ||||
| 		 _----------------------------=> si_code | ||||
| 		/	 _--------------------=> (padding) | ||||
| 	       |	/	 _------------=> _sifields(._kill._pid) | ||||
| 	       |       |	/	 _----=> _sifields(._kill._uid) | ||||
| 	       |       |       |	/ | ||||
| 	-------|-------|-------|-------| | ||||
| 	80000000000000000000000000000000000000000088ffff0000000000000000 | ||||
| 	 i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u | ||||
| 
 | ||||
| This allows us to realize another important fact: ``si_code`` contains the | ||||
| value 0x80. Remember that x86 is little endian, so the first 4 bytes | ||||
| "80000000" are really the number 0x00000080. With a bit of research, we | ||||
| find that this is actually the constant ``SI_KERNEL`` defined in | ||||
| ``include/asm-generic/siginfo.h``:: | ||||
| 
 | ||||
|     144 #define SI_KERNEL	0x80		/* sent by the kernel from somewhere	 */ | ||||
| 
 | ||||
| This macro is used in exactly one place in the x86 kernel: In ``send_signal()`` | ||||
| in ``kernel/signal.c``:: | ||||
| 
 | ||||
|     816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, | ||||
|     817				int group) | ||||
|     818 { | ||||
|     ... | ||||
|     828		pending = group ? &t->signal->shared_pending : &t->pending; | ||||
|     ... | ||||
|     851		q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && | ||||
|     852						     (is_si_special(info) || | ||||
|     853						      info->si_code >= 0))); | ||||
|     854		if (q) { | ||||
|     855			list_add_tail(&q->list, &pending->list); | ||||
|     856			switch ((unsigned long) info) { | ||||
|     ... | ||||
|     865			case (unsigned long) SEND_SIG_PRIV: | ||||
|     866				q->info.si_signo = sig; | ||||
|     867				q->info.si_errno = 0; | ||||
|     868				q->info.si_code = SI_KERNEL; | ||||
|     869				q->info.si_pid = 0; | ||||
|     870				q->info.si_uid = 0; | ||||
|     871				break; | ||||
|     ... | ||||
|     890 } | ||||
| 
 | ||||
| Not only does this match with the ``.si_code`` member, it also matches the place | ||||
| we found earlier when looking for where siginfo_t objects are enqueued on the | ||||
| ``shared_pending`` list. | ||||
| 
 | ||||
| So to sum up: It seems that it is the padding introduced by the compiler | ||||
| between two struct fields that is uninitialized, and this gets reported when | ||||
| we do a ``memcpy()`` on the struct. This means that we have identified a false | ||||
| positive warning. | ||||
| 
 | ||||
| Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls | ||||
| when both the source and destination addresses are tracked. (Instead, we copy | ||||
| the shadow bytemap as well). In this case, the destination address clearly | ||||
| was not tracked. We can dig a little deeper into the stack trace from above:: | ||||
| 
 | ||||
| 	arch/x86/kernel/signal.c:805 | ||||
| 	arch/x86/kernel/signal.c:871 | ||||
| 	arch/x86/kernel/entry_64.S:694 | ||||
| 
 | ||||
| And we clearly see that the destination siginfo object is located on the | ||||
| stack:: | ||||
| 
 | ||||
|     782 static void do_signal(struct pt_regs *regs) | ||||
|     783 { | ||||
|     784		struct k_sigaction ka; | ||||
|     785		siginfo_t info; | ||||
|     ... | ||||
|     804		signr = get_signal_to_deliver(&info, &ka, regs, NULL); | ||||
|     ... | ||||
|     854 } | ||||
| 
 | ||||
| And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the | ||||
| destination argument. | ||||
| 
 | ||||
| Now, even though we didn't find an actual error here, the example is still a | ||||
| good one, because it shows how one would go about to find out what the report | ||||
| was all about. | ||||
| 
 | ||||
| 
 | ||||
| Annotating false positives | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| 
 | ||||
| There are a few different ways to make annotations in the source code that | ||||
| will keep kmemcheck from checking and reporting certain allocations. Here | ||||
| they are: | ||||
| 
 | ||||
| - ``__GFP_NOTRACK_FALSE_POSITIVE`` | ||||
| 	This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()`` | ||||
| 	(therefore also to other functions that end up calling one of | ||||
| 	these) to indicate that the allocation should not be tracked | ||||
| 	because it would lead to a false positive report. This is a "big | ||||
| 	hammer" way of silencing kmemcheck; after all, even if the false | ||||
| 	positive pertains to particular field in a struct, for example, we | ||||
| 	will now lose the ability to find (real) errors in other parts of | ||||
| 	the same struct. | ||||
| 
 | ||||
| 	Example:: | ||||
| 
 | ||||
| 	    /* No warnings will ever trigger on accessing any part of x */ | ||||
| 	    x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE); | ||||
| 
 | ||||
| - ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and | ||||
| 	``kmemcheck_annotate_bitfield(ptr, name)`` | ||||
| 	The first two of these three macros can be used inside struct | ||||
| 	definitions to signal, respectively, the beginning and end of a | ||||
| 	bitfield. Additionally, this will assign the bitfield a name, which | ||||
| 	is given as an argument to the macros. | ||||
| 
 | ||||
| 	Having used these markers, one can later use | ||||
| 	kmemcheck_annotate_bitfield() at the point of allocation, to indicate | ||||
| 	which parts of the allocation is part of a bitfield. | ||||
| 
 | ||||
| 	Example:: | ||||
| 
 | ||||
| 	    struct foo { | ||||
| 		int x; | ||||
| 
 | ||||
| 		kmemcheck_bitfield_begin(flags); | ||||
| 		int flag_a:1; | ||||
| 		int flag_b:1; | ||||
| 		kmemcheck_bitfield_end(flags); | ||||
| 
 | ||||
| 		int y; | ||||
| 	    }; | ||||
| 
 | ||||
| 	    struct foo *x = kmalloc(sizeof *x); | ||||
| 
 | ||||
| 	    /* No warnings will trigger on accessing the bitfield of x */ | ||||
| 	    kmemcheck_annotate_bitfield(x, flags); | ||||
| 
 | ||||
| 	Note that ``kmemcheck_annotate_bitfield()`` can be used even before the | ||||
| 	return value of ``kmalloc()`` is checked -- in other words, passing NULL | ||||
| 	as the first argument is legal (and will do nothing). | ||||
| 
 | ||||
| 
 | ||||
| Reporting errors | ||||
| ---------------- | ||||
| 
 | ||||
| As we have seen, kmemcheck will produce false positive reports. Therefore, it | ||||
| is not very wise to blindly post kmemcheck warnings to mailing lists and | ||||
| maintainers. Instead, I encourage maintainers and developers to find errors | ||||
| in their own code. If you get a warning, you can try to work around it, try | ||||
| to figure out if it's a real error or not, or simply ignore it. Most | ||||
| developers know their own code and will quickly and efficiently determine the | ||||
| root cause of a kmemcheck report. This is therefore also the most efficient | ||||
| way to work with kmemcheck. | ||||
| 
 | ||||
| That said, we (the kmemcheck maintainers) will always be on the lookout for | ||||
| false positives that we can annotate and silence. So whatever you find, | ||||
| please drop us a note privately! Kernel configs and steps to reproduce (if | ||||
| available) are of course a great help too. | ||||
| 
 | ||||
| Happy hacking! | ||||
| 
 | ||||
| 
 | ||||
| Technical description | ||||
| --------------------- | ||||
| 
 | ||||
| kmemcheck works by marking memory pages non-present. This means that whenever | ||||
| somebody attempts to access the page, a page fault is generated. The page | ||||
| fault handler notices that the page was in fact only hidden, and so it calls | ||||
| on the kmemcheck code to make further investigations. | ||||
| 
 | ||||
| When the investigations are completed, kmemcheck "shows" the page by marking | ||||
| it present (as it would be under normal circumstances). This way, the | ||||
| interrupted code can continue as usual. | ||||
| 
 | ||||
| But after the instruction has been executed, we should hide the page again, so | ||||
| that we can catch the next access too! Now kmemcheck makes use of a debugging | ||||
| feature of the processor, namely single-stepping. When the processor has | ||||
| finished the one instruction that generated the memory access, a debug | ||||
| exception is raised. From here, we simply hide the page again and continue | ||||
| execution, this time with the single-stepping feature turned off. | ||||
| 
 | ||||
| kmemcheck requires some assistance from the memory allocator in order to work. | ||||
| The memory allocator needs to | ||||
| 
 | ||||
|   1. Tell kmemcheck about newly allocated pages and pages that are about to | ||||
|      be freed. This allows kmemcheck to set up and tear down the shadow memory | ||||
|      for the pages in question. The shadow memory stores the status of each | ||||
|      byte in the allocation proper, e.g. whether it is initialized or | ||||
|      uninitialized. | ||||
| 
 | ||||
|   2. Tell kmemcheck which parts of memory should be marked uninitialized. | ||||
|      There are actually a few more states, such as "not yet allocated" and | ||||
|      "recently freed". | ||||
| 
 | ||||
| If a slab cache is set up using the SLAB_NOTRACK flag, it will never return | ||||
| memory that can take page faults because of kmemcheck. | ||||
| 
 | ||||
| If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still | ||||
| request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags. | ||||
| This does not prevent the page faults from occurring, however, but marks the | ||||
| object in question as being initialized so that no warnings will ever be | ||||
| produced for this object. | ||||
| 
 | ||||
| Currently, the SLAB and SLUB allocators are supported by kmemcheck. | ||||
|  | @ -250,7 +250,6 @@ Table 1-2: Contents of the status files (as of 4.8) | |||
|  VmExe                       size of text segment | ||||
|  VmLib                       size of shared library code | ||||
|  VmPTE                       size of page table entries | ||||
|  VmPMD                       size of second level page tables | ||||
|  VmSwap                      amount of swap used by anonymous private data | ||||
|                              (shmem swap usage is not included) | ||||
|  HugetlbPages                size of hugetlb memory portions | ||||
|  |  | |||
|  | @ -58,6 +58,7 @@ Currently, these files are in /proc/sys/vm: | |||
| - percpu_pagelist_fraction | ||||
| - stat_interval | ||||
| - stat_refresh | ||||
| - numa_stat | ||||
| - swappiness | ||||
| - user_reserve_kbytes | ||||
| - vfs_cache_pressure | ||||
|  | @ -157,6 +158,10 @@ Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any | |||
| value lower than this limit will be ignored and the old configuration will be | ||||
| retained. | ||||
| 
 | ||||
| Note: the value of dirty_bytes also must be set greater than | ||||
| dirty_background_bytes or the amount of memory corresponding to | ||||
| dirty_background_ratio. | ||||
| 
 | ||||
| ============================================================== | ||||
| 
 | ||||
| dirty_expire_centisecs | ||||
|  | @ -176,6 +181,9 @@ generating disk writes will itself start writing out dirty data. | |||
| 
 | ||||
| The total available memory is not equal to total system memory. | ||||
| 
 | ||||
| Note: dirty_ratio must be set greater than dirty_background_ratio or | ||||
| ratio corresponding to dirty_background_bytes. | ||||
| 
 | ||||
| ============================================================== | ||||
| 
 | ||||
| dirty_writeback_centisecs | ||||
|  | @ -622,7 +630,7 @@ oom_dump_tasks | |||
| 
 | ||||
| Enables a system-wide task dump (excluding kernel threads) to be produced | ||||
| when the kernel performs an OOM-killing and includes such information as | ||||
| pid, uid, tgid, vm size, rss, nr_ptes, nr_pmds, swapents, oom_score_adj | ||||
| pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj | ||||
| score, and name.  This is helpful to determine why the OOM killer was | ||||
| invoked, to identify the rogue task that caused it, and to determine why | ||||
| the OOM killer chose the task it did to kill. | ||||
|  | @ -792,6 +800,21 @@ with no ill effects: errors and warnings on these stats are suppressed.) | |||
| 
 | ||||
| ============================================================== | ||||
| 
 | ||||
| numa_stat | ||||
| 
 | ||||
| This interface allows runtime configuration of numa statistics. | ||||
| 
 | ||||
| When page allocation performance becomes a bottleneck and you can tolerate | ||||
| some possible tool breakage and decreased numa counter precision, you can | ||||
| do: | ||||
| 	echo 0 > /proc/sys/vm/numa_stat | ||||
| 
 | ||||
| When page allocation performance is not a bottleneck and you want all | ||||
| tooling to work, you can do: | ||||
| 	echo 1 > /proc/sys/vm/numa_stat | ||||
| 
 | ||||
| ============================================================== | ||||
| 
 | ||||
| swappiness | ||||
| 
 | ||||
| This control is used to define how aggressive the kernel will swap | ||||
|  |  | |||
							
								
								
									
										93
									
								
								Documentation/vm/mmu_notifier.txt
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										93
									
								
								Documentation/vm/mmu_notifier.txt
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,93 @@ | |||
| When do you need to notify inside page table lock ? | ||||
| 
 | ||||
| When clearing a pte/pmd we are given a choice to notify the event through | ||||
| (notify version of *_clear_flush call mmu_notifier_invalidate_range) under | ||||
| the page table lock. But that notification is not necessary in all cases. | ||||
| 
 | ||||
| For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use | ||||
| thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a | ||||
| process virtual address space). There is only 2 cases when you need to notify | ||||
| those secondary TLB while holding page table lock when clearing a pte/pmd: | ||||
| 
 | ||||
|   A) page backing address is free before mmu_notifier_invalidate_range_end() | ||||
|   B) a page table entry is updated to point to a new page (COW, write fault | ||||
|      on zero page, __replace_page(), ...) | ||||
| 
 | ||||
| Case A is obvious you do not want to take the risk for the device to write to | ||||
| a page that might now be used by some completely different task. | ||||
| 
 | ||||
| Case B is more subtle. For correctness it requires the following sequence to | ||||
| happen: | ||||
|   - take page table lock | ||||
|   - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify()) | ||||
|   - set page table entry to point to new page | ||||
| 
 | ||||
| If clearing the page table entry is not followed by a notify before setting | ||||
| the new pte/pmd value then you can break memory model like C11 or C++11 for | ||||
| the device. | ||||
| 
 | ||||
| Consider the following scenario (device use a feature similar to ATS/PASID): | ||||
| 
 | ||||
| Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we assume | ||||
| they are write protected for COW (other case of B apply too). | ||||
| 
 | ||||
| [Time N] -------------------------------------------------------------------- | ||||
| CPU-thread-0  {try to write to addrA} | ||||
| CPU-thread-1  {try to write to addrB} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {read addrA and populate device TLB} | ||||
| DEV-thread-2  {read addrB and populate device TLB} | ||||
| [Time N+1] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}} | ||||
| CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {} | ||||
| DEV-thread-2  {} | ||||
| [Time N+2] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {COW_step1: {update page table to point to new page for addrA}} | ||||
| CPU-thread-1  {COW_step1: {update page table to point to new page for addrB}} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {} | ||||
| DEV-thread-2  {} | ||||
| [Time N+3] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {preempted} | ||||
| CPU-thread-1  {preempted} | ||||
| CPU-thread-2  {write to addrA which is a write to new page} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {} | ||||
| DEV-thread-2  {} | ||||
| [Time N+3] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {preempted} | ||||
| CPU-thread-1  {preempted} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {write to addrB which is a write to new page} | ||||
| DEV-thread-0  {} | ||||
| DEV-thread-2  {} | ||||
| [Time N+4] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {preempted} | ||||
| CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {} | ||||
| DEV-thread-2  {} | ||||
| [Time N+5] ------------------------------------------------------------------ | ||||
| CPU-thread-0  {preempted} | ||||
| CPU-thread-1  {} | ||||
| CPU-thread-2  {} | ||||
| CPU-thread-3  {} | ||||
| DEV-thread-0  {read addrA from old page} | ||||
| DEV-thread-2  {read addrB from new page} | ||||
| 
 | ||||
| So here because at time N+2 the clear page table entry was not pair with a | ||||
| notification to invalidate the secondary TLB, the device see the new value for | ||||
| addrB before seing the new value for addrA. This break total memory ordering | ||||
| for the device. | ||||
| 
 | ||||
| When changing a pte to write protect or to point to a new write protected page | ||||
| with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range | ||||
| call to mmu_notifier_invalidate_range_end() outside the page table lock. This | ||||
| is true even if the thread doing the page table update is preempted right after | ||||
| releasing page table lock but before call mmu_notifier_invalidate_range_end(). | ||||
							
								
								
									
										10
									
								
								MAINTAINERS
									
									
									
									
									
								
							
							
						
						
									
										10
									
								
								MAINTAINERS
									
									
									
									
									
								
							|  | @ -7692,16 +7692,6 @@ F:	include/linux/kdb.h | |||
| F:	include/linux/kgdb.h | ||||
| F:	kernel/debug/ | ||||
| 
 | ||||
| KMEMCHECK | ||||
| M:	Vegard Nossum <vegardno@ifi.uio.no> | ||||
| M:	Pekka Enberg <penberg@kernel.org> | ||||
| S:	Maintained | ||||
| F:	Documentation/dev-tools/kmemcheck.rst | ||||
| F:	arch/x86/include/asm/kmemcheck.h | ||||
| F:	arch/x86/mm/kmemcheck/ | ||||
| F:	include/linux/kmemcheck.h | ||||
| F:	mm/kmemcheck.c | ||||
| 
 | ||||
| KMEMLEAK | ||||
| M:	Catalin Marinas <catalin.marinas@arm.com> | ||||
| S:	Maintained | ||||
|  |  | |||
|  | @ -7,7 +7,6 @@ | |||
| #include <linux/mm_types.h> | ||||
| #include <linux/scatterlist.h> | ||||
| #include <linux/dma-debug.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/kref.h> | ||||
| 
 | ||||
| #define ARM_MAPPING_ERROR		(~(dma_addr_t)0x0) | ||||
|  |  | |||
|  | @ -57,7 +57,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) | |||
| extern pgd_t *pgd_alloc(struct mm_struct *mm); | ||||
| extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); | ||||
| 
 | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_ZERO) | ||||
| 
 | ||||
| static inline void clean_pte_table(pte_t *pte) | ||||
| { | ||||
|  |  | |||
|  | @ -141,7 +141,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base) | |||
| 	pte = pmd_pgtable(*pmd); | ||||
| 	pmd_clear(pmd); | ||||
| 	pte_free(mm, pte); | ||||
| 	atomic_long_dec(&mm->nr_ptes); | ||||
| 	mm_dec_nr_ptes(mm); | ||||
| no_pmd: | ||||
| 	pud_clear(pud); | ||||
| 	pmd_free(mm, pmd); | ||||
|  |  | |||
|  | @ -85,7 +85,7 @@ config ARM64 | |||
| 	select HAVE_ARCH_BITREVERSE | ||||
| 	select HAVE_ARCH_HUGE_VMAP | ||||
| 	select HAVE_ARCH_JUMP_LABEL | ||||
| 	select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP && !(ARM64_16K_PAGES && ARM64_VA_BITS_48) | ||||
| 	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48) | ||||
| 	select HAVE_ARCH_KGDB | ||||
| 	select HAVE_ARCH_MMAP_RND_BITS | ||||
| 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT | ||||
|  |  | |||
|  | @ -26,7 +26,7 @@ | |||
| 
 | ||||
| #define check_pgt_cache()		do { } while (0) | ||||
| 
 | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_ZERO) | ||||
| #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t)) | ||||
| 
 | ||||
| #if CONFIG_PGTABLE_LEVELS > 2 | ||||
|  |  | |||
|  | @ -11,6 +11,7 @@ | |||
|  */ | ||||
| 
 | ||||
| #define pr_fmt(fmt) "kasan: " fmt | ||||
| #include <linux/bootmem.h> | ||||
| #include <linux/kasan.h> | ||||
| #include <linux/kernel.h> | ||||
| #include <linux/sched/task.h> | ||||
|  | @ -35,77 +36,117 @@ static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE); | |||
|  * with the physical address from __pa_symbol. | ||||
|  */ | ||||
| 
 | ||||
| static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr, | ||||
| 					unsigned long end) | ||||
| static phys_addr_t __init kasan_alloc_zeroed_page(int node) | ||||
| { | ||||
| 	void *p = memblock_virt_alloc_try_nid(PAGE_SIZE, PAGE_SIZE, | ||||
| 					      __pa(MAX_DMA_ADDRESS), | ||||
| 					      MEMBLOCK_ALLOC_ACCESSIBLE, node); | ||||
| 	return __pa(p); | ||||
| } | ||||
| 
 | ||||
| static pte_t *__init kasan_pte_offset(pmd_t *pmd, unsigned long addr, int node, | ||||
| 				      bool early) | ||||
| { | ||||
| 	if (pmd_none(*pmd)) { | ||||
| 		phys_addr_t pte_phys = early ? __pa_symbol(kasan_zero_pte) | ||||
| 					     : kasan_alloc_zeroed_page(node); | ||||
| 		__pmd_populate(pmd, pte_phys, PMD_TYPE_TABLE); | ||||
| 	} | ||||
| 
 | ||||
| 	return early ? pte_offset_kimg(pmd, addr) | ||||
| 		     : pte_offset_kernel(pmd, addr); | ||||
| } | ||||
| 
 | ||||
| static pmd_t *__init kasan_pmd_offset(pud_t *pud, unsigned long addr, int node, | ||||
| 				      bool early) | ||||
| { | ||||
| 	if (pud_none(*pud)) { | ||||
| 		phys_addr_t pmd_phys = early ? __pa_symbol(kasan_zero_pmd) | ||||
| 					     : kasan_alloc_zeroed_page(node); | ||||
| 		__pud_populate(pud, pmd_phys, PMD_TYPE_TABLE); | ||||
| 	} | ||||
| 
 | ||||
| 	return early ? pmd_offset_kimg(pud, addr) : pmd_offset(pud, addr); | ||||
| } | ||||
| 
 | ||||
| static pud_t *__init kasan_pud_offset(pgd_t *pgd, unsigned long addr, int node, | ||||
| 				      bool early) | ||||
| { | ||||
| 	if (pgd_none(*pgd)) { | ||||
| 		phys_addr_t pud_phys = early ? __pa_symbol(kasan_zero_pud) | ||||
| 					     : kasan_alloc_zeroed_page(node); | ||||
| 		__pgd_populate(pgd, pud_phys, PMD_TYPE_TABLE); | ||||
| 	} | ||||
| 
 | ||||
| 	return early ? pud_offset_kimg(pgd, addr) : pud_offset(pgd, addr); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_pte_populate(pmd_t *pmd, unsigned long addr, | ||||
| 				      unsigned long end, int node, bool early) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 	unsigned long next; | ||||
| 	pte_t *pte = kasan_pte_offset(pmd, addr, node, early); | ||||
| 
 | ||||
| 	if (pmd_none(*pmd)) | ||||
| 		__pmd_populate(pmd, __pa_symbol(kasan_zero_pte), PMD_TYPE_TABLE); | ||||
| 
 | ||||
| 	pte = pte_offset_kimg(pmd, addr); | ||||
| 	do { | ||||
| 		phys_addr_t page_phys = early ? __pa_symbol(kasan_zero_page) | ||||
| 					      : kasan_alloc_zeroed_page(node); | ||||
| 		next = addr + PAGE_SIZE; | ||||
| 		set_pte(pte, pfn_pte(sym_to_pfn(kasan_zero_page), | ||||
| 					PAGE_KERNEL)); | ||||
| 		set_pte(pte, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL)); | ||||
| 	} while (pte++, addr = next, addr != end && pte_none(*pte)); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_early_pmd_populate(pud_t *pud, | ||||
| 					unsigned long addr, | ||||
| 					unsigned long end) | ||||
| static void __init kasan_pmd_populate(pud_t *pud, unsigned long addr, | ||||
| 				      unsigned long end, int node, bool early) | ||||
| { | ||||
| 	pmd_t *pmd; | ||||
| 	unsigned long next; | ||||
| 	pmd_t *pmd = kasan_pmd_offset(pud, addr, node, early); | ||||
| 
 | ||||
| 	if (pud_none(*pud)) | ||||
| 		__pud_populate(pud, __pa_symbol(kasan_zero_pmd), PMD_TYPE_TABLE); | ||||
| 
 | ||||
| 	pmd = pmd_offset_kimg(pud, addr); | ||||
| 	do { | ||||
| 		next = pmd_addr_end(addr, end); | ||||
| 		kasan_early_pte_populate(pmd, addr, next); | ||||
| 		kasan_pte_populate(pmd, addr, next, node, early); | ||||
| 	} while (pmd++, addr = next, addr != end && pmd_none(*pmd)); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_early_pud_populate(pgd_t *pgd, | ||||
| 					unsigned long addr, | ||||
| 					unsigned long end) | ||||
| static void __init kasan_pud_populate(pgd_t *pgd, unsigned long addr, | ||||
| 				      unsigned long end, int node, bool early) | ||||
| { | ||||
| 	pud_t *pud; | ||||
| 	unsigned long next; | ||||
| 	pud_t *pud = kasan_pud_offset(pgd, addr, node, early); | ||||
| 
 | ||||
| 	if (pgd_none(*pgd)) | ||||
| 		__pgd_populate(pgd, __pa_symbol(kasan_zero_pud), PUD_TYPE_TABLE); | ||||
| 
 | ||||
| 	pud = pud_offset_kimg(pgd, addr); | ||||
| 	do { | ||||
| 		next = pud_addr_end(addr, end); | ||||
| 		kasan_early_pmd_populate(pud, addr, next); | ||||
| 		kasan_pmd_populate(pud, addr, next, node, early); | ||||
| 	} while (pud++, addr = next, addr != end && pud_none(*pud)); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_map_early_shadow(void) | ||||
| static void __init kasan_pgd_populate(unsigned long addr, unsigned long end, | ||||
| 				      int node, bool early) | ||||
| { | ||||
| 	unsigned long addr = KASAN_SHADOW_START; | ||||
| 	unsigned long end = KASAN_SHADOW_END; | ||||
| 	unsigned long next; | ||||
| 	pgd_t *pgd; | ||||
| 
 | ||||
| 	pgd = pgd_offset_k(addr); | ||||
| 	do { | ||||
| 		next = pgd_addr_end(addr, end); | ||||
| 		kasan_early_pud_populate(pgd, addr, next); | ||||
| 		kasan_pud_populate(pgd, addr, next, node, early); | ||||
| 	} while (pgd++, addr = next, addr != end); | ||||
| } | ||||
| 
 | ||||
| /* The early shadow maps everything to a single page of zeroes */ | ||||
| asmlinkage void __init kasan_early_init(void) | ||||
| { | ||||
| 	BUILD_BUG_ON(KASAN_SHADOW_OFFSET != KASAN_SHADOW_END - (1UL << 61)); | ||||
| 	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_START, PGDIR_SIZE)); | ||||
| 	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE)); | ||||
| 	kasan_map_early_shadow(); | ||||
| 	kasan_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, NUMA_NO_NODE, | ||||
| 			   true); | ||||
| } | ||||
| 
 | ||||
| /* Set up full kasan mappings, ensuring that the mapped pages are zeroed */ | ||||
| static void __init kasan_map_populate(unsigned long start, unsigned long end, | ||||
| 				      int node) | ||||
| { | ||||
| 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false); | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  | @ -142,8 +183,8 @@ void __init kasan_init(void) | |||
| 	struct memblock_region *reg; | ||||
| 	int i; | ||||
| 
 | ||||
| 	kimg_shadow_start = (u64)kasan_mem_to_shadow(_text); | ||||
| 	kimg_shadow_end = (u64)kasan_mem_to_shadow(_end); | ||||
| 	kimg_shadow_start = (u64)kasan_mem_to_shadow(_text) & PAGE_MASK; | ||||
| 	kimg_shadow_end = PAGE_ALIGN((u64)kasan_mem_to_shadow(_end)); | ||||
| 
 | ||||
| 	mod_shadow_start = (u64)kasan_mem_to_shadow((void *)MODULES_VADDR); | ||||
| 	mod_shadow_end = (u64)kasan_mem_to_shadow((void *)MODULES_END); | ||||
|  | @ -161,19 +202,8 @@ void __init kasan_init(void) | |||
| 
 | ||||
| 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); | ||||
| 
 | ||||
| 	vmemmap_populate(kimg_shadow_start, kimg_shadow_end, | ||||
| 			 pfn_to_nid(virt_to_pfn(lm_alias(_text)))); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * vmemmap_populate() has populated the shadow region that covers the | ||||
| 	 * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round | ||||
| 	 * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent | ||||
| 	 * kasan_populate_zero_shadow() from replacing the page table entries | ||||
| 	 * (PMD or PTE) at the edges of the shadow region for the kernel | ||||
| 	 * image. | ||||
| 	 */ | ||||
| 	kimg_shadow_start = round_down(kimg_shadow_start, SWAPPER_BLOCK_SIZE); | ||||
| 	kimg_shadow_end = round_up(kimg_shadow_end, SWAPPER_BLOCK_SIZE); | ||||
| 	kasan_map_populate(kimg_shadow_start, kimg_shadow_end, | ||||
| 			   pfn_to_nid(virt_to_pfn(lm_alias(_text)))); | ||||
| 
 | ||||
| 	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START, | ||||
| 				   (void *)mod_shadow_start); | ||||
|  | @ -191,9 +221,9 @@ void __init kasan_init(void) | |||
| 		if (start >= end) | ||||
| 			break; | ||||
| 
 | ||||
| 		vmemmap_populate((unsigned long)kasan_mem_to_shadow(start), | ||||
| 				(unsigned long)kasan_mem_to_shadow(end), | ||||
| 				pfn_to_nid(virt_to_pfn(start))); | ||||
| 		kasan_map_populate((unsigned long)kasan_mem_to_shadow(start), | ||||
| 				   (unsigned long)kasan_mem_to_shadow(end), | ||||
| 				   pfn_to_nid(virt_to_pfn(start))); | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
|  |  | |||
|  | @ -42,21 +42,9 @@ | |||
| #undef DEBUG | ||||
| 
 | ||||
| /*
 | ||||
|  * BAD_PAGE is the page that is used for page faults when linux | ||||
|  * is out-of-memory. Older versions of linux just did a | ||||
|  * do_exit(), but using this instead means there is less risk | ||||
|  * for a process dying in kernel mode, possibly leaving a inode | ||||
|  * unused etc.. | ||||
|  * | ||||
|  * BAD_PAGETABLE is the accompanying page-table: it is initialized | ||||
|  * to point to BAD_PAGE entries. | ||||
|  * | ||||
|  * ZERO_PAGE is a special page that is used for zero-initialized | ||||
|  * data and COW. | ||||
|  */ | ||||
| static unsigned long empty_bad_page_table; | ||||
| static unsigned long empty_bad_page; | ||||
| 
 | ||||
| unsigned long empty_zero_page; | ||||
| EXPORT_SYMBOL(empty_zero_page); | ||||
| 
 | ||||
|  | @ -72,8 +60,6 @@ void __init paging_init(void) | |||
| 	unsigned long zones_size[MAX_NR_ZONES] = {0, }; | ||||
| 
 | ||||
| 	/* allocate some pages for kernel housekeeping tasks */ | ||||
| 	empty_bad_page_table	= (unsigned long) alloc_bootmem_pages(PAGE_SIZE); | ||||
| 	empty_bad_page		= (unsigned long) alloc_bootmem_pages(PAGE_SIZE); | ||||
| 	empty_zero_page		= (unsigned long) alloc_bootmem_pages(PAGE_SIZE); | ||||
| 
 | ||||
| 	memset((void *) empty_zero_page, 0, PAGE_SIZE); | ||||
|  |  | |||
|  | @ -40,20 +40,9 @@ | |||
| #include <asm/sections.h> | ||||
| 
 | ||||
| /*
 | ||||
|  * BAD_PAGE is the page that is used for page faults when linux | ||||
|  * is out-of-memory. Older versions of linux just did a | ||||
|  * do_exit(), but using this instead means there is less risk | ||||
|  * for a process dying in kernel mode, possibly leaving a inode | ||||
|  * unused etc.. | ||||
|  * | ||||
|  * BAD_PAGETABLE is the accompanying page-table: it is initialized | ||||
|  * to point to BAD_PAGE entries. | ||||
|  * | ||||
|  * ZERO_PAGE is a special page that is used for zero-initialized | ||||
|  * data and COW. | ||||
|  */ | ||||
| static unsigned long empty_bad_page_table; | ||||
| static unsigned long empty_bad_page; | ||||
| unsigned long empty_zero_page; | ||||
| 
 | ||||
| /*
 | ||||
|  | @ -78,8 +67,6 @@ void __init paging_init(void) | |||
| 	 * Initialize the bad page table and bad page to point | ||||
| 	 * to a couple of allocated pages. | ||||
| 	 */ | ||||
| 	empty_bad_page_table = (unsigned long)alloc_bootmem_pages(PAGE_SIZE); | ||||
| 	empty_bad_page = (unsigned long)alloc_bootmem_pages(PAGE_SIZE); | ||||
| 	empty_zero_page = (unsigned long)alloc_bootmem_pages(PAGE_SIZE); | ||||
| 	memset((void *)empty_zero_page, 0, PAGE_SIZE); | ||||
| 
 | ||||
|  |  | |||
|  | @ -196,8 +196,8 @@ config TIMER_DIVIDE | |||
| 	default "128" | ||||
| 
 | ||||
| config CPU_BIG_ENDIAN | ||||
|         bool "Generate big endian code" | ||||
| 	default n | ||||
| 	bool | ||||
| 	default !CPU_LITTLE_ENDIAN | ||||
| 
 | ||||
| config CPU_LITTLE_ENDIAN | ||||
|         bool "Generate little endian code" | ||||
|  |  | |||
|  | @ -31,12 +31,7 @@ | |||
|  * tables. Each page table is also a single 4K page, giving 512 (== | ||||
|  * PTRS_PER_PTE) 8 byte ptes. Each pud entry is initialized to point to | ||||
|  * invalid_pmd_table, each pmd entry is initialized to point to | ||||
|  * invalid_pte_table, each pte is initialized to 0. When memory is low, | ||||
|  * and a pmd table or a page table allocation fails, empty_bad_pmd_table | ||||
|  * and empty_bad_page_table is returned back to higher layer code, so | ||||
|  * that the failure is recognized later on. Linux does not seem to | ||||
|  * handle these failures very well though. The empty_bad_page_table has | ||||
|  * invalid pte entries in it, to force page faults. | ||||
|  * invalid_pte_table, each pte is initialized to 0. | ||||
|  * | ||||
|  * Kernel mappings: kernel mappings are held in the swapper_pg_table. | ||||
|  * The layout is identical to userspace except it's indexed with the | ||||
|  | @ -175,7 +170,6 @@ | |||
| 	printk("%s:%d: bad pgd %016lx.\n", __FILE__, __LINE__, pgd_val(e)) | ||||
| 
 | ||||
| extern pte_t invalid_pte_table[PTRS_PER_PTE]; | ||||
| extern pte_t empty_bad_page_table[PTRS_PER_PTE]; | ||||
| 
 | ||||
| #ifndef __PAGETABLE_PUD_FOLDED | ||||
| /*
 | ||||
|  |  | |||
|  | @ -433,14 +433,6 @@ ENTRY(swapper_pg_dir) | |||
| ENTRY(empty_zero_page) | ||||
| 	.space PAGE_SIZE
 | ||||
| 
 | ||||
| 	.balign PAGE_SIZE
 | ||||
| ENTRY(empty_bad_page) | ||||
| 	.space PAGE_SIZE
 | ||||
| 
 | ||||
| 	.balign PAGE_SIZE
 | ||||
| ENTRY(empty_bad_pte_table) | ||||
| 	.space PAGE_SIZE
 | ||||
| 
 | ||||
| 	.balign PAGE_SIZE
 | ||||
| ENTRY(large_page_table) | ||||
| 	.space PAGE_SIZE
 | ||||
|  |  | |||
|  | @ -23,7 +23,6 @@ | |||
|  */ | ||||
| 
 | ||||
| #include <linux/dma-debug.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/dma-mapping.h> | ||||
| 
 | ||||
| extern const struct dma_map_ops or1k_dma_map_ops; | ||||
|  |  | |||
|  | @ -18,7 +18,7 @@ static inline gfp_t pgtable_gfp_flags(struct mm_struct *mm, gfp_t gfp) | |||
| } | ||||
| #endif /* MODULE */ | ||||
| 
 | ||||
| #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO) | ||||
| 
 | ||||
| #ifdef CONFIG_PPC_BOOK3S | ||||
| #include <asm/book3s/pgalloc.h> | ||||
|  |  | |||
|  | @ -433,6 +433,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd, | |||
| 	pud = pud_offset(pgd, start); | ||||
| 	pgd_clear(pgd); | ||||
| 	pud_free_tlb(tlb, pud, start); | ||||
| 	mm_dec_nr_puds(tlb->mm); | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  |  | |||
|  | @ -200,7 +200,7 @@ static void destroy_pagetable_page(struct mm_struct *mm) | |||
| 	/* We allow PTE_FRAG_NR fragments from a PTE page */ | ||||
| 	if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) { | ||||
| 		pgtable_page_dtor(page); | ||||
| 		free_hot_cold_page(page, 0); | ||||
| 		free_unref_page(page); | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
|  |  | |||
|  | @ -404,7 +404,7 @@ void pte_fragment_free(unsigned long *table, int kernel) | |||
| 	if (put_page_testzero(page)) { | ||||
| 		if (!kernel) | ||||
| 			pgtable_page_dtor(page); | ||||
| 		free_hot_cold_page(page, 0); | ||||
| 		free_unref_page(page); | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
|  |  | |||
|  | @ -44,6 +44,8 @@ static inline int init_new_context(struct task_struct *tsk, | |||
| 		mm->context.asce_limit = STACK_TOP_MAX; | ||||
| 		mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | | ||||
| 				   _ASCE_USER_BITS | _ASCE_TYPE_REGION3; | ||||
| 		/* pgd_alloc() did not account this pud */ | ||||
| 		mm_inc_nr_puds(mm); | ||||
| 		break; | ||||
| 	case -PAGE_SIZE: | ||||
| 		/* forked 5-level task, set new asce with new_mm->pgd */ | ||||
|  | @ -59,7 +61,7 @@ static inline int init_new_context(struct task_struct *tsk, | |||
| 		/* forked 2-level compat task, set new asce with new mm->pgd */ | ||||
| 		mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | | ||||
| 				   _ASCE_USER_BITS | _ASCE_TYPE_SEGMENT; | ||||
| 		/* pgd_alloc() did not increase mm->nr_pmds */ | ||||
| 		/* pgd_alloc() did not account this pmd */ | ||||
| 		mm_inc_nr_pmds(mm); | ||||
| 	} | ||||
| 	crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); | ||||
|  |  | |||
|  | @ -1172,11 +1172,11 @@ static int __init dwarf_unwinder_init(void) | |||
| 
 | ||||
| 	dwarf_frame_cachep = kmem_cache_create("dwarf_frames", | ||||
| 			sizeof(struct dwarf_frame), 0, | ||||
| 			SLAB_PANIC | SLAB_HWCACHE_ALIGN | SLAB_NOTRACK, NULL); | ||||
| 			SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL); | ||||
| 
 | ||||
| 	dwarf_reg_cachep = kmem_cache_create("dwarf_regs", | ||||
| 			sizeof(struct dwarf_reg), 0, | ||||
| 			SLAB_PANIC | SLAB_HWCACHE_ALIGN | SLAB_NOTRACK, NULL); | ||||
| 			SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL); | ||||
| 
 | ||||
| 	dwarf_frame_pool = mempool_create_slab_pool(DWARF_FRAME_MIN_REQ, | ||||
| 						    dwarf_frame_cachep); | ||||
|  |  | |||
|  | @ -101,14 +101,6 @@ empty_zero_page: | |||
| mmu_pdtp_cache: | ||||
| 	.space PAGE_SIZE, 0 | ||||
| 
 | ||||
| 	.global empty_bad_page
 | ||||
| empty_bad_page: | ||||
| 	.space PAGE_SIZE, 0 | ||||
| 
 | ||||
| 	.global empty_bad_pte_table
 | ||||
| empty_bad_pte_table: | ||||
| 	.space PAGE_SIZE, 0 | ||||
| 
 | ||||
| 	.global	fpu_in_use
 | ||||
| fpu_in_use:	.quad	0 | ||||
| 
 | ||||
|  |  | |||
|  | @ -59,7 +59,7 @@ void arch_task_cache_init(void) | |||
| 
 | ||||
| 	task_xstate_cachep = kmem_cache_create("task_xstate", xstate_size, | ||||
| 					       __alignof__(union thread_xstate), | ||||
| 					       SLAB_PANIC | SLAB_NOTRACK, NULL); | ||||
| 					       SLAB_PANIC, NULL); | ||||
| } | ||||
| 
 | ||||
| #ifdef CONFIG_SH_FPU_EMU | ||||
|  |  | |||
|  | @ -231,6 +231,36 @@ extern unsigned long _PAGE_ALL_SZ_BITS; | |||
| extern struct page *mem_map_zero; | ||||
| #define ZERO_PAGE(vaddr)	(mem_map_zero) | ||||
| 
 | ||||
| /* This macro must be updated when the size of struct page grows above 80
 | ||||
|  * or reduces below 64. | ||||
|  * The idea that compiler optimizes out switch() statement, and only | ||||
|  * leaves clrx instructions | ||||
|  */ | ||||
| #define	mm_zero_struct_page(pp) do {					\ | ||||
| 	unsigned long *_pp = (void *)(pp);				\ | ||||
| 									\ | ||||
| 	 /* Check that struct page is either 64, 72, or 80 bytes */	\ | ||||
| 	BUILD_BUG_ON(sizeof(struct page) & 7);				\ | ||||
| 	BUILD_BUG_ON(sizeof(struct page) < 64);				\ | ||||
| 	BUILD_BUG_ON(sizeof(struct page) > 80);				\ | ||||
| 									\ | ||||
| 	switch (sizeof(struct page)) {					\ | ||||
| 	case 80:							\ | ||||
| 		_pp[9] = 0;	/* fallthrough */			\ | ||||
| 	case 72:							\ | ||||
| 		_pp[8] = 0;	/* fallthrough */			\ | ||||
| 	default:							\ | ||||
| 		_pp[7] = 0;						\ | ||||
| 		_pp[6] = 0;						\ | ||||
| 		_pp[5] = 0;						\ | ||||
| 		_pp[4] = 0;						\ | ||||
| 		_pp[3] = 0;						\ | ||||
| 		_pp[2] = 0;						\ | ||||
| 		_pp[1] = 0;						\ | ||||
| 		_pp[0] = 0;						\ | ||||
| 	}								\ | ||||
| } while (0) | ||||
| 
 | ||||
| /* PFNs are real physical page numbers.  However, mem_map only begins to record
 | ||||
|  * per-page information starting at pfn_base.  This is to handle systems where | ||||
|  * the first physical page in the machine is at some huge physical address, | ||||
|  |  | |||
|  | @ -397,7 +397,7 @@ static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, | |||
| 
 | ||||
| 	pmd_clear(pmd); | ||||
| 	pte_free_tlb(tlb, token, addr); | ||||
| 	atomic_long_dec(&tlb->mm->nr_ptes); | ||||
| 	mm_dec_nr_ptes(tlb->mm); | ||||
| } | ||||
| 
 | ||||
| static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud, | ||||
|  | @ -472,6 +472,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd, | |||
| 	pud = pud_offset(pgd, start); | ||||
| 	pgd_clear(pgd); | ||||
| 	pud_free_tlb(tlb, pud, start); | ||||
| 	mm_dec_nr_puds(tlb->mm); | ||||
| } | ||||
| 
 | ||||
| void hugetlb_free_pgd_range(struct mmu_gather *tlb, | ||||
|  |  | |||
|  | @ -2540,9 +2540,16 @@ void __init mem_init(void) | |||
| { | ||||
| 	high_memory = __va(last_valid_pfn << PAGE_SHIFT); | ||||
| 
 | ||||
| 	register_page_bootmem_info(); | ||||
| 	free_all_bootmem(); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Must be done after boot memory is put on freelist, because here we | ||||
| 	 * might set fields in deferred struct pages that have not yet been | ||||
| 	 * initialized, and free_all_bootmem() initializes all the reserved | ||||
| 	 * deferred pages for us. | ||||
| 	 */ | ||||
| 	register_page_bootmem_info(); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Set up the zero page, mark it reserved, so that page count | ||||
| 	 * is not manipulated when freeing the page from user ptes. | ||||
|  | @ -2637,30 +2644,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend, | |||
| 	vstart = vstart & PMD_MASK; | ||||
| 	vend = ALIGN(vend, PMD_SIZE); | ||||
| 	for (; vstart < vend; vstart += PMD_SIZE) { | ||||
| 		pgd_t *pgd = pgd_offset_k(vstart); | ||||
| 		pgd_t *pgd = vmemmap_pgd_populate(vstart, node); | ||||
| 		unsigned long pte; | ||||
| 		pud_t *pud; | ||||
| 		pmd_t *pmd; | ||||
| 
 | ||||
| 		if (pgd_none(*pgd)) { | ||||
| 			pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node); | ||||
| 		if (!pgd) | ||||
| 			return -ENOMEM; | ||||
| 
 | ||||
| 			if (!new) | ||||
| 				return -ENOMEM; | ||||
| 			pgd_populate(&init_mm, pgd, new); | ||||
| 		} | ||||
| 
 | ||||
| 		pud = pud_offset(pgd, vstart); | ||||
| 		if (pud_none(*pud)) { | ||||
| 			pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node); | ||||
| 
 | ||||
| 			if (!new) | ||||
| 				return -ENOMEM; | ||||
| 			pud_populate(&init_mm, pud, new); | ||||
| 		} | ||||
| 		pud = vmemmap_pud_populate(pgd, vstart, node); | ||||
| 		if (!pud) | ||||
| 			return -ENOMEM; | ||||
| 
 | ||||
| 		pmd = pmd_offset(pud, vstart); | ||||
| 
 | ||||
| 		pte = pmd_val(*pmd); | ||||
| 		if (!(pte & _PAGE_VALID)) { | ||||
| 			void *block = vmemmap_alloc_block(PMD_SIZE, node); | ||||
|  | @ -2927,7 +2923,7 @@ void __flush_tlb_all(void) | |||
| pte_t *pte_alloc_one_kernel(struct mm_struct *mm, | ||||
| 			    unsigned long address) | ||||
| { | ||||
| 	struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); | ||||
| 	struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO); | ||||
| 	pte_t *pte = NULL; | ||||
| 
 | ||||
| 	if (page) | ||||
|  | @ -2939,11 +2935,11 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm, | |||
| pgtable_t pte_alloc_one(struct mm_struct *mm, | ||||
| 			unsigned long address) | ||||
| { | ||||
| 	struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); | ||||
| 	struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO); | ||||
| 	if (!page) | ||||
| 		return NULL; | ||||
| 	if (!pgtable_page_ctor(page)) { | ||||
| 		free_hot_cold_page(page, 0); | ||||
| 		free_unref_page(page); | ||||
| 		return NULL; | ||||
| 	} | ||||
| 	return (pte_t *) page_address(page); | ||||
|  |  | |||
|  | @ -409,7 +409,7 @@ void __homecache_free_pages(struct page *page, unsigned int order) | |||
| 	if (put_page_testzero(page)) { | ||||
| 		homecache_change_page_home(page, order, PAGE_HOME_HASH); | ||||
| 		if (order == 0) { | ||||
| 			free_hot_cold_page(page, false); | ||||
| 			free_unref_page(page); | ||||
| 		} else { | ||||
| 			init_page_count(page); | ||||
| 			__free_pages(page, order); | ||||
|  |  | |||
|  | @ -22,8 +22,6 @@ | |||
| /* allocated in paging_init, zeroed in mem_init, and unchanged thereafter */ | ||||
| unsigned long *empty_zero_page = NULL; | ||||
| EXPORT_SYMBOL(empty_zero_page); | ||||
| /* allocated in paging_init and unchanged thereafter */ | ||||
| static unsigned long *empty_bad_page = NULL; | ||||
| 
 | ||||
| /*
 | ||||
|  * Initialized during boot, and readonly for initializing page tables | ||||
|  | @ -146,7 +144,6 @@ void __init paging_init(void) | |||
| 	int i; | ||||
| 
 | ||||
| 	empty_zero_page = (unsigned long *) alloc_bootmem_low_pages(PAGE_SIZE); | ||||
| 	empty_bad_page = (unsigned long *) alloc_bootmem_low_pages(PAGE_SIZE); | ||||
| 	for (i = 0; i < ARRAY_SIZE(zones_size); i++) | ||||
| 		zones_size[i] = 0; | ||||
| 
 | ||||
|  |  | |||
|  | @ -28,7 +28,7 @@ extern void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd); | |||
| #define pgd_alloc(mm)			get_pgd_slow(mm) | ||||
| #define pgd_free(mm, pgd)		free_pgd_slow(mm, pgd) | ||||
| 
 | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP	(GFP_KERNEL | __GFP_ZERO) | ||||
| 
 | ||||
| /*
 | ||||
|  * Allocate one PTE table. | ||||
|  |  | |||
|  | @ -97,7 +97,7 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd) | |||
| 	pte = pmd_pgtable(*pmd); | ||||
| 	pmd_clear(pmd); | ||||
| 	pte_free(mm, pte); | ||||
| 	atomic_long_dec(&mm->nr_ptes); | ||||
| 	mm_dec_nr_ptes(mm); | ||||
| 	pmd_free(mm, pmd); | ||||
| 	mm_dec_nr_pmds(mm); | ||||
| free: | ||||
|  |  | |||
|  | @ -110,9 +110,8 @@ config X86 | |||
| 	select HAVE_ARCH_AUDITSYSCALL | ||||
| 	select HAVE_ARCH_HUGE_VMAP		if X86_64 || X86_PAE | ||||
| 	select HAVE_ARCH_JUMP_LABEL | ||||
| 	select HAVE_ARCH_KASAN			if X86_64 && SPARSEMEM_VMEMMAP | ||||
| 	select HAVE_ARCH_KASAN			if X86_64 | ||||
| 	select HAVE_ARCH_KGDB | ||||
| 	select HAVE_ARCH_KMEMCHECK | ||||
| 	select HAVE_ARCH_MMAP_RND_BITS		if MMU | ||||
| 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT | ||||
| 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT | ||||
|  | @ -1430,7 +1429,7 @@ config ARCH_DMA_ADDR_T_64BIT | |||
| 
 | ||||
| config X86_DIRECT_GBPAGES | ||||
| 	def_bool y | ||||
| 	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK | ||||
| 	depends on X86_64 && !DEBUG_PAGEALLOC | ||||
| 	---help--- | ||||
| 	  Certain kernel features effectively disable kernel | ||||
| 	  linear 1 GB mappings (even if the CPU otherwise | ||||
|  |  | |||
|  | @ -158,11 +158,6 @@ ifdef CONFIG_X86_X32 | |||
| endif | ||||
| export CONFIG_X86_X32_ABI | ||||
| 
 | ||||
| # Don't unroll struct assignments with kmemcheck enabled
 | ||||
| ifeq ($(CONFIG_KMEMCHECK),y) | ||||
| 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy) | ||||
| endif | ||||
| 
 | ||||
| #
 | ||||
| # If the function graph tracer is used with mcount instead of fentry,
 | ||||
| # '-maccumulate-outgoing-args' is needed to prevent a GCC bug
 | ||||
|  |  | |||
|  | @ -7,7 +7,6 @@ | |||
|  * Documentation/DMA-API.txt for documentation. | ||||
|  */ | ||||
| 
 | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/scatterlist.h> | ||||
| #include <linux/dma-debug.h> | ||||
| #include <asm/io.h> | ||||
|  |  | |||
|  | @ -1,43 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ASM_X86_KMEMCHECK_H | ||||
| #define ASM_X86_KMEMCHECK_H | ||||
| 
 | ||||
| #include <linux/types.h> | ||||
| #include <asm/ptrace.h> | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK | ||||
| bool kmemcheck_active(struct pt_regs *regs); | ||||
| 
 | ||||
| void kmemcheck_show(struct pt_regs *regs); | ||||
| void kmemcheck_hide(struct pt_regs *regs); | ||||
| 
 | ||||
| bool kmemcheck_fault(struct pt_regs *regs, | ||||
| 	unsigned long address, unsigned long error_code); | ||||
| bool kmemcheck_trap(struct pt_regs *regs); | ||||
| #else | ||||
| static inline bool kmemcheck_active(struct pt_regs *regs) | ||||
| { | ||||
| 	return false; | ||||
| } | ||||
| 
 | ||||
| static inline void kmemcheck_show(struct pt_regs *regs) | ||||
| { | ||||
| } | ||||
| 
 | ||||
| static inline void kmemcheck_hide(struct pt_regs *regs) | ||||
| { | ||||
| } | ||||
| 
 | ||||
| static inline bool kmemcheck_fault(struct pt_regs *regs, | ||||
| 	unsigned long address, unsigned long error_code) | ||||
| { | ||||
| 	return false; | ||||
| } | ||||
| 
 | ||||
| static inline bool kmemcheck_trap(struct pt_regs *regs) | ||||
| { | ||||
| 	return false; | ||||
| } | ||||
| #endif /* CONFIG_KMEMCHECK */ | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -667,11 +667,6 @@ static inline bool pte_accessible(struct mm_struct *mm, pte_t a) | |||
| 	return false; | ||||
| } | ||||
| 
 | ||||
| static inline int pte_hidden(pte_t pte) | ||||
| { | ||||
| 	return pte_flags(pte) & _PAGE_HIDDEN; | ||||
| } | ||||
| 
 | ||||
| static inline int pmd_present(pmd_t pmd) | ||||
| { | ||||
| 	/*
 | ||||
|  |  | |||
|  | @ -32,7 +32,6 @@ | |||
| 
 | ||||
| #define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW1 | ||||
| #define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW1 | ||||
| #define _PAGE_BIT_HIDDEN	_PAGE_BIT_SOFTW3 /* hidden by kmemcheck */ | ||||
| #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */ | ||||
| #define _PAGE_BIT_DEVMAP	_PAGE_BIT_SOFTW4 | ||||
| 
 | ||||
|  | @ -79,18 +78,6 @@ | |||
| #define _PAGE_KNL_ERRATUM_MASK 0 | ||||
| #endif | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK | ||||
| #define _PAGE_HIDDEN	(_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN) | ||||
| #else | ||||
| #define _PAGE_HIDDEN	(_AT(pteval_t, 0)) | ||||
| #endif | ||||
| 
 | ||||
| /*
 | ||||
|  * The same hidden bit is used by kmemcheck, but since kmemcheck | ||||
|  * works on kernel pages while soft-dirty engine on user space, | ||||
|  * they do not conflict with each other. | ||||
|  */ | ||||
| 
 | ||||
| #ifdef CONFIG_MEM_SOFT_DIRTY | ||||
| #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY) | ||||
| #else | ||||
|  |  | |||
|  | @ -179,8 +179,6 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len) | |||
|  *	No 3D Now! | ||||
|  */ | ||||
| 
 | ||||
| #ifndef CONFIG_KMEMCHECK | ||||
| 
 | ||||
| #if (__GNUC__ >= 4) | ||||
| #define memcpy(t, f, n) __builtin_memcpy(t, f, n) | ||||
| #else | ||||
|  | @ -189,13 +187,6 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len) | |||
| 	 ? __constant_memcpy((t), (f), (n))	\ | ||||
| 	 : __memcpy((t), (f), (n))) | ||||
| #endif | ||||
| #else | ||||
| /*
 | ||||
|  * kmemcheck becomes very happy if we use the REP instructions unconditionally, | ||||
|  * because it means that we know both memory operands in advance. | ||||
|  */ | ||||
| #define memcpy(t, f, n) __memcpy((t), (f), (n)) | ||||
| #endif | ||||
| 
 | ||||
| #endif | ||||
| #endif /* !CONFIG_FORTIFY_SOURCE */ | ||||
|  |  | |||
|  | @ -33,7 +33,6 @@ extern void *memcpy(void *to, const void *from, size_t len); | |||
| extern void *__memcpy(void *to, const void *from, size_t len); | ||||
| 
 | ||||
| #ifndef CONFIG_FORTIFY_SOURCE | ||||
| #ifndef CONFIG_KMEMCHECK | ||||
| #if (__GNUC__ == 4 && __GNUC_MINOR__ < 3) || __GNUC__ < 4 | ||||
| #define memcpy(dst, src, len)					\ | ||||
| ({								\ | ||||
|  | @ -46,13 +45,6 @@ extern void *__memcpy(void *to, const void *from, size_t len); | |||
| 	__ret;							\ | ||||
| }) | ||||
| #endif | ||||
| #else | ||||
| /*
 | ||||
|  * kmemcheck becomes very happy if we use the REP instructions unconditionally, | ||||
|  * because it means that we know both memory operands in advance. | ||||
|  */ | ||||
| #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len)) | ||||
| #endif | ||||
| #endif /* !CONFIG_FORTIFY_SOURCE */ | ||||
| 
 | ||||
| #define __HAVE_ARCH_MEMSET | ||||
|  |  | |||
|  | @ -1,7 +1,4 @@ | |||
| #ifdef CONFIG_KMEMCHECK | ||||
| /* kmemcheck doesn't handle MMX/SSE/SSE2 instructions */ | ||||
| # include <asm-generic/xor.h> | ||||
| #elif !defined(_ASM_X86_XOR_H) | ||||
| #ifndef _ASM_X86_XOR_H | ||||
| #define _ASM_X86_XOR_H | ||||
| 
 | ||||
| /*
 | ||||
|  |  | |||
|  | @ -187,21 +187,6 @@ static void early_init_intel(struct cpuinfo_x86 *c) | |||
| 	if (c->x86 == 6 && c->x86_model < 15) | ||||
| 		clear_cpu_cap(c, X86_FEATURE_PAT); | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK | ||||
| 	/*
 | ||||
| 	 * P4s have a "fast strings" feature which causes single- | ||||
| 	 * stepping REP instructions to only generate a #DB on | ||||
| 	 * cache-line boundaries. | ||||
| 	 * | ||||
| 	 * Ingo Molnar reported a Pentium D (model 6) and a Xeon | ||||
| 	 * (model 2) with the same problem. | ||||
| 	 */ | ||||
| 	if (c->x86 == 15) | ||||
| 		if (msr_clear_bit(MSR_IA32_MISC_ENABLE, | ||||
| 				  MSR_IA32_MISC_ENABLE_FAST_STRING_BIT) > 0) | ||||
| 			pr_info("kmemcheck: Disabling fast string operations\n"); | ||||
| #endif | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * If fast string is not enabled in IA32_MISC_ENABLE for any reason, | ||||
| 	 * clear the fast string and enhanced fast string CPU capabilities. | ||||
|  |  | |||
|  | @ -57,7 +57,7 @@ | |||
| # error "Need more virtual address space for the ESPFIX hack" | ||||
| #endif | ||||
| 
 | ||||
| #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO) | ||||
| 
 | ||||
| /* This contains the *bottom* address of the espfix stack */ | ||||
| DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack); | ||||
|  |  | |||
|  | @ -42,7 +42,6 @@ | |||
| #include <linux/edac.h> | ||||
| #endif | ||||
| 
 | ||||
| #include <asm/kmemcheck.h> | ||||
| #include <asm/stacktrace.h> | ||||
| #include <asm/processor.h> | ||||
| #include <asm/debugreg.h> | ||||
|  | @ -749,10 +748,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) | |||
| 	if (!dr6 && user_mode(regs)) | ||||
| 		user_icebp = 1; | ||||
| 
 | ||||
| 	/* Catch kmemcheck conditions! */ | ||||
| 	if ((dr6 & DR_STEP) && kmemcheck_trap(regs)) | ||||
| 		goto exit; | ||||
| 
 | ||||
| 	/* Store the virtualized DR6 value */ | ||||
| 	tsk->thread.debugreg6 = dr6; | ||||
| 
 | ||||
|  |  | |||
|  | @ -29,8 +29,6 @@ obj-$(CONFIG_X86_PTDUMP)	+= debug_pagetables.o | |||
| 
 | ||||
| obj-$(CONFIG_HIGHMEM)		+= highmem_32.o | ||||
| 
 | ||||
| obj-$(CONFIG_KMEMCHECK)		+= kmemcheck/ | ||||
| 
 | ||||
| KASAN_SANITIZE_kasan_init_$(BITS).o := n | ||||
| obj-$(CONFIG_KASAN)		+= kasan_init_$(BITS).o | ||||
| 
 | ||||
|  |  | |||
|  | @ -20,7 +20,6 @@ | |||
| #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/ | ||||
| #include <asm/traps.h>			/* dotraplinkage, ...		*/ | ||||
| #include <asm/pgalloc.h>		/* pgd_*(), ...			*/ | ||||
| #include <asm/kmemcheck.h>		/* kmemcheck_*(), ...		*/ | ||||
| #include <asm/fixmap.h>			/* VSYSCALL_ADDR		*/ | ||||
| #include <asm/vsyscall.h>		/* emulate_vsyscall		*/ | ||||
| #include <asm/vm86.h>			/* struct vm86			*/ | ||||
|  | @ -1256,8 +1255,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, | |||
| 	 * Detect and handle instructions that would cause a page fault for | ||||
| 	 * both a tracked kernel page and a userspace page. | ||||
| 	 */ | ||||
| 	if (kmemcheck_active(regs)) | ||||
| 		kmemcheck_hide(regs); | ||||
| 	prefetchw(&mm->mmap_sem); | ||||
| 
 | ||||
| 	if (unlikely(kmmio_fault(regs, address))) | ||||
|  | @ -1280,9 +1277,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, | |||
| 		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { | ||||
| 			if (vmalloc_fault(address) >= 0) | ||||
| 				return; | ||||
| 
 | ||||
| 			if (kmemcheck_fault(regs, address, error_code)) | ||||
| 				return; | ||||
| 		} | ||||
| 
 | ||||
| 		/* Can handle a stale RO->RW TLB: */ | ||||
|  |  | |||
|  | @ -92,8 +92,7 @@ __ref void *alloc_low_pages(unsigned int num) | |||
| 		unsigned int order; | ||||
| 
 | ||||
| 		order = get_order((unsigned long)num << PAGE_SHIFT); | ||||
| 		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_NOTRACK | | ||||
| 						__GFP_ZERO, order); | ||||
| 		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order); | ||||
| 	} | ||||
| 
 | ||||
| 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) { | ||||
|  | @ -164,12 +163,11 @@ static int page_size_mask; | |||
| static void __init probe_page_size_mask(void) | ||||
| { | ||||
| 	/*
 | ||||
| 	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will | ||||
| 	 * use small pages. | ||||
| 	 * For pagealloc debugging, identity mapping will use small pages. | ||||
| 	 * This will simplify cpa(), which otherwise needs to support splitting | ||||
| 	 * large pages into small in interrupt context, etc. | ||||
| 	 */ | ||||
| 	if (boot_cpu_has(X86_FEATURE_PSE) && !debug_pagealloc_enabled() && !IS_ENABLED(CONFIG_KMEMCHECK)) | ||||
| 	if (boot_cpu_has(X86_FEATURE_PSE) && !debug_pagealloc_enabled()) | ||||
| 		page_size_mask |= 1 << PG_LEVEL_2M; | ||||
| 	else | ||||
| 		direct_gbpages = 0; | ||||
|  |  | |||
|  | @ -184,7 +184,7 @@ static __ref void *spp_getpage(void) | |||
| 	void *ptr; | ||||
| 
 | ||||
| 	if (after_bootmem) | ||||
| 		ptr = (void *) get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK); | ||||
| 		ptr = (void *) get_zeroed_page(GFP_ATOMIC); | ||||
| 	else | ||||
| 		ptr = alloc_bootmem_pages(PAGE_SIZE); | ||||
| 
 | ||||
|  | @ -1173,12 +1173,18 @@ void __init mem_init(void) | |||
| 
 | ||||
| 	/* clear_bss() already clear the empty_zero_page */ | ||||
| 
 | ||||
| 	register_page_bootmem_info(); | ||||
| 
 | ||||
| 	/* this will put all memory onto the freelists */ | ||||
| 	free_all_bootmem(); | ||||
| 	after_bootmem = 1; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Must be done after boot memory is put on freelist, because here we | ||||
| 	 * might set fields in deferred struct pages that have not yet been | ||||
| 	 * initialized, and free_all_bootmem() initializes all the reserved | ||||
| 	 * deferred pages for us. | ||||
| 	 */ | ||||
| 	register_page_bootmem_info(); | ||||
| 
 | ||||
| 	/* Register memory areas for /proc/kcore */ | ||||
| 	kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR, | ||||
| 			 PAGE_SIZE, KCORE_OTHER); | ||||
|  | @ -1399,7 +1405,6 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start, | |||
| 			vmemmap_verify((pte_t *)pmd, node, addr, next); | ||||
| 			continue; | ||||
| 		} | ||||
| 		pr_warn_once("vmemmap: falling back to regular page backing\n"); | ||||
| 		if (vmemmap_populate_basepages(addr, next, node)) | ||||
| 			return -ENOMEM; | ||||
| 	} | ||||
|  |  | |||
|  | @ -4,12 +4,14 @@ | |||
| #include <linux/bootmem.h> | ||||
| #include <linux/kasan.h> | ||||
| #include <linux/kdebug.h> | ||||
| #include <linux/memblock.h> | ||||
| #include <linux/mm.h> | ||||
| #include <linux/sched.h> | ||||
| #include <linux/sched/task.h> | ||||
| #include <linux/vmalloc.h> | ||||
| 
 | ||||
| #include <asm/e820/types.h> | ||||
| #include <asm/pgalloc.h> | ||||
| #include <asm/tlbflush.h> | ||||
| #include <asm/sections.h> | ||||
| #include <asm/pgtable.h> | ||||
|  | @ -18,7 +20,134 @@ extern struct range pfn_mapped[E820_MAX_ENTRIES]; | |||
| 
 | ||||
| static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE); | ||||
| 
 | ||||
| static int __init map_range(struct range *range) | ||||
| static __init void *early_alloc(size_t size, int nid) | ||||
| { | ||||
| 	return memblock_virt_alloc_try_nid_nopanic(size, size, | ||||
| 		__pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr, | ||||
| 				      unsigned long end, int nid) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 
 | ||||
| 	if (pmd_none(*pmd)) { | ||||
| 		void *p; | ||||
| 
 | ||||
| 		if (boot_cpu_has(X86_FEATURE_PSE) && | ||||
| 		    ((end - addr) == PMD_SIZE) && | ||||
| 		    IS_ALIGNED(addr, PMD_SIZE)) { | ||||
| 			p = early_alloc(PMD_SIZE, nid); | ||||
| 			if (p && pmd_set_huge(pmd, __pa(p), PAGE_KERNEL)) | ||||
| 				return; | ||||
| 			else if (p) | ||||
| 				memblock_free(__pa(p), PMD_SIZE); | ||||
| 		} | ||||
| 
 | ||||
| 		p = early_alloc(PAGE_SIZE, nid); | ||||
| 		pmd_populate_kernel(&init_mm, pmd, p); | ||||
| 	} | ||||
| 
 | ||||
| 	pte = pte_offset_kernel(pmd, addr); | ||||
| 	do { | ||||
| 		pte_t entry; | ||||
| 		void *p; | ||||
| 
 | ||||
| 		if (!pte_none(*pte)) | ||||
| 			continue; | ||||
| 
 | ||||
| 		p = early_alloc(PAGE_SIZE, nid); | ||||
| 		entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL); | ||||
| 		set_pte_at(&init_mm, addr, pte, entry); | ||||
| 	} while (pte++, addr += PAGE_SIZE, addr != end); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_populate_pud(pud_t *pud, unsigned long addr, | ||||
| 				      unsigned long end, int nid) | ||||
| { | ||||
| 	pmd_t *pmd; | ||||
| 	unsigned long next; | ||||
| 
 | ||||
| 	if (pud_none(*pud)) { | ||||
| 		void *p; | ||||
| 
 | ||||
| 		if (boot_cpu_has(X86_FEATURE_GBPAGES) && | ||||
| 		    ((end - addr) == PUD_SIZE) && | ||||
| 		    IS_ALIGNED(addr, PUD_SIZE)) { | ||||
| 			p = early_alloc(PUD_SIZE, nid); | ||||
| 			if (p && pud_set_huge(pud, __pa(p), PAGE_KERNEL)) | ||||
| 				return; | ||||
| 			else if (p) | ||||
| 				memblock_free(__pa(p), PUD_SIZE); | ||||
| 		} | ||||
| 
 | ||||
| 		p = early_alloc(PAGE_SIZE, nid); | ||||
| 		pud_populate(&init_mm, pud, p); | ||||
| 	} | ||||
| 
 | ||||
| 	pmd = pmd_offset(pud, addr); | ||||
| 	do { | ||||
| 		next = pmd_addr_end(addr, end); | ||||
| 		if (!pmd_large(*pmd)) | ||||
| 			kasan_populate_pmd(pmd, addr, next, nid); | ||||
| 	} while (pmd++, addr = next, addr != end); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_populate_p4d(p4d_t *p4d, unsigned long addr, | ||||
| 				      unsigned long end, int nid) | ||||
| { | ||||
| 	pud_t *pud; | ||||
| 	unsigned long next; | ||||
| 
 | ||||
| 	if (p4d_none(*p4d)) { | ||||
| 		void *p = early_alloc(PAGE_SIZE, nid); | ||||
| 
 | ||||
| 		p4d_populate(&init_mm, p4d, p); | ||||
| 	} | ||||
| 
 | ||||
| 	pud = pud_offset(p4d, addr); | ||||
| 	do { | ||||
| 		next = pud_addr_end(addr, end); | ||||
| 		if (!pud_large(*pud)) | ||||
| 			kasan_populate_pud(pud, addr, next, nid); | ||||
| 	} while (pud++, addr = next, addr != end); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_populate_pgd(pgd_t *pgd, unsigned long addr, | ||||
| 				      unsigned long end, int nid) | ||||
| { | ||||
| 	void *p; | ||||
| 	p4d_t *p4d; | ||||
| 	unsigned long next; | ||||
| 
 | ||||
| 	if (pgd_none(*pgd)) { | ||||
| 		p = early_alloc(PAGE_SIZE, nid); | ||||
| 		pgd_populate(&init_mm, pgd, p); | ||||
| 	} | ||||
| 
 | ||||
| 	p4d = p4d_offset(pgd, addr); | ||||
| 	do { | ||||
| 		next = p4d_addr_end(addr, end); | ||||
| 		kasan_populate_p4d(p4d, addr, next, nid); | ||||
| 	} while (p4d++, addr = next, addr != end); | ||||
| } | ||||
| 
 | ||||
| static void __init kasan_populate_shadow(unsigned long addr, unsigned long end, | ||||
| 					 int nid) | ||||
| { | ||||
| 	pgd_t *pgd; | ||||
| 	unsigned long next; | ||||
| 
 | ||||
| 	addr = addr & PAGE_MASK; | ||||
| 	end = round_up(end, PAGE_SIZE); | ||||
| 	pgd = pgd_offset_k(addr); | ||||
| 	do { | ||||
| 		next = pgd_addr_end(addr, end); | ||||
| 		kasan_populate_pgd(pgd, addr, next, nid); | ||||
| 	} while (pgd++, addr = next, addr != end); | ||||
| } | ||||
| 
 | ||||
| static void __init map_range(struct range *range) | ||||
| { | ||||
| 	unsigned long start; | ||||
| 	unsigned long end; | ||||
|  | @ -26,7 +155,7 @@ static int __init map_range(struct range *range) | |||
| 	start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); | ||||
| 	end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end)); | ||||
| 
 | ||||
| 	return vmemmap_populate(start, end, NUMA_NO_NODE); | ||||
| 	kasan_populate_shadow(start, end, early_pfn_to_nid(range->start)); | ||||
| } | ||||
| 
 | ||||
| static void __init clear_pgds(unsigned long start, | ||||
|  | @ -189,16 +318,16 @@ void __init kasan_init(void) | |||
| 		if (pfn_mapped[i].end == 0) | ||||
| 			break; | ||||
| 
 | ||||
| 		if (map_range(&pfn_mapped[i])) | ||||
| 			panic("kasan: unable to allocate shadow!"); | ||||
| 		map_range(&pfn_mapped[i]); | ||||
| 	} | ||||
| 
 | ||||
| 	kasan_populate_zero_shadow( | ||||
| 		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), | ||||
| 		kasan_mem_to_shadow((void *)__START_KERNEL_map)); | ||||
| 
 | ||||
| 	vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext), | ||||
| 			(unsigned long)kasan_mem_to_shadow(_end), | ||||
| 			NUMA_NO_NODE); | ||||
| 	kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), | ||||
| 			      (unsigned long)kasan_mem_to_shadow(_end), | ||||
| 			      early_pfn_to_nid(__pa(_stext))); | ||||
| 
 | ||||
| 	kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), | ||||
| 			(void *)KASAN_SHADOW_END); | ||||
|  |  | |||
|  | @ -1 +0,0 @@ | |||
| obj-y := error.o kmemcheck.o opcode.o pte.o selftest.o shadow.o | ||||
|  | @ -1,228 +1 @@ | |||
| // SPDX-License-Identifier: GPL-2.0
 | ||||
| #include <linux/interrupt.h> | ||||
| #include <linux/kdebug.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/kernel.h> | ||||
| #include <linux/types.h> | ||||
| #include <linux/ptrace.h> | ||||
| #include <linux/stacktrace.h> | ||||
| #include <linux/string.h> | ||||
| 
 | ||||
| #include "error.h" | ||||
| #include "shadow.h" | ||||
| 
 | ||||
| enum kmemcheck_error_type { | ||||
| 	KMEMCHECK_ERROR_INVALID_ACCESS, | ||||
| 	KMEMCHECK_ERROR_BUG, | ||||
| }; | ||||
| 
 | ||||
| #define SHADOW_COPY_SIZE (1 << CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT) | ||||
| 
 | ||||
| struct kmemcheck_error { | ||||
| 	enum kmemcheck_error_type type; | ||||
| 
 | ||||
| 	union { | ||||
| 		/* KMEMCHECK_ERROR_INVALID_ACCESS */ | ||||
| 		struct { | ||||
| 			/* Kind of access that caused the error */ | ||||
| 			enum kmemcheck_shadow state; | ||||
| 			/* Address and size of the erroneous read */ | ||||
| 			unsigned long	address; | ||||
| 			unsigned int	size; | ||||
| 		}; | ||||
| 	}; | ||||
| 
 | ||||
| 	struct pt_regs		regs; | ||||
| 	struct stack_trace	trace; | ||||
| 	unsigned long		trace_entries[32]; | ||||
| 
 | ||||
| 	/* We compress it to a char. */ | ||||
| 	unsigned char		shadow_copy[SHADOW_COPY_SIZE]; | ||||
| 	unsigned char		memory_copy[SHADOW_COPY_SIZE]; | ||||
| }; | ||||
| 
 | ||||
| /*
 | ||||
|  * Create a ring queue of errors to output. We can't call printk() directly | ||||
|  * from the kmemcheck traps, since this may call the console drivers and | ||||
|  * result in a recursive fault. | ||||
|  */ | ||||
| static struct kmemcheck_error error_fifo[CONFIG_KMEMCHECK_QUEUE_SIZE]; | ||||
| static unsigned int error_count; | ||||
| static unsigned int error_rd; | ||||
| static unsigned int error_wr; | ||||
| static unsigned int error_missed_count; | ||||
| 
 | ||||
| static struct kmemcheck_error *error_next_wr(void) | ||||
| { | ||||
| 	struct kmemcheck_error *e; | ||||
| 
 | ||||
| 	if (error_count == ARRAY_SIZE(error_fifo)) { | ||||
| 		++error_missed_count; | ||||
| 		return NULL; | ||||
| 	} | ||||
| 
 | ||||
| 	e = &error_fifo[error_wr]; | ||||
| 	if (++error_wr == ARRAY_SIZE(error_fifo)) | ||||
| 		error_wr = 0; | ||||
| 	++error_count; | ||||
| 	return e; | ||||
| } | ||||
| 
 | ||||
| static struct kmemcheck_error *error_next_rd(void) | ||||
| { | ||||
| 	struct kmemcheck_error *e; | ||||
| 
 | ||||
| 	if (error_count == 0) | ||||
| 		return NULL; | ||||
| 
 | ||||
| 	e = &error_fifo[error_rd]; | ||||
| 	if (++error_rd == ARRAY_SIZE(error_fifo)) | ||||
| 		error_rd = 0; | ||||
| 	--error_count; | ||||
| 	return e; | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_error_recall(void) | ||||
| { | ||||
| 	static const char *desc[] = { | ||||
| 		[KMEMCHECK_SHADOW_UNALLOCATED]		= "unallocated", | ||||
| 		[KMEMCHECK_SHADOW_UNINITIALIZED]	= "uninitialized", | ||||
| 		[KMEMCHECK_SHADOW_INITIALIZED]		= "initialized", | ||||
| 		[KMEMCHECK_SHADOW_FREED]		= "freed", | ||||
| 	}; | ||||
| 
 | ||||
| 	static const char short_desc[] = { | ||||
| 		[KMEMCHECK_SHADOW_UNALLOCATED]		= 'a', | ||||
| 		[KMEMCHECK_SHADOW_UNINITIALIZED]	= 'u', | ||||
| 		[KMEMCHECK_SHADOW_INITIALIZED]		= 'i', | ||||
| 		[KMEMCHECK_SHADOW_FREED]		= 'f', | ||||
| 	}; | ||||
| 
 | ||||
| 	struct kmemcheck_error *e; | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	e = error_next_rd(); | ||||
| 	if (!e) | ||||
| 		return; | ||||
| 
 | ||||
| 	switch (e->type) { | ||||
| 	case KMEMCHECK_ERROR_INVALID_ACCESS: | ||||
| 		printk(KERN_WARNING "WARNING: kmemcheck: Caught %d-bit read from %s memory (%p)\n", | ||||
| 			8 * e->size, e->state < ARRAY_SIZE(desc) ? | ||||
| 				desc[e->state] : "(invalid shadow state)", | ||||
| 			(void *) e->address); | ||||
| 
 | ||||
| 		printk(KERN_WARNING); | ||||
| 		for (i = 0; i < SHADOW_COPY_SIZE; ++i) | ||||
| 			printk(KERN_CONT "%02x", e->memory_copy[i]); | ||||
| 		printk(KERN_CONT "\n"); | ||||
| 
 | ||||
| 		printk(KERN_WARNING); | ||||
| 		for (i = 0; i < SHADOW_COPY_SIZE; ++i) { | ||||
| 			if (e->shadow_copy[i] < ARRAY_SIZE(short_desc)) | ||||
| 				printk(KERN_CONT " %c", short_desc[e->shadow_copy[i]]); | ||||
| 			else | ||||
| 				printk(KERN_CONT " ?"); | ||||
| 		} | ||||
| 		printk(KERN_CONT "\n"); | ||||
| 		printk(KERN_WARNING "%*c\n", 2 + 2 | ||||
| 			* (int) (e->address & (SHADOW_COPY_SIZE - 1)), '^'); | ||||
| 		break; | ||||
| 	case KMEMCHECK_ERROR_BUG: | ||||
| 		printk(KERN_EMERG "ERROR: kmemcheck: Fatal error\n"); | ||||
| 		break; | ||||
| 	} | ||||
| 
 | ||||
| 	__show_regs(&e->regs, 1); | ||||
| 	print_stack_trace(&e->trace, 0); | ||||
| } | ||||
| 
 | ||||
| static void do_wakeup(unsigned long data) | ||||
| { | ||||
| 	while (error_count > 0) | ||||
| 		kmemcheck_error_recall(); | ||||
| 
 | ||||
| 	if (error_missed_count > 0) { | ||||
| 		printk(KERN_WARNING "kmemcheck: Lost %d error reports because " | ||||
| 			"the queue was too small\n", error_missed_count); | ||||
| 		error_missed_count = 0; | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
| static DECLARE_TASKLET(kmemcheck_tasklet, &do_wakeup, 0); | ||||
| 
 | ||||
| /*
 | ||||
|  * Save the context of an error report. | ||||
|  */ | ||||
| void kmemcheck_error_save(enum kmemcheck_shadow state, | ||||
| 	unsigned long address, unsigned int size, struct pt_regs *regs) | ||||
| { | ||||
| 	static unsigned long prev_ip; | ||||
| 
 | ||||
| 	struct kmemcheck_error *e; | ||||
| 	void *shadow_copy; | ||||
| 	void *memory_copy; | ||||
| 
 | ||||
| 	/* Don't report several adjacent errors from the same EIP. */ | ||||
| 	if (regs->ip == prev_ip) | ||||
| 		return; | ||||
| 	prev_ip = regs->ip; | ||||
| 
 | ||||
| 	e = error_next_wr(); | ||||
| 	if (!e) | ||||
| 		return; | ||||
| 
 | ||||
| 	e->type = KMEMCHECK_ERROR_INVALID_ACCESS; | ||||
| 
 | ||||
| 	e->state = state; | ||||
| 	e->address = address; | ||||
| 	e->size = size; | ||||
| 
 | ||||
| 	/* Save regs */ | ||||
| 	memcpy(&e->regs, regs, sizeof(*regs)); | ||||
| 
 | ||||
| 	/* Save stack trace */ | ||||
| 	e->trace.nr_entries = 0; | ||||
| 	e->trace.entries = e->trace_entries; | ||||
| 	e->trace.max_entries = ARRAY_SIZE(e->trace_entries); | ||||
| 	e->trace.skip = 0; | ||||
| 	save_stack_trace_regs(regs, &e->trace); | ||||
| 
 | ||||
| 	/* Round address down to nearest 16 bytes */ | ||||
| 	shadow_copy = kmemcheck_shadow_lookup(address | ||||
| 		& ~(SHADOW_COPY_SIZE - 1)); | ||||
| 	BUG_ON(!shadow_copy); | ||||
| 
 | ||||
| 	memcpy(e->shadow_copy, shadow_copy, SHADOW_COPY_SIZE); | ||||
| 
 | ||||
| 	kmemcheck_show_addr(address); | ||||
| 	memory_copy = (void *) (address & ~(SHADOW_COPY_SIZE - 1)); | ||||
| 	memcpy(e->memory_copy, memory_copy, SHADOW_COPY_SIZE); | ||||
| 	kmemcheck_hide_addr(address); | ||||
| 
 | ||||
| 	tasklet_hi_schedule_first(&kmemcheck_tasklet); | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Save the context of a kmemcheck bug. | ||||
|  */ | ||||
| void kmemcheck_error_save_bug(struct pt_regs *regs) | ||||
| { | ||||
| 	struct kmemcheck_error *e; | ||||
| 
 | ||||
| 	e = error_next_wr(); | ||||
| 	if (!e) | ||||
| 		return; | ||||
| 
 | ||||
| 	e->type = KMEMCHECK_ERROR_BUG; | ||||
| 
 | ||||
| 	memcpy(&e->regs, regs, sizeof(*regs)); | ||||
| 
 | ||||
| 	e->trace.nr_entries = 0; | ||||
| 	e->trace.entries = e->trace_entries; | ||||
| 	e->trace.max_entries = ARRAY_SIZE(e->trace_entries); | ||||
| 	e->trace.skip = 1; | ||||
| 	save_stack_trace(&e->trace); | ||||
| 
 | ||||
| 	tasklet_hi_schedule_first(&kmemcheck_tasklet); | ||||
| } | ||||
|  |  | |||
|  | @ -1,16 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ARCH__X86__MM__KMEMCHECK__ERROR_H | ||||
| #define ARCH__X86__MM__KMEMCHECK__ERROR_H | ||||
| 
 | ||||
| #include <linux/ptrace.h> | ||||
| 
 | ||||
| #include "shadow.h" | ||||
| 
 | ||||
| void kmemcheck_error_save(enum kmemcheck_shadow state, | ||||
| 	unsigned long address, unsigned int size, struct pt_regs *regs); | ||||
| 
 | ||||
| void kmemcheck_error_save_bug(struct pt_regs *regs); | ||||
| 
 | ||||
| void kmemcheck_error_recall(void); | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -1,658 +0,0 @@ | |||
| /**
 | ||||
|  * kmemcheck - a heavyweight memory checker for the linux kernel | ||||
|  * Copyright (C) 2007, 2008  Vegard Nossum <vegardno@ifi.uio.no> | ||||
|  * (With a lot of help from Ingo Molnar and Pekka Enberg.) | ||||
|  * | ||||
|  * This program is free software; you can redistribute it and/or modify | ||||
|  * it under the terms of the GNU General Public License (version 2) as | ||||
|  * published by the Free Software Foundation. | ||||
|  */ | ||||
| 
 | ||||
| #include <linux/init.h> | ||||
| #include <linux/interrupt.h> | ||||
| #include <linux/kallsyms.h> | ||||
| #include <linux/kernel.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/mm.h> | ||||
| #include <linux/page-flags.h> | ||||
| #include <linux/percpu.h> | ||||
| #include <linux/ptrace.h> | ||||
| #include <linux/string.h> | ||||
| #include <linux/types.h> | ||||
| 
 | ||||
| #include <asm/cacheflush.h> | ||||
| #include <asm/kmemcheck.h> | ||||
| #include <asm/pgtable.h> | ||||
| #include <asm/tlbflush.h> | ||||
| 
 | ||||
| #include "error.h" | ||||
| #include "opcode.h" | ||||
| #include "pte.h" | ||||
| #include "selftest.h" | ||||
| #include "shadow.h" | ||||
| 
 | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK_DISABLED_BY_DEFAULT | ||||
| #  define KMEMCHECK_ENABLED 0 | ||||
| #endif | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK_ENABLED_BY_DEFAULT | ||||
| #  define KMEMCHECK_ENABLED 1 | ||||
| #endif | ||||
| 
 | ||||
| #ifdef CONFIG_KMEMCHECK_ONESHOT_BY_DEFAULT | ||||
| #  define KMEMCHECK_ENABLED 2 | ||||
| #endif | ||||
| 
 | ||||
| int kmemcheck_enabled = KMEMCHECK_ENABLED; | ||||
| 
 | ||||
| int __init kmemcheck_init(void) | ||||
| { | ||||
| #ifdef CONFIG_SMP | ||||
| 	/*
 | ||||
| 	 * Limit SMP to use a single CPU. We rely on the fact that this code | ||||
| 	 * runs before SMP is set up. | ||||
| 	 */ | ||||
| 	if (setup_max_cpus > 1) { | ||||
| 		printk(KERN_INFO | ||||
| 			"kmemcheck: Limiting number of CPUs to 1.\n"); | ||||
| 		setup_max_cpus = 1; | ||||
| 	} | ||||
| #endif | ||||
| 
 | ||||
| 	if (!kmemcheck_selftest()) { | ||||
| 		printk(KERN_INFO "kmemcheck: self-tests failed; disabling\n"); | ||||
| 		kmemcheck_enabled = 0; | ||||
| 		return -EINVAL; | ||||
| 	} | ||||
| 
 | ||||
| 	printk(KERN_INFO "kmemcheck: Initialized\n"); | ||||
| 	return 0; | ||||
| } | ||||
| 
 | ||||
| early_initcall(kmemcheck_init); | ||||
| 
 | ||||
| /*
 | ||||
|  * We need to parse the kmemcheck= option before any memory is allocated. | ||||
|  */ | ||||
| static int __init param_kmemcheck(char *str) | ||||
| { | ||||
| 	int val; | ||||
| 	int ret; | ||||
| 
 | ||||
| 	if (!str) | ||||
| 		return -EINVAL; | ||||
| 
 | ||||
| 	ret = kstrtoint(str, 0, &val); | ||||
| 	if (ret) | ||||
| 		return ret; | ||||
| 	kmemcheck_enabled = val; | ||||
| 	return 0; | ||||
| } | ||||
| 
 | ||||
| early_param("kmemcheck", param_kmemcheck); | ||||
| 
 | ||||
| int kmemcheck_show_addr(unsigned long address) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 
 | ||||
| 	pte = kmemcheck_pte_lookup(address); | ||||
| 	if (!pte) | ||||
| 		return 0; | ||||
| 
 | ||||
| 	set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); | ||||
| 	__flush_tlb_one(address); | ||||
| 	return 1; | ||||
| } | ||||
| 
 | ||||
| int kmemcheck_hide_addr(unsigned long address) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 
 | ||||
| 	pte = kmemcheck_pte_lookup(address); | ||||
| 	if (!pte) | ||||
| 		return 0; | ||||
| 
 | ||||
| 	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); | ||||
| 	__flush_tlb_one(address); | ||||
| 	return 1; | ||||
| } | ||||
| 
 | ||||
| struct kmemcheck_context { | ||||
| 	bool busy; | ||||
| 	int balance; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * There can be at most two memory operands to an instruction, but | ||||
| 	 * each address can cross a page boundary -- so we may need up to | ||||
| 	 * four addresses that must be hidden/revealed for each fault. | ||||
| 	 */ | ||||
| 	unsigned long addr[4]; | ||||
| 	unsigned long n_addrs; | ||||
| 	unsigned long flags; | ||||
| 
 | ||||
| 	/* Data size of the instruction that caused a fault. */ | ||||
| 	unsigned int size; | ||||
| }; | ||||
| 
 | ||||
| static DEFINE_PER_CPU(struct kmemcheck_context, kmemcheck_context); | ||||
| 
 | ||||
| bool kmemcheck_active(struct pt_regs *regs) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 
 | ||||
| 	return data->balance > 0; | ||||
| } | ||||
| 
 | ||||
| /* Save an address that needs to be shown/hidden */ | ||||
| static void kmemcheck_save_addr(unsigned long addr) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 
 | ||||
| 	BUG_ON(data->n_addrs >= ARRAY_SIZE(data->addr)); | ||||
| 	data->addr[data->n_addrs++] = addr; | ||||
| } | ||||
| 
 | ||||
| static unsigned int kmemcheck_show_all(void) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 	unsigned int i; | ||||
| 	unsigned int n; | ||||
| 
 | ||||
| 	n = 0; | ||||
| 	for (i = 0; i < data->n_addrs; ++i) | ||||
| 		n += kmemcheck_show_addr(data->addr[i]); | ||||
| 
 | ||||
| 	return n; | ||||
| } | ||||
| 
 | ||||
| static unsigned int kmemcheck_hide_all(void) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 	unsigned int i; | ||||
| 	unsigned int n; | ||||
| 
 | ||||
| 	n = 0; | ||||
| 	for (i = 0; i < data->n_addrs; ++i) | ||||
| 		n += kmemcheck_hide_addr(data->addr[i]); | ||||
| 
 | ||||
| 	return n; | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Called from the #PF handler. | ||||
|  */ | ||||
| void kmemcheck_show(struct pt_regs *regs) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 
 | ||||
| 	BUG_ON(!irqs_disabled()); | ||||
| 
 | ||||
| 	if (unlikely(data->balance != 0)) { | ||||
| 		kmemcheck_show_all(); | ||||
| 		kmemcheck_error_save_bug(regs); | ||||
| 		data->balance = 0; | ||||
| 		return; | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * None of the addresses actually belonged to kmemcheck. Note that | ||||
| 	 * this is not an error. | ||||
| 	 */ | ||||
| 	if (kmemcheck_show_all() == 0) | ||||
| 		return; | ||||
| 
 | ||||
| 	++data->balance; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * The IF needs to be cleared as well, so that the faulting | ||||
| 	 * instruction can run "uninterrupted". Otherwise, we might take | ||||
| 	 * an interrupt and start executing that before we've had a chance | ||||
| 	 * to hide the page again. | ||||
| 	 * | ||||
| 	 * NOTE: In the rare case of multiple faults, we must not override | ||||
| 	 * the original flags: | ||||
| 	 */ | ||||
| 	if (!(regs->flags & X86_EFLAGS_TF)) | ||||
| 		data->flags = regs->flags; | ||||
| 
 | ||||
| 	regs->flags |= X86_EFLAGS_TF; | ||||
| 	regs->flags &= ~X86_EFLAGS_IF; | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Called from the #DB handler. | ||||
|  */ | ||||
| void kmemcheck_hide(struct pt_regs *regs) | ||||
| { | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 	int n; | ||||
| 
 | ||||
| 	BUG_ON(!irqs_disabled()); | ||||
| 
 | ||||
| 	if (unlikely(data->balance != 1)) { | ||||
| 		kmemcheck_show_all(); | ||||
| 		kmemcheck_error_save_bug(regs); | ||||
| 		data->n_addrs = 0; | ||||
| 		data->balance = 0; | ||||
| 
 | ||||
| 		if (!(data->flags & X86_EFLAGS_TF)) | ||||
| 			regs->flags &= ~X86_EFLAGS_TF; | ||||
| 		if (data->flags & X86_EFLAGS_IF) | ||||
| 			regs->flags |= X86_EFLAGS_IF; | ||||
| 		return; | ||||
| 	} | ||||
| 
 | ||||
| 	if (kmemcheck_enabled) | ||||
| 		n = kmemcheck_hide_all(); | ||||
| 	else | ||||
| 		n = kmemcheck_show_all(); | ||||
| 
 | ||||
| 	if (n == 0) | ||||
| 		return; | ||||
| 
 | ||||
| 	--data->balance; | ||||
| 
 | ||||
| 	data->n_addrs = 0; | ||||
| 
 | ||||
| 	if (!(data->flags & X86_EFLAGS_TF)) | ||||
| 		regs->flags &= ~X86_EFLAGS_TF; | ||||
| 	if (data->flags & X86_EFLAGS_IF) | ||||
| 		regs->flags |= X86_EFLAGS_IF; | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_show_pages(struct page *p, unsigned int n) | ||||
| { | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < n; ++i) { | ||||
| 		unsigned long address; | ||||
| 		pte_t *pte; | ||||
| 		unsigned int level; | ||||
| 
 | ||||
| 		address = (unsigned long) page_address(&p[i]); | ||||
| 		pte = lookup_address(address, &level); | ||||
| 		BUG_ON(!pte); | ||||
| 		BUG_ON(level != PG_LEVEL_4K); | ||||
| 
 | ||||
| 		set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); | ||||
| 		set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_HIDDEN)); | ||||
| 		__flush_tlb_one(address); | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
| bool kmemcheck_page_is_tracked(struct page *p) | ||||
| { | ||||
| 	/* This will also check the "hidden" flag of the PTE. */ | ||||
| 	return kmemcheck_pte_lookup((unsigned long) page_address(p)); | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_hide_pages(struct page *p, unsigned int n) | ||||
| { | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < n; ++i) { | ||||
| 		unsigned long address; | ||||
| 		pte_t *pte; | ||||
| 		unsigned int level; | ||||
| 
 | ||||
| 		address = (unsigned long) page_address(&p[i]); | ||||
| 		pte = lookup_address(address, &level); | ||||
| 		BUG_ON(!pte); | ||||
| 		BUG_ON(level != PG_LEVEL_4K); | ||||
| 
 | ||||
| 		set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); | ||||
| 		set_pte(pte, __pte(pte_val(*pte) | _PAGE_HIDDEN)); | ||||
| 		__flush_tlb_one(address); | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
| /* Access may NOT cross page boundary */ | ||||
| static void kmemcheck_read_strict(struct pt_regs *regs, | ||||
| 	unsigned long addr, unsigned int size) | ||||
| { | ||||
| 	void *shadow; | ||||
| 	enum kmemcheck_shadow status; | ||||
| 
 | ||||
| 	shadow = kmemcheck_shadow_lookup(addr); | ||||
| 	if (!shadow) | ||||
| 		return; | ||||
| 
 | ||||
| 	kmemcheck_save_addr(addr); | ||||
| 	status = kmemcheck_shadow_test(shadow, size); | ||||
| 	if (status == KMEMCHECK_SHADOW_INITIALIZED) | ||||
| 		return; | ||||
| 
 | ||||
| 	if (kmemcheck_enabled) | ||||
| 		kmemcheck_error_save(status, addr, size, regs); | ||||
| 
 | ||||
| 	if (kmemcheck_enabled == 2) | ||||
| 		kmemcheck_enabled = 0; | ||||
| 
 | ||||
| 	/* Don't warn about it again. */ | ||||
| 	kmemcheck_shadow_set(shadow, size); | ||||
| } | ||||
| 
 | ||||
| bool kmemcheck_is_obj_initialized(unsigned long addr, size_t size) | ||||
| { | ||||
| 	enum kmemcheck_shadow status; | ||||
| 	void *shadow; | ||||
| 
 | ||||
| 	shadow = kmemcheck_shadow_lookup(addr); | ||||
| 	if (!shadow) | ||||
| 		return true; | ||||
| 
 | ||||
| 	status = kmemcheck_shadow_test_all(shadow, size); | ||||
| 
 | ||||
| 	return status == KMEMCHECK_SHADOW_INITIALIZED; | ||||
| } | ||||
| 
 | ||||
| /* Access may cross page boundary */ | ||||
| static void kmemcheck_read(struct pt_regs *regs, | ||||
| 	unsigned long addr, unsigned int size) | ||||
| { | ||||
| 	unsigned long page = addr & PAGE_MASK; | ||||
| 	unsigned long next_addr = addr + size - 1; | ||||
| 	unsigned long next_page = next_addr & PAGE_MASK; | ||||
| 
 | ||||
| 	if (likely(page == next_page)) { | ||||
| 		kmemcheck_read_strict(regs, addr, size); | ||||
| 		return; | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * What we do is basically to split the access across the | ||||
| 	 * two pages and handle each part separately. Yes, this means | ||||
| 	 * that we may now see reads that are 3 + 5 bytes, for | ||||
| 	 * example (and if both are uninitialized, there will be two | ||||
| 	 * reports), but it makes the code a lot simpler. | ||||
| 	 */ | ||||
| 	kmemcheck_read_strict(regs, addr, next_page - addr); | ||||
| 	kmemcheck_read_strict(regs, next_page, next_addr - next_page); | ||||
| } | ||||
| 
 | ||||
| static void kmemcheck_write_strict(struct pt_regs *regs, | ||||
| 	unsigned long addr, unsigned int size) | ||||
| { | ||||
| 	void *shadow; | ||||
| 
 | ||||
| 	shadow = kmemcheck_shadow_lookup(addr); | ||||
| 	if (!shadow) | ||||
| 		return; | ||||
| 
 | ||||
| 	kmemcheck_save_addr(addr); | ||||
| 	kmemcheck_shadow_set(shadow, size); | ||||
| } | ||||
| 
 | ||||
| static void kmemcheck_write(struct pt_regs *regs, | ||||
| 	unsigned long addr, unsigned int size) | ||||
| { | ||||
| 	unsigned long page = addr & PAGE_MASK; | ||||
| 	unsigned long next_addr = addr + size - 1; | ||||
| 	unsigned long next_page = next_addr & PAGE_MASK; | ||||
| 
 | ||||
| 	if (likely(page == next_page)) { | ||||
| 		kmemcheck_write_strict(regs, addr, size); | ||||
| 		return; | ||||
| 	} | ||||
| 
 | ||||
| 	/* See comment in kmemcheck_read(). */ | ||||
| 	kmemcheck_write_strict(regs, addr, next_page - addr); | ||||
| 	kmemcheck_write_strict(regs, next_page, next_addr - next_page); | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Copying is hard. We have two addresses, each of which may be split across | ||||
|  * a page (and each page will have different shadow addresses). | ||||
|  */ | ||||
| static void kmemcheck_copy(struct pt_regs *regs, | ||||
| 	unsigned long src_addr, unsigned long dst_addr, unsigned int size) | ||||
| { | ||||
| 	uint8_t shadow[8]; | ||||
| 	enum kmemcheck_shadow status; | ||||
| 
 | ||||
| 	unsigned long page; | ||||
| 	unsigned long next_addr; | ||||
| 	unsigned long next_page; | ||||
| 
 | ||||
| 	uint8_t *x; | ||||
| 	unsigned int i; | ||||
| 	unsigned int n; | ||||
| 
 | ||||
| 	BUG_ON(size > sizeof(shadow)); | ||||
| 
 | ||||
| 	page = src_addr & PAGE_MASK; | ||||
| 	next_addr = src_addr + size - 1; | ||||
| 	next_page = next_addr & PAGE_MASK; | ||||
| 
 | ||||
| 	if (likely(page == next_page)) { | ||||
| 		/* Same page */ | ||||
| 		x = kmemcheck_shadow_lookup(src_addr); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(src_addr); | ||||
| 			for (i = 0; i < size; ++i) | ||||
| 				shadow[i] = x[i]; | ||||
| 		} else { | ||||
| 			for (i = 0; i < size; ++i) | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 		} | ||||
| 	} else { | ||||
| 		n = next_page - src_addr; | ||||
| 		BUG_ON(n > sizeof(shadow)); | ||||
| 
 | ||||
| 		/* First page */ | ||||
| 		x = kmemcheck_shadow_lookup(src_addr); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(src_addr); | ||||
| 			for (i = 0; i < n; ++i) | ||||
| 				shadow[i] = x[i]; | ||||
| 		} else { | ||||
| 			/* Not tracked */ | ||||
| 			for (i = 0; i < n; ++i) | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 		} | ||||
| 
 | ||||
| 		/* Second page */ | ||||
| 		x = kmemcheck_shadow_lookup(next_page); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(next_page); | ||||
| 			for (i = n; i < size; ++i) | ||||
| 				shadow[i] = x[i - n]; | ||||
| 		} else { | ||||
| 			/* Not tracked */ | ||||
| 			for (i = n; i < size; ++i) | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 		} | ||||
| 	} | ||||
| 
 | ||||
| 	page = dst_addr & PAGE_MASK; | ||||
| 	next_addr = dst_addr + size - 1; | ||||
| 	next_page = next_addr & PAGE_MASK; | ||||
| 
 | ||||
| 	if (likely(page == next_page)) { | ||||
| 		/* Same page */ | ||||
| 		x = kmemcheck_shadow_lookup(dst_addr); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(dst_addr); | ||||
| 			for (i = 0; i < size; ++i) { | ||||
| 				x[i] = shadow[i]; | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 			} | ||||
| 		} | ||||
| 	} else { | ||||
| 		n = next_page - dst_addr; | ||||
| 		BUG_ON(n > sizeof(shadow)); | ||||
| 
 | ||||
| 		/* First page */ | ||||
| 		x = kmemcheck_shadow_lookup(dst_addr); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(dst_addr); | ||||
| 			for (i = 0; i < n; ++i) { | ||||
| 				x[i] = shadow[i]; | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 			} | ||||
| 		} | ||||
| 
 | ||||
| 		/* Second page */ | ||||
| 		x = kmemcheck_shadow_lookup(next_page); | ||||
| 		if (x) { | ||||
| 			kmemcheck_save_addr(next_page); | ||||
| 			for (i = n; i < size; ++i) { | ||||
| 				x[i - n] = shadow[i]; | ||||
| 				shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| 			} | ||||
| 		} | ||||
| 	} | ||||
| 
 | ||||
| 	status = kmemcheck_shadow_test(shadow, size); | ||||
| 	if (status == KMEMCHECK_SHADOW_INITIALIZED) | ||||
| 		return; | ||||
| 
 | ||||
| 	if (kmemcheck_enabled) | ||||
| 		kmemcheck_error_save(status, src_addr, size, regs); | ||||
| 
 | ||||
| 	if (kmemcheck_enabled == 2) | ||||
| 		kmemcheck_enabled = 0; | ||||
| } | ||||
| 
 | ||||
| enum kmemcheck_method { | ||||
| 	KMEMCHECK_READ, | ||||
| 	KMEMCHECK_WRITE, | ||||
| }; | ||||
| 
 | ||||
| static void kmemcheck_access(struct pt_regs *regs, | ||||
| 	unsigned long fallback_address, enum kmemcheck_method fallback_method) | ||||
| { | ||||
| 	const uint8_t *insn; | ||||
| 	const uint8_t *insn_primary; | ||||
| 	unsigned int size; | ||||
| 
 | ||||
| 	struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); | ||||
| 
 | ||||
| 	/* Recursive fault -- ouch. */ | ||||
| 	if (data->busy) { | ||||
| 		kmemcheck_show_addr(fallback_address); | ||||
| 		kmemcheck_error_save_bug(regs); | ||||
| 		return; | ||||
| 	} | ||||
| 
 | ||||
| 	data->busy = true; | ||||
| 
 | ||||
| 	insn = (const uint8_t *) regs->ip; | ||||
| 	insn_primary = kmemcheck_opcode_get_primary(insn); | ||||
| 
 | ||||
| 	kmemcheck_opcode_decode(insn, &size); | ||||
| 
 | ||||
| 	switch (insn_primary[0]) { | ||||
| #ifdef CONFIG_KMEMCHECK_BITOPS_OK | ||||
| 		/* AND, OR, XOR */ | ||||
| 		/*
 | ||||
| 		 * Unfortunately, these instructions have to be excluded from | ||||
| 		 * our regular checking since they access only some (and not | ||||
| 		 * all) bits. This clears out "bogus" bitfield-access warnings. | ||||
| 		 */ | ||||
| 	case 0x80: | ||||
| 	case 0x81: | ||||
| 	case 0x82: | ||||
| 	case 0x83: | ||||
| 		switch ((insn_primary[1] >> 3) & 7) { | ||||
| 			/* OR */ | ||||
| 		case 1: | ||||
| 			/* AND */ | ||||
| 		case 4: | ||||
| 			/* XOR */ | ||||
| 		case 6: | ||||
| 			kmemcheck_write(regs, fallback_address, size); | ||||
| 			goto out; | ||||
| 
 | ||||
| 			/* ADD */ | ||||
| 		case 0: | ||||
| 			/* ADC */ | ||||
| 		case 2: | ||||
| 			/* SBB */ | ||||
| 		case 3: | ||||
| 			/* SUB */ | ||||
| 		case 5: | ||||
| 			/* CMP */ | ||||
| 		case 7: | ||||
| 			break; | ||||
| 		} | ||||
| 		break; | ||||
| #endif | ||||
| 
 | ||||
| 		/* MOVS, MOVSB, MOVSW, MOVSD */ | ||||
| 	case 0xa4: | ||||
| 	case 0xa5: | ||||
| 		/*
 | ||||
| 		 * These instructions are special because they take two | ||||
| 		 * addresses, but we only get one page fault. | ||||
| 		 */ | ||||
| 		kmemcheck_copy(regs, regs->si, regs->di, size); | ||||
| 		goto out; | ||||
| 
 | ||||
| 		/* CMPS, CMPSB, CMPSW, CMPSD */ | ||||
| 	case 0xa6: | ||||
| 	case 0xa7: | ||||
| 		kmemcheck_read(regs, regs->si, size); | ||||
| 		kmemcheck_read(regs, regs->di, size); | ||||
| 		goto out; | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * If the opcode isn't special in any way, we use the data from the | ||||
| 	 * page fault handler to determine the address and type of memory | ||||
| 	 * access. | ||||
| 	 */ | ||||
| 	switch (fallback_method) { | ||||
| 	case KMEMCHECK_READ: | ||||
| 		kmemcheck_read(regs, fallback_address, size); | ||||
| 		goto out; | ||||
| 	case KMEMCHECK_WRITE: | ||||
| 		kmemcheck_write(regs, fallback_address, size); | ||||
| 		goto out; | ||||
| 	} | ||||
| 
 | ||||
| out: | ||||
| 	data->busy = false; | ||||
| } | ||||
| 
 | ||||
| bool kmemcheck_fault(struct pt_regs *regs, unsigned long address, | ||||
| 	unsigned long error_code) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * XXX: Is it safe to assume that memory accesses from virtual 86 | ||||
| 	 * mode or non-kernel code segments will _never_ access kernel | ||||
| 	 * memory (e.g. tracked pages)? For now, we need this to avoid | ||||
| 	 * invoking kmemcheck for PnP BIOS calls. | ||||
| 	 */ | ||||
| 	if (regs->flags & X86_VM_MASK) | ||||
| 		return false; | ||||
| 	if (regs->cs != __KERNEL_CS) | ||||
| 		return false; | ||||
| 
 | ||||
| 	pte = kmemcheck_pte_lookup(address); | ||||
| 	if (!pte) | ||||
| 		return false; | ||||
| 
 | ||||
| 	WARN_ON_ONCE(in_nmi()); | ||||
| 
 | ||||
| 	if (error_code & 2) | ||||
| 		kmemcheck_access(regs, address, KMEMCHECK_WRITE); | ||||
| 	else | ||||
| 		kmemcheck_access(regs, address, KMEMCHECK_READ); | ||||
| 
 | ||||
| 	kmemcheck_show(regs); | ||||
| 	return true; | ||||
| } | ||||
| 
 | ||||
| bool kmemcheck_trap(struct pt_regs *regs) | ||||
| { | ||||
| 	if (!kmemcheck_active(regs)) | ||||
| 		return false; | ||||
| 
 | ||||
| 	/* We're done. */ | ||||
| 	kmemcheck_hide(regs); | ||||
| 	return true; | ||||
| } | ||||
|  | @ -1,107 +1 @@ | |||
| // SPDX-License-Identifier: GPL-2.0
 | ||||
| #include <linux/types.h> | ||||
| 
 | ||||
| #include "opcode.h" | ||||
| 
 | ||||
| static bool opcode_is_prefix(uint8_t b) | ||||
| { | ||||
| 	return | ||||
| 		/* Group 1 */ | ||||
| 		b == 0xf0 || b == 0xf2 || b == 0xf3 | ||||
| 		/* Group 2 */ | ||||
| 		|| b == 0x2e || b == 0x36 || b == 0x3e || b == 0x26 | ||||
| 		|| b == 0x64 || b == 0x65 | ||||
| 		/* Group 3 */ | ||||
| 		|| b == 0x66 | ||||
| 		/* Group 4 */ | ||||
| 		|| b == 0x67; | ||||
| } | ||||
| 
 | ||||
| #ifdef CONFIG_X86_64 | ||||
| static bool opcode_is_rex_prefix(uint8_t b) | ||||
| { | ||||
| 	return (b & 0xf0) == 0x40; | ||||
| } | ||||
| #else | ||||
| static bool opcode_is_rex_prefix(uint8_t b) | ||||
| { | ||||
| 	return false; | ||||
| } | ||||
| #endif | ||||
| 
 | ||||
| #define REX_W (1 << 3) | ||||
| 
 | ||||
| /*
 | ||||
|  * This is a VERY crude opcode decoder. We only need to find the size of the | ||||
|  * load/store that caused our #PF and this should work for all the opcodes | ||||
|  * that we care about. Moreover, the ones who invented this instruction set | ||||
|  * should be shot. | ||||
|  */ | ||||
| void kmemcheck_opcode_decode(const uint8_t *op, unsigned int *size) | ||||
| { | ||||
| 	/* Default operand size */ | ||||
| 	int operand_size_override = 4; | ||||
| 
 | ||||
| 	/* prefixes */ | ||||
| 	for (; opcode_is_prefix(*op); ++op) { | ||||
| 		if (*op == 0x66) | ||||
| 			operand_size_override = 2; | ||||
| 	} | ||||
| 
 | ||||
| 	/* REX prefix */ | ||||
| 	if (opcode_is_rex_prefix(*op)) { | ||||
| 		uint8_t rex = *op; | ||||
| 
 | ||||
| 		++op; | ||||
| 		if (rex & REX_W) { | ||||
| 			switch (*op) { | ||||
| 			case 0x63: | ||||
| 				*size = 4; | ||||
| 				return; | ||||
| 			case 0x0f: | ||||
| 				++op; | ||||
| 
 | ||||
| 				switch (*op) { | ||||
| 				case 0xb6: | ||||
| 				case 0xbe: | ||||
| 					*size = 1; | ||||
| 					return; | ||||
| 				case 0xb7: | ||||
| 				case 0xbf: | ||||
| 					*size = 2; | ||||
| 					return; | ||||
| 				} | ||||
| 
 | ||||
| 				break; | ||||
| 			} | ||||
| 
 | ||||
| 			*size = 8; | ||||
| 			return; | ||||
| 		} | ||||
| 	} | ||||
| 
 | ||||
| 	/* escape opcode */ | ||||
| 	if (*op == 0x0f) { | ||||
| 		++op; | ||||
| 
 | ||||
| 		/*
 | ||||
| 		 * This is move with zero-extend and sign-extend, respectively; | ||||
| 		 * we don't have to think about 0xb6/0xbe, because this is | ||||
| 		 * already handled in the conditional below. | ||||
| 		 */ | ||||
| 		if (*op == 0xb7 || *op == 0xbf) | ||||
| 			operand_size_override = 2; | ||||
| 	} | ||||
| 
 | ||||
| 	*size = (*op & 1) ? operand_size_override : 1; | ||||
| } | ||||
| 
 | ||||
| const uint8_t *kmemcheck_opcode_get_primary(const uint8_t *op) | ||||
| { | ||||
| 	/* skip prefixes */ | ||||
| 	while (opcode_is_prefix(*op)) | ||||
| 		++op; | ||||
| 	if (opcode_is_rex_prefix(*op)) | ||||
| 		++op; | ||||
| 	return op; | ||||
| } | ||||
|  |  | |||
|  | @ -1,10 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ARCH__X86__MM__KMEMCHECK__OPCODE_H | ||||
| #define ARCH__X86__MM__KMEMCHECK__OPCODE_H | ||||
| 
 | ||||
| #include <linux/types.h> | ||||
| 
 | ||||
| void kmemcheck_opcode_decode(const uint8_t *op, unsigned int *size); | ||||
| const uint8_t *kmemcheck_opcode_get_primary(const uint8_t *op); | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -1,23 +1 @@ | |||
| // SPDX-License-Identifier: GPL-2.0
 | ||||
| #include <linux/mm.h> | ||||
| 
 | ||||
| #include <asm/pgtable.h> | ||||
| 
 | ||||
| #include "pte.h" | ||||
| 
 | ||||
| pte_t *kmemcheck_pte_lookup(unsigned long address) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 	unsigned int level; | ||||
| 
 | ||||
| 	pte = lookup_address(address, &level); | ||||
| 	if (!pte) | ||||
| 		return NULL; | ||||
| 	if (level != PG_LEVEL_4K) | ||||
| 		return NULL; | ||||
| 	if (!pte_hidden(*pte)) | ||||
| 		return NULL; | ||||
| 
 | ||||
| 	return pte; | ||||
| } | ||||
| 
 | ||||
|  |  | |||
|  | @ -1,11 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ARCH__X86__MM__KMEMCHECK__PTE_H | ||||
| #define ARCH__X86__MM__KMEMCHECK__PTE_H | ||||
| 
 | ||||
| #include <linux/mm.h> | ||||
| 
 | ||||
| #include <asm/pgtable.h> | ||||
| 
 | ||||
| pte_t *kmemcheck_pte_lookup(unsigned long address); | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -1,71 +1 @@ | |||
| // SPDX-License-Identifier: GPL-2.0
 | ||||
| #include <linux/bug.h> | ||||
| #include <linux/kernel.h> | ||||
| 
 | ||||
| #include "opcode.h" | ||||
| #include "selftest.h" | ||||
| 
 | ||||
| struct selftest_opcode { | ||||
| 	unsigned int expected_size; | ||||
| 	const uint8_t *insn; | ||||
| 	const char *desc; | ||||
| }; | ||||
| 
 | ||||
| static const struct selftest_opcode selftest_opcodes[] = { | ||||
| 	/* REP MOVS */ | ||||
| 	{1, "\xf3\xa4", 		"rep movsb <mem8>, <mem8>"}, | ||||
| 	{4, "\xf3\xa5",			"rep movsl <mem32>, <mem32>"}, | ||||
| 
 | ||||
| 	/* MOVZX / MOVZXD */ | ||||
| 	{1, "\x66\x0f\xb6\x51\xf8",	"movzwq <mem8>, <reg16>"}, | ||||
| 	{1, "\x0f\xb6\x51\xf8",		"movzwq <mem8>, <reg32>"}, | ||||
| 
 | ||||
| 	/* MOVSX / MOVSXD */ | ||||
| 	{1, "\x66\x0f\xbe\x51\xf8",	"movswq <mem8>, <reg16>"}, | ||||
| 	{1, "\x0f\xbe\x51\xf8",		"movswq <mem8>, <reg32>"}, | ||||
| 
 | ||||
| #ifdef CONFIG_X86_64 | ||||
| 	/* MOVZX / MOVZXD */ | ||||
| 	{1, "\x49\x0f\xb6\x51\xf8",	"movzbq <mem8>, <reg64>"}, | ||||
| 	{2, "\x49\x0f\xb7\x51\xf8",	"movzbq <mem16>, <reg64>"}, | ||||
| 
 | ||||
| 	/* MOVSX / MOVSXD */ | ||||
| 	{1, "\x49\x0f\xbe\x51\xf8",	"movsbq <mem8>, <reg64>"}, | ||||
| 	{2, "\x49\x0f\xbf\x51\xf8",	"movsbq <mem16>, <reg64>"}, | ||||
| 	{4, "\x49\x63\x51\xf8",		"movslq <mem32>, <reg64>"}, | ||||
| #endif | ||||
| }; | ||||
| 
 | ||||
| static bool selftest_opcode_one(const struct selftest_opcode *op) | ||||
| { | ||||
| 	unsigned size; | ||||
| 
 | ||||
| 	kmemcheck_opcode_decode(op->insn, &size); | ||||
| 
 | ||||
| 	if (size == op->expected_size) | ||||
| 		return true; | ||||
| 
 | ||||
| 	printk(KERN_WARNING "kmemcheck: opcode %s: expected size %d, got %d\n", | ||||
| 		op->desc, op->expected_size, size); | ||||
| 	return false; | ||||
| } | ||||
| 
 | ||||
| static bool selftest_opcodes_all(void) | ||||
| { | ||||
| 	bool pass = true; | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < ARRAY_SIZE(selftest_opcodes); ++i) | ||||
| 		pass = pass && selftest_opcode_one(&selftest_opcodes[i]); | ||||
| 
 | ||||
| 	return pass; | ||||
| } | ||||
| 
 | ||||
| bool kmemcheck_selftest(void) | ||||
| { | ||||
| 	bool pass = true; | ||||
| 
 | ||||
| 	pass = pass && selftest_opcodes_all(); | ||||
| 
 | ||||
| 	return pass; | ||||
| } | ||||
|  |  | |||
|  | @ -1,7 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ARCH_X86_MM_KMEMCHECK_SELFTEST_H | ||||
| #define ARCH_X86_MM_KMEMCHECK_SELFTEST_H | ||||
| 
 | ||||
| bool kmemcheck_selftest(void); | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -1,173 +0,0 @@ | |||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/export.h> | ||||
| #include <linux/mm.h> | ||||
| 
 | ||||
| #include <asm/page.h> | ||||
| #include <asm/pgtable.h> | ||||
| 
 | ||||
| #include "pte.h" | ||||
| #include "shadow.h" | ||||
| 
 | ||||
| /*
 | ||||
|  * Return the shadow address for the given address. Returns NULL if the | ||||
|  * address is not tracked. | ||||
|  * | ||||
|  * We need to be extremely careful not to follow any invalid pointers, | ||||
|  * because this function can be called for *any* possible address. | ||||
|  */ | ||||
| void *kmemcheck_shadow_lookup(unsigned long address) | ||||
| { | ||||
| 	pte_t *pte; | ||||
| 	struct page *page; | ||||
| 
 | ||||
| 	if (!virt_addr_valid(address)) | ||||
| 		return NULL; | ||||
| 
 | ||||
| 	pte = kmemcheck_pte_lookup(address); | ||||
| 	if (!pte) | ||||
| 		return NULL; | ||||
| 
 | ||||
| 	page = virt_to_page(address); | ||||
| 	if (!page->shadow) | ||||
| 		return NULL; | ||||
| 	return page->shadow + (address & (PAGE_SIZE - 1)); | ||||
| } | ||||
| 
 | ||||
| static void mark_shadow(void *address, unsigned int n, | ||||
| 	enum kmemcheck_shadow status) | ||||
| { | ||||
| 	unsigned long addr = (unsigned long) address; | ||||
| 	unsigned long last_addr = addr + n - 1; | ||||
| 	unsigned long page = addr & PAGE_MASK; | ||||
| 	unsigned long last_page = last_addr & PAGE_MASK; | ||||
| 	unsigned int first_n; | ||||
| 	void *shadow; | ||||
| 
 | ||||
| 	/* If the memory range crosses a page boundary, stop there. */ | ||||
| 	if (page == last_page) | ||||
| 		first_n = n; | ||||
| 	else | ||||
| 		first_n = page + PAGE_SIZE - addr; | ||||
| 
 | ||||
| 	shadow = kmemcheck_shadow_lookup(addr); | ||||
| 	if (shadow) | ||||
| 		memset(shadow, status, first_n); | ||||
| 
 | ||||
| 	addr += first_n; | ||||
| 	n -= first_n; | ||||
| 
 | ||||
| 	/* Do full-page memset()s. */ | ||||
| 	while (n >= PAGE_SIZE) { | ||||
| 		shadow = kmemcheck_shadow_lookup(addr); | ||||
| 		if (shadow) | ||||
| 			memset(shadow, status, PAGE_SIZE); | ||||
| 
 | ||||
| 		addr += PAGE_SIZE; | ||||
| 		n -= PAGE_SIZE; | ||||
| 	} | ||||
| 
 | ||||
| 	/* Do the remaining page, if any. */ | ||||
| 	if (n > 0) { | ||||
| 		shadow = kmemcheck_shadow_lookup(addr); | ||||
| 		if (shadow) | ||||
| 			memset(shadow, status, n); | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_mark_unallocated(void *address, unsigned int n) | ||||
| { | ||||
| 	mark_shadow(address, n, KMEMCHECK_SHADOW_UNALLOCATED); | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_mark_uninitialized(void *address, unsigned int n) | ||||
| { | ||||
| 	mark_shadow(address, n, KMEMCHECK_SHADOW_UNINITIALIZED); | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Fill the shadow memory of the given address such that the memory at that | ||||
|  * address is marked as being initialized. | ||||
|  */ | ||||
| void kmemcheck_mark_initialized(void *address, unsigned int n) | ||||
| { | ||||
| 	mark_shadow(address, n, KMEMCHECK_SHADOW_INITIALIZED); | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(kmemcheck_mark_initialized); | ||||
| 
 | ||||
| void kmemcheck_mark_freed(void *address, unsigned int n) | ||||
| { | ||||
| 	mark_shadow(address, n, KMEMCHECK_SHADOW_FREED); | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_mark_unallocated_pages(struct page *p, unsigned int n) | ||||
| { | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < n; ++i) | ||||
| 		kmemcheck_mark_unallocated(page_address(&p[i]), PAGE_SIZE); | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_mark_uninitialized_pages(struct page *p, unsigned int n) | ||||
| { | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < n; ++i) | ||||
| 		kmemcheck_mark_uninitialized(page_address(&p[i]), PAGE_SIZE); | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_mark_initialized_pages(struct page *p, unsigned int n) | ||||
| { | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	for (i = 0; i < n; ++i) | ||||
| 		kmemcheck_mark_initialized(page_address(&p[i]), PAGE_SIZE); | ||||
| } | ||||
| 
 | ||||
| enum kmemcheck_shadow kmemcheck_shadow_test(void *shadow, unsigned int size) | ||||
| { | ||||
| #ifdef CONFIG_KMEMCHECK_PARTIAL_OK | ||||
| 	uint8_t *x; | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	x = shadow; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Make sure _some_ bytes are initialized. Gcc frequently generates | ||||
| 	 * code to access neighboring bytes. | ||||
| 	 */ | ||||
| 	for (i = 0; i < size; ++i) { | ||||
| 		if (x[i] == KMEMCHECK_SHADOW_INITIALIZED) | ||||
| 			return x[i]; | ||||
| 	} | ||||
| 
 | ||||
| 	return x[0]; | ||||
| #else | ||||
| 	return kmemcheck_shadow_test_all(shadow, size); | ||||
| #endif | ||||
| } | ||||
| 
 | ||||
| enum kmemcheck_shadow kmemcheck_shadow_test_all(void *shadow, unsigned int size) | ||||
| { | ||||
| 	uint8_t *x; | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	x = shadow; | ||||
| 
 | ||||
| 	/* All bytes must be initialized. */ | ||||
| 	for (i = 0; i < size; ++i) { | ||||
| 		if (x[i] != KMEMCHECK_SHADOW_INITIALIZED) | ||||
| 			return x[i]; | ||||
| 	} | ||||
| 
 | ||||
| 	return x[0]; | ||||
| } | ||||
| 
 | ||||
| void kmemcheck_shadow_set(void *shadow, unsigned int size) | ||||
| { | ||||
| 	uint8_t *x; | ||||
| 	unsigned int i; | ||||
| 
 | ||||
| 	x = shadow; | ||||
| 	for (i = 0; i < size; ++i) | ||||
| 		x[i] = KMEMCHECK_SHADOW_INITIALIZED; | ||||
| } | ||||
|  | @ -1,19 +1 @@ | |||
| /* SPDX-License-Identifier: GPL-2.0 */ | ||||
| #ifndef ARCH__X86__MM__KMEMCHECK__SHADOW_H | ||||
| #define ARCH__X86__MM__KMEMCHECK__SHADOW_H | ||||
| 
 | ||||
| enum kmemcheck_shadow { | ||||
| 	KMEMCHECK_SHADOW_UNALLOCATED, | ||||
| 	KMEMCHECK_SHADOW_UNINITIALIZED, | ||||
| 	KMEMCHECK_SHADOW_INITIALIZED, | ||||
| 	KMEMCHECK_SHADOW_FREED, | ||||
| }; | ||||
| 
 | ||||
| void *kmemcheck_shadow_lookup(unsigned long address); | ||||
| 
 | ||||
| enum kmemcheck_shadow kmemcheck_shadow_test(void *shadow, unsigned int size); | ||||
| enum kmemcheck_shadow kmemcheck_shadow_test_all(void *shadow, | ||||
| 						unsigned int size); | ||||
| void kmemcheck_shadow_set(void *shadow, unsigned int size); | ||||
| 
 | ||||
| #endif | ||||
|  |  | |||
|  | @ -753,7 +753,7 @@ static int split_large_page(struct cpa_data *cpa, pte_t *kpte, | |||
| 
 | ||||
| 	if (!debug_pagealloc_enabled()) | ||||
| 		spin_unlock(&cpa_lock); | ||||
| 	base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0); | ||||
| 	base = alloc_pages(GFP_KERNEL, 0); | ||||
| 	if (!debug_pagealloc_enabled()) | ||||
| 		spin_lock(&cpa_lock); | ||||
| 	if (!base) | ||||
|  | @ -904,7 +904,7 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) | |||
| 
 | ||||
| static int alloc_pte_page(pmd_t *pmd) | ||||
| { | ||||
| 	pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK); | ||||
| 	pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); | ||||
| 	if (!pte) | ||||
| 		return -1; | ||||
| 
 | ||||
|  | @ -914,7 +914,7 @@ static int alloc_pte_page(pmd_t *pmd) | |||
| 
 | ||||
| static int alloc_pmd_page(pud_t *pud) | ||||
| { | ||||
| 	pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK); | ||||
| 	pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); | ||||
| 	if (!pmd) | ||||
| 		return -1; | ||||
| 
 | ||||
|  | @ -1120,7 +1120,7 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr) | |||
| 	pgd_entry = cpa->pgd + pgd_index(addr); | ||||
| 
 | ||||
| 	if (pgd_none(*pgd_entry)) { | ||||
| 		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK); | ||||
| 		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL); | ||||
| 		if (!p4d) | ||||
| 			return -1; | ||||
| 
 | ||||
|  | @ -1132,7 +1132,7 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr) | |||
| 	 */ | ||||
| 	p4d = p4d_offset(pgd_entry, addr); | ||||
| 	if (p4d_none(*p4d)) { | ||||
| 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK); | ||||
| 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL); | ||||
| 		if (!pud) | ||||
| 			return -1; | ||||
| 
 | ||||
|  |  | |||
|  | @ -7,7 +7,7 @@ | |||
| #include <asm/fixmap.h> | ||||
| #include <asm/mtrr.h> | ||||
| 
 | ||||
| #define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | __GFP_ZERO) | ||||
| #define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO) | ||||
| 
 | ||||
| #ifdef CONFIG_HIGHPTE | ||||
| #define PGALLOC_USER_GFP __GFP_HIGHMEM | ||||
|  |  | |||
|  | @ -207,7 +207,7 @@ int __init efi_alloc_page_tables(void) | |||
| 	if (efi_enabled(EFI_OLD_MEMMAP)) | ||||
| 		return 0; | ||||
| 
 | ||||
| 	gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO; | ||||
| 	gfp_mask = GFP_KERNEL | __GFP_ZERO; | ||||
| 	efi_pgd = (pgd_t *)__get_free_page(gfp_mask); | ||||
| 	if (!efi_pgd) | ||||
| 		return -ENOMEM; | ||||
|  |  | |||
|  | @ -2047,7 +2047,7 @@ static int blk_mq_init_hctx(struct request_queue *q, | |||
| 	 * Allocate space for all possible cpus to avoid allocation at | ||||
| 	 * runtime | ||||
| 	 */ | ||||
| 	hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *), | ||||
| 	hctx->ctxs = kmalloc_array_node(nr_cpu_ids, sizeof(void *), | ||||
| 					GFP_KERNEL, node); | ||||
| 	if (!hctx->ctxs) | ||||
| 		goto unregister_cpu_notifier; | ||||
|  |  | |||
|  | @ -122,12 +122,7 @@ calibrate_xor_blocks(void) | |||
| 		goto out; | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Note: Since the memory is not actually used for _anything_ but to | ||||
| 	 * test the XOR speed, we don't really want kmemcheck to warn about | ||||
| 	 * reading uninitialized bytes here. | ||||
| 	 */ | ||||
| 	b1 = (void *) __get_free_pages(GFP_KERNEL | __GFP_NOTRACK, 2); | ||||
| 	b1 = (void *) __get_free_pages(GFP_KERNEL, 2); | ||||
| 	if (!b1) { | ||||
| 		printk(KERN_WARNING "xor: Yikes!  No memory available.\n"); | ||||
| 		return -ENOMEM; | ||||
|  |  | |||
|  | @ -20,6 +20,7 @@ | |||
| #include <linux/radix-tree.h> | ||||
| #include <linux/fs.h> | ||||
| #include <linux/slab.h> | ||||
| #include <linux/backing-dev.h> | ||||
| #ifdef CONFIG_BLK_DEV_RAM_DAX | ||||
| #include <linux/pfn_t.h> | ||||
| #include <linux/dax.h> | ||||
|  | @ -448,6 +449,7 @@ static struct brd_device *brd_alloc(int i) | |||
| 	disk->flags		= GENHD_FL_EXT_DEVT; | ||||
| 	sprintf(disk->disk_name, "ram%d", i); | ||||
| 	set_capacity(disk, rd_size * 2); | ||||
| 	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO; | ||||
| 
 | ||||
| #ifdef CONFIG_BLK_DEV_RAM_DAX | ||||
| 	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue); | ||||
|  |  | |||
|  | @ -23,14 +23,14 @@ static const char * const backends[] = { | |||
| #if IS_ENABLED(CONFIG_CRYPTO_LZ4) | ||||
| 	"lz4", | ||||
| #endif | ||||
| #if IS_ENABLED(CONFIG_CRYPTO_DEFLATE) | ||||
| 	"deflate", | ||||
| #endif | ||||
| #if IS_ENABLED(CONFIG_CRYPTO_LZ4HC) | ||||
| 	"lz4hc", | ||||
| #endif | ||||
| #if IS_ENABLED(CONFIG_CRYPTO_842) | ||||
| 	"842", | ||||
| #endif | ||||
| #if IS_ENABLED(CONFIG_CRYPTO_ZSTD) | ||||
| 	"zstd", | ||||
| #endif | ||||
| 	NULL | ||||
| }; | ||||
|  |  | |||
|  | @ -122,14 +122,6 @@ static inline bool is_partial_io(struct bio_vec *bvec) | |||
| } | ||||
| #endif | ||||
| 
 | ||||
| static void zram_revalidate_disk(struct zram *zram) | ||||
| { | ||||
| 	revalidate_disk(zram->disk); | ||||
| 	/* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */ | ||||
| 	zram->disk->queue->backing_dev_info->capabilities |= | ||||
| 		BDI_CAP_STABLE_WRITES; | ||||
| } | ||||
| 
 | ||||
| /*
 | ||||
|  * Check if request is within bounds and aligned on zram logical blocks. | ||||
|  */ | ||||
|  | @ -436,7 +428,7 @@ static void put_entry_bdev(struct zram *zram, unsigned long entry) | |||
| 	WARN_ON_ONCE(!was_set); | ||||
| } | ||||
| 
 | ||||
| void zram_page_end_io(struct bio *bio) | ||||
| static void zram_page_end_io(struct bio *bio) | ||||
| { | ||||
| 	struct page *page = bio->bi_io_vec[0].bv_page; | ||||
| 
 | ||||
|  | @ -1373,7 +1365,8 @@ static ssize_t disksize_store(struct device *dev, | |||
| 	zram->comp = comp; | ||||
| 	zram->disksize = disksize; | ||||
| 	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT); | ||||
| 	zram_revalidate_disk(zram); | ||||
| 
 | ||||
| 	revalidate_disk(zram->disk); | ||||
| 	up_write(&zram->init_lock); | ||||
| 
 | ||||
| 	return len; | ||||
|  | @ -1420,7 +1413,7 @@ static ssize_t reset_store(struct device *dev, | |||
| 	/* Make sure all the pending I/O are finished */ | ||||
| 	fsync_bdev(bdev); | ||||
| 	zram_reset_device(zram); | ||||
| 	zram_revalidate_disk(zram); | ||||
| 	revalidate_disk(zram->disk); | ||||
| 	bdput(bdev); | ||||
| 
 | ||||
| 	mutex_lock(&bdev->bd_mutex); | ||||
|  | @ -1539,6 +1532,7 @@ static int zram_add(void) | |||
| 	/* zram devices sort of resembles non-rotational disks */ | ||||
| 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue); | ||||
| 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * To ensure that we always get PAGE_SIZE aligned | ||||
| 	 * and n*PAGE_SIZED sized I/O requests. | ||||
|  | @ -1563,6 +1557,8 @@ static int zram_add(void) | |||
| 	if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE) | ||||
| 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX); | ||||
| 
 | ||||
| 	zram->disk->queue->backing_dev_info->capabilities |= | ||||
| 			(BDI_CAP_STABLE_WRITES | BDI_CAP_SYNCHRONOUS_IO); | ||||
| 	add_disk(zram->disk); | ||||
| 
 | ||||
| 	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj, | ||||
|  |  | |||
|  | @ -259,7 +259,6 @@ | |||
| #include <linux/cryptohash.h> | ||||
| #include <linux/fips.h> | ||||
| #include <linux/ptrace.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/workqueue.h> | ||||
| #include <linux/irq.h> | ||||
| #include <linux/syscalls.h> | ||||
|  |  | |||
|  | @ -553,8 +553,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, | |||
| 				 * invalidated it. Free it and try again | ||||
| 				 */ | ||||
| 				release_pages(e->user_pages, | ||||
| 					      e->robj->tbo.ttm->num_pages, | ||||
| 					      false); | ||||
| 					      e->robj->tbo.ttm->num_pages); | ||||
| 				kvfree(e->user_pages); | ||||
| 				e->user_pages = NULL; | ||||
| 			} | ||||
|  | @ -691,8 +690,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, | |||
| 				continue; | ||||
| 
 | ||||
| 			release_pages(e->user_pages, | ||||
| 				      e->robj->tbo.ttm->num_pages, | ||||
| 				      false); | ||||
| 				      e->robj->tbo.ttm->num_pages); | ||||
| 			kvfree(e->user_pages); | ||||
| 		} | ||||
| 	} | ||||
|  |  | |||
|  | @ -347,7 +347,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, | |||
| 	return 0; | ||||
| 
 | ||||
| free_pages: | ||||
| 	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages, false); | ||||
| 	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages); | ||||
| 
 | ||||
| unlock_mmap_sem: | ||||
| 	up_read(¤t->mm->mmap_sem); | ||||
|  |  | |||
|  | @ -659,7 +659,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) | |||
| 	return 0; | ||||
| 
 | ||||
| release_pages: | ||||
| 	release_pages(pages, pinned, 0); | ||||
| 	release_pages(pages, pinned); | ||||
| 	return r; | ||||
| } | ||||
| 
 | ||||
|  |  | |||
|  | @ -779,7 +779,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages( | |||
| 	up_read(&mm->mmap_sem); | ||||
| 
 | ||||
| 	if (ret < 0) { | ||||
| 		release_pages(pvec, pinned, 0); | ||||
| 		release_pages(pvec, pinned); | ||||
| 		kvfree(pvec); | ||||
| 		return ERR_PTR(ret); | ||||
| 	} | ||||
|  | @ -852,7 +852,7 @@ static int etnaviv_gem_userptr_get_pages(struct etnaviv_gem_object *etnaviv_obj) | |||
| 		} | ||||
| 	} | ||||
| 
 | ||||
| 	release_pages(pvec, pinned, 0); | ||||
| 	release_pages(pvec, pinned); | ||||
| 	kvfree(pvec); | ||||
| 
 | ||||
| 	work = kmalloc(sizeof(*work), GFP_KERNEL); | ||||
|  | @ -886,7 +886,7 @@ static void etnaviv_gem_userptr_release(struct etnaviv_gem_object *etnaviv_obj) | |||
| 	if (etnaviv_obj->pages) { | ||||
| 		int npages = etnaviv_obj->base.size >> PAGE_SHIFT; | ||||
| 
 | ||||
| 		release_pages(etnaviv_obj->pages, npages, 0); | ||||
| 		release_pages(etnaviv_obj->pages, npages); | ||||
| 		kvfree(etnaviv_obj->pages); | ||||
| 	} | ||||
| 	put_task_struct(etnaviv_obj->userptr.task); | ||||
|  |  | |||
|  | @ -1859,7 +1859,7 @@ static void i915_address_space_init(struct i915_address_space *vm, | |||
| 	INIT_LIST_HEAD(&vm->unbound_list); | ||||
| 
 | ||||
| 	list_add_tail(&vm->global_link, &dev_priv->vm_list); | ||||
| 	pagevec_init(&vm->free_pages, false); | ||||
| 	pagevec_init(&vm->free_pages); | ||||
| } | ||||
| 
 | ||||
| static void i915_address_space_fini(struct i915_address_space *vm) | ||||
|  |  | |||
|  | @ -554,7 +554,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) | |||
| 	} | ||||
| 	mutex_unlock(&obj->mm.lock); | ||||
| 
 | ||||
| 	release_pages(pvec, pinned, 0); | ||||
| 	release_pages(pvec, pinned); | ||||
| 	kvfree(pvec); | ||||
| 
 | ||||
| 	i915_gem_object_put(obj); | ||||
|  | @ -668,7 +668,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) | |||
| 		__i915_gem_userptr_set_active(obj, true); | ||||
| 
 | ||||
| 	if (IS_ERR(pages)) | ||||
| 		release_pages(pvec, pinned, 0); | ||||
| 		release_pages(pvec, pinned); | ||||
| 	kvfree(pvec); | ||||
| 
 | ||||
| 	return pages; | ||||
|  |  | |||
|  | @ -597,7 +597,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) | |||
| 	kfree(ttm->sg); | ||||
| 
 | ||||
| release_pages: | ||||
| 	release_pages(ttm->pages, pinned, 0); | ||||
| 	release_pages(ttm->pages, pinned); | ||||
| 	return r; | ||||
| } | ||||
| 
 | ||||
|  |  | |||
|  | @ -1667,8 +1667,9 @@ int qib_setup_eagerbufs(struct qib_ctxtdata *rcd) | |||
| 	} | ||||
| 	if (!rcd->rcvegrbuf_phys) { | ||||
| 		rcd->rcvegrbuf_phys = | ||||
| 			kmalloc_node(chunk * sizeof(rcd->rcvegrbuf_phys[0]), | ||||
| 				GFP_KERNEL, rcd->node_id); | ||||
| 			kmalloc_array_node(chunk, | ||||
| 					   sizeof(rcd->rcvegrbuf_phys[0]), | ||||
| 					   GFP_KERNEL, rcd->node_id); | ||||
| 		if (!rcd->rcvegrbuf_phys) | ||||
| 			goto bail_rcvegrbuf; | ||||
| 	} | ||||
|  |  | |||
|  | @ -238,7 +238,7 @@ int rvt_driver_qp_init(struct rvt_dev_info *rdi) | |||
| 	rdi->qp_dev->qp_table_size = rdi->dparms.qp_table_size; | ||||
| 	rdi->qp_dev->qp_table_bits = ilog2(rdi->dparms.qp_table_size); | ||||
| 	rdi->qp_dev->qp_table = | ||||
| 		kmalloc_node(rdi->qp_dev->qp_table_size * | ||||
| 		kmalloc_array_node(rdi->qp_dev->qp_table_size, | ||||
| 			     sizeof(*rdi->qp_dev->qp_table), | ||||
| 			     GFP_KERNEL, rdi->dparms.node); | ||||
| 	if (!rdi->qp_dev->qp_table) | ||||
|  |  | |||
|  | @ -15,7 +15,6 @@ | |||
| #include <linux/errno.h> | ||||
| #include <linux/err.h> | ||||
| #include <linux/kernel.h> | ||||
| #include <linux/kmemcheck.h> | ||||
| #include <linux/ctype.h> | ||||
| #include <linux/delay.h> | ||||
| #include <linux/idr.h> | ||||
|  | @ -904,7 +903,6 @@ struct c2port_device *c2port_device_register(char *name, | |||
| 		return ERR_PTR(-EINVAL); | ||||
| 
 | ||||
| 	c2dev = kmalloc(sizeof(struct c2port_device), GFP_KERNEL); | ||||
| 	kmemcheck_annotate_bitfield(c2dev, flags); | ||||
| 	if (unlikely(!c2dev)) | ||||
| 		return ERR_PTR(-ENOMEM); | ||||
| 
 | ||||
|  |  | |||
|  | @ -517,7 +517,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num) | |||
| 
 | ||||
| 
 | ||||
| 		rc = ena_alloc_rx_page(rx_ring, rx_info, | ||||
| 				       __GFP_COLD | GFP_ATOMIC | __GFP_COMP); | ||||
| 				       GFP_ATOMIC | __GFP_COMP); | ||||
| 		if (unlikely(rc < 0)) { | ||||
| 			netif_warn(rx_ring->adapter, rx_err, rx_ring->netdev, | ||||
| 				   "failed to alloc buffer for rx queue %d\n", | ||||
|  |  | |||
|  | @ -295,7 +295,7 @@ static int xgbe_alloc_pages(struct xgbe_prv_data *pdata, | |||
| 	order = alloc_order; | ||||
| 
 | ||||
| 	/* Try to obtain pages, decreasing order if necessary */ | ||||
| 	gfp = GFP_ATOMIC | __GFP_COLD | __GFP_COMP | __GFP_NOWARN; | ||||
| 	gfp = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN; | ||||
| 	while (order >= 0) { | ||||
| 		pages = alloc_pages_node(node, gfp, order); | ||||
| 		if (pages) | ||||
|  |  | |||
|  | @ -304,8 +304,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self) | |||
| 		buff->flags = 0U; | ||||
| 		buff->len = AQ_CFG_RX_FRAME_MAX; | ||||
| 
 | ||||
| 		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | | ||||
| 					 __GFP_COMP, pages_order); | ||||
| 		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COMP, pages_order); | ||||
| 		if (!buff->page) { | ||||
| 			err = -ENOMEM; | ||||
| 			goto err_exit; | ||||
|  |  | |||
|  | @ -198,7 +198,7 @@ static inline void | |||
| 	struct sk_buff *skb; | ||||
| 	struct octeon_skb_page_info *skb_pg_info; | ||||
| 
 | ||||
| 	page = alloc_page(GFP_ATOMIC | __GFP_COLD); | ||||
| 	page = alloc_page(GFP_ATOMIC); | ||||
| 	if (unlikely(!page)) | ||||
| 		return NULL; | ||||
| 
 | ||||
|  |  | |||
|  | @ -193,7 +193,7 @@ static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv) | |||
| 
 | ||||
| 			if (mlx4_en_prepare_rx_desc(priv, ring, | ||||
| 						    ring->actual_size, | ||||
| 						    GFP_KERNEL | __GFP_COLD)) { | ||||
| 						    GFP_KERNEL)) { | ||||
| 				if (ring->actual_size < MLX4_EN_MIN_RX_SIZE) { | ||||
| 					en_err(priv, "Failed to allocate enough rx buffers\n"); | ||||
| 					return -ENOMEM; | ||||
|  | @ -551,8 +551,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv, | |||
| 	do { | ||||
| 		if (mlx4_en_prepare_rx_desc(priv, ring, | ||||
| 					    ring->prod & ring->size_mask, | ||||
| 					    GFP_ATOMIC | __GFP_COLD | | ||||
| 					    __GFP_MEMALLOC)) | ||||
| 					    GFP_ATOMIC | __GFP_MEMALLOC)) | ||||
| 			break; | ||||
| 		ring->prod++; | ||||
| 	} while (likely(--missing)); | ||||
|  |  | |||
|  | @ -1185,7 +1185,7 @@ static void *nfp_net_rx_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr) | |||
| 	} else { | ||||
| 		struct page *page; | ||||
| 
 | ||||
| 		page = alloc_page(GFP_KERNEL | __GFP_COLD); | ||||
| 		page = alloc_page(GFP_KERNEL); | ||||
| 		frag = page ? page_address(page) : NULL; | ||||
| 	} | ||||
| 	if (!frag) { | ||||
|  |  | |||
|  | @ -1092,8 +1092,7 @@ static int ql_get_next_chunk(struct ql_adapter *qdev, struct rx_ring *rx_ring, | |||
| { | ||||
| 	if (!rx_ring->pg_chunk.page) { | ||||
| 		u64 map; | ||||
| 		rx_ring->pg_chunk.page = alloc_pages(__GFP_COLD | __GFP_COMP | | ||||
| 						GFP_ATOMIC, | ||||
| 		rx_ring->pg_chunk.page = alloc_pages(__GFP_COMP | GFP_ATOMIC, | ||||
| 						qdev->lbq_buf_order); | ||||
| 		if (unlikely(!rx_ring->pg_chunk.page)) { | ||||
| 			netif_err(qdev, drv, qdev->ndev, | ||||
|  |  | |||
|  | @ -163,7 +163,7 @@ static int ef4_init_rx_buffers(struct ef4_rx_queue *rx_queue, bool atomic) | |||
| 	do { | ||||
| 		page = ef4_reuse_page(rx_queue); | ||||
| 		if (page == NULL) { | ||||
| 			page = alloc_pages(__GFP_COLD | __GFP_COMP | | ||||
| 			page = alloc_pages(__GFP_COMP | | ||||
| 					   (atomic ? GFP_ATOMIC : GFP_KERNEL), | ||||
| 					   efx->rx_buffer_order); | ||||
| 			if (unlikely(page == NULL)) | ||||
|  |  | |||
|  | @ -163,7 +163,7 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic) | |||
| 	do { | ||||
| 		page = efx_reuse_page(rx_queue); | ||||
| 		if (page == NULL) { | ||||
| 			page = alloc_pages(__GFP_COLD | __GFP_COMP | | ||||
| 			page = alloc_pages(__GFP_COMP | | ||||
| 					   (atomic ? GFP_ATOMIC : GFP_KERNEL), | ||||
| 					   efx->rx_buffer_order); | ||||
| 			if (unlikely(page == NULL)) | ||||
|  |  | |||
|  | @ -335,7 +335,7 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata, | |||
| 	dma_addr_t pages_dma; | ||||
| 
 | ||||
| 	/* Try to obtain pages, decreasing order if necessary */ | ||||
| 	gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN; | ||||
| 	gfp |= __GFP_COMP | __GFP_NOWARN; | ||||
| 	while (order >= 0) { | ||||
| 		pages = alloc_pages(gfp, order); | ||||
| 		if (pages) | ||||
|  |  | |||
|  | @ -906,7 +906,7 @@ static int netcp_allocate_rx_buf(struct netcp_intf *netcp, int fdq) | |||
| 		sw_data[0] = (u32)bufptr; | ||||
| 	} else { | ||||
| 		/* Allocate a secondary receive queue entry */ | ||||
| 		page = alloc_page(GFP_ATOMIC | GFP_DMA | __GFP_COLD); | ||||
| 		page = alloc_page(GFP_ATOMIC | GFP_DMA); | ||||
| 		if (unlikely(!page)) { | ||||
| 			dev_warn_ratelimited(netcp->ndev_dev, "Secondary page alloc failed\n"); | ||||
| 			goto fail; | ||||
|  |  | |||
|  | @ -1030,7 +1030,6 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, | |||
| 	int err; | ||||
| 	bool oom; | ||||
| 
 | ||||
| 	gfp |= __GFP_COLD; | ||||
| 	do { | ||||
| 		if (vi->mergeable_rx_bufs) | ||||
| 			err = add_recvbuf_mergeable(vi, rq, gfp); | ||||
|  |  | |||
|  | @ -23,6 +23,7 @@ | |||
| #include <linux/ndctl.h> | ||||
| #include <linux/fs.h> | ||||
| #include <linux/nd.h> | ||||
| #include <linux/backing-dev.h> | ||||
| #include "btt.h" | ||||
| #include "nd.h" | ||||
| 
 | ||||
|  | @ -1402,6 +1403,8 @@ static int btt_blk_init(struct btt *btt) | |||
| 	btt->btt_disk->private_data = btt; | ||||
| 	btt->btt_disk->queue = btt->btt_queue; | ||||
| 	btt->btt_disk->flags = GENHD_FL_EXT_DEVT; | ||||
| 	btt->btt_disk->queue->backing_dev_info->capabilities |= | ||||
| 			BDI_CAP_SYNCHRONOUS_IO; | ||||
| 
 | ||||
| 	blk_queue_make_request(btt->btt_queue, btt_make_request); | ||||
| 	blk_queue_logical_block_size(btt->btt_queue, btt->sector_size); | ||||
|  |  | |||
|  | @ -31,6 +31,7 @@ | |||
| #include <linux/uio.h> | ||||
| #include <linux/dax.h> | ||||
| #include <linux/nd.h> | ||||
| #include <linux/backing-dev.h> | ||||
| #include "pmem.h" | ||||
| #include "pfn.h" | ||||
| #include "nd.h" | ||||
|  | @ -394,6 +395,7 @@ static int pmem_attach_disk(struct device *dev, | |||
| 	disk->fops		= &pmem_fops; | ||||
| 	disk->queue		= q; | ||||
| 	disk->flags		= GENHD_FL_EXT_DEVT; | ||||
| 	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO; | ||||
| 	nvdimm_namespace_disk_name(ndns, disk->disk_name); | ||||
| 	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset) | ||||
| 			/ 512); | ||||
|  |  | |||
|  | @ -1152,7 +1152,7 @@ static int mdc_read_page_remote(void *data, struct page *page0) | |||
| 	} | ||||
| 
 | ||||
| 	for (npages = 1; npages < max_pages; npages++) { | ||||
| 		page = page_cache_alloc_cold(inode->i_mapping); | ||||
| 		page = page_cache_alloc(inode->i_mapping); | ||||
| 		if (!page) | ||||
| 			break; | ||||
| 		page_pool[npages] = page; | ||||
|  |  | |||
|  | @ -308,7 +308,7 @@ static void afs_kill_pages(struct afs_vnode *vnode, bool error, | |||
| 	_enter("{%x:%u},%lx-%lx", | ||||
| 	       vnode->fid.vid, vnode->fid.vnode, first, last); | ||||
| 
 | ||||
| 	pagevec_init(&pv, 0); | ||||
| 	pagevec_init(&pv); | ||||
| 
 | ||||
| 	do { | ||||
| 		_debug("kill %lx-%lx", first, last); | ||||
|  | @ -497,20 +497,13 @@ static int afs_writepages_region(struct address_space *mapping, | |||
| 	_enter(",,%lx,%lx,", index, end); | ||||
| 
 | ||||
| 	do { | ||||
| 		n = find_get_pages_tag(mapping, &index, PAGECACHE_TAG_DIRTY, | ||||
| 				       1, &page); | ||||
| 		n = find_get_pages_range_tag(mapping, &index, end, | ||||
| 					PAGECACHE_TAG_DIRTY, 1, &page); | ||||
| 		if (!n) | ||||
| 			break; | ||||
| 
 | ||||
| 		_debug("wback %lx", page->index); | ||||
| 
 | ||||
| 		if (page->index > end) { | ||||
| 			*_next = index; | ||||
| 			put_page(page); | ||||
| 			_leave(" = 0 [%lx]", *_next); | ||||
| 			return 0; | ||||
| 		} | ||||
| 
 | ||||
| 		/* at this point we hold neither mapping->tree_lock nor lock on
 | ||||
| 		 * the page itself: the page may be truncated or invalidated | ||||
| 		 * (changing page->mapping to NULL), or even swizzled back from | ||||
|  | @ -609,7 +602,7 @@ void afs_pages_written_back(struct afs_vnode *vnode, struct afs_call *call) | |||
| 
 | ||||
| 	ASSERT(wb != NULL); | ||||
| 
 | ||||
| 	pagevec_init(&pv, 0); | ||||
| 	pagevec_init(&pv); | ||||
| 
 | ||||
| 	do { | ||||
| 		_debug("done %lx-%lx", first, last); | ||||
|  |  | |||
|  | @ -3797,7 +3797,7 @@ int btree_write_cache_pages(struct address_space *mapping, | |||
| 	int scanned = 0; | ||||
| 	int tag; | ||||
| 
 | ||||
| 	pagevec_init(&pvec, 0); | ||||
| 	pagevec_init(&pvec); | ||||
| 	if (wbc->range_cyclic) { | ||||
| 		index = mapping->writeback_index; /* Start from prev offset */ | ||||
| 		end = -1; | ||||
|  | @ -3814,8 +3814,8 @@ int btree_write_cache_pages(struct address_space *mapping, | |||
| 	if (wbc->sync_mode == WB_SYNC_ALL) | ||||
| 		tag_pages_for_writeback(mapping, index, end); | ||||
| 	while (!done && !nr_to_write_done && (index <= end) && | ||||
| 	       (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, | ||||
| 			min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { | ||||
| 	       (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, | ||||
| 			tag))) { | ||||
| 		unsigned i; | ||||
| 
 | ||||
| 		scanned = 1; | ||||
|  | @ -3825,11 +3825,6 @@ int btree_write_cache_pages(struct address_space *mapping, | |||
| 			if (!PagePrivate(page)) | ||||
| 				continue; | ||||
| 
 | ||||
| 			if (!wbc->range_cyclic && page->index > end) { | ||||
| 				done = 1; | ||||
| 				break; | ||||
| 			} | ||||
| 
 | ||||
| 			spin_lock(&mapping->private_lock); | ||||
| 			if (!PagePrivate(page)) { | ||||
| 				spin_unlock(&mapping->private_lock); | ||||
|  | @ -3941,7 +3936,7 @@ static int extent_write_cache_pages(struct address_space *mapping, | |||
| 	if (!igrab(inode)) | ||||
| 		return 0; | ||||
| 
 | ||||
| 	pagevec_init(&pvec, 0); | ||||
| 	pagevec_init(&pvec); | ||||
| 	if (wbc->range_cyclic) { | ||||
| 		index = mapping->writeback_index; /* Start from prev offset */ | ||||
| 		end = -1; | ||||
|  | @ -3961,8 +3956,8 @@ static int extent_write_cache_pages(struct address_space *mapping, | |||
| 		tag_pages_for_writeback(mapping, index, end); | ||||
| 	done_index = index; | ||||
| 	while (!done && !nr_to_write_done && (index <= end) && | ||||
| 	       (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, | ||||
| 			min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { | ||||
| 			(nr_pages = pagevec_lookup_range_tag(&pvec, mapping, | ||||
| 						&index, end, tag))) { | ||||
| 		unsigned i; | ||||
| 
 | ||||
| 		scanned = 1; | ||||
|  | @ -3987,12 +3982,6 @@ static int extent_write_cache_pages(struct address_space *mapping, | |||
| 				continue; | ||||
| 			} | ||||
| 
 | ||||
| 			if (!wbc->range_cyclic && page->index > end) { | ||||
| 				done = 1; | ||||
| 				unlock_page(page); | ||||
| 				continue; | ||||
| 			} | ||||
| 
 | ||||
| 			if (wbc->sync_mode != WB_SYNC_NONE) { | ||||
| 				if (PageWriteback(page)) | ||||
| 					flush_fn(data); | ||||
|  |  | |||
|  | @ -1592,7 +1592,7 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len) | |||
| 	struct buffer_head *head; | ||||
| 
 | ||||
| 	end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits); | ||||
| 	pagevec_init(&pvec, 0); | ||||
| 	pagevec_init(&pvec); | ||||
| 	while (pagevec_lookup_range(&pvec, bd_mapping, &index, end)) { | ||||
| 		count = pagevec_count(&pvec); | ||||
| 		for (i = 0; i < count; i++) { | ||||
|  | @ -3514,7 +3514,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, | |||
| 	if (length <= 0) | ||||
| 		return -ENOENT; | ||||
| 
 | ||||
| 	pagevec_init(&pvec, 0); | ||||
| 	pagevec_init(&pvec); | ||||
| 
 | ||||
| 	do { | ||||
| 		unsigned nr_pages, i; | ||||
|  |  | |||
Some files were not shown because too many files have changed in this diff Show more
		Loading…
	
		Reference in a new issue
	
	 Linus Torvalds
						Linus Torvalds