mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	
	
		
			1483 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 
							 | 
						cd4453c5e9 | 
							
							
								
								tracing: Silence warning when chunk allocation fails in trace_pid_write
							
							
							
							
							
							
							
							Syzkaller trigger a fault injection warning:
WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0
Modules linked in:
CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0
Tainted: [U]=USER
Hardware name: Google Compute Engine/Google Compute Engine
RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294
Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff
RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283
RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000
RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef
R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0
FS:  00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464
 register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline]
 register_pid_events kernel/trace/trace_events.c:2354 [inline]
 event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425
 vfs_write+0x24c/0x1150 fs/read_write.c:677
 ksys_write+0x12b/0x250 fs/read_write.c:731
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
We can reproduce the warning by following the steps below:
1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid
   and register sched_switch tracepoint.
2. echo ' ' >> set_event_pid, and perform fault injection during chunk
   allocation of trace_pid_list_alloc. Let pid_list with no pid and
assign to tr->filtered_pids.
3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to
   tr->filtered_pids.
4. echo 9 >> set_event_pid, will trigger the double register
   sched_switch tracepoint warning.
The reason is that syzkaller injects a fault into the chunk allocation
in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which
may trigger double register of the same tracepoint. This only occurs
when the system is about to crash, but to suppress this warning, let's
add failure handling logic to trace_pid_list_set.
Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						3d62ab32df | 
							
							
								
								tracing: Fix tracing_marker may trigger page fault during preempt_disable
							
							
							
							
							
							
							
							Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will
be trigger:
        if (RB_WARN_ON(cpu_buffer,
                       !local_read(&cpu_buffer->committing)))
An example can illustrate this issue:
process flow						CPU
---------------------------------------------------------------------
tracing_mark_raw_write():				cpu:0
   ...
   ring_buffer_lock_reserve():				cpu:0
      ...
      cpu = raw_smp_processor_id()			cpu:0
      cpu_buffer = buffer->buffers[cpu]			cpu:0
      ...
   ...
   __copy_from_user_inatomic():				cpu:0
      ...
      # page fault
      do_mem_abort():					cpu:0
         ...
         # Call schedule
         schedule()					cpu:0
	 ...
   # the task schedule to cpu1
   __buffer_unlock_commit():				cpu:1
      ...
      ring_buffer_unlock_commit():			cpu:1
	 ...
	 cpu = raw_smp_processor_id()			cpu:1
	 cpu_buffer = buffer->buffers[cpu]		cpu:1
As shown above, the process will acquire cpuid twice and the return values
are not the same.
To fix this problem using copy_from_user_nofault instead of
__copy_from_user_inatomic, as the former performs 'access_ok' before
copying.
Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						4013aef2ce | 
							
							
								
								ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
							
							
							
							
							
							
							
							When calling ftrace_dump_one() concurrently with reading trace_pipe,
a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race
condition.
The issue occurs because:
CPU0 (ftrace_dump)                              CPU1 (reader)
echo z > /proc/sysrq-trigger
!trace_empty(&iter)
trace_iterator_reset(&iter) <- len = size = 0
                                                cat /sys/kernel/tracing/trace_pipe
trace_find_next_entry_inc(&iter)
  __find_next_entry
    ring_buffer_empty_cpu <- all empty
  return NULL
trace_printk_seq(&iter.seq)
  WARN_ON_ONCE(s->seq.len >= s->seq.size)
In the context between trace_empty() and trace_find_next_entry_inc()
during ftrace_dump, the ring buffer data was consumed by other readers.
This caused trace_find_next_entry_inc to return NULL, failing to populate
`iter.seq`. At this point, due to the prior trace_iterator_reset, both
`iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal,
the WARN_ON_ONCE condition is triggered.
Move the trace_printk_seq() into the if block that checks to make sure the
return value of trace_find_next_entry_inc() is non-NULL in
ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before
subsequent operations.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@elte.hu>
Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						6a909ea83f | 
							
							
								
								tracing: Limit access to parser->buffer when trace_get_user failed
							
							
							
							
							
							
							
							When the length of the string written to set_ftrace_filter exceeds
FTRACE_BUFF_MAX, the following KASAN alarm will be triggered:
BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0
Read of size 1 at addr ffff0000d00bd5ba by task ash/165
CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty
Hardware name: linux,dummy-virt (DT)
Call trace:
 show_stack+0x34/0x50 (C)
 dump_stack_lvl+0xa0/0x158
 print_address_description.constprop.0+0x88/0x398
 print_report+0xb0/0x280
 kasan_report+0xa4/0xf0
 __asan_report_load1_noabort+0x20/0x30
 strsep+0x18c/0x1b0
 ftrace_process_regex.isra.0+0x100/0x2d8
 ftrace_regex_release+0x484/0x618
 __fput+0x364/0xa58
 ____fput+0x28/0x40
 task_work_run+0x154/0x278
 do_notify_resume+0x1f0/0x220
 el0_svc+0xec/0xf0
 el0t_64_sync_handler+0xa0/0xe8
 el0t_64_sync+0x1ac/0x1b0
The reason is that trace_get_user will fail when processing a string
longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0.
Then an OOB access will be triggered in ftrace_regex_release->
ftrace_process_regex->strsep->strpbrk. We can solve this problem by
limiting access to parser->buffer when trace_get_user failed.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						3c4a063b1f | 
							
							
								
								tracing cleanups for v6.17:
							
							
							
							
							
							
							
							- Remove unneeded goto out statements
 
   Over time, the logic was restructured but left a "goto out" where the
   out label simply did a "return ret;". Instead of jumping to this out
   label, simply return immediately and remove the out label.
 
 - Add guard(ring_buffer_nest)
 
   Some calls to the tracing ring buffer can happen when the ring buffer is
   already being written to at the same context (for example, a
   trace_printk() in between a ring_buffer_lock_reserve() and a
   ring_buffer_unlock_commit()).
 
   In order to not trigger the recursion detection, these functions use
   ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard() for
   these functions so that their use cases can be simplified and not need to
   use goto for the release.
 
 - Clean up the tracing code with guard() and __free() logic
 
   There were several locations that were prime candidates for using guard()
   and __free() helpers. Switch them over to use them.
 
 - Fix output of function argument traces for unsigned int values
 
   The function tracer with "func-args" option set will record up to 6 argument
   registers and then use BTF to format them for human consumption when the
   trace file is read. There's several arguments that are "unsigned long" and
   even "unsigned int" that are either and address or a mask. It is easier to
   understand if they were printed using hexadecimal instead of decimal.
   The old method just printed all non-pointer values as signed integers,
   which made it even worse for unsigned integers.
 
   For instance, instead of:
 
     __local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs
 
   Show:
 
    __local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaI9pOBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkhoAQD+moa8M+WWUS9T9utwREytolfyNKEO
 dW0dPVzquX3L6gEAnc7zNla4QZJsdU1bHyhpDTn/Zhu11aMrzoxcBcdrSwI=
 =x79z
 -----END PGP SIGNATURE-----
Merge tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
 - Remove unneeded goto out statements
   Over time, the logic was restructured but left a "goto out" where the
   out label simply did a "return ret;". Instead of jumping to this out
   label, simply return immediately and remove the out label.
 - Add guard(ring_buffer_nest)
   Some calls to the tracing ring buffer can happen when the ring buffer
   is already being written to at the same context (for example, a
   trace_printk() in between a ring_buffer_lock_reserve() and a
   ring_buffer_unlock_commit()).
   In order to not trigger the recursion detection, these functions use
   ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard()
   for these functions so that their use cases can be simplified and not
   need to use goto for the release.
 - Clean up the tracing code with guard() and __free() logic
   There were several locations that were prime candidates for using
   guard() and __free() helpers. Switch them over to use them.
 - Fix output of function argument traces for unsigned int values
   The function tracer with "func-args" option set will record up to 6
   argument registers and then use BTF to format them for human
   consumption when the trace file is read. There are several arguments
   that are "unsigned long" and even "unsigned int" that are either and
   address or a mask. It is easier to understand if they were printed
   using hexadecimal instead of decimal. The old method just printed all
   non-pointer values as signed integers, which made it even worse for
   unsigned integers.
   For instance, instead of:
     __local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs
   show:
     __local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs"
* tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have unsigned int function args displayed as hexadecimal
  ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)
  tracing: Use __free(kfree) in trace.c to remove gotos
  tracing: Add guard() around locks and mutexes in trace.c
  tracing: Add guard(ring_buffer_nest)
  tracing: Remove unneeded goto out logic
							
						 | 
						
							||
| 
							 | 
						8877fcb70f | 
							
							
								
								This is a small set of changes for modules, primarily to extend module users
							
							
							
							
							
							
							
							to use the module data structures in combination with the already no-op stub module functions, even when support for modules is disabled in the kernel configuration. This change follows the kernel's coding style for conditional compilation and allows kunit code to drop all CONFIG_MODULES ifdefs, which is also part of the changes. This should allow others part of the kernel to do the same cleanup. Note that this had a conflict with sysctl changes [1] but should be fixed now as I rebased on top. The remaining changes include a fix for module name length handling which could potentially lead to the removal of an incorrect module, and various cleanups. The module name fix and related cleanup has been in linux-next since Thursday (July 31) while the rest of the changes for a bit more than 3 weeks. Note that this currently has conflicts in next with kbuild's tree [2]. Link: https://lore.kernel.org/all/20250714175916.774e6d79@canb.auug.org.au/ [1] Link: https://lore.kernel.org/all/20250801132941.6815d93d@canb.auug.org.au/ [2] -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE73Ua4R8Pc+G5xjxTQJ6jxB8ZUfsFAmiPQgkACgkQQJ6jxB8Z UfuPTA//XrRguJFBhQh6cUWqVleTNQJuhjiPsOSO5S52aVET4wsrnRNeM2eM5oqw 0+6ELvhIJINQ1LjpOP8D67d8P5Ds1/qM1pbQIkQsoKiEj6E7Q4dXH5N0uyf/BzO3 HaosLG9cpqcomlSorYEiYoPjqy9EChQzsi+YAYWAB+fW6bvU/AdUHTRH88m3ppBJ Y22BTTPOKKyj5/QgfY+kwH8TTnrzCzY8aoOqW7uimLI5h4c9dFQ2PigRJnoMfDG1 11w5VshOTzZJvNFrUk5GVSirwlxdJDbW6dKfG0DD5+eNWK5dfIEc+/EcuhaGoPvO Euwv8VQubdxHTAG6kzHI0MtxAVQUM1gyz8zHiu18eW++GTtnTFs6m8E6H9AC176G nDkUh3qSxJN2HHgxtS9VUExEEZpYqtWeB9Zts8K3oSWvTaQenHWpVHPADkxzS4JU Jvkjq8SiKo+RqHxaOKfyf1RfOtYe5tjMCLrP7zX39d1+cwGxuc6mip/omY9HFDgn op132fYdt24JSHoioJDzRz9mTfvj3nICEmgX4D4WDQx5lP27CUcLugPnBNHPp0fu 5hL+ajy8M8nq4zm/42Y+F7VS74TIA6mSnJKs9dMCknUWueD6HrDEU9xHi1YMpUMZ cBUSpU+P94dCIScwEzkp926vDnHyxCHLbpF1Jsq5qNNdj7AelHk= =4bGB -----END PGP SIGNATURE----- Merge tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull module updates from Daniel Gomez: "This is a small set of changes for modules, primarily to extend module users to use the module data structures in combination with the already no-op stub module functions, even when support for modules is disabled in the kernel configuration. This change follows the kernel's coding style for conditional compilation and allows kunit code to drop all CONFIG_MODULES ifdefs, which is also part of the changes. This should allow others part of the kernel to do the same cleanup. The remaining changes include a fix for module name length handling which could potentially lead to the removal of an incorrect module, and various cleanups" * tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN module: Restore the moduleparam prefix length check module: Remove unnecessary +1 from last_unloaded_module::name size module: Prevent silent truncation of module name in delete_module(2) kunit: test: Drop CONFIG_MODULE ifdeffery module: make structure definitions always visible module: move 'struct module_use' to internal.h  | 
						
							||
| 
							 | 
						12d5189615 | 
							
							
								
								tracing: Use __free(kfree) in trace.c to remove gotos
							
							
							
							
							
							
							
							There's a couple of locations that have goto out in trace.c for the only purpose of freeing a variable that was allocated. These can be replaced with __free(kfree). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203858.040892777@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						debe57fbe1 | 
							
							
								
								tracing: Add guard() around locks and mutexes in trace.c
							
							
							
							
							
							
							
							There's several locations in trace.c that can be simplified by using guards around raw_spin_lock_irqsave, mutexes and preempt disabling. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.879085376@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						788fa4b47c | 
							
							
								
								tracing: Add guard(ring_buffer_nest)
							
							
							
							
							
							
							
							Some calls to the tracing ring buffer can happen when the ring buffer is already being written to by the same context (for example, a trace_printk() in between a ring_buffer_lock_reserve() and a ring_buffer_unlock_commit()). In order to not trigger the recursion detection, these functions use ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard() for these functions so that their use cases can be simplified and not need to use goto for the release. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.710501021@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						c89504a703 | 
							
							
								
								tracing: Remove unneeded goto out logic
							
							
							
							
							
							
							
							Several places in the trace.c file there's a goto out where the out is simply a return. There's no reason to jump to the out label if it's not doing any more logic but simply returning from the function. Replace the goto outs with a return and remove the out labels. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.538726745@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						d6f38c1239 | 
							
							
								
								tracing changes for 6.17
							
							
							
							
							
							
							
							- Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing
 
   When tracefs was first introduced back in 2014, the directory
   /sys/kernel/tracing was added and is the designated location to mount
   tracefs. To keep backward compatibility, tracefs was auto-mounted in
   /sys/kernel/debug/tracing as well.
 
   All distros now mount tracefs on /sys/kernel/tracing. Having it seen in two
   different locations has lead to various issues and inconsistencies.
 
   The VFS folks have to also maintain debugfs_create_automount() for this
   single user.
 
   It's been over 10 years. Tooling and scripts should start replacing the
   debugfs location with the tracefs one. The reason tracefs was created in the
   first place was to allow access to the tracing facilities without the need
   to configure debugfs into the kernel. Using tracefs should now be more
   robust.
 
   A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED
   which is default y, so that the kernel is still built with the automount.
   This config allows those that want to remove the automount from debugfs to
   do so.
 
   When tracefs is accessed from /sys/kernel/debug/tracing, the following
   printk is triggerd:
 
    pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n");
 
   This gives users another 5 years to fix their scripts.
 
 - Use queue_rcu_work() instead of call_rcu() for freeing event filters
 
   The number of filters to be free can be many depending on the number of
   events within an event system. Freeing them from softirq context can
   potentially cause undesired latency. Use the RCU workqueue to free them
   instead.
 
 - Remove pointless memory barriers in latency code
 
   Memory barriers were added to some of the latency code a long time ago with
   the idea of "making them visible", but that's not what memory barriers are
   for. They are to synchronize access between different variables. There was
   no synchronization here making them pointless.
 
 - Remove "__attribute__()" from the type field of event format
 
   When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y and
   PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with the
   following:
 
     field:const char * filename;      offset:24;      size:8; signed:0;
 
   Turns into:
 
     field:const char __attribute__((btf_type_tag("user"))) * filename;      offset:24;      size:8; signed:0;
 
   This confuses parsers. Add code to strip these tags from the strings.
 
 - Add eprobe config option CONFIG_EPROBE_EVENTS
 
   Eprobes were added back in 5.15 but were only enabled when another probe was
   enabled (kprobe, fprobe, uprobe, etc). The eprobes had no config option
   of their own. Add one as they should be a separate entity.
 
   It's default y to keep with the old kernels but still has dependencies on
   TRACING and HAVE_REGS_AND_STACK_ACCESS_API.
 
 - Add eprobe documentation
 
   When eprobes were added back in 5.15 no documentation was added to describe
   them. This needs to be rectified.
 
 - Replace open coded cpumask_next_wrap() in move_to_next_cpu()
 
 - Have preemptirq_delay_run() use off-stack CPU mask
 
 - Remove obsolete comment about pelt_cfs event
 
   DECLARE_TRACE() appends "_tp" to trace events now, but the comment above
   pelt_cfs still mentioned appending it manually.
 
 - Remove EVENT_FILE_FL_SOFT_MODE flag
 
   The SOFT_MODE flag was required when the soft enabling and disabling of
   trace events was first introduced. But there was a bug with this approach
   as it only worked for a single instance. When multiple users required soft
   disabling and disabling the code was changed to have a ref count. The
   SOFT_MODE flag is now set iff the ref count is non zero. This is redundant
   and just reading the ref count is good enough.
 
 - Fix typo in comment
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIt5ZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qvriAPsEbOEgMrPF1Tdj1mHLVajYTxI8ft5J
 aX5bfM2cDDRVcgEA57JHOXp4d05dj555/hgAUuCWuFp/E0Anp45EnFTedgQ=
 =wKZW
 -----END PGP SIGNATURE-----
Merge tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
 - Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing
   When tracefs was first introduced back in 2014, the directory
   /sys/kernel/tracing was added and is the designated location to mount
   tracefs. To keep backward compatibility, tracefs was auto-mounted in
   /sys/kernel/debug/tracing as well.
   All distros now mount tracefs on /sys/kernel/tracing. Having it seen
   in two different locations has lead to various issues and
   inconsistencies.
   The VFS folks have to also maintain debugfs_create_automount() for
   this single user.
   It's been over 10 years. Tooling and scripts should start replacing
   the debugfs location with the tracefs one. The reason tracefs was
   created in the first place was to allow access to the tracing
   facilities without the need to configure debugfs into the kernel.
   Using tracefs should now be more robust.
   A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is
   default y, so that the kernel is still built with the automount. This
   config allows those that want to remove the automount from debugfs to
   do so.
   When tracefs is accessed from /sys/kernel/debug/tracing, the
   following printk is triggerd:
     pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n");
   This gives users another 5 years to fix their scripts.
 - Use queue_rcu_work() instead of call_rcu() for freeing event filters
   The number of filters to be free can be many depending on the number
   of events within an event system. Freeing them from softirq context
   can potentially cause undesired latency. Use the RCU workqueue to
   free them instead.
 - Remove pointless memory barriers in latency code
   Memory barriers were added to some of the latency code a long time
   ago with the idea of "making them visible", but that's not what
   memory barriers are for. They are to synchronize access between
   different variables. There was no synchronization here making them
   pointless.
 - Remove "__attribute__()" from the type field of event format
   When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y
   and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with
   the following:
     field:const char * filename;      offset:24;      size:8; signed:0;
   Turns into:
     field:const char __attribute__((btf_type_tag("user"))) * filename;      offset:24;      size:8; signed:0;
   This confuses parsers. Add code to strip these tags from the strings.
 - Add eprobe config option CONFIG_EPROBE_EVENTS
   Eprobes were added back in 5.15 but were only enabled when another
   probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no
   config option of their own. Add one as they should be a separate
   entity.
   It's default y to keep with the old kernels but still has
   dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API.
 - Add eprobe documentation
   When eprobes were added back in 5.15 no documentation was added to
   describe them. This needs to be rectified.
 - Replace open coded cpumask_next_wrap() in move_to_next_cpu()
 - Have preemptirq_delay_run() use off-stack CPU mask
 - Remove obsolete comment about pelt_cfs event
   DECLARE_TRACE() appends "_tp" to trace events now, but the comment
   above pelt_cfs still mentioned appending it manually.
 - Remove EVENT_FILE_FL_SOFT_MODE flag
   The SOFT_MODE flag was required when the soft enabling and disabling
   of trace events was first introduced. But there was a bug with this
   approach as it only worked for a single instance. When multiple users
   required soft disabling and disabling the code was changed to have a
   ref count. The SOFT_MODE flag is now set iff the ref count is non
   zero. This is redundant and just reading the ref count is good
   enough.
 - Fix typo in comment
* tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Add documentation about eprobes
  tracing: Have eprobes have their own config option
  tracing: Remove "__attribute__()" from the type field of event format
  tracing: Deprecate auto-mounting tracefs in debugfs
  tracing: Fix comment in trace_module_remove_events()
  tracing: Remove EVENT_FILE_FL_SOFT_MODE flag
  tracing: Remove pointless memory barriers
  tracing/sched: Remove obsolete comment on suffixes
  kernel: trace: preemptirq_delay_test: use offstack cpu mask
  tracing: Use queue_rcu_work() to free filters
  tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
							
						 | 
						
							||
| 
							 | 
						
							
							
								
								
							
							
							
								
							
							
								a7c54b2b41
								 | 
						
							
							
								
								tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN
							
							
							
							
							
							
							
							Use the MODULE_NAME_LEN definition in module_exists() to obtain the maximum
size of a module name, instead of using MAX_PARAM_PREFIX_LEN. The values
are the same but MODULE_NAME_LEN is more appropriate in this context.
MAX_PARAM_PREFIX_LEN was added in commit 
							
						 | 
						
							||
| 
							 | 
						1a967e92bf | 
							
							
								
								tracing: Remove "__attribute__()" from the type field of event format
							
							
							
							
							
							
							
							With CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, `__user` is
converted to `__attribute__((btf_type_tag("user")))`. In this case,
some syscall events have it for __user data, like below;
/sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 720
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:int __syscall_nr; offset:8;       size:4; signed:1;
        field:int dfd;  offset:16;      size:8; signed:0;
        field:const char __attribute__((btf_type_tag("user"))) * filename;      offset:24;      size:8; signed:0;
        field:int flags;        offset:32;      size:8; signed:0;
        field:umode_t mode;     offset:40;      size:8; signed:0;
Then the trace event filter fails to set the string acceptable flag
(FILTER_PTR_STRING) to the field and rejects setting string filter;
 # echo 'filename.ustring ~ "*ftracetest-dir.wbx24v*"' \
    >> events/syscalls/sys_enter_openat/filter
 sh: write error: Invalid argument
 # cat error_log
 [  723.743637] event filter parse error: error: Expecting numeric field
   Command: filename.ustring ~ "*ftracetest-dir.wbx24v*"
Since this __attribute__ makes format parsing complicated and not
needed, remove the __attribute__(.*) from the type string.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/175376583493.1688759.12333973498014733551.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
							
						 | 
						
							||
| 
							 | 
						9ba817fb7c | 
							
							
								
								tracing: Deprecate auto-mounting tracefs in debugfs
							
							
							
							
							
							
							
							In January 2015, tracefs was created to allow access to the tracing infrastructure without needing to compile in debugfs. When tracefs is configured, the directory /sys/kernel/tracing will exist and tooling is expected to use that path to access the tracing infrastructure. To allow backward compatibility, when debugfs is mounted, it would automount tracefs in its "tracing" directory so that tooling that had hard coded /sys/kernel/debug/tracing would still work. It has been over 10 years since the new interface was introduced, and all tooling should now be using it. Start the process of deprecating the old path so that it doesn't need to be maintained anymore. A new config is added to allow distributions to disable automounting of tracefs on debugfs. If /sys/kernel/debug/tracing is accessed, a pr_warn() will trigger stating: "NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030" Expect to remove this feature in 5 years (2030). Cc: <linux-trace-users@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/20250722170806.40c068c6@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						c897c1e5b1 | 
							
							
								
								tracing: Remove pointless memory barriers
							
							
							
							
							
							
							
							Memory barriers are useful to ensure memory accesses from one CPU appear in the original order as seen by other CPUs. Some smp_rmb() and smp_wmb() are used, but they are not ordering multiple memory accesses. Remove them. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626151940.1756398-1-namcao@linutronix.de Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						119a5d5736 | 
							
							
								
								ring-buffer: Remove ring_buffer_read_prepare_sync()
							
							
							
							
							
							
							
							When the ring buffer was first introduced, reading the non-consuming "trace" file required disabling the writing of the ring buffer. To make sure the writing was fully disabled before iterating the buffer with a non-consuming read, it would set the disable flag of the buffer and then call an RCU synchronization to make sure all the buffers were synchronized. The function ring_buffer_read_start() originally would initialize the iterator and call an RCU synchronization, but this was for each individual per CPU buffer where this would get called many times on a machine with many CPUs before the trace file could be read. The commit  | 
						
							||
| 
							 | 
						8bf722c684 | 
							
							
								
								ring-buffer changes for v6.16:
							
							
							
							
							
							
							
							- Allow the persistent ring buffer to be memory mapped
 
   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the persistent
   ring buffer can be mapped the same way as the allocated ring buffer is
   mapped.
 
   The meta data for the persistent ring buffer is different than the normal
   ring buffer and the organization of mapping it to user space is a little
   different. Make the updates needed to the meta data to allow the
   persistent ring buffer to be mapped to user space.
 
 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock
 
   Mapping the ring buffer to user space uses the cpu_buffer->mapping_lock.
   The buffer->mutex can be taken when the mapping_lock is held, giving the
   locking order of: cpu_buffer->mapping_lock -->> buffer->mutex. But there
   also exists the ordering:
 
       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock
 
   causing a circular chain of:
 
       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock
 
   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.
 
 - Do not trigger WARN_ON() for commit overrun
 
   When the ring buffer is user space mapped and there's a "commit overrun"
   (where an interrupt preempted an event, and then added so many events it
    filled the buffer having to drop events when it hit the preempted event)
   a WARN_ON() was triggered if this was read via a memory mapped buffer.
   This is due to "missed events" being non zero when the reader page ended
   up with the commit page. The idea was, if the writer is on the reader page,
   there's only one page that has been written to and there should be no
   missed events. But if a commit overrun is done where the writer is off the
   commit page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed events.
 
   Instead of triggering a WARN_ON() when the reader page is the commit page
   with missed events, trigger it when the reader page is the tail_page with
   missed events. That's because the writer is always on the tail_page if
   an event was interrupted (which holds the commit event) and continues off
   the commit page.
 
 - Reset the persistent buffer if it is fully consumed
 
   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will still be
   events in the buffer which can cause confusion. Instead, reset the buffer
   when it is fully consumed, so that the data is not read again.
 
 - Clean up some goto out jumps
 
   There's a few cases that the code jumps to the "out:" label that simply
   returns a value. There used to be more work done at those labels but now
   that they simply return a value use a return instead of jumping to a
   label.
 
 - Use guard() to simplify some of the code
 
   Add guard() around some locking instead of jumping to a label to do the
   unlocking.
 
 - Use free() to simplify some of the code
 
   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable that was
   allocated just for the function use.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDjJMxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkDzAP468AZOnjIxezfzYEmtcDl8ZUgf2U3I
 XtXjn7aKH/gZiwD/dCCZX2IY2gddqAb6s9Bo4/AWgtYbjacLPL+pWYbTJwQ=
 =DOfF
 -----END PGP SIGNATURE-----
Merge tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
 - Allow the persistent ring buffer to be memory mapped
   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the
   persistent ring buffer can be mapped the same way as the allocated
   ring buffer is mapped.
   The metadata for the persistent ring buffer is different than the
   normal ring buffer and the organization of mapping it to user space
   is a little different. Make the updates needed to the meta data to
   allow the persistent ring buffer to be mapped to user space.
 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock
   Mapping the ring buffer to user space uses the
   cpu_buffer->mapping_lock. The buffer->mutex can be taken when the
   mapping_lock is held, giving the locking order of:
   cpu_buffer->mapping_lock -->> buffer->mutex. But there also exists
   the ordering:
       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock
   causing a circular chain of:
       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock
   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.
 - Do not trigger WARN_ON() for commit overrun
   When the ring buffer is user space mapped and there's a "commit
   overrun" (where an interrupt preempted an event, and then added so
   many events it filled the buffer having to drop events when it hit
   the preempted event) a WARN_ON() was triggered if this was read via a
   memory mapped buffer.
   This is due to "missed events" being non zero when the reader page
   ended up with the commit page. The idea was, if the writer is on the
   reader page, there's only one page that has been written to and there
   should be no missed events.
   But if a commit overrun is done where the writer is off the commit
   page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed
   events.
   Instead of triggering a WARN_ON() when the reader page is the commit
   page with missed events, trigger it when the reader page is the
   tail_page with missed events. That's because the writer is always on
   the tail_page if an event was interrupted (which holds the commit
   event) and continues off the commit page.
 - Reset the persistent buffer if it is fully consumed
   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will
   still be events in the buffer which can cause confusion. Instead,
   reset the buffer when it is fully consumed, so that the data is not
   read again.
 - Clean up some goto out jumps
   There's a few cases that the code jumps to the "out:" label that
   simply returns a value. There used to be more work done at those
   labels but now that they simply return a value use a return instead
   of jumping to a label.
 - Use guard() to simplify some of the code
   Add guard() around some locking instead of jumping to a label to do
   the unlocking.
 - Use free() to simplify some of the code
   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable
   that was allocated just for the function use.
* tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Simplify functions with __free(kfree) to free allocations
  ring-buffer: Make ring_buffer_{un}map() simpler with guard(mutex)
  ring-buffer: Simplify ring_buffer_read_page() with guard()
  ring-buffer: Simplify reset_disabled_cpu_buffer() with use of guard()
  ring-buffer: Remove jump to out label in ring_buffer_swap_cpu()
  ring-buffer: Removed unnecessary if() goto out where out is the next line
  tracing: Reset last-boot buffers when reading out all cpu buffers
  ring-buffer: Allow reserve_mem persistent ring buffers to be mmapped
  ring-buffer: Do not trigger WARN_ON() due to a commit_overrun
  ring-buffer: Move cpus_read_lock() outside of buffer->mutex
							
						 | 
						
							||
| 
							 | 
						0f70f5b08a | 
							
							
								
								automount wart removal
							
							
							
							
							
							
							
							Calling conventions of ->d_automount() made saner (flagday change) vfs_submount() is gone - its sole remaining user (trace_automount) had been switched to saner primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaDoRWQAKCRBZ7Krx/gZQ 6wxMAQCzuMc2GiGBMXzeK4SGA7d5rsK71unf+zczOd8NvbTImQEAs1Cu3u3bF3pq EmHQWFTKBpBf+RHsLSoDHwUA+9THowM= =GXLi -----END PGP SIGNATURE----- Merge tag 'pull-automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull automount updates from Al Viro: "Automount wart removal A bunch of odd boilerplate gone from instances - the reason for those was the need to protect the yet-to-be-attched mount from mark_mounts_for_expiry() deciding to take it out. But that's easy to detect and take care of in mark_mounts_for_expiry() itself; no need to have every instance simulate mount being busy by grabbing an extra reference to it, with finish_automount() undoing that once it attaches that mount. Should've done it that way from the very beginning... This is a flagday change, thankfully there are very few instances. vfs_submount() is gone - its sole remaining user (trace_automount) had been switched to saner primitives" * tag 'pull-automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill vfs_submount() saner calling conventions for ->d_automount()  | 
						
							||
| 
							 | 
						b78f1293f9 | 
							
							
								
								tracing updates for v6.16:
							
							
							
							
							
							
							
							- Have module addresses get updated in the persistent ring buffer The addresses of the modules from the previous boot are saved in the persistent ring buffer. If the same modules are loaded and an address is in the old buffer points to an address that was both saved in the persistent ring buffer and is loaded in memory, shift the address to point to the address that is loaded in memory in the trace event. - Print function names for irqs off and preempt off callsites When ignoring the print fmt of a trace event and just printing the fields directly, have the fields for preempt off and irqs off events still show the function name (via kallsyms) instead of just showing the raw address. - Clean ups of the histogram code The histogram functions saved over 800 bytes on the stack to process events as they come in. Instead, create per-cpu buffers that can hold this information and have a separate location for each context level (thread, softirq, IRQ and NMI). Also add some more comments to the code. - Add "common_comm" field for histograms Add "common_comm" that uses the current->comm as a field in an event histogram and acts like any of the other fields of the event. - Show "subops" in the enabled_functions file When the function graph infrastructure is used, a subsystem has a "subops" that it attaches its callback function to. Instead of the enabled_functions just showing a function calling the function that calls the subops functions, also show the subops functions that will get called for that function too. - Add "copy_trace_marker" option to instances There are cases where an instance is created for tooling to write into, but the old tooling has the top level instance hardcoded into the application. New tools want to consume the data from an instance and not the top level buffer. By adding a copy_trace_marker option, whenever the top instance trace_marker is written into, a copy of it is also written into the instance with this option set. This allows new tools to read what old tools are writing into the top buffer. If this option is cleared by the top instance, then what is written into the trace_marker is not written into the top instance. This is a way to redirect the trace_marker writes into another instance. - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp() If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(), then it will not be exposed via tracefs. Currently there's no way to differentiate in the kernel the tracepoint functions between those that are exposed via tracefs or not. A calling convention has been made manually to append a "_tp" prefix for events created by DECLARE_TRACE(). Instead of doing this manually, force it so that all DECLARE_TRACE() events have this notation. - Use __string() for task->comm in some sched events Instead of hardcoding the comm to be TASK_COMM_LEN in some of the scheduler events use __string() which makes it dynamic. Note, if these events are parsed by user space it they may break, and the event may have to be converted back to the hardcoded size. - Have function graph "depth" be unsigned to the user Internally to the kernel, the "depth" field of the function graph event is signed due to -1 being used for end of boundary. What actually gets recorded in the event itself is zero or positive. Reflect this to user space by showing "depth" as unsigned int and be consistent across all events. - Allow an arbitrary long CPU string to osnoise_cpus_write() The filtering of which CPUs to write to can exceed 256 bytes. If a machine has 256 CPUs, and the filter is to filter every other CPU, the write would take a string larger than 256 bytes. Instead of using a fixed size buffer on the stack that is 256 bytes, allocate it to handle what is passed in. - Stop having ftrace check the per-cpu data "disabled" flag The "disabled" flag in the data structure passed to most ftrace functions is checked to know if tracing has been disabled or not. This flag was added back in 2008 before the ring buffer had its own way to disable tracing. The "disable" flag is now not always set when needed, and the ring buffer flag should be used in all locations where the disabled is needed. Since the "disable" flag is redundant and incorrect, stop using it. Fix up some locations that use the "disable" flag to use the ring buffer info. - Use a new tracer_tracing_disable/enable() instead of data->disable flag There's a few cases that set the data->disable flag to stop tracing, but this flag is not consistently used. It is also an on/off switch where if a function set it and calls another function that sets it, the called function may incorrectly enable it. Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a counter and can be nested. These use the ring buffer flags which are always checked making the disabling more consistent. - Save the trace clock in the persistent ring buffer Save what clock was used for tracing in the persistent ring buffer and set it back to that clock after a reboot. - Remove unused reference to a per CPU data pointer in mmiotrace functions - Remove unused buffer_page field from trace_array_cpu structure - Remove more strncpy() instances - Other minor clean ups and fixes -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDhiqRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qkheAQDpyRHoXF1AIoEqyahDax8f3vpZQeCH B/mn+YJmU1wuVgEA7AFALov5SHKv4IzoARz68GXtR0jGhP5D8uebUhUqDAQ= =WmFG -----END PGP SIGNATURE----- Merge tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Have module addresses get updated in the persistent ring buffer The addresses of the modules from the previous boot are saved in the persistent ring buffer. If the same modules are loaded and an address is in the old buffer points to an address that was both saved in the persistent ring buffer and is loaded in memory, shift the address to point to the address that is loaded in memory in the trace event. - Print function names for irqs off and preempt off callsites When ignoring the print fmt of a trace event and just printing the fields directly, have the fields for preempt off and irqs off events still show the function name (via kallsyms) instead of just showing the raw address. - Clean ups of the histogram code The histogram functions saved over 800 bytes on the stack to process events as they come in. Instead, create per-cpu buffers that can hold this information and have a separate location for each context level (thread, softirq, IRQ and NMI). Also add some more comments to the code. - Add "common_comm" field for histograms Add "common_comm" that uses the current->comm as a field in an event histogram and acts like any of the other fields of the event. - Show "subops" in the enabled_functions file When the function graph infrastructure is used, a subsystem has a "subops" that it attaches its callback function to. Instead of the enabled_functions just showing a function calling the function that calls the subops functions, also show the subops functions that will get called for that function too. - Add "copy_trace_marker" option to instances There are cases where an instance is created for tooling to write into, but the old tooling has the top level instance hardcoded into the application. New tools want to consume the data from an instance and not the top level buffer. By adding a copy_trace_marker option, whenever the top instance trace_marker is written into, a copy of it is also written into the instance with this option set. This allows new tools to read what old tools are writing into the top buffer. If this option is cleared by the top instance, then what is written into the trace_marker is not written into the top instance. This is a way to redirect the trace_marker writes into another instance. - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp() If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(), then it will not be exposed via tracefs. Currently there's no way to differentiate in the kernel the tracepoint functions between those that are exposed via tracefs or not. A calling convention has been made manually to append a "_tp" prefix for events created by DECLARE_TRACE(). Instead of doing this manually, force it so that all DECLARE_TRACE() events have this notation. - Use __string() for task->comm in some sched events Instead of hardcoding the comm to be TASK_COMM_LEN in some of the scheduler events use __string() which makes it dynamic. Note, if these events are parsed by user space it they may break, and the event may have to be converted back to the hardcoded size. - Have function graph "depth" be unsigned to the user Internally to the kernel, the "depth" field of the function graph event is signed due to -1 being used for end of boundary. What actually gets recorded in the event itself is zero or positive. Reflect this to user space by showing "depth" as unsigned int and be consistent across all events. - Allow an arbitrary long CPU string to osnoise_cpus_write() The filtering of which CPUs to write to can exceed 256 bytes. If a machine has 256 CPUs, and the filter is to filter every other CPU, the write would take a string larger than 256 bytes. Instead of using a fixed size buffer on the stack that is 256 bytes, allocate it to handle what is passed in. - Stop having ftrace check the per-cpu data "disabled" flag The "disabled" flag in the data structure passed to most ftrace functions is checked to know if tracing has been disabled or not. This flag was added back in 2008 before the ring buffer had its own way to disable tracing. The "disable" flag is now not always set when needed, and the ring buffer flag should be used in all locations where the disabled is needed. Since the "disable" flag is redundant and incorrect, stop using it. Fix up some locations that use the "disable" flag to use the ring buffer info. - Use a new tracer_tracing_disable/enable() instead of data->disable flag There's a few cases that set the data->disable flag to stop tracing, but this flag is not consistently used. It is also an on/off switch where if a function set it and calls another function that sets it, the called function may incorrectly enable it. Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a counter and can be nested. These use the ring buffer flags which are always checked making the disabling more consistent. - Save the trace clock in the persistent ring buffer Save what clock was used for tracing in the persistent ring buffer and set it back to that clock after a reboot. - Remove unused reference to a per CPU data pointer in mmiotrace functions - Remove unused buffer_page field from trace_array_cpu structure - Remove more strncpy() instances - Other minor clean ups and fixes * tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits) tracing: Fix compilation warning on arm32 tracing: Record trace_clock and recover when reboot tracing/sched: Use __string() instead of fixed lengths for task->comm tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix tracing: Cleanup upper_empty() in pid_list tracing: Allow the top level trace_marker to write into another instances tracing: Add a helper function to handle the dereference arg in verifier tracing: Remove unnecessary "goto out" that simply returns ret is trigger code tracing: Fix error handling in event_trigger_parse() tracing: Rename event_trigger_alloc() to trigger_data_alloc() tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf tracing: Remove unused buffer_page field from trace_array_cpu structure tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer tracing: Convert the per CPU "disabled" counter to local from atomic tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field ring-buffer: Add ring_buffer_record_is_on_cpu() tracing: Do not use per CPU array_buffer.data->disabled for cpumask ftrace: Do not disabled function graph based on "disabled" field tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one() ...  | 
						
							||
| 
							 | 
						32dc004252 | 
							
							
								
								tracing: Reset last-boot buffers when reading out all cpu buffers
							
							
							
							
							
							
							
							Reset the last-boot ring buffers when read() reads out all cpu buffers through trace_pipe/trace_pipe_raw. This prevents ftrace to unwind ring buffer read pointer next boot. Note that this resets only when all per-cpu buffers are empty, and read via read(2) syscall. For example, if you read only one of the per-cpu trace_pipe, it does not reset it. Also, reading buffer by splice(2) syscall does not reset because some data in the reader (the last) page. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/174792929202.496143.8184644221859580999.stgit@mhiramat.tok.corp.google.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						c2a0831142 | 
							
							
								
								ring-buffer: Allow reserve_mem persistent ring buffers to be mmapped
							
							
							
							
							
							
							
							When the persistent ring buffer is created from the memory returned by reserve_mem there is nothing prohibiting it to be memory mapped to user space. The memory is the same as the pages allocated by alloc_page(). The way the memory is managed by the ring buffer code is slightly different though and needs to be addressed. The persistent memory uses the page->id for its own purpose where as the user mmap buffer currently uses that for the subbuf array mapped to user space. If the buffer is a persistent buffer, use the page index into that buffer as the identifier instead of the page->id. That is, the page->id for a persistent buffer, represents the order of the buffer is in the link list. ->id == 0 means it is the reader page. When a reader page is swapped, the new reader page's ->id gets zero, and the old reader page gets the ->id of the page that it swapped with. The user space mapping has the ->id is the index of where it was mapped in user space and does not change while it is mapped. Since the persistent buffer is fixed in its location, the index of where a page is in the memory range can be used as the "id" to put in the meta page array, and it can be mapped in the same order to user space as it is in the persistent memory. A new rb_page_id() helper function is used to get and set the id depending on if the page is a normal memory allocated buffer or a physical memory mapped buffer. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250401203332.246646011@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						2fbdb6d8e0 | 
							
							
								
								tracing: Fix compilation warning on arm32
							
							
							
							
							
							
							
							On arm32, size_t is defined to be unsigned int, while PAGE_SIZE is
unsigned long. This hence triggers a compilation warning as min()
asserts the type of two operands to be equal. Casting PAGE_SIZE to size_t
solves this issue and works on other target architectures as well.
Compilation warning details:
kernel/trace/trace.c: In function 'tracing_splice_read_pipe':
./include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast
  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                            ^
./include/linux/minmax.h:26:4: note: in expansion of macro '__typecheck'
   (__typecheck(x, y) && __no_side_effects(x, y))
    ^~~~~~~~~~~
...
kernel/trace/trace.c:6771:8: note: in expansion of macro 'min'
        min((size_t)trace_seq_used(&iter->seq),
        ^~~
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250526013731.1198030-1-pantaixi@huaweicloud.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						f1975e4765 | 
							
							
								
								Summary
							
							
							
							
							
							
							
							* Move kern_table members out of kernel/sysctl.c Moved a subset (tracing, panic, signal, stack_tracer and sparc) out of the kern_table array. The goal is for kern_table to only have sysctl elements. All this increases modularity by placing the ctl_tables closer to where they are used while reducing the chances of merge conflicts in kernel/sysctl.c. * Fixed sysctl unit test panic by relocating it to selftests * Testing These have been in linux-next from rc2, so they have had more than a month worth of testing. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmgwLsAACgkQupfNUreW QU9ghwv/VKZW+IXEvSjc8OiwntWkL7e5ddHY6O2Vf44MzhBefLTXmfx2HfkEA0Xw RaOQ28Hf/zQL83RqHHnXqI7JdGWQJUm8bCPwk4H3DCaF8qOfPVvblVYmfNL2auSY oyRRpRzZuY5EtKcrNjiHFHL2WIC8KvPVwS748oHY1eZY7kn1fcs8DDnNO4iuWop+ uJeDxu87wkRCFXF3DIM+MAHRvxSa8GHtZvb9EjAl/EHMbAyVSz3uTb7FdQDdnE09 s7P30EC03RHtgi3sd2Ku04dJsHLz7VErvpToxSH2KFlcdpJuWuCSCTT8XaD8kII8 kYYCxNpmPOf4LzEy/J2vVZB0PSHrHvuQCH7iGy+8wOPk9GHTOMkKMMXVmeGnAsef AiosPYroxXp/nBFcuNs6/1LKpsdpFr2F6u6oMgbzLaW1Xe/oc+6oynuOgeVj9LuM FrSxSwaVvpdwHYHujYPQAAWIgKRzITiEXnCgtSyohFquKb+7E8ZspwjOqYH2xWMQ WwABNRqY =45X2 -----END PGP SIGNATURE----- Merge tag 'sysctl-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move kern_table members out of kernel/sysctl.c Moved a subset (tracing, panic, signal, stack_tracer and sparc) out of the kern_table array. The goal is for kern_table to only have sysctl elements. All this increases modularity by placing the ctl_tables closer to where they are used while reducing the chances of merge conflicts in kernel/sysctl.c. - Fixed sysctl unit test panic by relocating it to selftests * tag 'sysctl-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: Close test ctl_headers with a for loop sysctl: call sysctl tests with a for loop sysctl: Add 0012 to test the u8 range check sysctl: move u8 register test to lib/test_sysctl.c sparc: mv sparc sysctls into their own file under arch/sparc/kernel stack_tracer: move sysctl registration to kernel/trace/trace_stack.c tracing: Move trace sysctls into trace.c signal: Move signal ctl tables into signal.c panic: Move panic ctl tables into panic.c  | 
						
							||
| 
							 | 
						2632a2013f | 
							
							
								
								tracing: Record trace_clock and recover when reboot
							
							
							
							
							
							
							
							Record trace_clock information in the trace_scratch area and recover the trace_clock when boot, so that reader can docode the timestamp correctly. Note that since most trace_clocks records the timestamp in nano- seconds, this is not a bug. But some trace_clock, like counter and tsc will record the counter value. Only for those trace_clock user needs this information. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/174720625803.1925039.1815089037443798944.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						7b382efd5e | 
							
							
								
								tracing: Allow the top level trace_marker to write into another instances
							
							
							
							
							
							
							
							There are applications that have it hard coded to write into the top level trace_marker instance (/sys/kernel/tracing/trace_marker). This can be annoying if a profiler is using that instance for other work, or if it needs all writes to go into a new instance. A new option is created called "copy_trace_marker". By default, the top level has this set, as that is the default buffer that writing into the top level trace_marker file will go to. But now if an instance is created and sets this option, all writes into the top level trace_marker will also be written into that instance buffer just as if an application were to write into the instance's trace_marker file. If the top level instance disables this option, then writes to its own trace_marker and trace_marker_raw files will not go into its buffer. If no instance has this option set, then the write will return an error and errno will contain ENODEV. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250508095639.39f84eda@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						969043af15 | 
							
							
								
								tracing: Do not use per CPU array_buffer.data->disabled for cpumask
							
							
							
							
							
							
							
							The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable flag. Do not bother setting the per CPU disabled flag of the array_buffer data to use to determine what CPUs can write to the buffer and only rely on the ring buffer code itself to disabled it. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250505212235.885452497@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						6ba3e0533f | 
							
							
								
								tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
							
							
							
							
							
							
							
							The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable flag. The ftrace_dump_one() function iterates over all the current tracing CPUs and increments the "disabled" counter before doing the dump, and decrements it afterward. As the disabled flag can be ignored, doing this today is not reliable. Instead use the new tracer_tracing_disable() that calls into the ring buffer code to do the disabling. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250505212235.381188238@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						dbecef68ad | 
							
							
								
								tracing: Add tracer_tracing_disable/enable() functions
							
							
							
							
							
							
							
							Allow a tracer to disable writing to its buffer for a temporary amount of time and re-enable it. The tracer_tracing_disable() will disable writing to the trace array buffer, and requires a tracer_tracing_enable() to re-enable it. The difference between tracer_tracing_disable() and tracer_tracing_off() is that the disable version can nest, and requires as many enable() calls as disable() calls to re-enable the buffer. Where as the off() function can be called multiple times and only requires a singe tracer_tracing_on() to re-enable the buffer. Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Thompson <danielt@kernel.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/20250505212235.210330010@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						2dbf6e0df4 | 
							
							
								
								kill vfs_submount()
							
							
							
							
							
							
							
							The last remaining user of vfs_submount() (tracefs) is easy to convert to fs_context_for_submount(); do that and bury that thing, along with SB_SUBMOUNT Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>  | 
						
							||
| 
							 | 
						00d872dd54 | 
							
							
								
								tracing: Only return an adjusted address if it matches the kernel address
							
							
							
							
							
							
							
							The trace_adjust_address() will take a given address and examine the persistent ring buffer to see if the address matches a module that is listed there. If it does not, it will just adjust the value to the core kernel delta. But if the address was for something that was not part of the core kernel text or data it should not be adjusted. Check the result of the adjustment and only return the adjustment if it lands in the current kernel text or data. If not, return the original address. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250506102300.0ba2f9e0@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						006ff7498f | 
							
							
								
								saner calling conventions for ->d_automount()
							
							
							
							
							
							
							
							Currently the calling conventions for ->d_automount() instances have an odd wart - returned new mount to be attached is expected to have refcount 2. That kludge is intended to make sure that mark_mounts_for_expiry() called before we get around to attaching that new mount to the tree won't decide to take it out. finish_automount() drops the extra reference after it's done with attaching mount to the tree - or drops the reference twice in case of error. ->d_automount() instances have rather counterintuitive boilerplate in them. There's a much simpler approach: have mark_mounts_for_expiry() skip the mounts that are yet to be mounted. And to hell with grabbing/dropping those extra references. Makes for simpler correctness analysis, at that... Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>  | 
						
							||
| 
							 | 
						1be8e54a1e | 
							
							
								
								tracing: Fix trace_adjust_address() when there is no modules in scratch area
							
							
							
							
							
							
							
							The function trace_adjust_address() is used to map addresses of modules
stored in the persistent memory and are also loaded in the current boot to
return the current address for the module.
If there's only one module entry, it will simply use that, otherwise it
performs a bsearch of the entry array to find the modules to offset with.
The issue is if there are no modules in the array. The code does not
account for that and ends up referencing the first element in the array
which does not exist and causes a crash.
If nr_entries is zero, exit out early as if this was a core kernel
address.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250501151909.65910359@gandalf.local.home
Fixes: 
							
						 | 
						
							||
| 
							 | 
						f5178c41bb | 
							
							
								
								tracing: Fix oob write in trace_seq_to_buffer()
							
							
							
							
							
							
							
							syzbot reported this bug:
==================================================================
BUG: KASAN: slab-out-of-bounds in trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
BUG: KASAN: slab-out-of-bounds in tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
Write of size 4507 at addr ffff888032b6b000 by task syz.2.320/7260
CPU: 1 UID: 0 PID: 7260 Comm: syz.2.320 Not tainted 6.15.0-rc1-syzkaller-00301-g3bde70a2c827 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc3/0x670 mm/kasan/report.c:521
 kasan_report+0xe0/0x110 mm/kasan/report.c:634
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189
 __asan_memcpy+0x3c/0x60 mm/kasan/shadow.c:106
 trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
 tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
 ....
==================================================================
It has been reported that trace_seq_to_buffer() tries to copy more data
than PAGE_SIZE to buf. Therefore, to prevent this, we should use the
smaller of trace_seq_used(&iter->seq) and PAGE_SIZE as an argument.
Link: https://lore.kernel.org/20250422113026.13308-1-aha310510@gmail.com
Reported-by: syzbot+c8cd2d2c412b868263fb@syzkaller.appspotmail.com
Fixes: 
							
						 | 
						
							||
| 
							 | 
						dd293df639 | 
							
							
								
								tracing: Move trace sysctls into trace.c
							
							
							
							
							
							
							
							Move trace ctl tables into their own const array in kernel/trace/trace.c. The sysctl table register is called with subsys_initcall placing if after its original place in proc_root_init. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						4808595a99 | 
							
							
								
								tracing: Hide get_vm_area() from MMUless builds
							
							
							
							
							
							
							
							The function get_vm_area() is not defined for non-MMU builds and causes a
build error if it is used. Hide the map_pages() function around a:
 #ifdef CONFIG_MMU
to keep it from being compiled when CONFIG_MMU is not set.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250407120111.2ccc9319@gandalf.local.home
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/4f8ece8b-8862-4f7c-8ede-febd28f8a9fe@roeck-us.net/
Fixes: 
							
						 | 
						
							||
| 
							 | 
						6cb0bd94c0 | 
							
							
								
								Persistent buffer cleanups and simplifications for v6.15:
							
							
							
							
							
							
							
							It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----
Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.
  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.
   - Enforce that the persistent memory is page aligned
     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.
   - Use phys_to_virt() to get the virtual address from reserve_mem
     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.
     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.
   - Use vmap_page_range() for memmap virtual address mapping
     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead
   - Replace flush_dcache_folio() with flush_kernel_vmap_range()
     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"
* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
							
						 | 
						
							||
| 
							 | 
						41677970ad | 
							
							
								
								tracing fixes for 6.15
							
							
							
							
							
							
							
							- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled The tracing of arguments in the function tracer depends on some functions that are only defined when PROBE_EVENTS_BTF_ARGS is enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same configs as the function argument tracing requires. Just have the function argument tracing depend on PROBE_EVENTS_BTF_ARGS. - Free module_delta for persistent ring buffer instance When an instance holds the persistent ring buffer, it allocates a helper array to hold the deltas between where modules are loaded on the last boot and the current boot. This array needs to be freed when the instance is freed. - Add cond_resched() to loop in ftrace_graph_set_hash() The hash functions in ftrace loop over every function that can be enabled by ftrace. This can be 50,000 functions or more. This loop is known to trigger soft lockup warnings and requires a cond_resched(). The loop in ftrace_graph_set_hash() was missing it. - Fix the event format verifier to include "%*p.." arguments To prevent events from dereferencing stale pointers that can happen if a trace event uses a dereferece pointer to something that was not copied into the ring buffer and can be freed by the time the trace is read, a verifier is called. At boot or module load, the verifier scans the print format string for pointers that can be dereferenced and it checks the arguments to make sure they do not contain something that can be freed. The "%*p" was not handled, which would add another argument and cause the verifier to not only not verify this pointer, but it will look at the wrong argument for every pointer after that. - Fix mcount sorttable building for different endian type target When modifying the ELF file to sort the mcount_loc table in the sorttable.c code, the endianess of the file and the host is used to determine if the bytes need to be swapped when calculations are done. A change was made to the sorting of the mcount_loc that read the values from the ELF file into an array and the swap happened on the filling of the array. But one of the calculations of the array still did the swap when it did not need to. This caused building on a little endian machine for a big endian target to not find the mcount function in the 'nm' table and it zeroed it out, causing there to be no functions available to trace. - Add goto out_unlock jump to rv_register_monitor() on error path One of the error paths in rv_register_monitor() just returned the error when it should have jumped to the out_unlock label to release the mutex. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+2tyBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qjPYAPwJDti6nHTqheFwIa1WzJ3yC2tKRYKt 1E5PYW/2Ct5NmwEAqgg3TvJppXHymVdutLghhGFnlBnyTWMI+KIhparSBw8= =NFM5 -----END PGP SIGNATURE----- Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled The tracing of arguments in the function tracer depends on some functions that are only defined when PROBE_EVENTS_BTF_ARGS is enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same configs as the function argument tracing requires. Just have the function argument tracing depend on PROBE_EVENTS_BTF_ARGS. - Free module_delta for persistent ring buffer instance When an instance holds the persistent ring buffer, it allocates a helper array to hold the deltas between where modules are loaded on the last boot and the current boot. This array needs to be freed when the instance is freed. - Add cond_resched() to loop in ftrace_graph_set_hash() The hash functions in ftrace loop over every function that can be enabled by ftrace. This can be 50,000 functions or more. This loop is known to trigger soft lockup warnings and requires a cond_resched(). The loop in ftrace_graph_set_hash() was missing it. - Fix the event format verifier to include "%*p.." arguments To prevent events from dereferencing stale pointers that can happen if a trace event uses a dereferece pointer to something that was not copied into the ring buffer and can be freed by the time the trace is read, a verifier is called. At boot or module load, the verifier scans the print format string for pointers that can be dereferenced and it checks the arguments to make sure they do not contain something that can be freed. The "%*p" was not handled, which would add another argument and cause the verifier to not only not verify this pointer, but it will look at the wrong argument for every pointer after that. - Fix mcount sorttable building for different endian type target When modifying the ELF file to sort the mcount_loc table in the sorttable.c code, the endianess of the file and the host is used to determine if the bytes need to be swapped when calculations are done. A change was made to the sorting of the mcount_loc that read the values from the ELF file into an array and the swap happened on the filling of the array. But one of the calculations of the array still did the swap when it did not need to. This caused building on a little endian machine for a big endian target to not find the mcount function in the 'nm' table and it zeroed it out, causing there to be no functions available to trace. - Add goto out_unlock jump to rv_register_monitor() on error path One of the error paths in rv_register_monitor() just returned the error when it should have jumped to the out_unlock label to release the mutex. * tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Fix missing unlock on double nested monitors return path scripts/sorttable: Fix endianness handling in build-time mcount sort tracing: Verify event formats that have "%*p.." ftrace: Add cond_resched() to ftrace_graph_set_hash() tracing: Free module_delta on freeing of persistent ring buffer ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS  | 
						
							||
| 
							 | 
						af54a3a151 | 
							
							
								
								more printk changes for 6.15
							
							
							
							
							
							
							
							-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmftIx0ACgkQUqAMR0iA lPJKUxAArJaWC7EXtDtd8u8Rl/CYpIEaMdPd7V+XA5sqdUyjkSJI+jRswonpOsNX Zn9pbGMds1LNBXm1NO9039+2TrPSJFTCGK6OgeCJC17/O31wnnm0LhZiU+JElgfi iQI5fdTnc3sB37bsjkvEUr9HFizRxY2fHHMWZ8ngiLfkKfki4ET+1u/yf7CraRk1 6+LK9mM/WyytP6gYaSlL5YYVYs9fNcR/ND6IQgpfIN15/fOAOXWbMB1jE2iDRzqt MQUD4+DTYQYmeS6jQ4ToZdx3Ql9NwcP2nJnA5fxXeqPFHc/SgRS6KqOPQgQUD4tV N4q6ozLPlzDFeHVHMhPz/PzlSEn0zC1ZX87xXCUAilnkJpbEujcPxf44R/3RHu3d y7kmCRj0RwgHpLIwzLH5POrF4il9/wVlyZFRaYBPMkj09l0WBwYvfMhlnzvAtCP8 pRKqHkjJ1FOWQFJyn98ONqcCmm2pZ8XKW2enikAhISVXcptI/1lIQ6IIpRdTjte1 r60CbiJ7UFL+TrVqsWBuqWQRi5u5HykPkZiCL/YYXzZmrl3zLO+0ti9YzEU8Yrzd K1VAB/1aK/MDrTgOI+VaqlPq79uJBwtbrflgFhFBKAKsqTpBcsZUv9/1KHthnqXV Y84SsY2XpoGtjn58mU6eEc+8lLTOTDVXs+ZZL4/M3maW7ygNiYY= =Biv4 -----END PGP SIGNATURE----- Merge tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull more printk updates from Petr Mladek: - Silence warnings about candidates for ‘gnu_print’ format attribute * tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsnprintf: Silence false positive GCC warning for va_format() vsnprintf: Drop unused const char fmt * in va_format() vsnprintf: Mark binary printing functions with __printf() attribute tracing: Mark binary printing functions with __printf() attribute seq_file: Mark binary printing functions with __printf() attribute seq_buf: Mark binary printing functions with __printf() attribute  | 
						
							||
| 
							 | 
						394f3f02de | 
							
							
								
								tracing: Use vmap_page_range() to map memmap ring buffer
							
							
							
							
							
							
							
							The code to map the physical memory retrieved by memmap currently allocates an array of pages to cover the physical memory and then calls vmap() to map it to a virtual address. Instead of using this temporary array of struct page descriptors, simply use vmap_page_range() that can directly map the contiguous physical memory to a virtual address. Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250402144953.754618481@goodmis.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						34ea8fa084 | 
							
							
								
								tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
							
							
							
							
							
							
							
							The reserve_mem kernel command line option may pass back a physical address, but the memory is still part of the normal memory just like using memblock_alloc() would be. This means that the physical memory returned by the reserve_mem command line option can be converted directly to virtual memory by simply using phys_to_virt(). When freeing the buffer there's no need to call vunmap() anymore as the memory allocated by reserve_mem is freed by the call to reserve_mem_release_by_name(). Because the persistent ring buffer can also be allocated via the memmap option, which *is* different than normal memory as it cannot be added back to the buddy system, it must be treated differently. It still needs to be virtually mapped to have access to it. It also can not be freed nor can it ever be memory mapped to user space. Create a new trace_array flag called TRACE_ARRAY_FL_MEMMAP which gets set if the buffer is created by the memmap option, and this will prevent the buffer from being memory mapped by user space. Also increment the ref count for memmap'ed buffers so that they can never be freed. Link: https://lore.kernel.org/all/Z-wFszhJ_9o4dc8O@kernel.org/ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250402144953.583750106@goodmis.org Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						c44a14f216 | 
							
							
								
								tracing: Enforce the persistent ring buffer to be page aligned
							
							
							
							
							
							
							
							Enforce that the address and the size of the memory used by the persistent ring buffer is page aligned. Also update the documentation to reflect this requirement. Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						2c9ee74a6d | 
							
							
								
								tracing: Free module_delta on freeing of persistent ring buffer
							
							
							
							
							
							
							
							If a persistent ring buffer is created, a "module_delta" array is also
allocated to hold the module deltas of loaded modules that match modules
in the scratch area. If this buffer gets freed, the module_delta array is
not freed and causes a memory leak.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250401124525.1f9ac02a@gandalf.local.home
Fixes: 
							
						 | 
						
							||
| 
							 | 
						46d29f23a7 | 
							
							
								
								ring-buffer updates for v6.15
							
							
							
							
							
							
							
							- Restructure the persistent memory to have a "scratch" area Instead of hard coding the KASLR offset in the persistent memory by the ring buffer, push that work up to the callers of the persistent memory as they are the ones that need this information. The offsets and such is not important to the ring buffer logic and it should not be part of that. A scratch pad is now created when the caller allocates a ring buffer from persistent memory by stating how much memory it needs to save. - Allow where modules are loaded to be saved in the new scratch pad Save the addresses of modules when they are loaded into the persistent memory scratch pad. - A new module_for_each_mod() helper function was created With the acknowledgement of the module maintainers a new module helper function was created to iterate over all the currently loaded modules. This has a callback to be called for each module. This is needed for when tracing is started in the persistent buffer and the currently loaded modules need to be saved in the scratch area. - Expose the last boot information where the kernel and modules were loaded The last_boot_info file is updated to print out the addresses of where the kernel "_text" location was loaded from a previous boot, as well as where the modules are loaded. If the buffer is recording the current boot, it only prints "# Current" so that it does not expose the KASLR offset of the currently running kernel. - Allow the persistent ring buffer to be released (freed) To have this in production environments, where the kernel command line can not be changed easily, the ring buffer needs to be freed when it is not going to be used. The memory for the buffer will always be allocated at boot up, but if the system isn't going to enable tracing, the memory needs to be freed. Allow it to be freed and added back to the kernel memory pool. - Allow stack traces to print the function names in the persistent buffer Now that the modules are saved in the persistent ring buffer, if the same modules are loaded, the printing of the function names will examine the saved modules. If the module is found in the scratch area and is also loaded, then it will do the offset shift and use kallsyms to display the function name. If the address is not found, it simply displays the address from the previous boot in hex. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+cUERQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qrAsAQCFt2nfzxoe3wtF5EqIT1VHp/8bQVjG gBe8B6ouboreogD/dS7yK8MRy24ZAmObGwYG0RbVicd50S7P8Rf7+823ng8= =OJKk -----END PGP SIGNATURE----- Merge tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Restructure the persistent memory to have a "scratch" area Instead of hard coding the KASLR offset in the persistent memory by the ring buffer, push that work up to the callers of the persistent memory as they are the ones that need this information. The offsets and such is not important to the ring buffer logic and it should not be part of that. A scratch pad is now created when the caller allocates a ring buffer from persistent memory by stating how much memory it needs to save. - Allow where modules are loaded to be saved in the new scratch pad Save the addresses of modules when they are loaded into the persistent memory scratch pad. - A new module_for_each_mod() helper function was created With the acknowledgement of the module maintainers a new module helper function was created to iterate over all the currently loaded modules. This has a callback to be called for each module. This is needed for when tracing is started in the persistent buffer and the currently loaded modules need to be saved in the scratch area. - Expose the last boot information where the kernel and modules were loaded The last_boot_info file is updated to print out the addresses of where the kernel "_text" location was loaded from a previous boot, as well as where the modules are loaded. If the buffer is recording the current boot, it only prints "# Current" so that it does not expose the KASLR offset of the currently running kernel. - Allow the persistent ring buffer to be released (freed) To have this in production environments, where the kernel command line can not be changed easily, the ring buffer needs to be freed when it is not going to be used. The memory for the buffer will always be allocated at boot up, but if the system isn't going to enable tracing, the memory needs to be freed. Allow it to be freed and added back to the kernel memory pool. - Allow stack traces to print the function names in the persistent buffer Now that the modules are saved in the persistent ring buffer, if the same modules are loaded, the printing of the function names will examine the saved modules. If the module is found in the scratch area and is also loaded, then it will do the offset shift and use kallsyms to display the function name. If the address is not found, it simply displays the address from the previous boot in hex. * tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use _text and the kernel offset in last_boot_info tracing: Show last module text symbols in the stacktrace ring-buffer: Remove the unused variable bmeta tracing: Skip update_last_data() if cleared and remove active check for save_mod() tracing: Initialize scratch_size to zero to prevent UB tracing: Fix a compilation error without CONFIG_MODULES tracing: Freeable reserved ring buffer mm/memblock: Add reserved memory release function tracing: Update modules to persistent instances when loaded tracing: Show module names and addresses of last boot tracing: Have persistent trace instances save module addresses module: Add module_for_each_mod() function tracing: Have persistent trace instances save KASLR offset ring-buffer: Add ring_buffer_meta_scratch() ring-buffer: Add buffer meta data for persistent ring buffer ring-buffer: Use kaslr address instead of text delta ring-buffer: Fix bytes_dropped calculation issue  | 
						
							||
| 
							 | 
						028a58ec15 | 
							
							
								
								tracing: Use _text and the kernel offset in last_boot_info
							
							
							
							
							
							
							
							Instead of using kaslr_offset() just record the location of "_text". This makes it possible for user space to use either the System.map or /proc/kallsyms as what to map all addresses to functions with. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250326220304.38dbedcd@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						35a380ddbc | 
							
							
								
								tracing: Show last module text symbols in the stacktrace
							
							
							
							
							
							
							
							Since the previous boot trace buffer can include module text address in the stacktrace. As same as the kernel text address, convert the module text address using the module address information. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/174282689201.356346.17647540360450727687.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						486fbcb380 | 
							
							
								
								tracing: Skip update_last_data() if cleared and remove active check for save_mod()
							
							
							
							
							
							
							
							If the last boot data is already cleared, there is no reason to update it again. Skip if the TRACE_ARRAY_FL_LAST_BOOT is cleared. Also, for calling save_mod() when module loading, we don't need to check the trace is active or not because any module address can be on the stacktrace. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/174165660328.1173316.15529357882704817499.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						5dbeb56bb9 | 
							
							
								
								tracing: Initialize scratch_size to zero to prevent UB
							
							
							
							
							
							
							
							In allocate_trace_buffer() the following code: buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0, tr->range_addr_start, tr->range_addr_size, struct_size(tscratch, entries, 128)); tscratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size); setup_trace_scratch(tr, tscratch, scratch_size); Has undefined behavior if ring_buffer_alloc_range() fails because "scratch_size" is not initialize. If the allocation fails, then buf->buffer will be NULL. The ring_buffer_meta_scratch() will return NULL immediately if it is passed a NULL buffer and it will not update scratch_size. Then setup_trace_scratch() will return immediately if tscratch is NULL. Although there's no real issue here, but it is considered undefined behavior to pass an uninitialized variable to a function as input, and UBSan may complain about it. Just initialize scratch_size to zero to make the code defined behavior and a little more robust. Link: https://lore.kernel.org/all/44c5deaa-b094-4852-90f9-52f3fb10e67a@stanley.mountain/ Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						f00c9201f9 | 
							
							
								
								tracing: Fix a compilation error without CONFIG_MODULES
							
							
							
							
							
							
							
							There are some code which depends on CONFIG_MODULES. #ifdef to enclose it. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/174230515367.2909896.8132122175220657625.stgit@mhiramat.tok.corp.google.com Fixes: dca91c1c5468 ("tracing: Have persistent trace instances save module addresses") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  | 
						
							||
| 
							 | 
						fb6d03238e | 
							
							
								
								tracing: Freeable reserved ring buffer
							
							
							
							
							
							
							
							Make the ring buffer on reserved memory to be freeable. This allows us
to free the trace instance on the reserved memory without changing
cmdline and rebooting. Even if we can not change the kernel cmdline
for security reason, we can release the reserved memory for the ring
buffer as free (available) memory.
For example, boot kernel with reserved memory;
"reserve_mem=20M:2M:trace trace_instance=boot_mapped^traceoff@trace"
~ # free
              total        used        free      shared  buff/cache   available
Mem:        1995548       50544     1927568       14964       17436     1911480
Swap:             0           0           0
~ # rmdir /sys/kernel/tracing/instances/boot_mapped/
[   23.704023] Freeing reserve_mem:trace memory: 20476K
~ # free
              total        used        free      shared  buff/cache   available
Mem:        2016024       41844     1956740       14968       17440     1940572
Swap:             0           0           0
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/173989134814.230693.18199312930337815629.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
							
						 | 
						
							||
| 
							 | 
						5f3719f697 | 
							
							
								
								tracing: Update modules to persistent instances when loaded
							
							
							
							
							
							
							
							When a module is loaded and a persistent buffer is actively tracing, add it to the list of modules in the persistent memory. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164609.469844721@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>  |