Commit graph

94 commits

Author SHA1 Message Date
Filipe Manana
c832378622 btrfs: use bools for local variables at btrfs_clear_extent_bit_changeset()
Several variables are defined as integers but used as booleans, and the
'delete' variable can be made const since it's not changed after being
declared. So change them to proper booleans and simplify setting their
value.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:50 +02:00
Filipe Manana
5af1eae78d btrfs: add missing error return to btrfs_clear_extent_bit_changeset()
We have a couple error branches where we have an error stored in the 'err'
variable and then jump to the 'out' label, however we don't return that
error, we just return 0. Normally this is not a problem since those error
branches call extent_io_tree_panic() which triggers a BUG() call, however
it's possible to have rather exotic kernel config with CONFIG_BUG disabled
in which case the BUG() call does nothing and we fallthrough. So make sure
to return the error, not just to fix that exotic case but also to make the
code less confusing. While at it also rename the 'err' variable to 'ret'
since this is the style we prefer and use more widely.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:50 +02:00
Filipe Manana
2187540b6f btrfs: exit after state split error at btrfs_clear_extent_bit_changeset()
If split_state() returned an error we call extent_io_tree_panic() which
will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an
uncommon and exotic scenario, then we fallthrough and hit a use after free
when calling clear_state_bit() since the extent state record which the
local variable 'prealloc' points to was freed by split_state().

So jump to the label 'out' after calling extent_io_tree_panic() and set
the 'prealloc' pointer to NULL since split_state() has already freed it
when it hit an error.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:50 +02:00
Filipe Manana
f389e7b982 btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset()
There's no need to check if split_state() returned an error twice, instead
unify into a single if statement after setting 'prealloc' to NULL, because
on error split_state() frees the 'prealloc' extent state record.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:50 +02:00
David Sterba
2d44a15afd btrfs: use list_first_entry() everywhere
Using the helper makes it a bit more clear that we're accessing the
first list entry.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:47 +02:00
Filipe Manana
9d072bfab5 btrfs: make btrfs_find_contiguous_extent_bit() return bool instead of int
The function needs only to return true or false, so there's no need to
return an integer. Currently it returns 0 when a range with the given
bits is set and 1 when not found, which is a bit counter intuitive too.
So change the function to return a bool instead, returning true when a
range is found and false otherwise. Update the function's documentation
to mention the return value too.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
00ba32e5be btrfs: remove double underscore prefix from __set_extent_bit()
Now that set_extent_bit() was renamed to btrfs_set_extent_bit(), there's
no need to have a __set_extent_bit() function, we can just remove the
double underscore prefix, which we try to avoid according to the coding
style conventions.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
94bd699a08 btrfs: rename remaining exported functions from extent-io-tree.h
Rename the remaning exported functions that don't have a 'btrfs_' prefix.
By convention exported functions should have such prefix to make it clear
they are btrfs specific and to avoid collisions with functions from
elsewhere in the kernel.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
b351161f4f btrfs: rename free_extent_state() to include a btrfs prefix
This is an exported function so it should have a 'btrfs_' prefix by
convention, to make it clear it's btrfs specific and to avoid collisions
with functions from elsewhere in the kernel.

Rename the function to add 'btrfs_' prefix to it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
f81c2aea71 btrfs: rename the functions to count, test and get bit ranges in io trees
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.

So add a 'btrfs_' prefix to their names to make it clear they are from
btrfs.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
e965835c98 btrfs: rename the functions to init and release an extent io tree
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.

So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
20612db462 btrfs: directly grab inode at __btrfs_debug_check_extent_io_range()
We've tested that we are dealing with io tree that is associated to an
inode (its owner is IO_TREE_INODE_IO), so there's no need to call
btrfs_extent_io_tree_to_inode() in a separate line and we just assign
tree->inode to the local inode variable when we declare it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
02c340c278 btrfs: rename the functions to get inode and fs_info from an extent io tree
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.

So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs. Also remove the 'const' suffix from extent_io_tree_to_inode_const()
since there's no non-const variant anymore and makes the naming consistent
with extent_io_tree_to_fs_info() (no 'const' suffix and returns a const
pointer).

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
66da9c1bed btrfs: rename the functions to search for bits in extent ranges
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.

So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
791b3455ac btrfs: rename set_extent_bit() to include a btrfs prefix
This is an exported function so it should have a 'btrfs_' prefix by
convention, to make it clear it's btrfs specific and to avoid collisions
with functions from elsewhere in the kernel.

So rename it to btrfs_set_extent_bit().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:44 +02:00
Filipe Manana
9d222562b4 btrfs: rename the functions to clear bits for an extent range
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. One of them has a
double underscore prefix which is also discouraged.

So remove double underscore prefix where applicable and add a 'btrfs_'
prefix to their name to make it clear they are from btrfs.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:43 +02:00
Filipe Manana
2cb9ac3faa btrfs: rename __lock_extent() and __try_lock_extent()
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. Their double
underscore prefix is also discouraged.

So remove their double underscore prefix, add a 'btrfs_' prefix to their
name to make it clear they are from btrfs and a '_bits' suffix to avoid
collision with btrfs_lock_extent() and btrfs_try_lock_extent().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:43 +02:00
Filipe Manana
41708a4c23 btrfs: add btrfs prefix to trace events for extent state alloc and free
These trace events don't have the 'btrfs_' prefix in their name, unlike
the other trace events from extent-io-tree.c. So add the prefix to make
them consistent and follow coding style conventions too.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:43 +02:00
Filipe Manana
024b3bc190 btrfs: remove extent_io_tree_to_inode() and is_inode_io_tree()
These functions aren't used outside extent-io-tree.c, but yet one of them
(extent_io_tree_to_inode()) is unnecessarily exported in the header.

Furthermore their single use is in a pattern like this:

    if (is_inode_io_tree(tree))
        foo(extent_io_tree_to_inode(tree), ...);

So we're effectively unnecessarily adding more indirection, checking
twice if tree->owner == IO_TREE_INODE_IO before getting the inode and
doing a non-inline function call to get tree->inode.

Simplify this by removing these helper functions and instead doing
thing like this:

   if (tree->owner == IO_TREE_INODE_IO)
       foo(tree->inode, ...);

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:43 +02:00
Filipe Manana
c6a43322a3 btrfs: remove redundant record start offset check at test_range_bit()
It's pointless to check if the current record's start offset is greater
than the end offset, as before we just tested if it was greater than the
start offset - and if it's not it means it's less than or equal to the
start offset, so it can not be greater than the end offset, as our start
offset is always smaller than the end offset.

So remove that check and also add an assertion to verify the start offset
is smaller then the end offset.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:41 +02:00
Filipe Manana
53828c759a btrfs: simplify last record detection at test_range_bit()
The overflow detection for the start offset of the next record is not
really necessary, we can just stop iterating if the current record ends at
or after out end offset. This removes the need to test if the current
record end offset is (u64)-1 and to check if adding 1 to the current
end offset results in 0.

By testing only if the current record ends at or after the end offset, we
also don't need anymore to test the new start offset at the head of the
while loop.

This makes both the source code and assembly code simpler, more efficient
and shorter (reducing the object text size).

Also remove the pointless initialization to NULL of the state variable, as
we don't use it before the first assignment to it. This may help avoid
some warnings with clang tools such as the one reported/fixed by commit
966de47ff0 ("btrfs: remove redundant initialization of variables in
log_new_ancestors").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:41 +02:00
Filipe Manana
c54c245f80 btrfs: remove redundant check at find_first_extent_bit_state()
The tree_search() function always returns an entry that either contains
the search offset or the first entry in the tree that starts after the
offset. So checking at find_first_extent_bit_state() if the returned
entry ends at or after the search offset is pointless. Remove the check.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:41 +02:00
Filipe Manana
56ec21a6dd btrfs: fix documentation for tree_search_for_insert()
There are several things wrong with the documentation:

1) At the top it's only mentioned that we search for an entry containing
   the given offset, but when such entry does not exists we search for
   the first entry that starts and ends after that offset;

2) It mentions that @node_ret and @parent_ret aren't changed if the
   returned entry contains the given offset - that is true only if the
   returned entry starts exactly at @offset, otherwise those arguments
   are changed;

3) It mentions that if no entry containing offset is found then we return
   the first entry ending before the offset - that is not true, we return
   the first entry that starts and ends after that offset;

4) It also mentions that NULL is never returned. This is false as in case
   there's no entry containing offset or any entry that starts and ends
   after offset, NULL is returned.

So fix the documentation.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:41 +02:00
Filipe Manana
131a4be1c0 btrfs: simplify last record detection at test_range_bit_exists()
Instead of keeping track of the minimum start offset of the next record
and detecting overflow every time we update that offset to be the sum of
current record's end offset plus one, we can simply exit when the current
record ends at or beyond our end offset and forget about updating the
start offset on every iteration and testing for it at the top of the loop.
This makes both the source code and assembly code simpler, more efficient
and shorter (reducing the object text size).

Also remove the pointless initialization to NULL of the state variable, as
we don't use it before the first assignment to it. This may help avoid
some warnings with clang tools such as the one reported/fixed by commit
966de47ff0 ("btrfs: remove redundant initialization of variables in
log_new_ancestors").

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:41 +02:00
David Sterba
6aa79c4f25 btrfs: use rb_entry_safe() where possible to simplify code
Simplify conditionally reading an rb_entry(), there's the
rb_entry_safe() helper that checks the node pointer for NULL so we don't
have to write it explicitly.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
c4669e4a8b btrfs: pass a pointer to get_range_bits() to cache first search result
Allow get_range_bits() to take an extent state pointer to pointer argument
so that we can cache the first extent state record in the target range, so
that a caller can use it for subsequent operations without doing a full
tree search. Currently the only user is try_release_extent_state(), which
then does a call to __clear_extent_bit() which can use such a cached state
record.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
32c523c578 btrfs: allow folios to be released while ordered extent is finishing
When the release_folio callback (from struct address_space_operations) is
invoked we don't allow the folio to be released if its range is currently
locked in the inode's io_tree, as it may indicate the folio may be needed
by the task that locked the range.

However if the range is locked because an ordered extent is finishing,
then we can safely allow the folio to be released because ordered extent
completion doesn't need to use the folio at all.

When we are under memory pressure, the kernel starts writeback of dirty
pages (folios) with the goal of releasing the pages from the page cache
after writeback completes, however this often is not possible on btrfs
because:

  * Once the writeback completes we queue the ordered extent completion;

  * Once the ordered extent completion starts, we lock the range in the
    inode's io_tree (at btrfs_finish_one_ordered());

  * If the release_folio callback is called while the folio's range is
    locked in the inode's io_tree, we don't allow the folio to be
    released, so the kernel has to try to release memory elsewhere,
    which may result in triggering more writeback or releasing other
    pages from the page cache which may be more useful to have around
    for applications.

In contrast, when the release_folio callback is invoked after writeback
finishes and before ordered extent completion starts or locks the range,
we allow the folio to be released, as well as when the release_folio
callback is invoked after ordered extent completion unlocks the range.

Improve on this by detecting if the range is locked for ordered extent
completion and if it is, allow the folio to be released. This detection
is achieved by adding a new extent flag in the io_tree that is set when
the range is locked during ordered extent completion.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
David Sterba
0128c9a7cd btrfs: add __cold attribute to extent_io_tree_panic()
This is a wrapper that leads to a panic, so add the annotation like the
other similar functions have.

Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18 20:35:42 +01:00
Josef Bacik
7e2a595084 btrfs: introduce EXTENT_DIO_LOCKED
In order to support dropping the extent lock during a read we need a way
to make sure that direct reads and direct writes for overlapping ranges
are protected from each other.  To accomplish this introduce another
lock bit specifically for direct io.  Subsequent patches will utilize
this to protect direct IO operations.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10 16:51:20 +02:00
Boris Burkov
33336c1805 btrfs: preallocate ulist memory for qgroup rsv
When qgroups are enabled, during data reservation, we allocate the
ulist_nodes that track the exact reserved extents with GFP_ATOMIC
unconditionally. This is unnecessary, and we can follow the model
already employed by the struct extent_state we preallocate in the non
qgroups case, which should reduce the risk of allocation failures with
GFP_ATOMIC.

Add a prealloc node to struct ulist which ulist_add will grab when it is
present, and try to allocate it before taking the tree lock while we can
still take advantage of a less strict gfp mask. The lifetime of that
node belongs to the new prealloc field, until it is used, at which point
it belongs to the ulist linked list.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 15:33:26 +02:00
Anand Jain
d5b634ae1f btrfs: rename err to ret in convert_extent_bit()
Unify naming of return value to the preferred way.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:01 +02:00
Anand Jain
cbb6b5d208 btrfs: rename err to ret in __set_extent_bit()
Unify naming of return value to the preferred way.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:01 +02:00
Chengming Zhou
ef5a05c557 btrfs: remove SLAB_MEM_SPREAD flag use
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit
16a1d96835 ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series[1] went on to mark it obsolete to avoid confusion for users.
Here we can just remove all its users, which has no functional change.

[1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-05 17:13:23 +01:00
David Sterba
2b712e3bb2 btrfs: remove unused included headers
With help of neovim, LSP and clangd we can identify header files that
are not actually needed to be included in the .c files. This is focused
only on removal (with minor fixups), further cleanups are possible but
will require doing the header files properly with forward declarations,
minimized includes and include-what-you-use care.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:46 +01:00
Josef Bacik
8fd2b12e6a btrfs: WARN_ON_ONCE() in our leak detection code
fstests looks for WARN_ON's in dmesg.  Add WARN_ON_ONCE() to our leak
detection code (enabled only in debug builds) so that fstests will fail
if these things trip at all.  This will allow us to easily catch
problems with our reference counting that may otherwise go unnoticed.

Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:45 +01:00
David Sterba
637e6e0f50 btrfs: allocate btrfs_inode::file_extent_tree only without NO_HOLES
The file_extent_tree was added in 41a2ee75aa ("btrfs: introduce
per-inode file extent tree") so we have an explicit mapping of the file
extents to know where it is safe to update i_size. When the feature
NO_HOLES is enabled, and it's been a mkfs default since 5.15, the tree
is not necessary.

To save some space in the inode, allocate the tree only when necessary.
This reduces size by 16 bytes from 1096 to 1080 on a x86_64 release
config.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 22:59:01 +01:00
David Sterba
738290c056 btrfs: always set extent_io_tree::inode and drop fs_info
The extent_io_tree is embedded in several structures, notably in struct
btrfs_inode.  The fs_info is only used for reporting errors and for
reference in trace points. We can get to the pointer through the inode,
but not all io trees set it. However, we always know the owner and
can recognize if inode is valid.  For access helpers are provided, const
variant for the trace points.

This reduces size of extent_io_tree by 8 bytes and following structures
in turn:

- btrfs_inode		1104 -> 1088
- btrfs_device		 520 ->  512
- btrfs_root		1360 -> 1344
- btrfs_transaction	 456 ->  440
- btrfs_fs_info		3600 -> 3592
- reloc_control		1520 -> 1512

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:02 +01:00
David Sterba
70146f2b09 btrfs: enhance extent_io_tree error reports
Pass the type of the extent io tree operation which failed in the report
helper. The message wording and contents is updated, though locking
might be the cause of the error it's probably not the only one and we're
interested in the state.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:02 +01:00
David Sterba
ab76c43e74 btrfs: drop error message in extent_io_tree insert_state()
The helper insert_state errors are handled in all callers and reported
by extent_io_tree_panic so we don't need to do it twice.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:02 +01:00
David Sterba
516095cdf0 btrfs: move lockdep class setting out of extent_io_tree_init
The per-inode file extent tree was added in 41a2ee75aa ("btrfs:
introduce per-inode file extent tree"), it's the only tree type
that requires the lockdep class. Move it to the file where it is
actually used.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:02 +01:00
David Sterba
49542050b1 btrfs: remove unused definition of tree_entry in extent-io-tree.c
The declaration was temporarily moved in a4055213bf ("btrfs: unexport
all the temporary exports for extent-io-tree.c") and then should have
been removed in 6.0 in 071d19f513 ("btrfs: remove struct tree_entry in
extent-io-tree.c") but was not.  This was found by tool
https://github.com/jirislaby/clang-struct .

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
Filipe Manana
efba145449 btrfs: make sure we cache next state in find_first_extent_bit()
Currently, at find_first_extent_bit(), when we are given a cached extent
state that happens to have its end offset match the desired range start,
we find the next extent state using that cached state, with next_state()
calls, and then return it.

We then try to cache that next state by calling cache_state_if_flags(),
but that will not cache the state because we haven't reset *cached_state
to NULL, so we end up with the cached_state unchanged, and if the caller
is iterating over extent states in the io tree, its next call to
find_first_extent_bit() will not use the current cached state as its end
offset does not match the minimum start range offset, therefore the cached
state is reset and we have to search the rbtree to find the next suitable
extent state record.

So fix this by resetting the cached state to NULL (and dropping our ref
on it) when we have a suitable cached state and we found a next state by
using next_state() starting from the cached state. This makes use cases
of calling find_first_extent_bit() to go over all ranges in the io tree
to do a single rbtree full search, only on the first call, and the next
calls will just do next_state() (rb_next() wrapper) calls, which is more
efficient.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:15 +02:00
Filipe Manana
63ffc1f7c4 btrfs: make tree iteration in extent_io_tree_release() more efficient
Currently extent_io_tree_release() is a loop that keeps getting the first
node in the io tree, using rb_first() which is a loop that gets to the
leftmost node of the rbtree, and then for each node it calls rb_erase(),
which often requires rebalancing the rbtree.

We can make this more efficient by using
rbtree_postorder_for_each_entry_safe() to free each node without having
to delete it from the rbtree and without looping to get the first node.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:15 +02:00
Filipe Manana
df2a8e70c3 btrfs: collapse wait_on_state() to its caller wait_extent_bit()
The wait_on_state() function is very short and has a single caller, which
is wait_extent_bit(), so remove the function and put its code into the
caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:15 +02:00
Filipe Manana
28967c7622 btrfs: remove redundant memory barrier from extent_io_tree_release()
The memory barrier at extent_io_tree_release() is redundant. Holding
spin_lock here is not enough to drop the barrier completely.  We only
change the waitqueue of an extent state record while holding the tree
lock - see wait_on_state().

The update to waitqueue state will not become stale because there will
be an spin_unlock/spin_lock sequence between the change and waiting,
this implies a full memory barrier.

So remove the explicit smp_mb() barrier.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reword reasoning ]
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:15 +02:00
Filipe Manana
a1c20d15ee btrfs: make wait_extent_bit() static
The function wait_extent_bit() is not used outside extent-io-tree.c so
make it static. Furthermore the function doesn't have the 'btrfs_' prefix.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:15 +02:00
Filipe Manana
bea22a58c9 btrfs: update stale comment at extent_io_tree_release()
There's this comment at extent_io_tree_release() that mentions io btrees,
but this function is no longer used only for io btrees. Originally it was
added as a static function named clear_btree_io_tree() at transaction.c,
in commit 663dfbb077 ("Btrfs: deal with convert_extent_bit errors to
avoid fs corruption"), as it was used only for cleaning one of the io
trees that track dirty extent buffers, the dirty_log_pages io tree of a
a root and the dirty_pages io tree of a transaction. Later it was renamed
and exported and now it's used to cleanup other io trees such as the
allocation state io tree of a device or the csums range io tree of a log
root.

So remove that comment and replace it with one at the top of the function
that is more complete, mentioning what the function does and that it's
expected to be called only when a task is sure no one else will need to
use the tree anymore, as well as there should be no locked ranges in the
tree and therefore no waiters on its extent state records. Also add an
assertion to check that there are no locked extent state records in the
tree.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:14 +02:00
Filipe Manana
c91ea4bfa6 btrfs: make extent state merges more efficient during insertions
When inserting a new extent state record into an io tree that happens to
be mergeable, we currently do the following:

1) Insert the extent state record in the io tree's rbtree. This requires
   going down the tree to find where to insert it, and during the
   insertion we often need to balance the rbtree;

2) We then check if the previous node is mergeable, so we call rb_prev()
   to find it, which requires some looping to find the previous node;

3) If the previous node is mergeable, we adjust our node to include the
   range of the previous node and then delete the previous node from the
   rbtree, which again may need to balance the rbtree;

4) Then we check if the next node is mergeable with the node we inserted,
   so we call rb_next(), which requires some looping too. If the next node
   is indeed mergeable, we expand the range of our node to include the
   next node's range and then delete the next node from the rbtree, which
   again may need to balance the tree.

So these are quite of lot of iterations and looping over the rbtree, and
some of the operations may need to rebalance the rb tree. This can be made
a bit more efficient by:

1) When iterating the rbtree, once we find a node that is mergeable with
   the node we want to insert, we can just adjust that node's range with
   the range of the node to insert - this avoids continuing iterating
   over the tree and deleting a node from the rbtree;

2) If we expand the range of a mergeable node, then we find the next or
   the previous node, depending on other we merged a range to the right or
   to the left of the node we are currently at during the iteration. This
   merging is as before, we find the next or previous node with rb_next()
   or rb_prev() and if that other node is mergeable with the current one,
   we adjust the range of the current node and remove the other node from
   the rbtree;

3) Whenever we need to insert the new extent state record it's because
   we don't have any extent state record in the rbtree which can be
   merged, so we can remove the call to merge_state() after the insertion,
   saving rb_next() and rb_prev() calls, which require some looping.

So update the insertion function insert_state() to have this behaviour.

Running dbench for 120 seconds and capturing the execution times of
set_extent_bit() at pin_down_extent(), resulted in the following data
(time values are in nanoseconds):

Before this change:

  Count: 2278299
  Range:  0.000 - 4003728.000; Mean: 713.436; Median: 612.000; Stddev: 3606.952
  Percentiles:  90th: 1187.000; 95th: 1350.000; 99th: 1724.000
       0.000 -       7.534:       5 |
       7.534 -      35.418:      36 |
      35.418 -     154.403:     273 |
     154.403 -     662.138: 1244016 #####################################################
     662.138 -    2828.745: 1031335 ############################################
    2828.745 -   12074.102:    1395 |
   12074.102 -   51525.930:     806 |
   51525.930 -  219874.955:     162 |
  219874.955 -  938254.688:      22 |
  938254.688 - 4003728.000:       3 |

After this change:

  Count: 2275862
  Range:  0.000 - 1605175.000; Mean: 678.903; Median: 590.000; Stddev: 2149.785
  Percentiles:  90th: 1105.000; 95th: 1245.000; 99th: 1590.000
       0.000 -      10.219:      10 |
      10.219 -      40.957:      36 |
      40.957 -     155.907:     262 |
     155.907 -     585.789: 1127214 ####################################################
     585.789 -    2193.431: 1145134 #####################################################
    2193.431 -    8205.578:    1648 |
    8205.578 -   30689.378:    1039 |
   30689.378 -  114772.699:     362 |
  114772.699 -  429221.537:      52 |
  429221.537 - 1605175.000:      10 |

Maximum duration (range), average duration, percentiles and standard
deviation are all better.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:14 +02:00
David Sterba
893fe24399 btrfs: change test_range_bit to scan the whole range
The semantics of test_range_bit() with filled == 0 is now in it's own
helper so test_range_bit will check the whole range unconditionally.
The detection logic is flipped and assumes success by default and
catches exceptions.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:14 +02:00
David Sterba
99be1a66e1 btrfs: add specific helper for range bit test exists
The existing helper test_range_bit works in two ways, checks if the whole
range contains all the bits, or stop on the first occurrence.  By adding
a specific helper for the latter case, the inner loop can be simplified
and contains fewer conditionals, making it a bit faster.

There's no caller that uses the cached state pointer so this reduces the
argument count further.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:14 +02:00