linux/fs/btrfs
Qu Wenruo efa11fd269 btrfs: fix data overwriting bug during buffered write when block size < page size
[BUG]
When running generic/418 with a btrfs whose block size < page size
(subpage cases), it always fails.

And the following minimal reproducer is more than enough to trigger it
reliably:

workload()
{
        mkfs.btrfs -s 4k -f $dev > /dev/null
        dmesg -C
        mount $dev $mnt
        $fsstree_dir/src/dio-invalidate-cache -r -b 4096 -n 3 -i 1 -f $mnt/diotest
        ret=$?
        umount $mnt
        stop_trace
        if [ $ret -ne 0 ]; then
                fail
        fi
}

for (( i = 0; i < 1024; i++)); do
        echo "=== $i/$runtime ==="
        workload
done

[CAUSE]
With extra trace printk added to the following functions:
- btrfs_buffered_write()
  * Which folio is touched
  * The file offset (start) where the buffered write is at
  * How many bytes are copied
  * The content of the write (the first 2 bytes)

- submit_one_sector()
  * Which folio is touched
  * The position inside the folio
  * The content of the page cache (the first 2 bytes)

- pagecache_isize_extended()
  * The parameters of the function itself
  * The parameters of the folio_zero_range()

Which are enough to show the problem:

  22.158114: btrfs_buffered_write: folio pos=0 start=0 copied=4096 content=0x0101
  22.158161: submit_one_sector: r/i=5/257 folio=0 pos=0 content=0x0101
  22.158609: btrfs_buffered_write: folio pos=0 start=4096 copied=4096 content=0x0101
  22.158634: btrfs_buffered_write: folio pos=0 start=8192 copied=4096 content=0x0101
  22.158650: pagecache_isize_extended: folio=0 from=4096 to=8192 bsize=4096 zero off=4096 len=8192
  22.158682: submit_one_sector: r/i=5/257 folio=0 pos=4096 content=0x0000
  22.158686: submit_one_sector: r/i=5/257 folio=0 pos=8192 content=0x0101

The tool dio-invalidate-cache will start 3 threads, each doing a buffered
write with 0x01 at offset 0, 4096 and 8192, do a fsync, then do a direct read,
and compare the read buffer with the write buffer.

Note that all 3 btrfs_buffered_write() are writing the correct 0x01 into
the page cache.

But at submit_one_sector(), at file offset 4096, the content is zeroed
out, by pagecache_isize_extended().

The race happens like this:
 Thread A is writing into range [4K, 8K).
 Thread B is writing into range [8K, 12k).

               Thread A              |         Thread B
-------------------------------------+------------------------------------
btrfs_buffered_write()               | btrfs_buffered_write()
|- old_isize = 4K;                   | |- old_isize = 4096;
|- btrfs_inode_lock()                | |
|- write into folio range [4K, 8K)   | |
|- pagecache_isize_extended()        | |
|  extend isize from 4096 to 8192    | |
|  no folio_zero_range() called      | |
|- btrfs_inode_lock()                | |
                                     | |- btrfs_inode_lock()
				     | |- write into folio range [8K, 12K)
				     | |- pagecache_isize_extended()
				     | |  calling folio_zero_range(4K, 8K)
				     | |  This is caused by the old_isize is
				     | |  grabbed too early, without any
				     | |  inode lock.
				     | |- btrfs_inode_unlock()

The @old_isize is grabbed without inode lock, causing race between two
buffered write threads and making pagecache_isize_extended() to zero
range which is still containing cached data.

And this is only affecting subpage btrfs, because for regular blocksize
== page size case, the function pagecache_isize_extended() will do
nothing if the block size >= page size.

[FIX]
Grab the old i_size while holding the inode lock.
This means each buffered write thread will have a stable view of the
old inode size, thus avoid the above race.

CC: stable@vger.kernel.org # 5.15+
Fixes: 5e8b9ef303 ("btrfs: move pos increment and pagecache extension to btrfs_buffered_write")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-21 09:32:24 +01:00
..
tests btrfs: selftests: fix btrfs_test_delayed_refs() leak of transaction 2025-02-17 17:24:14 +01:00
accessors.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
accessors.h move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
acl.c btrfs: remove unused included headers 2024-03-04 16:24:46 +01:00
acl.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
async-thread.c btrfs: async-thread: rename DFT_THRESHOLD to DEFAULT_THRESHOLD 2025-01-13 14:53:23 +01:00
async-thread.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
backref.c btrfs: update prelim_ref_insert() to use rb helpers 2025-01-13 14:53:18 +01:00
backref.h btrfs: remove detached list from struct btrfs_backref_cache 2025-01-13 14:53:15 +01:00
bio.c btrfs: add tracking of read blocks for read policy 2025-01-13 14:53:21 +01:00
bio.h btrfs: fix error propagation of split bios 2024-10-23 18:17:43 +02:00
block-group.c btrfs: block-group: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
block-group.h btrfs: constify more pointer parameters 2024-09-10 16:51:22 +02:00
block-rsv.c btrfs: drop fs_info argument from btrfs_update_space_info_*() 2025-01-13 14:53:14 +01:00
block-rsv.h btrfs: constify more pointer parameters 2024-09-10 16:51:22 +02:00
btrfs_inode.h btrfs: remove no longer needed strict argument from can_nocow_extent() 2025-01-13 14:53:16 +01:00
compression.c btrfs: use filemap_get_folio() helper 2024-11-11 14:34:19 +01:00
compression.h btrfs: lzo: drop unused paramter level from lzo_alloc_workspace() 2024-11-11 14:34:16 +01:00
ctree.c btrfs: fix lockdep splat while merging a relocation root 2025-01-23 22:34:05 +01:00
ctree.h btrfs: remove pointless comment from ctree.h 2025-01-13 14:53:17 +01:00
defrag.c btrfs: fix defrag not merging contiguous extents due to merged extent maps 2024-10-31 16:46:41 +01:00
defrag.h btrfs: drop transaction parameter from btrfs_add_inode_defrag() 2024-09-10 16:51:19 +02:00
delalloc-space.c btrfs: drop fs_info argument from btrfs_update_space_info_*() 2025-01-13 14:53:14 +01:00
delalloc-space.h btrfs: constify pointer parameters where applicable 2024-07-11 15:33:22 +02:00
delayed-inode.c btrfs: drop unused parameter fs_info to btrfs_delete_delayed_insertion_item() 2025-01-13 14:53:21 +01:00
delayed-inode.h btrfs: remove hole from struct btrfs_delayed_node 2024-11-11 14:34:22 +01:00
delayed-ref.c btrfs: update tree_insert() to use rb helpers 2025-01-13 14:53:18 +01:00
delayed-ref.h btrfs: move select_delayed_ref() and export it 2025-01-13 14:53:13 +01:00
dev-replace.c btrfs: dev-replace: remove unnecessary call to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
dev-replace.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
dir-item.c btrfs: dir-item: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
dir-item.h btrfs: drop unused parameter fs_info from btrfs_match_dir_item_name() 2024-11-11 14:34:17 +01:00
direct-io.c btrfs: remove no longer needed strict argument from can_nocow_extent() 2025-01-13 14:53:16 +01:00
direct-io.h btrfs: move the direct IO code into its own file 2024-07-11 15:33:29 +02:00
discard.c btrfs: constify more pointer parameters 2024-09-10 16:51:22 +02:00
discard.h
disk-io.c btrfs: split waiting from read_extent_buffer_pages(), drop parameter wait 2025-01-13 14:53:23 +01:00
disk-io.h btrfs: remove stray comment about SRCU 2025-01-13 14:53:21 +01:00
export.c btrfs: remove super block argument from btrfs_iget() 2024-07-11 15:33:25 +02:00
export.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
extent-io-tree.c btrfs: introduce EXTENT_DIO_LOCKED 2024-09-10 16:51:20 +02:00
extent-io-tree.h btrfs: introduce EXTENT_DIO_LOCKED 2024-09-10 16:51:20 +02:00
extent-tree.c btrfs: use SECTOR_SIZE defines in btrfs_issue_discard() 2025-01-13 14:53:22 +01:00
extent-tree.h btrfs: move extent-tree function declarations out of ctree.h 2025-01-13 14:53:17 +01:00
extent_io.c btrfs: fix stale page cache after race between readahead and direct IO write 2025-02-11 23:09:03 +01:00
extent_io.h btrfs: split waiting from read_extent_buffer_pages(), drop parameter wait 2025-01-13 14:53:23 +01:00
extent_map.c btrfs: do regular iput instead of delayed iput during extent map shrinking 2025-02-21 09:32:11 +01:00
extent_map.h btrfs: make the extent map shrinker run asynchronously as a work queue job 2024-11-11 14:34:17 +01:00
fiemap.c btrfs: correct typos in multiple comments across various files 2024-11-11 14:34:14 +01:00
fiemap.h btrfs: move fiemap code into its own file 2024-07-11 15:33:20 +02:00
file-item.c btrfs: file-item: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
file-item.h btrfs: constify more pointer parameters 2024-09-10 16:51:22 +02:00
file.c btrfs: fix data overwriting bug during buffered write when block size < page size 2025-02-21 09:32:24 +01:00
file.h btrfs: convert btrfs_buffered_write() to use folios 2024-11-11 14:34:19 +01:00
free-space-cache.c btrfs: open code set_page_extent_mapped() 2025-01-13 14:53:22 +01:00
free-space-cache.h btrfs: add cancellation points to trim loops 2024-10-07 23:21:56 +02:00
free-space-tree.c btrfs: free-space-tree: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
free-space-tree.h btrfs: add forward declarations and headers, part 2 2024-03-04 16:24:49 +01:00
fs.c btrfs: use uuid_is_null() to verify if an uuid is empty 2025-01-13 14:53:17 +01:00
fs.h btrfs: add tracking of read blocks for read policy 2025-01-13 14:53:21 +01:00
inode-item.c btrfs: inode-item: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:19 +01:00
inode-item.h btrfs: constify more pointer parameters 2024-09-10 16:51:22 +02:00
inode.c btrfs: remove the unused locked_folio parameter from btrfs_cleanup_ordered_extents() 2025-01-13 16:00:50 +01:00
ioctl.c btrfs: add io_uring interface for encoded writes 2025-01-13 21:06:31 +01:00
ioctl.h btrfs: move btrfs_is_empty_uuid() from ioctl.c into fs.c 2025-01-13 14:53:17 +01:00
Kconfig btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from CONFIG_BTRFS_DEBUG 2024-11-11 14:34:12 +01:00
locking.c btrfs: remove unused btrfs_try_tree_write_lock() 2024-11-11 14:34:14 +01:00
locking.h btrfs: add assertions and comment about path expectations to btrfs_cross_ref_exist() 2025-01-13 14:53:16 +01:00
lru_cache.c
lru_cache.h btrfs: cleanup recursive include of the same header 2024-07-11 15:33:22 +02:00
lzo.c btrfs: lzo: drop unused paramter level from lzo_alloc_workspace() 2024-11-11 14:34:16 +01:00
Makefile btrfs: selftests: add delayed ref self test cases 2025-01-13 14:53:13 +01:00
messages.c btrfs: disable rate limiting when debug enabled 2024-10-01 19:29:41 +02:00
messages.h
misc.h btrfs: constify pointer parameters where applicable 2024-07-11 15:33:22 +02:00
ordered-data.c btrfs: fix assertion failure when splitting ordered extent after transaction abort 2025-01-23 22:34:09 +01:00
ordered-data.h btrfs: convert btrfs_mark_ordered_io_finished() to take a folio 2024-09-10 16:51:14 +02:00
orphan.c btrfs: BTRFS_PATH_AUTO_FREE in orphan.c 2024-09-10 16:51:22 +02:00
orphan.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
print-tree.c btrfs: avoid using fixed char array size for tree names 2024-08-02 22:44:27 +02:00
print-tree.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
props.c btrfs: pass a btrfs_inode to btrfs_set_prop() 2024-07-11 15:33:29 +02:00
props.h btrfs: pass a btrfs_inode to btrfs_set_prop() 2024-07-11 15:33:29 +02:00
qgroup.c btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop 2025-01-23 22:34:17 +01:00
qgroup.h btrfs: drop unused transaction parameter from btrfs_qgroup_add_swapped_blocks() 2024-11-11 14:34:16 +01:00
raid-stripe-tree.c btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents 2025-01-14 15:52:22 +01:00
raid-stripe-tree.h btrfs: tests: add selftests for raid-stripe-tree 2024-11-11 14:34:14 +01:00
raid56.c btrfs: make assert_rbio() to only check CONFIG_BTRFS_ASSERT 2024-11-11 14:34:12 +01:00
raid56.h btrfs: add forward declarations and headers, part 2 2024-03-04 16:24:49 +01:00
rcu-string.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
ref-verify.c btrfs: ref-verify: fix use-after-free after invalid ref action 2024-11-29 16:52:29 +01:00
ref-verify.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
reflink.c btrfs: convert copy_inline_to_page() to use folio 2024-09-10 16:51:21 +02:00
reflink.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
relocation.c btrfs: open code set_page_extent_mapped() 2025-01-13 14:53:22 +01:00
relocation.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
root-tree.c btrfs: root-tree: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:20 +01:00
root-tree.h btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations 2024-04-02 19:18:23 +02:00
scrub.c btrfs: avoid NULL pointer dereference if no valid extent tree 2025-01-06 16:32:31 +01:00
scrub.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
send.c btrfs: send: remove redundant assignments to variable ret 2025-01-13 14:53:14 +01:00
send.h btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from CONFIG_BTRFS_DEBUG 2024-11-11 14:34:12 +01:00
space-info.c btrfs: zoned: reclaim unused zone by zone resetting 2025-01-13 14:53:14 +01:00
space-info.h btrfs: zoned: reclaim unused zone by zone resetting 2025-01-13 14:53:14 +01:00
subpage.c btrfs: subpage: dump the involved bitmap when ASSERT() failed 2025-01-13 15:57:51 +01:00
subpage.h btrfs: do proper folio cleanup when run_delalloc_nocow() failed 2025-01-13 15:52:17 +01:00
super.c btrfs: print read policy on module load 2025-01-13 14:53:21 +01:00
super.h btrfs: change BTRFS_MOUNT_* flags to 64bit type 2024-07-19 17:20:23 +02:00
sysfs.c btrfs: configure read policy via module parameter 2025-01-13 14:53:21 +01:00
sysfs.h btrfs: configure read policy via module parameter 2025-01-13 14:53:21 +01:00
transaction.c btrfs: fix use-after-free when attempting to join an aborted transaction 2025-01-23 22:34:14 +01:00
transaction.h btrfs: move abort_should_print_stack() to transaction.h 2025-01-13 14:53:17 +01:00
tree-checker.c btrfs: validate system chunk array at btrfs_validate_super() 2025-01-13 14:53:18 +01:00
tree-checker.h btrfs: validate system chunk array at btrfs_validate_super() 2025-01-13 14:53:18 +01:00
tree-log.c btrfs: tree-log: remove unnecessary calls to btrfs_mark_buffer_dirty() 2025-01-13 14:53:18 +01:00
tree-log.h btrfs: avoid transaction commit on any fsync after subvolume creation 2024-07-11 15:33:24 +02:00
tree-mod-log.c btrfs: drop unused parameter path from btrfs_tree_mod_log_rewind() 2024-11-11 14:34:15 +01:00
tree-mod-log.h btrfs: drop unused parameter path from btrfs_tree_mod_log_rewind() 2024-11-11 14:34:15 +01:00
ulist.c btrfs: preallocate ulist memory for qgroup rsv 2024-07-11 15:33:26 +02:00
ulist.h btrfs: preallocate ulist memory for qgroup rsv 2024-07-11 15:33:26 +02:00
uuid-tree.c btrfs: uuid-tree: remove unnecessary call to btrfs_mark_buffer_dirty() 2025-01-13 14:53:20 +01:00
uuid-tree.h btrfs: move uuid tree related code to uuid-tree.[ch] 2024-09-10 16:51:12 +02:00
verity.c btrfs: add and use helper to verify the calling task has locked the inode 2024-09-10 16:51:22 +02:00
verity.h btrfs: add forward declarations and headers, part 1 2024-03-04 16:24:49 +01:00
volumes.c btrfs: output an error message if btrfs failed to find the seed fsid 2025-02-21 09:32:16 +01:00
volumes.h btrfs: add read policy to set a preferred device 2025-01-13 14:53:21 +01:00
xattr.c btrfs: xattr: remove unnecessary call to btrfs_mark_buffer_dirty() 2025-01-13 14:53:20 +01:00
xattr.h btrfs: constify pointer parameters where applicable 2024-07-11 15:33:22 +02:00
zlib.c btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path 2025-01-06 16:32:43 +01:00
zoned.c btrfs: zoned: reclaim unused zone by zone resetting 2025-01-13 14:53:14 +01:00
zoned.h btrfs: zoned: reclaim unused zone by zone resetting 2025-01-13 14:53:14 +01:00
zstd.c btrfs: zstd: assert the timer pointer in callback 2024-11-11 14:34:15 +01:00