New Features:
* Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache
* Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2
* Add a localio sysfs attribute
Stable Fixes:
* Fix double-unlock bug in nfs_return_empty_folio()
* Don't check for OPEN feature support in v4.1
* Always probe for LOCALIO support asynchronously
* Prevent hang on NFS mounts with xprtsec=[m]tls
Other Bugfixes:
* xattr handlers should check for absent nfs filehandles
* Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are delegated
* Fix listxattr to return selinux security labels
* Connect to NFSv3 DS using TLS if MDS connection uses TLS
* Clear SB_RDONLY before getting a superblock, and ignore when remounting
* Fix incorrect handling of NFS error codes in nfs4_do_mkdir()
* Various nfs_localio fixes from Neil Brown that include fixing an
rcu compilation error found by older gcc versions.
* Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY
Cleanups:
* Add a refcount tracker for struct net in the nfs_client
* Allow FREE_STATEID to clean up delegations
* Always set NLINK even if the server doesn't support it
* Cleanups to the NFS folio writeback code
* Remove dead code from xs_tcp_tls_setup_socket()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmg/YkAACgkQ18tUv7Cl
QOuGpQ/+OuG/xkVX6j7FerUcdbVhcZ+5jDUKC0cNe6EeFeFRjgqsdFB0uqH+AgJh
DlxEJuXTMq+9mcptl0rjrOn0tj7dlTpgZowp3kWdK3bX1zSI2jBEJjnz3xVzjBQx
3lbmF/UAIaHv5bPVc9aF8mioaj5DSRKWTBLTg7iOM1ol1DqgHK/M0q2D7d2n1yB4
WYGI7LlAWSBGV4PvEkhHW6PwVPDSqECPBvIxd1obq8TSNl+YZlmVxCoJ99+zVqWf
dvaDOwfs5x+YEQH/+N/XWdc38QiCGfu7H79qGHShWB8t/KT4axxmjVs2fT7xtUsv
yN3fb77rlFOCJaPLRF549/4EJqHYMWmFDKIMUZ7YC1vEBCG4B1kQUqarA5eCbsAi
s/rxBs1VNKeev/RecDaViAeH3XZoVU1rNyIBJjOuWgNlC5wnbF+An3zE0m8MAXxO
Vh7wQSH3GZEY+VCR6ljwLhIv6+tvSVQxEZKUUjfVQXp5UuNwN3wKa+sW6li+FBl6
uV6lJcmdUffrurNhvSghIiSQGDkerHUVhSltgtj5FnmRp/AM95Z850t5a7qqc7Cv
duks9siLLaeC4K5W+AOcKLWXho1dJMIPWUej3ErCiHWnA20QiNXsQN4QoimkDKqf
9SYdcl6UECqV5MzIa/L7cW96S3K0acrq+8ofJCjN3A8M0pcTGgU=
=5DFQ
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS clent updates from Anna Schumaker:
"New Features:
- Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache
- Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2
- Add a localio sysfs attribute
Stable Fixes:
- Fix double-unlock bug in nfs_return_empty_folio()
- Don't check for OPEN feature support in v4.1
- Always probe for LOCALIO support asynchronously
- Prevent hang on NFS mounts with xprtsec=[m]tls
Other Bugfixes:
- xattr handlers should check for absent nfs filehandles
- Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are
delegated
- Fix listxattr to return selinux security labels
- Connect to NFSv3 DS using TLS if MDS connection uses TLS
- Clear SB_RDONLY before getting a superblock, and ignore when
remounting
- Fix incorrect handling of NFS error codes in nfs4_do_mkdir()
- Various nfs_localio fixes from Neil Brown that include fixing an
rcu compilation error found by older gcc versions.
- Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY
Cleanups:
- Add a refcount tracker for struct net in the nfs_client
- Allow FREE_STATEID to clean up delegations
- Always set NLINK even if the server doesn't support it
- Cleanups to the NFS folio writeback code
- Remove dead code from xs_tcp_tls_setup_socket()"
* tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (30 commits)
flexfiles/pNFS: update stats on NFS4ERR_DELAY for v4.1 DSes
nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer
nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()
nfs_localio: duplicate nfs_close_local_fh()
nfs_localio: simplify interface to nfsd for getting nfsd_file
nfs_localio: always hold nfsd net ref with nfsd_file ref
nfs_localio: use cmpxchg() to install new nfs_file_localio
SUNRPC: Remove dead code from xs_tcp_tls_setup_socket()
SUNRPC: Prevent hang on NFS mount with xprtsec=[m]tls
nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()
nfs: ignore SB_RDONLY when remounting nfs
nfs: clear SB_RDONLY before getting superblock
NFS: always probe for LOCALIO support asynchronously
pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS
NFS: add localio to sysfs
nfs: use writeback_iter directly
nfs: refactor nfs_do_writepage
nfs: don't return AOP_WRITEPAGE_ACTIVATE from nfs_do_writepage
nfs: fold nfs_page_async_flush into nfs_do_writepage
NFSv4: Always set NLINK even if the server doesn't support it
...
Instead of calling xchg() and unrcu_pointer() before
nfsd_file_put_local(), we now pass pointer to the __rcu pointer and call
xchg() and unrcu_pointer() inside that function.
Where unrcu_pointer() is currently called the internals of "struct
nfsd_file" are not known and that causes older compilers such as gcc-8
to complain.
In some cases we have a __kernel (aka normal) pointer not an __rcu
pointer so we need to cast it to __rcu first. This is strictly a
weakening so no information is lost. Somewhat surprisingly, this cast
is accepted by gcc-8.
This has the pleasing result that the cmpxchg() which sets ro_file and
rw_file, and also the xchg() which clears them, are both now in the nfsd
code.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e0041225 ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The nfsd_localio_operations structure contains nfsd_file_get() to get a
reference to an nfsd_file. This is only used in one place, where
nfsd_open_local_fh() is also used.
This patch combines the two, calling nfsd_open_local_fh() passing a
pointer to where the nfsd_file pointer might be stored. If there is a
pointer there an nfsd_file_get() can get a reference, that reference is
returned. If not a new nfsd_file is acquired, stored at the pointer,
and returned. When we store a reference we also increase the refcount
on the net, as that refcount is decrements when we clear the stored
pointer.
We now get an extra reference *before* storing the new nfsd_file at the
given location. This avoids possible races with the nfsd_file being
freed before the final reference can be taken.
This patch moves the rcu_dereference() needed after fetching from
ro_file or rw_file into the nfsd code where the 'struct nfs_file' is
fully defined. This avoids an error reported by older versions of gcc
such as gcc-8 which complain about rcu_dereference() use in contexts
where the structure (which will supposedly be accessed) is not fully
defined.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e0041225 ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Having separate nfsd_file_put and nfsd_file_put_local in struct
nfsd_localio_operations doesn't make much sense. The difference is that
nfsd_file_put doesn't drop a reference to the nfs_net which is what
keeps nfsd from shutting down.
Currently, if nfsd tries to shutdown it will invalidate the files stored
in the list from the nfs_uuid and this will drop all references to the
nfsd net that the client holds. But the client could still hold some
references to nfsd_files for active IO. So nfsd might think is has
completely shut down local IO, but hasn't and has no way to wait for
those active IO requests to complete.
So this patch changes nfsd_file_get to nfsd_file_get_local and has it
increase the ref count on the nfsd net and it replaces all calls to
->nfsd_put_file to ->nfsd_put_file_local.
It also changes ->nfsd_open_local_fh to return with the refcount on the
net elevated precisely when a valid nfsd_file is returned.
This means that whenever the client holds a valid nfsd_file, there will
be an associated count on the nfsd net, and so the count can only reach
zero when all nfsd_files have been returned.
nfs_local_file_put() is changed to call nfs_to_nfsd_file_put_local()
instead of replacing calls to one with calls to the other because this
will help a later patch which changes nfs_to_nfsd_file_put_local() to
take an __rcu pointer while nfs_local_file_put() doesn't.
Fixes: 86e0041225 ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The marquee feature for this release is that the limit on the
maximum rsize and wsize has been raised to 4MB. The default remains
at 1MB, but risk-seeking administrators now have the ability to try
larger I/O sizes with NFS clients that support them. Eventually the
default setting will be increased when we have confidence that this
change will not have negative impact.
With v6.16, NFSD now has its own debugfs file system where we can
add experimental features and make them available outside of our
development community without impacting production deployments. The
first experimental setting added is one that makes all NFS READ
operations use vfs_iter_read() instead of the NFSD splice actor. The
plan is to eventually retire the splice actor, as that will enable a
number of new capabilities such as the use of struct bio_vec from the
top to the bottom of the NFSD stack.
Jeff Layton contributed a number of observability improvements. The
use of dprintk() in a number of high-traffic code paths has been
replaced with static trace points.
This release sees the continuation of efforts to harden the NFSv4.2
COPY operation. Soon, the restriction on async COPY operations can
be lifted.
Many thanks to the contributors, reviewers, testers, and bug
reporters who participated during the v6.16 development cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmg1yGUACgkQM2qzM29m
f5frMA//TJTbSWiM7qBX1GhVMNr1lxQcjU4BPKo0qZfEtwV06F2BB9mWgDU+BIQh
AcGfMZUmNWAnhOTOYvwqyW6dnX+1yt8sBsCZ/1ctY30A4JH4AgG5sdZS7BUrlEEr
bGDMUCaPnvQ3maeDjMlefe7Xv/rUhj9TVhXmDkt4vf/jCde2JODTB/z8n7WeAxYJ
eOvmr/n5z6VI5Q67M7b5/xqofBEaEoq9P5UEgn61ThfeR0bMlrklm/avDCbbNIH8
6n7Z3tjzllK1CAjEmwHalq4LRbMX5FHWzNkyJw+wtviXS18J5vCAvRe+JDoykusu
L2bgXT8bBUqy46eO4WKEOJtEqVQhIsRFx/8ku1iTLrpDWlwrR4mHVyObEDkkdlMX
EyBQ4svg2OxCXSyy5O8oggzU0TWVJStIjbIEHbJYusWLU7HxxFveBwqwzYHXLtip
WKm6N2ANqQi1du+Pc6xmgXo9svA5Vk+DQjljm1Y5up9dhi2K9cvCIHjwFsZ+E0VL
XqXJ2YgIQb3oXK7FttzLOiDrpX1OX82sTIbgdcPcfT7lP+ej7uiHMBPmdPwgaZIU
EbIp0ThoTkh8/VRMDcWIt+B6SEhmb5vY3Zgz9Lcf2J0PM1fuYJ67L7xGTviFX7Ci
DpohiCgceb6PHYeIuarayF86tPJGF8Vb7XvQZej2Ybv8QdxLFg8=
=FbeG
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"The marquee feature for this release is that the limit on the maximum
rsize and wsize has been raised to 4MB. The default remains at 1MB,
but risk-seeking administrators now have the ability to try larger I/O
sizes with NFS clients that support them. Eventually the default
setting will be increased when we have confidence that this change
will not have negative impact.
With v6.16, NFSD now has its own debugfs file system where we can add
experimental features and make them available outside of our
development community without impacting production deployments. The
first experimental setting added is one that makes all NFS READ
operations use vfs_iter_read() instead of the NFSD splice actor. The
plan is to eventually retire the splice actor, as that will enable a
number of new capabilities such as the use of struct bio_vec from the
top to the bottom of the NFSD stack.
Jeff Layton contributed a number of observability improvements. The
use of dprintk() in a number of high-traffic code paths has been
replaced with static trace points.
This release sees the continuation of efforts to harden the NFSv4.2
COPY operation. Soon, the restriction on async COPY operations can be
lifted.
Many thanks to the contributors, reviewers, testers, and bug reporters
who participated during the v6.16 development cycle"
* tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits)
xdrgen: Fix code generated for counted arrays
SUNRPC: Bump the maximum payload size for the server
NFSD: Add a "default" block size
NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro
NFSD: Remove NFSD_BUFSIZE
sunrpc: Remove the RPCSVC_MAXPAGES macro
svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
sunrpc: Adjust size of socket's receive page array dynamically
SUNRPC: Remove svc_rqst :: rq_vec
SUNRPC: Remove svc_fill_write_vector()
NFSD: Use rqstp->rq_bvec in nfsd_iter_write()
SUNRPC: Export xdr_buf_to_bvec()
NFSD: De-duplicate the svc_fill_write_vector() call sites
NFSD: Use rqstp->rq_bvec in nfsd_iter_read()
sunrpc: Replace the rq_bvec array with dynamically-allocated memory
sunrpc: Replace the rq_pages array with dynamically-allocated memory
sunrpc: Remove backchannel check in svc_init_buffer()
sunrpc: Add a helper to derive maxpages from sv_max_mesg
svcrdma: Reduce the number of rdma_rw contexts per-QP
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc
ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH
eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA=
=IW7H
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs directory lookup updates from Christian Brauner:
"This contains cleanups for the lookup_one*() family of helpers.
We expose a set of functions with names containing "lookup_one_len"
and others without the "_len". This difference has nothing to do with
"len". It's rater a historical accident that can be confusing.
The functions without "_len" take a "mnt_idmap" pointer. This is found
in the "vfsmount" and that is an important question when choosing
which to use: do you have a vfsmount, or are you "inside" the
filesystem. A related question is "is permission checking relevant
here?".
nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
functions. They pass nop_mnt_idmap and refuse to work on filesystems
which have any other idmap.
This work changes nfsd and cachefile to use the lookup_one family of
functions and to explictily pass &nop_mnt_idmap which is consistent
with all other vfs interfaces used where &nop_mnt_idmap is explicitly
passed.
The remaining uses of the "_one" functions do not require permission
checks so these are renamed to be "_noperm" and the permission
checking is removed.
This series also changes these lookup function to take a qstr instead
of separate name and len. In many cases this simplifies the call"
* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
VFS: change lookup_one_common and lookup_noperm_common to take a qstr
Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
cachefiles: Use lookup_one() rather than lookup_one_len()
nfsd: Use lookup_one() rather than lookup_one_len()
VFS: improve interface for lookup_one functions
We'd like to increase the maximum r/wsize that NFSD can support,
but without introducing possible regressions. So let's add a
default setting of 1MB. A subsequent patch will raise the
maximum value but leave the default alone.
No behavior change is expected.
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The 8192-byte maximum is a protocol-defined limit, and we already
have a symbolic constant defined whose name matches the name of
the limit defined in the protocol. Replace the duplicate.
No change in behavior is expected.
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: The documenting comment for NFSD_BUFSIZE is quite stale.
NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for
NFSv2 or v3, and never for RPC Calls. Even so, the byte count
estimate does not include the size of the NFSv4 COMPOUND Reply
HEADER or the RPC auth flavor.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If we can get rid of all uses of rq_vec, then it can be removed.
Replace one use of rqstp::rq_vec with rqstp::rq_bvec.
The feeling of layering violation grows stronger now that
<linux/sunrpc/xdr.h> is included in fs/nfsd/vfs.c.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
All three call sites do the same thing.
I'm struggling with this a bit, however. struct xdr_buf is an XDR
layer object and unmarshaling a WRITE payload is clearly a task
intended to be done by the proc and xdr functions, not by VFS. This
feels vaguely like a layering violation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If we can get rid of all uses of rq_vec, then it can be removed.
Replace one use of rqstp::rq_vec with rqstp::rq_bvec.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observability here is now covered by static tracepoints.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There isn't a common helper for getattrs, so add these into the
protocol-specific helpers.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observe the start of NFS READDIR operations.
The NFS READDIR's count argument can be interesting when tuning a
client's readdir behavior.
However, the count argument is not passed to nfsd_readdir(). To
properly capture the count argument, this tracepoint must appear in
each proc function before the nfsd_readdir() call.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observe the start of RENAME operations for all NFS versions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observe the start of UNLINK, REMOVE, and RMDIR operations for all
NFS versions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observe the start of SYMLINK operations for all NFS versions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Observe the start of file and directory creation for all NFS
versions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace the dprintk in nfsd_lookup_dentry() with a trace point.
nfsd_lookup_dentry() is called frequently enough that enabling this
dprintk call site would result in log floods and performance issues.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Turn Sargun's internal kprobe based implementation of this into a normal
static tracepoint. Also, remove the dprintk's that got added recently
with the fix for zero-length ACLs.
Cc: Sargun Dillon <sargun@sargun.me>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Introduce tracing helpers that can be used before the procedure
status code is known. These macros are similar to the
SVC_RQST_ENDPOINT helpers, but they can be modified to include
NFS-specific fields if that is needed later.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Record and emit presentation addresses using tracing helpers
designed for the task.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 7862 states that if an NFS server implements a CLONE operation,
it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE,
but does not implement FATTR4_CLONE_BLKSIZE.
Note that in Section 12.2, RFC 7862 claims that
FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is
because a minor version is not permitted to add a REQUIRED
attribute. Confusing.
We assume this attribute reports a block size as a count of bytes,
as RFC 7862 does not specify a unit.
Reported-by: Roland Mainz <roland.mainz@nrubsig.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org # v6.7+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256
library API instead, which is much simpler and easier to use.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through
nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized,
this can cause NULL pointer dereference.
Normally the delayed start of laundromat_work allows sufficient time for
nfsd_ssc initialization to complete. However, when the kernel waits too
long for userspace responses (e.g. in nfs4_state_start_net ->
nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done ->
cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the
delayed work may start before nfsd_ssc initialization finishes.
Fix this by moving nfsd_ssc initialization before starting laundromat_work.
Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Very useful for gauging how long the vfs_fsync_range() takes.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If the request being processed is not a v4 compound request, then
examining the cstate can have undefined results.
This patch adds a check that the rpc procedure being executed
(rq_procinfo) is the NFSPROC4_COMPOUND procedure.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When an export policy with xprtsec policy is set with "tls"
and/or "mtls", but an NFS client is doing a v3 xprtsec=tls
mount, then NLM locking calls fail with an error because
there is currently no support for NLM with TLS.
Until such support is added, allow NLM calls under TLS-secured
policy.
Fixes: 4cc9b9f2bf ("nfsd: refine and rename NFSD_MAY_LOCK")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It can be removed since svc_fill_write_vector already has the
same WARN_ON_ONCE.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSD currently has two separate code paths for handling read
requests. One uses page splicing; the other is a traditional read
based on an iov iterator.
Because most Linux file systems support splice read, the latter
does not get nearly the same test experience as splice reads.
To force the use of vectored reads for testing and benchmarking,
introduce the ability to disable splice reads for all NFS READ
operations.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Create a small sandbox under /sys/kernel/debug for experimental NFS
server feature settings. There is no API/ABI compatibility guarantee
for these settings.
The only documentation for such settings, if any documentation exists,
is in the kernel source code.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
As of now nfsd calls create_proc_exports_entry() at start of init_nfsd
and cleanup by remove_proc_entry() at last of exit_nfsd.
Which causes kernel OOPs if there is race between below 2 operations:
(i) exportfs -r
(ii) mount -t nfsd none /proc/fs/nfsd
for 5.4 kernel ARM64:
CPU 1:
el1_irq+0xbc/0x180
arch_counter_get_cntvct+0x14/0x18
running_clock+0xc/0x18
preempt_count_add+0x88/0x110
prep_new_page+0xb0/0x220
get_page_from_freelist+0x2d8/0x1778
__alloc_pages_nodemask+0x15c/0xef0
__vmalloc_node_range+0x28c/0x478
__vmalloc_node_flags_caller+0x8c/0xb0
kvmalloc_node+0x88/0xe0
nfsd_init_net+0x6c/0x108 [nfsd]
ops_init+0x44/0x170
register_pernet_operations+0x114/0x270
register_pernet_subsys+0x34/0x50
init_nfsd+0xa8/0x718 [nfsd]
do_one_initcall+0x54/0x2e0
CPU 2 :
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
PC is at : exports_net_open+0x50/0x68 [nfsd]
Call trace:
exports_net_open+0x50/0x68 [nfsd]
exports_proc_open+0x2c/0x38 [nfsd]
proc_reg_open+0xb8/0x198
do_dentry_open+0x1c4/0x418
vfs_open+0x38/0x48
path_openat+0x28c/0xf18
do_filp_open+0x70/0xe8
do_sys_open+0x154/0x248
Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu().
and same is happening on latest 6.14 kernel as well:
[ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty
...
[ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48
...
[ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4
...
[ 285.469695] Call trace:
[ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P)
[ 285.470488] seq_read+0xe0/0x11c
[ 285.470675] proc_reg_read+0x9c/0xf0
[ 285.470874] vfs_read+0xc4/0x2fc
[ 285.471057] ksys_read+0x6c/0xf4
[ 285.471231] __arm64_sys_read+0x1c/0x28
[ 285.471428] invoke_syscall+0x44/0x100
[ 285.471633] el0_svc_common.constprop.0+0x40/0xe0
[ 285.471870] do_el0_svc_compat+0x1c/0x34
[ 285.472073] el0_svc_compat+0x2c/0x80
[ 285.472265] el0t_32_sync_handler+0x90/0x140
[ 285.472473] el0t_32_sync+0x19c/0x1a0
[ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3)
[ 285.473422] ---[ end trace 0000000000000000 ]---
It reproduced simply with below script:
while [ 1 ]
do
/exportfs -r
done &
while [ 1 ]
do
insmod /nfsd.ko
mount -t nfsd none /proc/fs/nfsd
umount /proc/fs/nfsd
rmmod nfsd
done &
So exporting interfaces to user space shall be done at last and
cleanup at first place.
With change there is no Kernel OOPs.
Co-developed-by: Shubham Rana <s9.rana@samsung.com>
Signed-off-by: Shubham Rana <s9.rana@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
With rpc_status netlink support, unregister of register_filesystem()
was missed in case of genl_register_family() fails.
Correcting it by making new label.
Fixes: bd9d6a3efa ("NFSD: add rpc_status netlink support")
Cc: stable@vger.kernel.org
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Help the client resolve the race between the reply to an
asynchronous COPY reply and the associated CB_OFFLOAD callback by
planting the session, slot, and sequence number of the COPY in the
CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The slot index number of the current COMPOUND has, until now, not
been needed outside of nfsd4_sequence(). But to record the tuple
that represents a referring call, the slot number will be needed
when processing subsequent operations in the COMPOUND.
Refactor the code that allocates a new struct nfsd4_slot to ensure
that the new sl_index field is always correctly initialized.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We have yet to implement a mechanism in NFSD for resolving races
between a server's reply and a related callback operation. For
example, a CB_OFFLOAD callback can race with the matching COPY
response. The client will not recognize the copy state ID in the
CB_OFFLOAD callback until the COPY response arrives.
Trond adds:
> It is also needed for the same kind of race with delegation
> recalls, layout recalls, CB_NOTIFY_DEVICEID and would also be
> helpful (although not as strongly required) for CB_NOTIFY_LOCK.
RFC 8881 Section 20.9.3 describes referring call lists this way:
> The csa_referring_call_lists array is the list of COMPOUND
> requests, identified by session ID, slot ID, and sequence ID.
> These are requests that the client previously sent to the server.
> These previous requests created state that some operation(s) in
> the same CB_COMPOUND as the csa_referring_call_lists are
> identifying. A session ID is included because leased state is tied
> to a client ID, and a client ID can have multiple sessions. See
> Section 2.10.6.3.
Introduce the XDR infrastructure for populating the
csa_referring_call_lists argument of CB_SEQUENCE. Subsequent patches
will put the referring call list to use.
Note that cb_sequence_enc_sz estimates that only zero or one rcl is
included in each CB_SEQUENCE, but the new infrastructure can
manage any number of referring calls.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Try not to prolong the wait for completion of a COPY or COPY_NOTIFY
operation.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Update the status of an async COPY operation when it has been
stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer
running.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- v6.15 libcrc clean-up makes invalid configurations possible
- Fix a potential deadlock introduced during the v6.15 merge window
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmgD2mYACgkQM2qzM29m
f5fn3xAAlPZP7MInbeszGMeRw9zdWaqbg4C90Gehdvi7sMBIYRo9E36NZuT4vl4D
dfpHhoMIPMO5gsGrOeJZAT27La1F8M+O4yZOiuheTfIikaBrbZcZZtpwdRQHayGw
qhHz8XoVNPsqvnoG+95u15XWG4IbI4kme2gNbYxEgF8zGERLv1aKPQTh3FqYG7Bk
+9ANjoPNCAlnUAYdVHJZ5kwfTR10pDwx+jC45pZA4+hAusLx1Ek43pkcBgLMvoh2
ylbWa6SKceVdh/kb3rWaKXtVptNjcaMWLUZNSpd6Bu9e3zDCfjJN0hKUamHD+hSn
HjYKbNU1W/l0wJsq9KKDwrOgE7LDZPP/mQ3MkFtMR/ccyTCAGe9Wy6nv7PnwEAu2
GqRh34VbNDGMPp4bpuTlp2FAL91BtGyYCkuwDZ1lqUDqnCmg4gRjyzFpkFF4lXPJ
hq4J3NgA9BjO9wlQ4P3KNKFgE34s/8R0ey7euyrI5hxFoVFKxr5DNgKXVeW0FsY/
zMaDLzVN2Uv37IxjKhc1XvE2Ml6+J3tRv9awv+Yp05tG8/3vAhogs7So3TtwfWdU
iWA3G0+zB30ATqdeXwpmJ8XxIqBnixDKhJcnLNOwrT+vAeqqqa0C39M0sOTQTMg3
VyFYkgt7v8jljp34DRfc4DAhTCsHzBzgAh9H/jrTP1xDlp7i7AU=
=vCFi
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- v6.15 libcrc clean-up makes invalid configurations possible
- Fix a potential deadlock introduced during the v6.15 merge window
* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: decrease sc_count directly if fail to queue dl_recall
nfs: add missing selections of CONFIG_CRC32
A deadlock warning occurred when invoking nfs4_put_stid following a failed
dl_recall queue operation:
T1 T2
nfs4_laundromat
nfs4_get_client_reaplist
nfs4_anylock_blockers
__break_lease
spin_lock // ctx->flc_lock
spin_lock // clp->cl_lock
nfs4_lockowner_has_blockers
locks_owner_has_blockers
spin_lock // flctx->flc_lock
nfsd_break_deleg_cb
nfsd_break_one_deleg
nfs4_put_stid
refcount_dec_and_lock
spin_lock // clp->cl_lock
When a file is opened, an nfs4_delegation is allocated with sc_count
initialized to 1, and the file_lease holds a reference to the delegation.
The file_lease is then associated with the file through kernel_setlease.
The disassociation is performed in nfsd4_delegreturn via the following
call chain:
nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg -->
nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease
The corresponding sc_count reference will be released after this
disassociation.
Since nfsd_break_one_deleg executes while holding the flc_lock, the
disassociation process becomes blocked when attempting to acquire flc_lock
in generic_delete_lease. This means:
1) sc_count in nfsd_break_one_deleg will not be decremented to 0;
2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to
acquire cl_lock;
3) Consequently, no deadlock condition is created.
Given that sc_count in nfsd_break_one_deleg remains non-zero, we can
safely perform refcount_dec on sc_count directly. This approach
effectively avoids triggering deadlock warnings.
Fixes: 230ca75845 ("nfsd: put dl_stid if fail to queue dl_recall")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available
only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that
selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and
did not actually guard the use of crc32_le() even on the client.
The code worked around this bug by only actually calling crc32_le() when
CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This
avoided randconfig build errors, and in real kernels the fallback code
was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this
really needs to just be done properly, especially now that I'm planning
to update CONFIG_CRC32 to not be 'default y'.
Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select
CONFIG_CRC32. Then remove the fallback code that becomes unnecessary,
as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG.
Fixes: 1264a2f053 ("NFS: refactor code for calculating the crc32 hash of a filehandle")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit
mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support
idmapped mounts.
It also uses the lookup_one_len() family of functions which implicitly
use &nop_mnt_idmap. This mixture of implicit and explicit could be
confusing. When we eventually update nfsd to support idmap mounts it
would be best if all places which need an idmap determined from the
mount point were similar and easily found.
So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(),
and lookup_one_positive_unlocked(), passing &nop_mnt_idmap.
This has the benefit of removing some uses of the lookup_one_len
functions where permission checking is actually needed. Many callers
don't care about permission checking and using these function only where
permission checking is needed is a valuable simplification.
This change requires passing the name in a qstr. Currently this is a
little clumsy, but if nfsd is changed to use qstr more broadly it will
result in a net improvement.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>