Commit Graph

10725 Commits

Author SHA1 Message Date
Rob Norris 8ff64005a2 zap: split implementation out into more files
The ZAP code is mixed up across a few files without clear separation of
concerns. This splits it out from three source files to five:

- zap.c: the bulk of the "public" interface
- zap_impl.c: internals shared across all backends
- zap_micro.c: microzap backend
- zap_fat.c: fatzap backend: core logic
- zap_leaf.c: fatzap backend: leaf blocks

Note that this doesn't not change any code, just moves functions around.
Also note that right now the microzap and fatzap backends know more
about each other than is healthy. This change is simply marking out
where different things should live in the end, to make it easier for
that refactoring work to begin.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18516
2026-05-12 07:46:41 -07:00
Gality d50f5b6d0b dsl_dir: avoid dd_lock during snapshots_changed updates
Avoid holding dd_lock while updating the on-disk
snapshots_changed timestamp.

Both dsl_dir_zapify() and zap_update() may dirty buffers
and recurse into space accounting, which can take dd_lock.
Holding dd_lock across either operation can therefore
preserve the lock-order inversion reported by lockdep.

Only protect the in-memory dd_snap_cmtime update
with dd_lock. Perform the zapify and ZAP update without
dd_lock held, and retry the on-disk write if another updater
advanced dd_snap_cmtime while the write was in progress.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Co-authored-by: gality369 <gality369@example.com>
Closes #18472
2026-05-11 13:13:28 -07:00
Christos Longros 968f4db039 zpool-attach.8: add EXAMPLES section
Mirror-attach (shared with zpool.8 example 5) and raidz expansion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18508
2026-05-11 12:19:28 -07:00
Christos Longros 35853ac849 CI: skip qemu matrix for documentation-only pull requests
Add a new "docs" CI type, selected when every file modified by a
pull request matches a documentation pattern (man pages, .md,
AUTHORS, COPYRIGHT, LICENSE, NOTICE, .gitignore). For this type the
os_selection is empty and the qemu matrix runs no jobs.

This affects only pull requests whose entire diff is documentation.
Any change touching a non-documentation file continues to be
classified as full, quick, linux, or freebsd by the existing
file-path rules, and a manual ZFS-CI-Type commit tag still overrides
that classification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18518
2026-05-11 12:16:48 -07:00
Mateusz Piotrowski 45dddc4523 zfs.4: Fix documentation of zfs_arc_dnode_reduce_percent
Fixes: 25458cbef Limit the amount of dnode metadata in the ARC
Fixes: 5b9f3b766 Soften pruning threshold on not evictable metadata

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #18513
2026-05-11 12:04:58 -07:00
Gality 9ae9f2e983 Linux: annotate nested xattr setattr znode locks
zfs_setattr() updates both the target znode and its hidden xattr
directory when ownership, mode, or project ID changes. The xattr
directory uses the same z_acl_lock and z_lock classes as the
parent znode, so lockdep reports recursive locking when the
second znode's mutexes are acquired.

This is a lockdep false positive rather than a real deadlock.
attrzp is the target file's hidden xattr directory, and the code
does not acquire these znode mutexes in the reverse order.
Acquire the attrzp mutexes with mutex_enter_nested() so lockdep
treats them as nested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Co-authored-by: gality369 <gality369@example.com>
Closes #18506
2026-05-08 15:08:21 -07:00
Ameer Hamza c7cfe0805c zarcstat: detect attached L2ARC device with no data
zarcstat and zarcsummary detected L2ARC presence using the l2_size
kstat, which is data held in L2ARC, not whether a cache device is
attached. When a cache device was attached but empty (freshly added,
or fully evicted):

  - zarcstat rejected "-f l2*" with "Incompatible field specified!"
  - zarcsummary printed "L2ARC not detected, skipping section",
    hiding cumulative I/O history and health counters

Expose the existing l2arc_ndev counter as a new kstat l2_dev_count.
It is maintained by l2arc_add_vdev() and l2arc_remove_vdev(), so it
tracks attachment in real time. Use it in both tools, falling back to
l2_size for compatibility with older kernel modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18499
2026-05-08 15:01:47 -07:00
Alexander Motin 956deba27b zdb: detect BRT and DDT leaks during block traversal
During -b traversal, track BRT and DDT reference counts and report
blocks claimed more times than their reference tables account for
if it causes claim errors, instead of just asserting it.  Also
report entries with references not fully consumed by the traversal.

Add zdb leaks checks to cloning and dedup tests. This should make
sure the pools are in a sane state after completing the functional
tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18494
2026-05-08 11:34:59 -07:00
Brian Behlendorf 6a25950e72 ZTS: redundancy_draid_spare1
Preserve the 'zpool status' output used to calculate the number of
checksum errors so it can be logged on failure.  Several instances have
been observed in the CI where cksum was set to a non-zero value, yet a
subsequent run of 'zpool status' on failure showed no checksum errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18500
2026-05-07 15:57:07 -07:00
Sean Eric Fagan a2d053329c Add some more file layout output, triggered by -v
With one -v, the block type (parity or data) is printed (matching
the ASCII-art version); with two -v, the offset into the file is
also printed.

This also updates the man page, and adds some simple
test scripts.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Sean Fagan <sean.fagan@klarasystems.com>
Signed-off-by: Sean Fagan <sean.fagan@klarasystems.com>
Closes #18470
2026-05-07 13:22:38 -07:00
Gality 439b802e77 sa: fix sa_add_projid lock ordering
sa_add_projid() currently acquires hdl->sa_lock before zp->z_lock.
Several same-znode update paths take zp->z_lock and then call
sa_update() or sa_bulk_update() on the same SA handle.

On Linux, FS_IOC_FSSETXATTR reaches zfs_setattr() through
zpl_ioctl_setxattr() without outer inode serialization. This makes
the reversed lock order a real ABBA deadlock rather than a lockdep
false positive when projid is added to an old-format inode while
another thread updates the same znode.

Acquire zp->z_lock before hdl->sa_lock in sa_add_projid() to match
the existing znode update ordering.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Co-authored-by: gality369 <gality369@example.com>
Closes #18503
2026-05-07 13:20:44 -07:00
Brian Behlendorf 500b44eef2 ZTS: zpool_iostat_002_pos remove sleep
In the CI environment commands may occasionally take longer than
expected.  For zpool_iostat_002_pos this can cause a failure if fewer
than the expected numbers of lines are logged in time.  To prevent
this issue relax the time constraint and simply verify the command
ran to completion and generate the correct number of lines.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18501
2026-05-07 09:54:45 -07:00
Alexander Motin d65015938e Vdev allocation bias/class change
Normal, special and dedup vdevs differ only by space allocation
bias.  Normal and special vdevs might even legally store blocks
targeted to other classes.  Dedup vdevs don't normally do it, but
there is no real reason why they can't.  Considering this, it is
not impossible to change the allocation bias for those vdevs.

This change introduces a new top-level vdev property -- alloc_bias,
reporting current bias for the vdev, and allowing to change it.
This allows to easily change vdev role in a pool, especially if
vdev removal is impossible.  To not complicate the code, changes
take effect only on next pool import.

Changes to/from log vdev could also be theoretically possible, but
they are artificially blocked for now, partially due to additional
complications, and partially due to potential danger of placing
other blocks on log vdevs, that would otherwise be non-fatal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18493
2026-05-07 09:16:39 -07:00
Brian Behlendorf bdb8e8a2c5 ZTS: removal_with_export.ksh busy export
If the pool is active 'zpool export' will fail resulting in
a test failure.  Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18498
2026-05-07 09:15:16 -07:00
Gality 8fdc866757 zfs: annotate nested dd_lock in reservation sync accounting
When reservation sync updates a child's reserved space, it rolls the
delta into ancestor space accounting while still holding the child's
dd_lock.  That locking order is intentional, but Linux lockdep sees
the ancestor acquisition as recursive because it lacks a nested lock
subclass annotation.

Teach the reservation-sync space-accounting path to acquire ancestor
dd_lock instances with a nested subclass.  Keep the existing public
interfaces and accounting behavior unchanged by routing only the
ancestor rollup through local helpers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Signed-off-by: gality369 <gality369@example.com>
Closes #18497
2026-05-07 09:14:20 -07:00
Brian Behlendorf c4545ba037 ZTS: use 'zpool trim -w' in zpool_trim_partial.ksh
Don't use trim_progress() which is racy to wait for the pool trim
to complete.  Instead use the wait (-w) option which is intended
for this.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18496
2026-05-07 09:12:33 -07:00
Brian Behlendorf a12c6ed62f ZTS: Remove threadsappend_001_pos exception
Commit f828a80c may have resolved the underlying cause for
the occasional CI failures observed for this test.  Remove
the exception to ensure any new occurrences are noticed.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6136
Closes #18495
2026-05-06 09:44:33 -07:00
Ryan Libby 872f010193 Zstd: rework ZSTD_isError symbol renaming
The import of Zstd v1.5.7 in a2ac9cd606
added an unconditional renaming of ZSTD_isError to zfs_ZSTD_isError
with an asm directive.  Instead, do it with a define that is conditioned
on whether zstd_compat_wrapper.h is actually in use.  Also add a define
to that header so that it can be detected.  This allows the build to
work without using the compat wrapper.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #18483
2026-05-06 09:42:54 -07:00
Gality ae37f05d87 linux: verify stale znodes in legacy fallocate
The mode=0 and FALLOC_FL_KEEP_SIZE preallocation path can reach
zfs_freesp() directly and call zfs_statvfs() before going through the
normal zpl_enter_verify_zp() boundary.

When zfs_rezget() tears down a failed SA reload, a stale inode may
remain alive in the VFS with z_sa_hdl cleared. The unchecked
fallocate path can then reach sa_lookup(zp->z_sa_hdl, ...) through
zfs_statvfs() or zfs_freesp() and crash on a NULL SA handle.

Use zfs_enter_verify_zp() in zfs_statvfs() so stale znodes are
rejected under the teardown lock for both fallocate and statfs.
Also wrap the direct zfs_freesp() call in
zpl_enter_verify_zp()/zfs_exit() so this path follows the same
validation rules as the other Linux ZPL file operations.

Fixes: f734301d22
("linux: add basic fallocate(mode=0/2) compatibility")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Co-authored-by: gality369 <gality369@example.com>
Closes #18458
2026-05-06 09:40:14 -07:00
Christos Longros 5dd912192d Update description of spl_schedule_hrtimeout_slack_us
Clarify the effect of the non-zero value on wakeup coalescing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18467
2026-05-04 15:09:33 -07:00
Christos Longros b68c782d82 man: document three missing properties and tunables
Add manpage entries for parameters and properties that exist in
source but were not previously described:

- spl.4: spl_schedule_hrtimeout_slack_us
- zfsprops.7: longname
- vdevprops.7: raidz_expanding

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18467
2026-05-04 15:09:00 -07:00
Brian Behlendorf 2de4f4c742 CI: FreeBSD 15.1 PRERELEASE (#18490)
Update freebsd15-0s builder to freebsd15-1s and point it at the
15.1-PRERELEASE tag.  The previous freebsd-15.0-STABLE images are
no longer available.

Additionally, add a freebsd15-0r stanza for the RELEASE.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
2026-05-04 10:34:00 -07:00
Alexander Motin 366b1f9a3e Fix long POSIX_FADV_DONTNEED for single block files
dbuf_whichblock() is not made to handle offsets beyond the block
end for single-block objects.  Handle it in dmu_evict_range(),
similar to dmu_prefetch_by_dnode().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18399
Closes #18489
2026-05-04 10:22:47 -07:00
Tony Hutter f828a80cb6 CI/GCC: Add Fedora 44, fix build errors and threadsappend
- Add Fedora 44 to CI tests
- Fix build issues from the newer compiler. These are mostly 'char *'
  to 'const char *' conversions.
- Fix threadsappend.c test waiting for the same thread TID twice.
  This caused the test to hang on F44 (but strangely not other OSs?)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18478
2026-05-02 09:57:15 -07:00
Brian Behlendorf d5099c330b Initialize vr_last_txg for rebuild
Only call txg_wait_synced() when rebuild IOs were issued for this
metaslab.  This is a small optimization since in practice the first
metaslab is very likely to have allocations and cause vr_last_txg
to be initialized.  After this point when processing empty metaslabs
txg_wait_synced() is called but with an already committed txg so it
will not wait.  Still it's better not to call txg_wait_synced() at
all when it's not needed.

Reviewed-by: Andriy Tkachuk <atkachuk@wasabi.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18482
2026-05-02 09:55:39 -07:00
Ameer Hamza 0a59f7845c Avoid flushing unrelated NFS exports on snapshot unmount
zfsctl_snapshot_unmount() called exportfs_flush() before every umount
attempt to drop NFS export cache references that pin the snapshot
mountpoint.  The flush has global effect on the host's NFS exports and
clients, so paying it on every snapshot unmount (including auto-expire
rounds for snapshots that were never NFS-accessed) impacts unrelated
snapshots and clients.

ZFS cannot invalidate individual export cache entries because the
relevant sunrpc cache APIs are exported GPL-only.  Defer the global
flush so it runs only when the umount has actually failed, then retry
once.  Snapshots that are not NFS-pinned succeed on the first attempt
and never trigger the flush.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18476
2026-05-01 12:19:53 -07:00
Andriy Tkachuk b8d9596403 Fix rare cksum errors after rebuild
Currently, after rebuild (aka sequential resilver), checksum
errors can be seen sometimes on the spare vdev or draid spare.
On my laptop, it happens from 2 to 4 times of running
redundancy_draid_spare1 test in a loop for 100 times.

It looks like there's a race in vdev_rebuild_thread() when the
rebuild of space map ranges is finished and we re-enable
allocations from the metaslab too soon: a new allocations may
happen from that metaslab before txg with the rebuilt ranges is
sync-ed, causing undesirable interference.

Solution: wait for the txg to be sync-ed before enabling metaslab.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com>
Closes #18307
Closes #18319
Closes #18473
2026-05-01 12:15:27 -07:00
Manoj Joseph e78a51dd6f Fix off-by-one in PREVIOUSLY_REDACTED handler that drops last block
In send_reader_thread(), the PREVIOUSLY_REDACTED handler computed
file_max as MIN(dn->dn_maxblkid, range->end_blkid).  dn_maxblkid is
an inclusive maximum block ID while range->end_blkid is exclusive (one
past the last block).  The resulting file_max was then used as an
exclusive loop bound, causing the last block of any file (at index
dn_maxblkid) to be silently skipped when a PREVIOUSLY_REDACTED range
covered the end of the file.

The block was never written to the send stream so the receiver kept
zeros there.  ZFS reported no error because the stream itself was
valid; the data was simply absent.

Fix: use dn_maxblkid + 1 so file_max is consistently exclusive.

Add a regression test (redacted_max_blkid.ksh) that modifies only the
last block of a file in one clone, creates a redaction bookmark from
it, then sends an unmodified clone incrementally from that bookmark.
The PREVIOUSLY_REDACTED path must fill in the last block; the test
verifies it is not zeros and matches the original.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #18477
2026-05-01 12:03:29 -07:00
Rob Norris c18e8ba874 Linux 7.1: access dentry d_alias directly
The d_u union introduced in 3.18 is now anonymous, so we need to detect
it and decide the right way to name d_alias.

Note that we used to have support for both names to support kernels
before 3.18, so this commit is effectively reverting the commit that
removed that support, efc293e371.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18471
2026-05-01 11:52:57 -07:00
Rob Norris 6748e7e65e ZTS: add libzfs_mnttab_cache test
This is the repro test from #18464, and confirms that when disabled, the
libzfs_mnttab_cache is discarded and reloaded on every lookup.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Prakash Surya <prakash.surya@perforce.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18466
Closes #18464
2026-05-01 11:47:56 -07:00
Rob Norris a4a7df886f libzfs/mnttab: restore ability to enable/disable cache
In #18296 we made the cache "always on", with the justification that our
internal tools always enable the cache anyway. This allowed removing the
entire alternate implementation of libzfs_mnttab_find().

Unfortunately, it appears that there are still libzfs consumers out
there that were expecting to be able to disable the cache entirely, and
this broke some behaviour for them.

This commit restores the ability to enable or disable the cache (and
returns to "disabled" as the default, to preserve existing behaviour).
Fortunately there is no need for a whole second codepath; just a small
reorganisation to drop all cached entries each time.

Sponsored-by: TrueNAS
Reviewed-by: Prakash Surya <prakash.surya@perforce.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18466
Closes #18464
2026-05-01 11:46:14 -07:00
Rob Norris 84ffe564df AUTHORS: add names of recent new contributors
"Speak, friend, and enter."

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18475
2026-04-29 10:12:42 -07:00
Prakash Surya 4acb62930b libspl/mnttab: follow symlinks when resolving path via statx (#18469)
When the path argument to "zfs list -Ho name <path>" (or any caller of
zfs_path_to_zhandle()) is a symlink that crosses a mount boundary, the
wrong dataset is returned. Instead of returning the dataset that owns
the symlink's target, getextmntent() matches the dataset containing the
symlink itself.

For example, given two ZFS datasets "tank/ds1" and "tank/ds2", and a
symlink "/tank/ds1/link" pointing into "/tank/ds2":

    $ sudo zfs list -Ho name /tank/ds1/link
    tank/ds1

The expected (and previous) behavior is to return "tank/ds2", since the
symlink's target resides in that dataset.

The problem is in getextmntent(), in lib/libspl/os/linux/mnttab.c. That
function calls statx() on the caller-supplied path to obtain its mnt_id
(used to match against the mnt_id of each entry in /proc/self/mounts),
and it passes AT_SYMLINK_NOFOLLOW to that statx() call. As a result,
the mnt_id returned reflects the symlink's location rather than the
symlink target's mount, and the wrong /proc/self/mounts entry is
matched.

The same function also calls stat64() on the caller-supplied path
(used as a fallback when STATX_MNT_ID is not available, and to populate
the statbuf out-parameter). stat64() always follows symlinks, so the
statx() and stat64() calls were inconsistent: one resolved the symlink,
the other didn't. The AT_SYMLINK_NOFOLLOW behavior may be appropriate
when statx() is called on a mount entry from /proc/self/mounts (which
is always a real directory), but it is wrong for caller-supplied paths,
which may be symlinks.

This bug was introduced by 523d9d6007 ("Validate mountpoint on
path-based unmount using statx"), which added the STATX_MNT_ID code
path. However, the bug was latent: config/user-statx.m4 omitted
"#define _GNU_SOURCE" when checking for STATX_MNT_ID in <sys/stat.h>,
so HAVE_STATX_MNT_ID was never defined, and the buggy statx() path was
never compiled in. getextmntent() always fell back to the dev_t
comparison via stat64(), which correctly follows symlinks.

The fix to that autoconf check, in 2b930f63f8 ("config: fix
STATX_MNT_ID detection"), caused HAVE_STATX_MNT_ID to be properly
defined on kernels that support it, activating the broken
AT_SYMLINK_NOFOLLOW path for the first time and exposing the
regression.

The fix is to drop AT_SYMLINK_NOFOLLOW from the statx() call so that
symlinks are followed, matching the behavior of stat64() on the same
path.

Verified with a minimal reproducer: created two ZFS datasets, placed a
symlink inside the first pointing into the second, and confirmed that
"zfs list -Ho name <symlink>" returns the dataset containing the
symlink's target rather than the dataset containing the symlink.

Signed-off-by: Prakash Surya <prakash.surya@perforce.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
2026-04-28 09:24:24 -07:00
Christos Longros cd06f79e29 build: use pax tar format for make dist
Automake's default tar formats (v7 pre-1.18, ustar since) impose path
length limits that drop several long test filenames from the release
tarball when `make dist` runs. Pax format has no such limit and is
read by GNU tar 1.14+ and libarchive/bsdtar.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes: #17276
Closes: #18465
2026-04-25 15:24:38 -07:00
Ryan Moeller 2a9a70a2af include: Remove duplicate lzc_send_space prototype
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Closes #18463
2026-04-25 15:07:16 -07:00
Tony Hutter 2d7ed99145 CI: curl fallback, print killed tests, FreeBSD URL
- We've seen occasional 'ERROR 502: Bad Gateway' from the runner trying
to download an image with axel.  Axel can open multiple connections for
a faster download, so maybe that's causing problems.  This commit adds
in a fallback to curl if the axel download doesn't work.

- Update merge_summary.awk to print out killed tests in the summary.
We've seen cases where the summary page was red but there were no test
failures printed.  This is because one of the VMs had too may
killed tests, which caused the total test time to run too long and
caused the runner to timeout qemu-6-test.sh. When the runner kills off
qemu-6-tests.sh, it means we never generate the nice summary page
for that VM listing the killed off tests.  This commit parses the
partial test logs for killed off tests and includes them in the
merge_summary.awk output.

- Print an error message in the summary page if one of the VMs
didn't complete ZTS.  This helps draw attention to a VM crash.

- FreeBSD sometimes has broken links to their CI image. When that
happens, select the newest nightly snapshot image as an alternative.
This is needed right now, since the current images in the FreeBSD 16
"current/" directory are returning 404 errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18460
2026-04-25 14:44:58 -07:00
Christos Longros 4a58ab8ce2 zfs.4: document five missing module parameters
Add entries for module parameters that are exposed via
ZFS_MODULE_PARAM but not covered in zfs.4:

  zfs_active_allocator          (charp,  module/zfs/metaslab.c)
  zfs_compressed_arc_enabled    (int,    module/zfs/arc.c)
  zfs_arc_no_grow_shift         (uint,   module/os/freebsd/zfs/arc_os.c)
  zfs_scan_blkstats             (int,    module/zfs/dsl_scan.c)
  zfs_snapshot_history_enabled  (int,    module/zfs/dsl_dataset.c)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18456
2026-04-25 14:39:43 -07:00
Alek P 8da4729732 key lookup failure should always return EACCES
spa_do_crypt_abd() already maps a missing key to EACCES. However
spa_do_crypt_mac_abd(), spa_do_crypt_objset_mac_abd(), and
spa_crypt_get_salt() still return the raw
spa_keystore_lookup_key() error (ENOENT). This is inconsistent
As we want to treat all “no key” failures as a permission
failure. Standardize on EACCES for the unloaded-key case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Closes #18448
2026-04-23 13:55:28 -07:00
Brian Behlendorf 9dd3c653c2 ZTS: zpool_iostat_002_pos increase sleep time
Allow an additional second for the test to complete before checking
the results.  This may explain occasional test failures in the CI.
Additionally, when the test fails dump the tmpfile for inspection.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18455
2026-04-23 13:54:22 -07:00
Brian Behlendorf 91f9b11331 ZTS: add targeted redundancy_draid_spare exception
When sequentially resilvering a dRAID pool it's possible that a few
correctable checksum errors will be reported.  This is a known issue
which is occasionally observed in the CI.  Until it's resolved we
want the test case to tolerate a few checksum errors in this scenario
to prevent false positives in the CI.

This change also has the additional side effect of standardizing in
one location how the dRAID pool integrity is verified.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #18307
Issue #18319
Closes #18436
2026-04-23 13:45:48 -07:00
Tony Hutter fc6aa4369e Fix 'kernel BUG at mm/usercopy.c'
Fix a bug where an cgroup-OOM-killed process can cause a panic:

usercopy: Kernel memory exposure attempt detected from vmalloc (offset
1007584, size 217120)!
kernel BUG at mm/usercopy.c:102!

This was caused by zfs_uiomove() not correctly returning EFAULT
for short copies.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15918
Closes #18408
2026-04-23 10:52:19 -07:00
Brian Behlendorf 67589348e3 ZTS: snapshot_018_pos.ksh add extra margin
The date(1) command and snapshot timestamps use different clock
sources which can result in a small discrepancy.  This can cause
the test the incorrectly fail.  To avoid this, add a brief delay
to the test case to allow for minor skew.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18450
2026-04-23 10:23:22 -07:00
Brian Behlendorf 60a7f64d1c ZTS: mmp_on_uberblocks.ksh simplify
The last portion of mmp_on_uberblocks.ksh was intended to verify that
the sequence number was incremented.  However, it failed to account for
the case where a txg sync would occur resulting in the sequence number
being correctly reset.

Rather than add additional code to detect this that check has been
removed.  The mmp update frequency is still verified via the kstat
which is a more reliably mechanism to detect the writes.  There are
several other mmp tests which verify the uberblock changes are reflected
on disk so there's no significant loss of test coverage.

Finally, the test case has been simplified to use the within_percent
function for readability.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18452
2026-04-23 10:21:57 -07:00
Christos Longros 207202cde3 ZTS: fix trim test portability for FreeBSD
Replace GNU-specific du flags (--block-size, -B1) and dd conv=nocreat
with POSIX compatible commands. Move -O flag before pool name in
zpool create to align with FreeBSD's strict POSIX getopt(). Relax vdev
size thresholds in trim_config to account for ZFS-on-ZFS overhead.
Add sync_pool before zpool trim -w to ensure freed blocks are committed
before trimming.

Skip zpool_trim_partial, zpool_trim_verify_trimmed, trim_config, and
autotrim_config on FreeBSD where trim does not reclaim space on file
vdevs stored on a ZFS filesystem within the test framework.

Tested on FreeBSD 16.0-CURRENT: 26 PASS, 4 SKIP, 0 FAIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18398
2026-04-22 15:46:41 -07:00
Christos Longros 7f9a480698 ZTS: remove outdated FreeBSD skip from trim tests
FreeBSD has supported hole punching via fspacectl(2) since
FreeBSD 14.0 and the test library already handles this using
truncate -d. Remove the skip that prevented trim tests from
running on FreeBSD.

Tests will still skip if the hardware does not support
TRIM/UNMAP, which is checked separately via diskinfo.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18398
2026-04-22 15:46:12 -07:00
Brian Behlendorf 3162c631ee ZTS: zpool_export_parallel_admin.sh busy export
If the pool is active 'zpool export' will fail resulting in
a test failure.  Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18447
2026-04-22 13:08:54 -07:00
Pranav P 2eee4ac1ea Fix: draid autopkgtests fail on s390x architecture (Endianness Issue)
The ioctl call to create the pool was returning -1 with errno EINVAL.
Inside the module code, inside vdev_draid.c, verify_perms is calling
fletcher_4_native_varsize. This in turn calls fletcher_4_scalar_native.
So, implemented a fletcher_4_byteswap_varsize which makes use of the
fletcher_4_scalar_byteswap in Big endian machines.

Reviewed-by: Andriy Tkachuk <andriy.tkachuk@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pranav P <pranavsdreams@gmail.com>
Closes #16261
Closes #18445
2026-04-22 09:53:48 -07:00
Jan Martin Mikkelsen 513710ed21 Fix "panic: cache_vop_rename: lingering negative entry"
A FreeBSD ZFS filesystem with properties "utf8only=on" and
"normalization=formD" consistently produces this panic when
building the lang/perl-5.42.0 port.

A ZFS file system with "utf8only=off" and "normalization=none"
works fine.

The cause of the panic seems to be incorrectly using the FreeBSD
namecache when normalisation is present. This commit adds a
predicate to prevent that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jan Martin Mikkelsen <janm-github@transactionware.com>
Closes #18430
2026-04-21 14:19:10 -07:00
Paul Dagnelie 6562851406 Handle raidz errors <= nparity rather than ignoring
This PR adds a check in the mirror and raidz code for the case where 
there are errors <= nparity. In that case, ZFS sets a new flag on 
the zio that will be checked in zio_done. If that flag is set, when 
the write IO completes, we issue a read IO for the same blkptr. 
That will allow ZFS's auto-healing mechanisms and other errors 
recovery tools to detect the effectively-corrupt data, and handle 
it accordingly. Note that because draid raidz's IO done function, 
it also benefits from this functionality.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #18387
2026-04-21 14:17:37 -07:00
Tony Hutter f798b40000 CI: Add more debugging to qemu-1-setup.sh
- Remove line where we disable stdout at the end of qemu-1-setup.sh
- Fix comment switching the 2x75GB -> 1x150GB cases
- Add some more debug to the end of the script

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18441
2026-04-20 10:50:47 -07:00