Commit Graph

10801 Commits

Author SHA1 Message Date
Tony Hutter 59dc88602e nvpair: Check for un-terminated strings in packed nvlist
Add additional checks to verify a packed string or string array nvpair
is terminated.  Or more specifically, verify doing a strlen() on the
prospective string does not overrun the packed nvlist buffer.

Also add additional checks in the libzfs_input_checks test case to
verify un-terminated strings, and add in a nvlist ioctl payload
fuzz test for good measure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18604
2026-06-01 14:55:20 -07:00
Alexander Motin 4bc8c39b62 zed: Prefer dRAID distributed spares to regular ones
One of the main dRAID features is avoiding single drive bottlenecks
by using distributed spares.  Activation of regular spare will take
more time, during which the dRAID redundancy is even lower than in
case of RAIDZ.  But regular spares might still be added to the pool
as a second line of defence, possibly shared by several vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18578
2026-06-01 14:49:38 -07:00
Christos Longros 20d56830f9 CI: add concurrency support to zfs-arm
The zfs-arm workflow was the only build/test workflow without a
concurrency block, so superseded runs were not cancelled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18608
2026-05-31 18:27:40 -07:00
Christos Longros bfb914ca58 CI: apt-get update before purging host packages
The package removal ran against a stale package index and failed to
fetch a package that had been removed from the repository. Refresh
the index first.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18607
Closes #18609
2026-05-31 18:25:26 -07:00
Alek P c90dc28089 enforce exact decompressed length for lz4, gzip, and zstd
Decompressors must expand a ZFS block to exactly the expected number
of bytes. Treat decompression to an unexpected length as failure, so
truncated or short output is not accepted as valid decompression. This
makes our handling of decompress return values consistent with the
decompression functions' APIs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alek Pinchuk <Alek.Pinchuk@connectwise.com>
Closes #18599
2026-05-29 18:13:39 -07:00
Timothy Day eafa39fbc3 build: add ZFS_DEBUG Kconfig for copy-builtin
... so we can toggle ZFS debug assertions from the
Linux kernel build without having to regenerate the
ZFS patch.

Update the qemu test script to also set this kernel
config.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Timothy Day <timday@thelustrecollective.com>
Co-authored-by: Timothy Day <timday@thelustrecollective.com>
Closes #18595
2026-05-29 09:40:14 -07:00
Christos Longros ec65e4b6bb CI: skip smatch, zloop, and zfs-arm for documentation-only changes
Follow-up to #18518, which skipped the qemu matrix on doc-only PRs.
zloop, zfs-arm, and smatch are irrelevant to doc-only changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18601
2026-05-28 17:32:07 -07:00
Brian Behlendorf d13663b17c CI: Lustre 6.16 kernel compatibility fix (#18602)
Almalinux 9,10 kernels now include a backport of Linux commit
v6.15-13744-g41cb08555c41 which renames the from_timer() function
to timer_container_of().  Apply the upstream Lustre compatibility
patch to our builds.  This patch should be included in the next
Lustre release and can be dropped then.

ZFS-CI-Type: quick

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2026-05-28 15:45:43 -07:00
Alexander Motin 472ddca116 zed: Prefer spares with matching rotational and size
Before this change zed tried to activate spares just in order they
are stored in configuration, which is quite arbitrary.  To make
the result more optimal, sort the spares by their rotational status
and size, so that the most fitting ones have better chances.

To make it more visible, export the rotational status as a vdev
property.  While at it, minimally fix vdev properties reading for
spare and L2ARC vdevs, having no ZAPs.

To keep the rotational status for spare activation purposes when
failed device is already gone, save it into the vdev config.  The
same is for spare vdevs asize.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18597
2026-05-28 15:14:26 -07:00
Christos Longros 3250b4393e CI: Update checkstyle checkout action to v6
The checkstyle workflow was the only one still pinned to
actions/checkout@v4; the other workflows already use v6.
Bump it to match.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18600
2026-05-28 15:06:46 -07:00
Mark Johnston e30ab5fa4f FreeBSD: Make it possible to build openzfs.ko with sanitizers
Add make options which let one respectively compile the kernel modules
with the address sanitizer, memory sanitizer, and undefined behaviour
sanitizer enabled.  This makes it much easier to run the ZTS with those
sanitizers enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Longros <chris.longros@gmail.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18596
2026-05-28 09:02:48 -07:00
tiehexue dc585960e0 Linux 5.6 compat: fix fs_parse API mismatch
Added m4 macro to check fs_parse API signature and wrappers.  Before 
5.6, fs_parse() took a struct fs_parameter_description which wraps
the parameter specs with name and enum pointers. From 5.6, the 
description struct was removed and fs_parse() accepts the 
fs_parameter_spec directly.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tiehexue <tiehexue@hotmail.com>
Closes #18585
2026-05-27 10:07:55 -07:00
Christos Longros 6303a58242 spa: expose max_missing_tvds_cachefile and _scan on Linux
Register the two siblings of zfs_max_missing_tvds via
ZFS_MODULE_PARAM in spa.c

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18589
2026-05-26 17:15:42 -07:00
Christos Longros 8bfac28f15 .github: update workflows README
Describe the current zfs-qemu pipeline, ci_type selection, supported
guests, and the code-checking and other auxiliary workflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18590
2026-05-26 15:45:38 -07:00
Rob Norris 6c08f5db51 config: detect the right way to get pthreads
To get at userspace threads, we use a mix of -pthread and -lpthread to
compiler and/or linker. That's fine enough for the platforms we target
but its not exactly right (eg on Linux -pthread defines _REENTRANT, when
-lpthread does not), and won't work properly some other platforms that
we might end up on someday (eg illumos).

There's also a danger if we link together two compilations units, one
compiled with -pthread, one not, as calls between them may not properly
manage thread state.

Here we switch to use the AX_PTHREAD macro to detect the correct set of
flags for CFLAGS and LIBS, and add them to the default compilation
flags for all units.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18588
2026-05-26 12:49:53 -07:00
Rob Norris 1294d44203 test_zap: cover all core ZAP operations
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18586
2026-05-26 12:12:50 -07:00
Rob Norris 6ecaa194b6 zap: expose _by_dnode() variants of remaining core functions
Exposes the remaining internal implementation functions:
- zap_update_by_dnode()
- zap_length_by_dnode()
- zap_get_stats_by_dnode()

And creates zap_contains_by_dnode(), followng the same structure as the
other functions.

Together, these complete the "core" ZAP _by_dnode() API for the test
suite to use.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18586
2026-05-26 12:12:42 -07:00
Rob Norris 605ae84102 unit: TOPT make arg to pass test options through to the test binary
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18586
2026-05-26 12:12:38 -07:00
Rob Norris 2e5b9bd116 unit: zero coverage counters before coverage run
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18586
2026-05-26 12:12:15 -07:00
Christos Longros efdc755761 ZTS/zinject: cover label, object, delay, panic and verify effect (#18579)
* ZTS/zinject: cover label, object, delay, panic and verify effect

Cover the device, label, object, delay and panic injection modes:
every valid value is accepted and unknown values are rejected. A
final pass confirms that registered injections execute by watching
the inject counter advance after triggering the desired injected
error.

Signed-off-by: Christos Longros <chris.longros@gmail.com>

* ZTS/zinject: add author copyright

Signed-off-by: Christos Longros <chris.longros@gmail.com>

---------

Signed-off-by: Christos Longros <chris.longros@gmail.com>
2026-05-26 12:10:44 -07:00
Christos Longros 88656cc95b ZTS/alloc_class: move file_in_special_vdev to alloc_class.kshlib
Move the function into the shared library.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18584
2026-05-25 16:07:02 -07:00
Christos Longros af0228bb54 ZTS: zpool_expand_005_pos: correct variable name in expandsize check
The check referenced $zpool_expandsize, which is not defined in this
test; the variable assigned two lines above is $expandsize. A "-"
value returned by zpool reopen therefore did not trigger the
intended log_fail, and the failure surfaced only at the later
post-online-e size check with a less specific message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18580
2026-05-25 16:05:07 -07:00
Andrew Walker 112b0131b9 zpl_xattr: stop heap-allocating prefixed xattr names
The six __zpl_xattr_{user,trusted,security}_{get,set} entry points
built their prefixed name via kmem_asprintf("%s%s", prefix, name)
and freed it with kmem_strfree on the way out.

The Linux xattr API caps the full prefix+name length at
XATTR_NAME_MAX (255), the same bound fs/xattr.c's syscall handlers
rely on with their stack-resident struct xattr_name, and so do
the same in our xattr handlers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@truenas.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18570
2026-05-25 16:02:08 -07:00
Brian Behlendorf 8f6f4bcb54 ZTS: update sanity.run file
Several of the tests included in the sanity.run file are no
longer quick.  In fact, the pyzfs tests can take over 5 minutes
to run which exceeds the allowed default timeout resulting the
the testing being killed.

Perform a little housekeeping and drop any test which takes more
than 10 seconds to run.  This brings things back a little closer
to the original intent of having a battery of useful test cases
which can be run in ~10 minutes.

ZFS-CI-Type: quick
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18576
2026-05-22 13:58:36 -07:00
Rob Norris 1d601eb83b unit/test_zap: a trivial ZAP unit test suite
This commit adds the bones of a unit test suite for the ZAP subsystem.
The actual tests themselves don't do much, just ZAP creation and
destruction and basic KV ops. At this point its intended to be enough to
demonstrate what tests under this framework would look like.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18564
2026-05-22 13:29:50 -07:00
Rob Norris a20ef9c4e7 unit: dnode/dbuf/dmu_tx mocks
Some simple initial mock for key DMU structures. It's hard to say this
early how generalisable these are, however they are enough for the ZAP
unit tests (next commit).

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18564
2026-05-22 13:29:46 -07:00
Rob Norris 82b33c0034 unit: a unit testing framework
This commit establishes a unit test framework for OpenZFS, and
integrates it into the build.

It includes:
- the "munit" unit test framework (munit.c, munit.h)
- some light extensions to munit and glue for OpenZFS (unit.c, unit.h)
- make targets for running tests and generating coverage reports
- a document explaining the what, how and why

This is a first step; I expect we will extend all of this as we use it
more places and gain experience with it.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18564
2026-05-22 13:29:30 -07:00
Christos Longros accb2b418e CI: run full CI when a workflow YAML changes
FULL_RUN_REGEX in generate-ci-type.py covered .github/workflows/scripts/
but not the workflow YAML files, so a PR that only edited zfs-qemu.yml
got "quick" CI and never tested its own matrix change. Add the YAML
files to the list.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18577
2026-05-22 13:15:31 -07:00
Brian Behlendorf 1916c2c552 CI: skip full CI runs on push events
Full CI runs for proposed changes always occur in the PR where the
review is done and patch approved.  Once merged the full CI is run
again using the merged commit.  This is somewhat overkill.  In the
interest of reducing the CI load only run the zloop and checkstyle
workflows which are enough to verify the build on the master branch.
Push events to forks will continue to trigger a full CI run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18571
2026-05-22 12:01:22 -07:00
Christos Longros 971791762a CI: enable FreeBSD 15.0-RELEASE in matrix
Add freebsd15-0r to the FreeBSD presets

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18561
2026-05-22 09:18:39 -07:00
Tony Hutter 15761954d7 CI: Build custom branch from zfs-qemu-packages
The zfs-qemu-packages workflow allows us to easily build RPMs for the
current branch.  However, there can be cases where we want to use the
current CI environment to build older releases.  This can happen when
the VM or runner environment changes, and the older CI doesn't have
the updates needed to run with it anymore.

This commit adds in a text box to specify a specific branch/tag to build
using the current CI environment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18569
2026-05-21 09:23:32 -07:00
Rob Norris 58d7194426 linux/super: properly apply ro/rw mount option to superblock
f5a9e3a622 changed how SB_RDONLY was applied to the new mount in a way
that was too simplistic - it only sets readonly on the filesystem if the
mount was 'ro', but it never clears it if the mount was 'rw'. This
causes the 'rw' option to effectively be ignored, and so the readonly=
property wins out.

This fixes it by doing it the right way: checking the flags mask to see
if it was actually provided as an option at all, and then setting or
clearing it as appropriate.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18557
Closes #18563
2026-05-20 15:57:20 -07:00
Rob Norris 20437d856c ZTS/zfs_mount: test that ro/rw mount methods remain consistent
Whether a mount ends up as read-only or read-write depends on a
combination of platform, readonly= filesystem property, mount method
(system mount(8) or zfs-mount(8)) and mount option provided (ro, rw or
none).

This tests all combinations, and ensures they match what has
traditionally been expected on this platform, so we'll know if we
accidentally changed it.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18557
Closes #18563
2026-05-20 15:57:13 -07:00
Rob Norris b394b8742e ZTS/zfs_mount: lift & update helpers from zfs_mount_remount
zfs_mount_remount has some nice helpers for checking the claimed and
actual read-only/read-write state of a mount. I wanted to use them for
another test but they weren't exactly what I wanted.

This adds separate functions for the different kinds of mounts the
zfs_mount_remount test wants to use, mostly to avoid the assymetry of
sometimes calling a helper function and sometimes doing it direct. It
also separates the code to get the current ro/rw mount option from
actually asserting it.

Test has been updated to use the new functions, but the logic and
structure has not changed.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18557
Closes #18563
2026-05-20 15:56:29 -07:00
Brian Behlendorf f9bf31ff7a ZTS: zfs_unshare_006_pos.ksh enable usershares
Ensure samba usershares are enabled in the CI test environment for
the zfs_unshare_006_pos test case.  By default they are disabled
in the Ubuntu 26.04 LTS and must be enabled.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18547
2026-05-20 10:25:40 -07:00
Brian Behlendorf d64dcd2575 ZTS: statx_dioalign.ksh update to stride_dd
The uutils 0.8.0 version of dd appears to diverge from GNU behavior
and does not fail when an unaligned write O_DIRECT write is issued.
Update the test case to use stride_dd which is provided by the ZTS
so the expected syscall behavior can be verified.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18547
2026-05-20 10:25:34 -07:00
Brian Behlendorf c59d690e56 ZTS: Pass dec instead of hex to mknod
On Ubuntu 26.04 the default mknod command returns an error when
provided the major and minor numbers in hex.  Switch to passing
decimal values.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18547
2026-05-20 10:25:25 -07:00
Brian Behlendorf bd2f0aa057 CI: Fix qemu-guest-agent systemd enable
The qemu-guest-agent.service for Debian and Ubuntu does
not contain an install section which prevents it from
being enabled.  Add a drop-in override file so it can
be enabled and the service started on boot.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18547
2026-05-20 10:25:19 -07:00
Brian Behlendorf 5fde52c3f9 CI: Add Ubuntu 26.04 builder
The Ubuntu 26.04 LTS, named "Resolute Raccoon, was released on
April 23, 2026.  Add to the supported releases in README.md and
add a CI builder for it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18547
2026-05-20 10:24:43 -07:00
Rob Norris e5473afe18 spl_kvmalloc: remove __GFP_COMP before calling vmalloc()
In cb1833023 we stopped using it for KM_VMEM allocations, since its not
a valid flag for vmalloc(). However, there's a fallback path for
non-KM_VMEM allocations to use vmalloc(), and we need to remove
__GFP_COMP there too to avoid a warning.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18558
2026-05-19 09:11:31 -07:00
Christos Longros ea7fd8a7bc libzfs_pool: add docstrings to several public functions
Cover a number of frequently-used functions that previously had no
documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18538
2026-05-19 09:08:48 -07:00
Rob Norris 536c06be82 config: show progress output for kernel API checks
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18554
2026-05-18 16:05:29 -07:00
Christos Longros 3f44da701b CI: remove FreeBSD 13.5 (EOL April 30, 2026)
FreeBSD 13.5 and stable/13 reached End-of-Life on April 30, 2026 and no
longer receive security support, so they fall outside README.md's stated
support policy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18553
2026-05-18 11:27:45 -07:00
Rob Norris eed67e4043 zap: split objset+object implementations to use a dnode
For the functions that don't (yet) have _by_dnode() variants, give them
the same treatment as the previous commit - pull their implementation
into a _by_dnode() function, with the original as a simple wrapper.

This lets them all follow the same uniform pattern, and lays the
groundwork for further cleanup in other non-dnode parts of the ZAP
subsystem.

Note that it would be trivial to expose these new _by_dnode() functions,
but there's no need to do that until there's an external need for them.

Also note that there's no change yet to the following, which are not
simple zap_t operations in the same way:

 - zap_contains: wrapper around other ops
 - zap_increment: wrapper around other opts
 - zap_*_int(): wrappers around other ops
 - zap_cursor_*: different lifetime constraints
 - zap_value_search: cursor-based
 - zap_join_*: cursor-based

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18551
2026-05-18 09:19:19 -07:00
Rob Norris bd02c10b00 zap: make the _by_dnode() op variants be the primary implementation
The existing pattern for each operation is to have a "frontend" function
that takes an object referenced by either a objset+object pair (eg
zap_add()) or an existing dnode (eg zap_add_by_dnode()). Those functions
obtain a locked zap_t for the given object from either zap_lockdir() or
zap_lockdir_by_dnode(). That zap_t, the operation args, and the refcount
tag for lockdir() are then passed through to through to the "backend"
function (eg zap_add()), which does the work and then releases calls
zap_unlockdir() to release the zap_t.

This pattern is overcomplicated, in at least three ways:

- Both frontends for each operation have to make the call to
  zap_lockdir(), which has multiple args that must be the same for both.

- Frontends need to pass the refcount tag to the backend so it can
  call zap_unlockdir() correctly, which makes the signature more
  complicated.

- The only difference between the frontend functions is whether they
  call either zap_lockdir() or zap_lockdir_by_dnode(), and the only real
  difference between those is that the objset+object version takes a
  dnode hold first.

All of this makes the code very repetitive and difficult to read (and
thus to modify).

This commits addresses all of the above by having the _impl() function
take a dnode_t, rather than a zap_t. This allows zap_lockdir_by_dnode() to
be called in all cases from inside the _impl() function, so it only
needs to be specified in one place.

Then, because the lock and unlock are now done inside the same function,
there's no need for a separate tag arg - we can just use FTAG.

This results in the _by_dnode() functions being just direct calls and
returns to the _impl() functions, and so allows them to be removed
entirely, and _impl() to be renamed as _by_dnode().

Finally, the objset+object functions are simple mechanical wrappers
around dnode_hold(), _by_dnode(), dnode_rele().

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18551
2026-05-18 09:19:02 -07:00
Paul Dagnelie 891e379d0f Fix failfast default and usage
The feature that added a failfast property to vdevs unfortunately did
not correctly set the default at creation time, so many vdevs do not
actually have the property set. In addition, when the property is
used, the failfast flag is not checked correctly, resulting in the
feature mostly not working as intended.

Set the failfast property to the default value at vdev allocation time.
The value will be read in from the ZAP as normal when the vdev metadata
is loaded.  Allow the property to be set on any vdev and have it be
inherited from the root or top-level vdev.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #18410
2026-05-18 09:12:09 -07:00
Rob Norris 40a87651d4 zap_impl: use flex array field for mzap_phys_t.mz_chunks
mz_phys_t is always a full-block allocation, with mz_chunks[] as an
array over the rest of the block past the header.

Recent Linux compiled with CONFIG_UBSAN will complain about this:

    UBSAN: array-index-out-of-bounds in module/zfs/zap.c:1236:28
    index 2 is out of range for type 'mzap_ent_phys_t [1]'

The fix is straightforward; simply convert this field to a flex member.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18550
2026-05-17 12:13:59 -07:00
Saju Palayur 6fb72fda0f zio_ddt_write: compute have_dvas after taking dde_io_lock
In zio_ddt_write(), have_dvas and is_ganged were computed before
dde_io_lock was taken. A concurrent zio_ddt_child_write_done() error
path calls ddt_phys_unextend() under dde_io_lock, which can zero
DVA[0] while another thread is between computing have_dvas and taking
dde_io_lock. That thread then uses the stale have_dvas=1 to call
ddt_bp_fill(), copying the zeroed DVA into the BP. A zero DVA resolves
as a hole, producing blocks that read back as zeros with no checksum
error (silent data corruption).

Fix by moving have_dvas and is_ganged computation to after dde_io_lock
is taken, so they always reflect the current state of dde->dde_phys.

Regression introduced by a41ef36858 ("DDT: Reduce global DDT lock
scope during writes").

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Saju Palayur <spalayur@maxlinear.com>
Signed-off-by: Saju Palayur <spalayur@maxlinear.com>
Closes #18366
Closes #18544
2026-05-15 14:15:05 -07:00
Rob Norris 2f283c99cc zap: remove refcount tags from backend functions
Since we now never need to unlock/lock an existing zap_t, we don't need
to thread through the refcount tag everywhere, which lets us simplify a
lot of calls.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18546
2026-05-15 12:11:19 -07:00
Rob Norris c8f9b4c4da zap: lift and simplify zap_t lock upgrade
Most fatzap write ops only take the READER zap_t lock, because the
header block only needs to be updated when a change would add or remove
a leaf block or spill the ptrtbl. When this happens, the lock is
upgraded to WRITER so those changes can be made.

If the lock can't be upgraded directly (not least because
rw_tryupgrade() is a no-op on Linux and userspace), then it has to be
dropped and re-acquired, that is, zap_unlock() and then zap_lock().

However, this method is far heavier than it needs to be, and adds
complication because it fully releases the zap_t, the header dbuf and
the dnode. This gives a window where the dbuf can be evicted and so the
zap_t destroyed. In addition to the IO overhead if this happens, this
means the zap_t returned by zap_lock() may be different to the original,
which means all callers need to be prepared for it to change.

zap_shrink() used an alternate method of simply dropping and reacquiring
zap_rwlock rather than fully destroying everything. The comment there
says it was only done because of lack of a refcount tag for unlock/lock,
but this is actually a better general technique, as the zap_t is
guaranteed to remain alive because its owning dbuf is never released and
so can will not be evicted.

So, this commit lifts the old zap_tryupgradedir() to
zap_lock_try_upgrade(), and adds a potentially-blocking variant
zap_lock_upgrade() that drops and retakes the rwlock. Everything is
switched to use them, which vastly simplifies the surrounding code.
Because the zap_t, dbuf and dnode are never dropped, there's no way for
the upgrade operation to fail, and so the callers never have to deal
with the zap_t changing under them.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18546
2026-05-15 12:11:13 -07:00