Some of our random write benchmarks on a fragmented pool show that
single-threaded portion of sync process (txg_sync_thread) can use
up to 45% of CPU time. Most of it is consumed by metaslab_sync()
and metaslab_sync_done(), during which time the pool is not doing
anything else.
While metaslab_sync() is not trivial to parallelize due to having
single spacemap log, metaslab_sync_done() is doing only per-metaslab
accounting and they can run in parallel. Even better, we can run
them while waiting for vdev label update and cache flush I/Os.
With this patch on my test system similar test randomly writing 12
100GB files with 4KB blocks shows IOPS increase from 176K to 220K.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18622
The main change is switching `unit-coverage` to run
scripts/coverage_report.pl, to get nice coverage summary output on the
commandline. The previous behaviour moves to `unit-coverage-html`.
Calls to lcov and genhtml are now silencing more warnings, and the
output file now gets branch coverage as well.
This should be compatible with both lcov 1.x and 2.x. It takes advantage
of the fact that 1.x is far more forgiving of both options it doesn't
understand, and of various kinds of "inconsistency" in the input data.
The rest is both simplifying and improving the rules. We keep the
coverage output around now, but still rebuild it if the binary changes.
The `clean` target now removes the coverage output too. And we use the
target name more often for building path names, as its far less noisy.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18619
Instead of performing multiple operations on the path name in
zfs_key_config_modify_session_counter() open the file once and
perform the fchown, fchmod, and openat on the open file handle.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18618
When zc->zc_cookie is set this indicates to zfs_ioc_set_prop() that
these are received properties and ZPROP_HAS_RECVD will be set on the
dataset. This is only done as part of a `zfs receive` so additionally
apply the zfs_secpolicy_recv() policy. Individual property checks
continue to be handled by zfs_check_settable().
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18617
Update vdev_prop_get_objid() to set objid on error as the comment
in vdev_prop_get() describes.
"objid is set to 0 when absent and the few cases that call
zap_lookup directly guard against this below."
This resolves the following possible uninitialized variable warning.
module/zfs/vdev.c: In function ‘vdev_prop_get’:
module/zfs/vdev.c:6913:12: error: ‘objid’ may be used uninitialized
in this function [-Werror=maybe-uninitialized]
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18616
Check for invalid characters in sharenfs/sharesmb dataset props.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18613
Fix the mismatched type in zfs_ioc_userspace_many() and limit the
number of entries returned to 1000. When a size larger than this
is requested the response is truncated, zfs_userspace() already
correctly handles short responses. Historically, zfs_userspace()
has requested 100 entries at a time, this cap allows for 10x larger
batch sizes if needed in the future.
Reported-by: Yuxiang Yang, Yizhou Zhao, Ao Wang, Xuewei Feng, Qi Li,
Reported-by: and Ke Xu from Tsinghua University using GLM-5.1 from Z.ai
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18615
There's no need to call libzfs_core_init() when `zdb -l` is used to
read a vdev label.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tiehexue <tiehexue@hotmail.com>
Closes#18606
Cursors defer taking holds until they're needed, so if a cursor is
created but not used, it may still hold resources that it would have
cleaned up along the way, but never got chance to.
(this really happened in the first version of
zap_cursor_init_by_dnode(), so not a contrived case!)
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18603
These add a bunch of entries to the ZAP, and then ensure that a cursor
walk over the ZAP sees them all once and once only, and no others.
The serialization test takes it a bit further, by serializing and
recreating the cursor half way through and confirming it correctly picks
up from the same spot, and then recreating the cursor from serialized
again and confirming that it also see only the second set of entries.
This ensures that the serialized cursor state is fully self contained
and not reliant on anything left over in the ZAP itself at serialization
time.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18603
The thing under test will be taking and releasing dnode refs/holds. By
counting them and exposing the current count, we can assert in test
cleanup that we haven't missed releasing any, especially in cases where
the hold is held across multiple test steps.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18603
This commit adds zap_cursor_init_by_dnode() (and
zap_cursor_init_serialized_by_dnode()), which allow the target ZAP to
provided via an existing dnode rather than the traditional objset+object
pair.
This requires some reorganisation of the way that zap_cursor_t is
initialised. Up until now, zap_cursor_init() has merely stored the
objset, object, serialized form and prefetch flag, and left it until
zap_cursor_retrieve() to actually call zap_lock(). This makes a
_by_dnode() form complicated, because it is a held resource that needs
to be released, but might not be used if zap_cursor_retrieve() is not
called. So there's a bunch of state tracking required.
However, all cursor users immediately follow zap_cursor_init() with
zap_cursor_retrieve(), so there's nothing gained by delaying holds. This
allows us to simplify things, by calling zap_lock() directly in
zap_cursor_init() and retaining it until zap_cursor_fini().
This does however means the _init() functions are now fallible, and can
return an error. This adds complexity to most of the call sites, which
are typically in a for loop of the form:
for (zap_cursor_init(...);
zap_cursor_retrieve(...) == 0;
zap_cursor_advance(...))
To avoid needing to make significant changes at every call site, a
failed _init() call will also zero the cursor struct. If the caller
doesn't check the return and continues to zap_cursor_retrieve(), they
will get an EIO return, and zap_cursor_fini() will just return.
The existing zc_objset and zc_zapobj fields are retained to support
source backcompat for Lustre, which inspects them directly.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18603
If the cursor were ever to actively hold resources, not finalising it
would mean leaking those resources whenever the scrub is paused.
The cursor is already reinitialized from the stored serialized form
if/when it is resumed, so there's nothing we need from the old one, just
to release it.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18603
Expose zfs_metaslab_condense_pct and zfs_metaslab_sm_blksz_* as
module parameters on Linux, matching their existing FreeBSD sysctls.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18594
Add additional checks to verify a packed string or string array nvpair
is terminated. Or more specifically, verify doing a strlen() on the
prospective string does not overrun the packed nvlist buffer.
Also add additional checks in the libzfs_input_checks test case to
verify un-terminated strings, and add in a nvlist ioctl payload
fuzz test for good measure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18604
One of the main dRAID features is avoiding single drive bottlenecks
by using distributed spares. Activation of regular spare will take
more time, during which the dRAID redundancy is even lower than in
case of RAIDZ. But regular spares might still be added to the pool
as a second line of defence, possibly shared by several vdevs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18578
The zfs-arm workflow was the only build/test workflow without a
concurrency block, so superseded runs were not cancelled.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18608
The package removal ran against a stale package index and failed to
fetch a package that had been removed from the repository. Refresh
the index first.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18607Closes#18609
Decompressors must expand a ZFS block to exactly the expected number
of bytes. Treat decompression to an unexpected length as failure, so
truncated or short output is not accepted as valid decompression. This
makes our handling of decompress return values consistent with the
decompression functions' APIs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alek Pinchuk <Alek.Pinchuk@connectwise.com>
Closes#18599
Follow-up to #18518, which skipped the qemu matrix on doc-only PRs.
zloop, zfs-arm, and smatch are irrelevant to doc-only changes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18601
Almalinux 9,10 kernels now include a backport of Linux commit
v6.15-13744-g41cb08555c41 which renames the from_timer() function
to timer_container_of(). Apply the upstream Lustre compatibility
patch to our builds. This patch should be included in the next
Lustre release and can be dropped then.
ZFS-CI-Type: quick
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Before this change zed tried to activate spares just in order they
are stored in configuration, which is quite arbitrary. To make
the result more optimal, sort the spares by their rotational status
and size, so that the most fitting ones have better chances.
To make it more visible, export the rotational status as a vdev
property. While at it, minimally fix vdev properties reading for
spare and L2ARC vdevs, having no ZAPs.
To keep the rotational status for spare activation purposes when
failed device is already gone, save it into the vdev config. The
same is for spare vdevs asize.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18597
The checkstyle workflow was the only one still pinned to
actions/checkout@v4; the other workflows already use v6.
Bump it to match.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18600
Add make options which let one respectively compile the kernel modules
with the address sanitizer, memory sanitizer, and undefined behaviour
sanitizer enabled. This makes it much easier to run the ZTS with those
sanitizers enabled.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Longros <chris.longros@gmail.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#18596
Added m4 macro to check fs_parse API signature and wrappers. Before
5.6, fs_parse() took a struct fs_parameter_description which wraps
the parameter specs with name and enum pointers. From 5.6, the
description struct was removed and fs_parse() accepts the
fs_parameter_spec directly.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tiehexue <tiehexue@hotmail.com>
Closes#18585
Describe the current zfs-qemu pipeline, ci_type selection, supported
guests, and the code-checking and other auxiliary workflows.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18590
To get at userspace threads, we use a mix of -pthread and -lpthread to
compiler and/or linker. That's fine enough for the platforms we target
but its not exactly right (eg on Linux -pthread defines _REENTRANT, when
-lpthread does not), and won't work properly some other platforms that
we might end up on someday (eg illumos).
There's also a danger if we link together two compilations units, one
compiled with -pthread, one not, as calls between them may not properly
manage thread state.
Here we switch to use the AX_PTHREAD macro to detect the correct set of
flags for CFLAGS and LIBS, and add them to the default compilation
flags for all units.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18588
Exposes the remaining internal implementation functions:
- zap_update_by_dnode()
- zap_length_by_dnode()
- zap_get_stats_by_dnode()
And creates zap_contains_by_dnode(), followng the same structure as the
other functions.
Together, these complete the "core" ZAP _by_dnode() API for the test
suite to use.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18586
* ZTS/zinject: cover label, object, delay, panic and verify effect
Cover the device, label, object, delay and panic injection modes:
every valid value is accepted and unknown values are rejected. A
final pass confirms that registered injections execute by watching
the inject counter advance after triggering the desired injected
error.
Signed-off-by: Christos Longros <chris.longros@gmail.com>
* ZTS/zinject: add author copyright
Signed-off-by: Christos Longros <chris.longros@gmail.com>
---------
Signed-off-by: Christos Longros <chris.longros@gmail.com>
The check referenced $zpool_expandsize, which is not defined in this
test; the variable assigned two lines above is $expandsize. A "-"
value returned by zpool reopen therefore did not trigger the
intended log_fail, and the failure surfaced only at the later
post-online-e size check with a less specific message.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18580
The six __zpl_xattr_{user,trusted,security}_{get,set} entry points
built their prefixed name via kmem_asprintf("%s%s", prefix, name)
and freed it with kmem_strfree on the way out.
The Linux xattr API caps the full prefix+name length at
XATTR_NAME_MAX (255), the same bound fs/xattr.c's syscall handlers
rely on with their stack-resident struct xattr_name, and so do
the same in our xattr handlers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@truenas.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes#18570
Several of the tests included in the sanity.run file are no
longer quick. In fact, the pyzfs tests can take over 5 minutes
to run which exceeds the allowed default timeout resulting the
the testing being killed.
Perform a little housekeeping and drop any test which takes more
than 10 seconds to run. This brings things back a little closer
to the original intent of having a battery of useful test cases
which can be run in ~10 minutes.
ZFS-CI-Type: quick
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18576
This commit adds the bones of a unit test suite for the ZAP subsystem.
The actual tests themselves don't do much, just ZAP creation and
destruction and basic KV ops. At this point its intended to be enough to
demonstrate what tests under this framework would look like.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18564
Some simple initial mock for key DMU structures. It's hard to say this
early how generalisable these are, however they are enough for the ZAP
unit tests (next commit).
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18564
This commit establishes a unit test framework for OpenZFS, and
integrates it into the build.
It includes:
- the "munit" unit test framework (munit.c, munit.h)
- some light extensions to munit and glue for OpenZFS (unit.c, unit.h)
- make targets for running tests and generating coverage reports
- a document explaining the what, how and why
This is a first step; I expect we will extend all of this as we use it
more places and gain experience with it.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18564
FULL_RUN_REGEX in generate-ci-type.py covered .github/workflows/scripts/
but not the workflow YAML files, so a PR that only edited zfs-qemu.yml
got "quick" CI and never tested its own matrix change. Add the YAML
files to the list.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18577
Full CI runs for proposed changes always occur in the PR where the
review is done and patch approved. Once merged the full CI is run
again using the merged commit. This is somewhat overkill. In the
interest of reducing the CI load only run the zloop and checkstyle
workflows which are enough to verify the build on the master branch.
Push events to forks will continue to trigger a full CI run.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18571
The zfs-qemu-packages workflow allows us to easily build RPMs for the
current branch. However, there can be cases where we want to use the
current CI environment to build older releases. This can happen when
the VM or runner environment changes, and the older CI doesn't have
the updates needed to run with it anymore.
This commit adds in a text box to specify a specific branch/tag to build
using the current CI environment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18569
f5a9e3a622 changed how SB_RDONLY was applied to the new mount in a way
that was too simplistic - it only sets readonly on the filesystem if the
mount was 'ro', but it never clears it if the mount was 'rw'. This
causes the 'rw' option to effectively be ignored, and so the readonly=
property wins out.
This fixes it by doing it the right way: checking the flags mask to see
if it was actually provided as an option at all, and then setting or
clearing it as appropriate.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18557Closes#18563