Normal, special and dedup vdevs differ only by space allocation
bias. Normal and special vdevs might even legally store blocks
targeted to other classes. Dedup vdevs don't normally do it, but
there is no real reason why they can't. Considering this, it is
not impossible to change the allocation bias for those vdevs.
This change introduces a new top-level vdev property -- alloc_bias,
reporting current bias for the vdev, and allowing to change it.
This allows to easily change vdev role in a pool, especially if
vdev removal is impossible. To not complicate the code, changes
take effect only on next pool import.
Changes to/from log vdev could also be theoretically possible, but
they are artificially blocked for now, partially due to additional
complications, and partially due to potential danger of placing
other blocks on log vdevs, that would otherwise be non-fatal.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18493
If the pool is active 'zpool export' will fail resulting in
a test failure. Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18498
When reservation sync updates a child's reserved space, it rolls the
delta into ancestor space accounting while still holding the child's
dd_lock. That locking order is intentional, but Linux lockdep sees
the ancestor acquisition as recursive because it lacks a nested lock
subclass annotation.
Teach the reservation-sync space-accounting path to acquire ancestor
dd_lock instances with a nested subclass. Keep the existing public
interfaces and accounting behavior unchanged by routing only the
ancestor rollup through local helpers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Signed-off-by: gality369 <gality369@example.com>
Closes#18497
Don't use trim_progress() which is racy to wait for the pool trim
to complete. Instead use the wait (-w) option which is intended
for this.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18496
Commit f828a80c may have resolved the underlying cause for
the occasional CI failures observed for this test. Remove
the exception to ensure any new occurrences are noticed.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6136Closes#18495
The import of Zstd v1.5.7 in a2ac9cd606
added an unconditional renaming of ZSTD_isError to zfs_ZSTD_isError
with an asm directive. Instead, do it with a define that is conditioned
on whether zstd_compat_wrapper.h is actually in use. Also add a define
to that header so that it can be detected. This allows the build to
work without using the compat wrapper.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#18483
The mode=0 and FALLOC_FL_KEEP_SIZE preallocation path can reach
zfs_freesp() directly and call zfs_statvfs() before going through the
normal zpl_enter_verify_zp() boundary.
When zfs_rezget() tears down a failed SA reload, a stale inode may
remain alive in the VFS with z_sa_hdl cleared. The unchecked
fallocate path can then reach sa_lookup(zp->z_sa_hdl, ...) through
zfs_statvfs() or zfs_freesp() and crash on a NULL SA handle.
Use zfs_enter_verify_zp() in zfs_statvfs() so stale znodes are
rejected under the teardown lock for both fallocate and statfs.
Also wrap the direct zfs_freesp() call in
zpl_enter_verify_zp()/zfs_exit() so this path follows the same
validation rules as the other Linux ZPL file operations.
Fixes: f734301d22
("linux: add basic fallocate(mode=0/2) compatibility")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Co-authored-by: gality369 <gality369@example.com>
Closes#18458
Add manpage entries for parameters and properties that exist in
source but were not previously described:
- spl.4: spl_schedule_hrtimeout_slack_us
- zfsprops.7: longname
- vdevprops.7: raidz_expanding
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18467
Update freebsd15-0s builder to freebsd15-1s and point it at the
15.1-PRERELEASE tag. The previous freebsd-15.0-STABLE images are
no longer available.
Additionally, add a freebsd15-0r stanza for the RELEASE.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
dbuf_whichblock() is not made to handle offsets beyond the block
end for single-block objects. Handle it in dmu_evict_range(),
similar to dmu_prefetch_by_dnode().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18399Closes#18489
- Add Fedora 44 to CI tests
- Fix build issues from the newer compiler. These are mostly 'char *'
to 'const char *' conversions.
- Fix threadsappend.c test waiting for the same thread TID twice.
This caused the test to hang on F44 (but strangely not other OSs?)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18478
Only call txg_wait_synced() when rebuild IOs were issued for this
metaslab. This is a small optimization since in practice the first
metaslab is very likely to have allocations and cause vr_last_txg
to be initialized. After this point when processing empty metaslabs
txg_wait_synced() is called but with an already committed txg so it
will not wait. Still it's better not to call txg_wait_synced() at
all when it's not needed.
Reviewed-by: Andriy Tkachuk <atkachuk@wasabi.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18482
zfsctl_snapshot_unmount() called exportfs_flush() before every umount
attempt to drop NFS export cache references that pin the snapshot
mountpoint. The flush has global effect on the host's NFS exports and
clients, so paying it on every snapshot unmount (including auto-expire
rounds for snapshots that were never NFS-accessed) impacts unrelated
snapshots and clients.
ZFS cannot invalidate individual export cache entries because the
relevant sunrpc cache APIs are exported GPL-only. Defer the global
flush so it runs only when the umount has actually failed, then retry
once. Snapshots that are not NFS-pinned succeed on the first attempt
and never trigger the flush.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#18476
Currently, after rebuild (aka sequential resilver), checksum
errors can be seen sometimes on the spare vdev or draid spare.
On my laptop, it happens from 2 to 4 times of running
redundancy_draid_spare1 test in a loop for 100 times.
It looks like there's a race in vdev_rebuild_thread() when the
rebuild of space map ranges is finished and we re-enable
allocations from the metaslab too soon: a new allocations may
happen from that metaslab before txg with the rebuilt ranges is
sync-ed, causing undesirable interference.
Solution: wait for the txg to be sync-ed before enabling metaslab.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com>
Closes#18307Closes#18319Closes#18473
In send_reader_thread(), the PREVIOUSLY_REDACTED handler computed
file_max as MIN(dn->dn_maxblkid, range->end_blkid). dn_maxblkid is
an inclusive maximum block ID while range->end_blkid is exclusive (one
past the last block). The resulting file_max was then used as an
exclusive loop bound, causing the last block of any file (at index
dn_maxblkid) to be silently skipped when a PREVIOUSLY_REDACTED range
covered the end of the file.
The block was never written to the send stream so the receiver kept
zeros there. ZFS reported no error because the stream itself was
valid; the data was simply absent.
Fix: use dn_maxblkid + 1 so file_max is consistently exclusive.
Add a regression test (redacted_max_blkid.ksh) that modifies only the
last block of a file in one clone, creates a redaction bookmark from
it, then sends an unmodified clone incrementally from that bookmark.
The PREVIOUSLY_REDACTED path must fill in the last block; the test
verifies it is not zeros and matches the original.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes#18477
The d_u union introduced in 3.18 is now anonymous, so we need to detect
it and decide the right way to name d_alias.
Note that we used to have support for both names to support kernels
before 3.18, so this commit is effectively reverting the commit that
removed that support, efc293e371.
Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18471
This is the repro test from #18464, and confirms that when disabled, the
libzfs_mnttab_cache is discarded and reloaded on every lookup.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Prakash Surya <prakash.surya@perforce.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18466Closes#18464
In #18296 we made the cache "always on", with the justification that our
internal tools always enable the cache anyway. This allowed removing the
entire alternate implementation of libzfs_mnttab_find().
Unfortunately, it appears that there are still libzfs consumers out
there that were expecting to be able to disable the cache entirely, and
this broke some behaviour for them.
This commit restores the ability to enable or disable the cache (and
returns to "disabled" as the default, to preserve existing behaviour).
Fortunately there is no need for a whole second codepath; just a small
reorganisation to drop all cached entries each time.
Sponsored-by: TrueNAS
Reviewed-by: Prakash Surya <prakash.surya@perforce.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes#18466Closes#18464
When the path argument to "zfs list -Ho name <path>" (or any caller of
zfs_path_to_zhandle()) is a symlink that crosses a mount boundary, the
wrong dataset is returned. Instead of returning the dataset that owns
the symlink's target, getextmntent() matches the dataset containing the
symlink itself.
For example, given two ZFS datasets "tank/ds1" and "tank/ds2", and a
symlink "/tank/ds1/link" pointing into "/tank/ds2":
$ sudo zfs list -Ho name /tank/ds1/link
tank/ds1
The expected (and previous) behavior is to return "tank/ds2", since the
symlink's target resides in that dataset.
The problem is in getextmntent(), in lib/libspl/os/linux/mnttab.c. That
function calls statx() on the caller-supplied path to obtain its mnt_id
(used to match against the mnt_id of each entry in /proc/self/mounts),
and it passes AT_SYMLINK_NOFOLLOW to that statx() call. As a result,
the mnt_id returned reflects the symlink's location rather than the
symlink target's mount, and the wrong /proc/self/mounts entry is
matched.
The same function also calls stat64() on the caller-supplied path
(used as a fallback when STATX_MNT_ID is not available, and to populate
the statbuf out-parameter). stat64() always follows symlinks, so the
statx() and stat64() calls were inconsistent: one resolved the symlink,
the other didn't. The AT_SYMLINK_NOFOLLOW behavior may be appropriate
when statx() is called on a mount entry from /proc/self/mounts (which
is always a real directory), but it is wrong for caller-supplied paths,
which may be symlinks.
This bug was introduced by 523d9d6007 ("Validate mountpoint on
path-based unmount using statx"), which added the STATX_MNT_ID code
path. However, the bug was latent: config/user-statx.m4 omitted
"#define _GNU_SOURCE" when checking for STATX_MNT_ID in <sys/stat.h>,
so HAVE_STATX_MNT_ID was never defined, and the buggy statx() path was
never compiled in. getextmntent() always fell back to the dev_t
comparison via stat64(), which correctly follows symlinks.
The fix to that autoconf check, in 2b930f63f8 ("config: fix
STATX_MNT_ID detection"), caused HAVE_STATX_MNT_ID to be properly
defined on kernels that support it, activating the broken
AT_SYMLINK_NOFOLLOW path for the first time and exposing the
regression.
The fix is to drop AT_SYMLINK_NOFOLLOW from the statx() call so that
symlinks are followed, matching the behavior of stat64() on the same
path.
Verified with a minimal reproducer: created two ZFS datasets, placed a
symlink inside the first pointing into the second, and confirmed that
"zfs list -Ho name <symlink>" returns the dataset containing the
symlink's target rather than the dataset containing the symlink.
Signed-off-by: Prakash Surya <prakash.surya@perforce.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Automake's default tar formats (v7 pre-1.18, ustar since) impose path
length limits that drop several long test filenames from the release
tarball when `make dist` runs. Pax format has no such limit and is
read by GNU tar 1.14+ and libarchive/bsdtar.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes: #17276Closes: #18465
- We've seen occasional 'ERROR 502: Bad Gateway' from the runner trying
to download an image with axel. Axel can open multiple connections for
a faster download, so maybe that's causing problems. This commit adds
in a fallback to curl if the axel download doesn't work.
- Update merge_summary.awk to print out killed tests in the summary.
We've seen cases where the summary page was red but there were no test
failures printed. This is because one of the VMs had too may
killed tests, which caused the total test time to run too long and
caused the runner to timeout qemu-6-test.sh. When the runner kills off
qemu-6-tests.sh, it means we never generate the nice summary page
for that VM listing the killed off tests. This commit parses the
partial test logs for killed off tests and includes them in the
merge_summary.awk output.
- Print an error message in the summary page if one of the VMs
didn't complete ZTS. This helps draw attention to a VM crash.
- FreeBSD sometimes has broken links to their CI image. When that
happens, select the newest nightly snapshot image as an alternative.
This is needed right now, since the current images in the FreeBSD 16
"current/" directory are returning 404 errors.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18460
Add entries for module parameters that are exposed via
ZFS_MODULE_PARAM but not covered in zfs.4:
zfs_active_allocator (charp, module/zfs/metaslab.c)
zfs_compressed_arc_enabled (int, module/zfs/arc.c)
zfs_arc_no_grow_shift (uint, module/os/freebsd/zfs/arc_os.c)
zfs_scan_blkstats (int, module/zfs/dsl_scan.c)
zfs_snapshot_history_enabled (int, module/zfs/dsl_dataset.c)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18456
spa_do_crypt_abd() already maps a missing key to EACCES. However
spa_do_crypt_mac_abd(), spa_do_crypt_objset_mac_abd(), and
spa_crypt_get_salt() still return the raw
spa_keystore_lookup_key() error (ENOENT). This is inconsistent
As we want to treat all “no key” failures as a permission
failure. Standardize on EACCES for the unloaded-key case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Closes#18448
Allow an additional second for the test to complete before checking
the results. This may explain occasional test failures in the CI.
Additionally, when the test fails dump the tmpfile for inspection.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18455
When sequentially resilvering a dRAID pool it's possible that a few
correctable checksum errors will be reported. This is a known issue
which is occasionally observed in the CI. Until it's resolved we
want the test case to tolerate a few checksum errors in this scenario
to prevent false positives in the CI.
This change also has the additional side effect of standardizing in
one location how the dRAID pool integrity is verified.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #18307
Issue #18319Closes#18436
Fix a bug where an cgroup-OOM-killed process can cause a panic:
usercopy: Kernel memory exposure attempt detected from vmalloc (offset
1007584, size 217120)!
kernel BUG at mm/usercopy.c:102!
This was caused by zfs_uiomove() not correctly returning EFAULT
for short copies.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#15918Closes#18408
The date(1) command and snapshot timestamps use different clock
sources which can result in a small discrepancy. This can cause
the test the incorrectly fail. To avoid this, add a brief delay
to the test case to allow for minor skew.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18450
The last portion of mmp_on_uberblocks.ksh was intended to verify that
the sequence number was incremented. However, it failed to account for
the case where a txg sync would occur resulting in the sequence number
being correctly reset.
Rather than add additional code to detect this that check has been
removed. The mmp update frequency is still verified via the kstat
which is a more reliably mechanism to detect the writes. There are
several other mmp tests which verify the uberblock changes are reflected
on disk so there's no significant loss of test coverage.
Finally, the test case has been simplified to use the within_percent
function for readability.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18452
Replace GNU-specific du flags (--block-size, -B1) and dd conv=nocreat
with POSIX compatible commands. Move -O flag before pool name in
zpool create to align with FreeBSD's strict POSIX getopt(). Relax vdev
size thresholds in trim_config to account for ZFS-on-ZFS overhead.
Add sync_pool before zpool trim -w to ensure freed blocks are committed
before trimming.
Skip zpool_trim_partial, zpool_trim_verify_trimmed, trim_config, and
autotrim_config on FreeBSD where trim does not reclaim space on file
vdevs stored on a ZFS filesystem within the test framework.
Tested on FreeBSD 16.0-CURRENT: 26 PASS, 4 SKIP, 0 FAIL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18398
FreeBSD has supported hole punching via fspacectl(2) since
FreeBSD 14.0 and the test library already handles this using
truncate -d. Remove the skip that prevented trim tests from
running on FreeBSD.
Tests will still skip if the hardware does not support
TRIM/UNMAP, which is checked separately via diskinfo.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18398
If the pool is active 'zpool export' will fail resulting in
a test failure. Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18447
The ioctl call to create the pool was returning -1 with errno EINVAL.
Inside the module code, inside vdev_draid.c, verify_perms is calling
fletcher_4_native_varsize. This in turn calls fletcher_4_scalar_native.
So, implemented a fletcher_4_byteswap_varsize which makes use of the
fletcher_4_scalar_byteswap in Big endian machines.
Reviewed-by: Andriy Tkachuk <andriy.tkachuk@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pranav P <pranavsdreams@gmail.com>
Closes#16261Closes#18445
A FreeBSD ZFS filesystem with properties "utf8only=on" and
"normalization=formD" consistently produces this panic when
building the lang/perl-5.42.0 port.
A ZFS file system with "utf8only=off" and "normalization=none"
works fine.
The cause of the panic seems to be incorrectly using the FreeBSD
namecache when normalisation is present. This commit adds a
predicate to prevent that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jan Martin Mikkelsen <janm-github@transactionware.com>
Closes#18430
This PR adds a check in the mirror and raidz code for the case where
there are errors <= nparity. In that case, ZFS sets a new flag on
the zio that will be checked in zio_done. If that flag is set, when
the write IO completes, we issue a read IO for the same blkptr.
That will allow ZFS's auto-healing mechanisms and other errors
recovery tools to detect the effectively-corrupt data, and handle
it accordingly. Note that because draid raidz's IO done function,
it also benefits from this functionality.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes#18387
- Remove line where we disable stdout at the end of qemu-1-setup.sh
- Fix comment switching the 2x75GB -> 1x150GB cases
- Add some more debug to the end of the script
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18441
dmu_write_direct_done() passes dmu_sync_arg_t to
dmu_sync_done(), which updates the override state and
frees the completion context. The Direct I/O error path
then still dereferences dsa->dsa_tx while rolling the
dirty record back with dbuf_undirty(), resulting in a
use-after-free.
Save dsa->dsa_tx in a local variable before calling
dmu_sync_done() and use that saved tx for the error
rollback. This preserves the existing ownership model
for dsa and does not change the Direct I/O write
semantics.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: gality369 <gality369@example.com>
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Closes#18440
zfs allow with a typo (e.g. "snapshop") produced the misleading
error "operation not applicable to datasets of this type". Report
"invalid permission" instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18401Closes#11903
If I could go back in time, I would beg Sun engineers to pick a
different name. For those of us who have not read the ZFS On-Disk
Specification pdf, it is not at all obvious that clearing a "label" is
such a bad thing.
But changing the name would be a breaking change, so at least for now
we can update the documentation.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shelvacu <git@shelvacu.com>
Closes#18347
Non-root callers got "unmount failed" when ZFS_MOUNT_HELPER was set
because /bin/umount's exit status doesn't preserve errno. Map a
non-zero helper exit to EPERM when geteuid() != 0 so the user sees
"permission denied".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#11740Closes#18443
When a VM fails to launch or is unreachable the qemu-7-prepare.sh
script will fail to collect the artifacts due to the missing vm*
directories. We want to collect as much diagnostic information as
possible, when missing create the directory to allow the subsequent
steps to proceed normally. Additionally, we don't want to fail
if the /tmp/summary.txt file is missing.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18438
We've seen some qemu-1-setup failures while trying to change the
runner's block device scheduler value to 'none':
We have a single 150GB block device
Setting up swapspace version 1, size = 16 GiB (17179865088 bytes)
no label, UUID=7a790bfe-79e5-4e38-b208-9c63fe523294
tee: '/sys/block/s*/queue/scheduler': No such file or directory
Luckily, we don't need to set the scheduler anymore on modern kernels:
https://github.com/openzfs/zfs/issues/9778#issuecomment-569347505
This commit just removes the code that sets the scheduler.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18437
Update the META file to reflect compatibility with the 7.0
kernel.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18435
The resilver_restart_001 test case has not been entirely reliable
when run under the CI. Address several small issues which may be
responsible.
- Configure the pool as raidz2 instead of raidz1 since the test
offlines two devices. This ensures the second device is marked
as OFFLINE instead of DEGRADED.
- Start the zpool replace after setting SCAN_SUSPEND_PROGRESS to
close any potential race where the replace finishs to quickly.
- Wait for the offlines/onlined vdevs to fully transition to the
expected state during the test.
- Add the true flag to sync_pool to force a TXG sync to happen
even if it might not otherwise be required.
- During cleanup dump the zpool events history to aid debugging
if the updated test case is still unreliable in the CI.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#18434
When copy_file_range overwrites a recent truncation, subsequent reads
can incorrectly determine that it is read hole instead of reading the
cloned blocks.
This can happen when the following conditions are met:
- Truncate adds blkid to dn_free_ranges
- A new TXG is created
- copy_file_range calls dmu_brt_clone which override the block pointer
and set DB_NOFILL
- Subsequent read, given DB_NOFILL, hits dbuf_read_impl and
dbuf_read_hole
- dbuf_read_hole calls dnode_block_freed, which returns TRUE because the
truncated blkids are still in dn_free_ranges
This will not happen if the clone and truncate are in the same TXG,
because the block clone would update the current TXG's dn_free_ranges,
which is why this bug only triggers under high IO load (such as
compilation).
Fix this by skipping the dnode_block_freed call if the block is
overridden. The fix shouldn't cause an issue when the cloned block is
subsequently freed in later TXGs, as dbuf_undirty would remove the
override.
This requires a dedicated test program as it is much harder to trigger
with scripts (this needs to generate a lot of I/O in short period of
time for the bug to trigger reliably).
Assisted-by: Gemini:gemini-3.1-pro
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gary Guo <gary@kernel.org>
Closes#18412Closes#18421
Replace semicolons with && so build failures are not masked by the
subsequent lockfile cleanup. Use trap to ensure the lockfile is
removed on both success and failure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes#18206Closes#18424