KekenoBSD/src

Author	SHA1	Message	Date
Alexander Motin	d65015938e	Vdev allocation bias/class change Normal, special and dedup vdevs differ only by space allocation bias. Normal and special vdevs might even legally store blocks targeted to other classes. Dedup vdevs don't normally do it, but there is no real reason why they can't. Considering this, it is not impossible to change the allocation bias for those vdevs. This change introduces a new top-level vdev property -- alloc_bias, reporting current bias for the vdev, and allowing to change it. This allows to easily change vdev role in a pool, especially if vdev removal is impossible. To not complicate the code, changes take effect only on next pool import. Changes to/from log vdev could also be theoretically possible, but they are artificially blocked for now, partially due to additional complications, and partially due to potential danger of placing other blocks on log vdevs, that would otherwise be non-fatal. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18493	2026-05-07 09:16:39 -07:00
Christos Longros	b68c782d82	man: document three missing properties and tunables Add manpage entries for parameters and properties that exist in source but were not previously described: - spl.4: spl_schedule_hrtimeout_slack_us - zfsprops.7: longname - vdevprops.7: raidz_expanding Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #18467	2026-05-04 15:09:00 -07:00
Andriy Tkachuk	d1b0a69825	draid: add failure domains support Currently, the only way to tolerate the failure of the whole enclosure is to configure several draid vdevs in the pool, each vdev having disks from different enclosures. But this essentially degrades draid to raidz and defeats the purpose of having fast sequential resilvering on wide pools with draid. This patch allows to configure several children groups in the same row in one draid vdev. In each such group, let's call it failure group, the user can configure disks belonging to different enclosures - failure domains. For example, in case of 10 such enclosures with 10 disks each, the user can put 1st disk from each enclosure into 1st group, 2nd disk from each enclosure into 2nd group, and so on. If one enclosure fails, only one disk from each group would fail, which won't affect draid operation, and each group would have enough redundancy to recover the stored data. Of course, in case of draid2 - two enclosures can fail at a time, in case of draid3 - three enclosures (provided there are no other disk failures in each group). In order to preserve fast sequential resilvering in case of a disk failure, the groups much share all disks between themselves, and this is achieved by shuffling the disks between the groups. But only i-th disks in each group are shuffled between themselves, i.e. the disks from the same enclosures, after that they are shuffled within each group, like it is done today in an ordinary draid. Thus, no more than one disk from any enclosure can appear in any failure group as a result of this shuffling. For example, here's how the pool status output looks like in case of two `draid1:2d:4c` failure groups: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 draid1:2d:4c:8w:1s-0 ONLINE 0 0 0 enc0d0 ONLINE 0 0 0 enc1d0 ONLINE 0 0 0 enc2d0 ONLINE 0 0 0 enc3d0 ONLINE 0 0 0 enc0d1 ONLINE 0 0 0 enc1d1 ONLINE 0 0 0 enc2d1 ONLINE 0 0 0 enc3d1 ONLINE 0 0 0 spares draid1-0-0 AVAIL The number of failure groups is specified indirectly via the new width parameter in draid vdev configuration descriptor, which is the total number of disks and which is multiple of children in each group. This multiple is the number of groups (width / children). Doing it this way allows the user conveniently see how many disks draid has in an instant. Spare disks are evenly distributed among failure groups, and they are shared by all groups. However, to support domain failure, we cannot have more than nparity - 1 failed disks in any group, even if they are rebuilt to draid spares (the blocks of those spares can be mapped to the disks from the failed domain, and we cannot tolerate more than nparity failures in any failure group). The retire agent in zed is updated to not start resilvering when the domain failure happens. Otherwise, it might take a lot of computing and I/O bandwidth resources, only to be wasted when the failed domain component is replaced. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Closes #11969 Closes #18148	2026-04-08 10:09:47 -07:00
Christos Longros	1ff1f13a87	vdevprops: remove unused slow_io defaults, fix documentation Remove the unused DEFAULT_SLOW_IO_N and DEFAULT_SLOW_IO_T defines from zfs_diagnosis.c. Unlike the checksum and I/O thresholds, the slow_io_n and slow_io_t properties must be manually opted in and have no built-in defaults. The defines were misleading. Update the vdevprops man page to clarify that slow_io_n and slow_io_t must be manually set, and that the documented defaults (10 errors in 600 seconds) apply only to checksum and I/O events. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #18359	2026-04-06 09:30:46 -07:00
MigeljanImeri	4975430cf5	Add vdev property to disable vdev scheduler Added vdev property to disable the vdev scheduler. The intention behind this property is to improve IOPS performance when using o_direct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com> Closes #17358	2026-02-23 09:34:33 -08:00
Mariusz Zaborski	02fdd26e51	Add knob to disable slow io notifications Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that allows users to disable notifications for slow devices. This prevents ZED and/or ZFSD from degrading the pool due to slow I/O. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org> Closes 17477	2025-11-11 10:42:17 -08:00
Paul Dagnelie	d64711c202	Detect a slow raidz child during reads A single slow responding disk can affect the overall read performance of a raidz group. When a raidz child disk is determined to be a persistent slow outlier, then have it sit out during reads for a period of time. The raidz group can use parity to reconstruct the data that was skipped. Each time a slow disk is placed into a sit out period, its `vdev_stat.vs_slow_ios count` is incremented and a zevent class `ereport.fs.zfs.delay` is posted. The length of the sit out period can be changed using the `raid_read_sit_out_secs` module parameter. Setting it to zero disables slow outlier detection. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Contributions-by: Don Brady <don.brady@klarasystems.com> Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17227	2025-09-10 15:25:03 -07:00
Alexander Ziaee	5a8ba4520b	manuals: Audit/bump dates for last content change Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org> Closes #17676	2025-08-28 16:26:16 -07:00
Simon Howard	ef81812726	Fix spelling errors Unlike some of my other fixes which are more subtle, these are unambigously spelling errors. Signed-off-by: Simon Howard <fraggle@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org>	2025-03-24 14:37:40 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Tim Smith	b8c0c154ad	Fix several typos in the man pages Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tim Smith <tsmith84@gmail.com> Closes #16965	2025-01-21 10:30:17 -05:00
Don Brady	fb6d8cf229	Add some missing vdev properties (#16346 ) Sponsored-by: Klara, Inc. Sponsored-By: Wasabi Technology, Inc. Signed-off-by: Don Brady <don.brady@klarasystems.com> Co-authored-by: Don Brady <don.brady@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2024-07-23 16:34:09 -07:00
Alek P	ea2862cdda	vdev props comment and manpage should include zfsd and FreeBSD mentions Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Alek Pinchuk <apinchuk@axcient.com> Closes #15968	2024-04-03 17:56:34 -07:00
Don Brady	cbe882298e	Add slow disk diagnosis to ZED Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). However, no action is taken by ZED, like is done for checksum or I/O errors. This change adds slow disk diagnosis to ZED which is opt-in using new VDEV properties: VDEV_PROP_SLOW_IO_N VDEV_PROP_SLOW_IO_T If multiple VDEVs in a pool are undergoing slow I/Os, then it skips the zpool_vdev_degrade(). Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15469	2024-02-08 09:19:52 -08:00
rob-wing	69f024a56e	Configure zed's diagnosis engine with vdev properties Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology LLC Closes #13805	2023-01-23 13:14:25 -08:00
Mariusz Zaborski	16f0fdaddd	Allow to control failfast Linux defaults to setting "failfast" on BIOs, so that the OS will not retry IOs that fail, and instead report the error to ZFS. In some cases, such as errors reported by the HBA driver, not the device itself, we would wish to retry rather than generating vdev errors in ZFS. This new property allows that. This introduces a per vdev option to disable the failfast option. This also introduces a global module parameter to define the failfast mask value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Seagate Technology LLC Submitted-by: Klara, Inc. Closes #14056	2022-11-10 13:37:12 -08:00
Tino Reichardt	1d3ba0bf01	Replace dead opensolaris.org license link The commit replaces all findings of the link: http://www.opensolaris.org/os/licensing with this one: https://opensource.org/licenses/CDDL-1.0 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13619	2022-07-11 14:16:13 -07:00
chrisrd	0175272f64	man: speling Fix spelling. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes #12911	2022-01-06 11:00:01 -08:00
Allan Jude	2a673e76a9	Vdev Properties Feature Add properties, similar to pool properties, to each vdev. This makes use of the existing per-vdev ZAP that was added as part of device evacuation/removal. A large number of read-only properties are exposed, many of the members of struct vdev_t, that provide useful statistics. Adds support for read-only "removing" vdev property. Adds the "allocating" property that defaults to "on" and can be set to "off" to prevent future allocations from that top-level vdev. Supports user-defined vdev properties. Includes support for properties.vdev in SYSFS. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #11711	2021-11-30 07:46:25 -07:00

19 Commits