Commit Graph

293406 Commits

Author SHA1 Message Date
Mark Johnston d06fe346ec libgeom: Avoid fixed remappings of the devstat device
libgeom maintains a quasi-private mapping of /dev/devstat, which might
grow over time if new devices appear.  When the mapping needs to be
expanded, the old mapping is passed as a hint, but this appears to be
unnecessary.

Simplify and improve things a bit:
- stop passing a hint when remapping,
- don't creat a mapping in geom_stats_open(), as geom_stats_resync() will
  create it for us,
- check for errors from munmap().

Reviewed by:	imp, asomers
Tested by:	asomers
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D46294
2024-08-19 16:02:59 +00:00
Igor Ostapenko 22a632c366 pf: Make pf_test6 handle m_len < sizeof(struct ip6_hdr) case
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D46312
2024-08-19 18:02:20 +02:00
Kristof Provost 6a88e22728 pfctl: pfik_ifp is always NULL
The pfik_ifp field is not provided by the kernel, it is always NULL. Do not
check for it. This caused us to not clear the skip flag on interfaces, leading
to unexpected behaviour when a 'set skip' was removed.

PR:		280834
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D46311
2024-08-19 18:02:15 +02:00
Mark Johnston d02dcf21ee pkgbase: Make src package creation recipes more precise
Just remove the plist created by the respective rule.  Otherwise the two
receipes can race with each other.

Fixes:	d7d5c9efef ("pkgbase: Let source packages be built in parallel")
Reviewed by:	bapt, emaste
Reported by:	Mark Millard <marklmi@yahoo.com>
Differential Revision:	https://reviews.freebsd.org/D46320
2024-08-19 15:48:12 +00:00
Mark Johnston 6982be38cb socket: Microoptimize soreceive_stream_locked()
There is no need to hold the sockbuf lock while checking uio_resid.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
2024-08-19 14:52:39 +00:00
Mark Johnston fb901935f2 socket: Split up sosend_generic()
Factor out the bits that run with the sock I/O lock held into a separate
function.  In this implementation, we are doing a bit more work under
the I/O lock than before.  However, lock contention is only a problem
when multiple threads are transmitting on the same socket, which is an
unusual case that is not expected to perform well in any case.

No functional change intended.

Reviewed by:	gallatin, glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D46305
2024-08-19 14:37:27 +00:00
Mark Johnston 0a68f644dc socket: Split up soreceive_generic()
Factor out the bits that run with the sock I/O lock held into a separate
function.  No functional change intended.

Reviewed by:	gallatin, glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D46304
2024-08-19 14:37:27 +00:00
Mark Johnston aa141adc03 socket: Split up soreceive_stream()
Factor out the bits that run with the sock I/O lock held into a separate
function.  No functional change intended.

Reviewed by:	gallatin, glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D46303
2024-08-19 14:37:27 +00:00
Mark Johnston 7e65cfc9bb pf: Make pf_get_translation() more expressive
Currently pf_get_translation() returns a pointer to a matching
nat/rdr/binat rule, or NULL if no rule was matched or an error occurred
while applying the translation.  That is, we don't distinguish between
errors and the lack of a matching rule.  This, if an error (e.g., a
memory allocation failure or a state conflict) occurs, we simply handle
the packet as if no translation rule was present.  This is not
desireable.

Make pf_get_translation() return the matching rule as an out-param and
instead return a reason code which indicates whether there was no
translation rule, or there was a translation rule and we failed to apply
it, or there was a translation rule and we applied it successfully.

Reviewed by:	kp, allanjude
MFC after:	3 months
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum
Differential Revision:	https://reviews.freebsd.org/D45672
2024-08-19 14:37:27 +00:00
Mark Johnston 9897a66923 pf: Let rdr rules modify the src port if doing so would avoid a conflict
If NAT rules cause inbound connections to different external IPs to be
mapped to the same internal IP, and some application uses the same
source port for multiple such connections, rdr translation may result in
conflicts that cause some of the connections to be dropped.

Address this by letting rdr rules detect state conflicts and modulate
the source port to avoid them.

Reviewed by:	kp, allanjude
MFC after:	3 months
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum
Differential Revision:	https://reviews.freebsd.org/D44488
2024-08-19 14:37:27 +00:00
Mark Johnston d7d5c9efef pkgbase: Let source packages be built in parallel
To build the packages target, we build src and src-sys packages
containing the source code from which the repo was built.  These
packages take significantly longer than the others, presumably because
they contain many more files.  Because both source packages are built
to satisfy the same target, they end up being built serially.  Split
them into separate subtargets so that they can run in parallel.  This
saves a couple of minutes on my build machine.

Reviewed by:	manu, emaste
MFC after:	1 month
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D46288
2024-08-19 14:37:27 +00:00
Mark Johnston b118b6eb4c pkgbase: Unify pkg ABI handling for pkgbase targets
Right now, to get the pkg ABI we either use PKG_ABI, derived from
newvers.sh, or use an ABI file from the staged world.  This
inconsistency is confusing and can cause problems.

Switch to a single source of truth: use an ABI file from the worldstage
dir to get the ABI of pkgbase packages.  In particular, we do not need
to know the ABI until staging is done.  More specifically:
- use a shell command to define PKG_ABI,
- replace inline uses of ABI_FILE,
- run sign-packages in a subshell (this was already done for the
  update-packages target) so that the staging targets are done before we
  try to evaluate the ABI.

Reviewed by:	manu
MFC after:	1 month
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D46287
2024-08-19 14:37:27 +00:00
Mark Johnston 1d26746cfd build.7: Document the packages target
Reviewed by:	manu, emaste
MFC after:	1 week
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D46286
2024-08-19 14:37:27 +00:00
Ed Maste d1daec3d35 linux.4: improve the path translation clarificiation
As suggested by martin@lispworks.com, refer to the compat path
explicitly, and correct an existing grammaro.

PR:		277804
Fixes: f66e71fa78 ("linux.4: clarify path translation")
Sponsored by:	The FreeBSD Foundation
2024-08-19 10:29:19 -04:00
Ed Maste f66e71fa78 linux.4: clarify path translation
Try to be a little more explicit about the path translation mechanism
accessing /compat/linux/<path> then falling back to /<path>.

PR:		277804
Reviewed by:	fernape
Sponsored by:	The FreeBSD Foundation
2024-08-19 10:14:28 -04:00
Mark Johnston e962b37bf0 bhyve: Do not enable PCI BAR decoding if a boot ROM is present
Let the boot ROM handle BAR initialization.  This fixes a problem where
u-boot's BAR remapping conflicts with some limitations in bhyve.  See
https://lists.freebsd.org/archives/freebsd-virtualization/2024-April/002103.html
for a description of what goes wrong.

The old behaviour can be restored by setting the pci.enable_bars
configuration variable.

Reviewed by:	corvink, jhb
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D45049
2024-08-19 13:56:06 +00:00
Mark Johnston 43caa2e805 bhyve: Make boot ROM handling more consistent
- On amd64, deprecate lpc.bootrom and lpc.bootvars.  Use top-level
  config variables instead.
- Introduce a generic predicate which can be used to determine whether
  the guest has a boot ROM.

Reviewed by:	corvink, jhb
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D46282
2024-08-19 13:55:47 +00:00
Andrew Turner 7a345763f9 arm64: Expand the use of Armv8.1-A atomics
When targeting Armv8.1 we can assume FEAT_LSE is available and can use
the atomic instructions this provides without needing to check for
support first.

Reviewed by:	imp, markj, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46159
2024-08-19 10:53:12 +00:00
Andrew Turner 87940d2b33 buf_ring: Add an Arm copyright
I've change enough of this file to add Arm as a copyright holder.
Add it after the "All rights reserved" line as that's not needed.

Reviewed by:	imp
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46157
2024-08-19 10:53:12 +00:00
Andrew Turner fe2445f47d buf_ring: Ensure correct ordering of loads
When enqueueing on an architecture with a weak memory model ensure
loading br->br_prod_head and br->br_cons_tail are ordered correctly.

If br_cons_tail is loaded first then other threads may perform a
dequeue and enqueue before br_prod_head is loaded. This will mean the
tail is one less than it should be and the code under the
prod_next == cons_tail check could incorrectly be skipped.

buf_ring_dequeue_mc has the same issue with br->br_prod_tail and
br->br_cons_head so needs the same fix.

Reported by:	Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Reviewed by:	imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46155
2024-08-19 10:53:11 +00:00
Andrew Turner 947754af55 buf_ring: Use atomic operations with br_prod_tail
As with br_cons_tail use an atomic load acquire to read br_prod_tail
in buf_ring_dequeue_mc and buf_ring_peek*.

On dequeue we need to ensure we don't read the entry from the buf_ring
until it is available and prod_tail has updated. There is already an
appropriate store in the enqueue path and an appropriate load in the
single consumer dequeue, we just need one in the other functions that
read from the buf_ring.

Reviewed by:	imp, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46154
2024-08-19 10:53:11 +00:00
Andrew Turner 7eb0fffc77 buf_ring: Remove old arm-only dequeue code
In the single consumer dequeue the consumer thread controls
br_cons_head. As such no ordering between this and other data are
required.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46153
2024-08-19 10:53:11 +00:00
Andrew Turner 44e1cfca41 buf_ring: Use atomic operations with br_cons_tail
Use an atomic operation with a memory barrier loading br_cons_tail
from the producer thread and storing to it in the consumer thread.

On dequeue we need to read the pointer value from the buf_ring before
moving the consumer tail as that indicates the entry is available to be
used. The store release atomic operation guarantees this.

In the enqueueing thread we then need to use a load acquire atomic
operation to ensure writing to this entry can only happen after the
tail has been read and checked.

Reported by:	Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Reviewed by:	markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46152
2024-08-19 10:53:11 +00:00
Andrew Turner 3cc603909e buf_ring: Keep the full head and tail values
If a thread reads the head but then sleeps for long enough that
another thread fills the ring and leaves the new head with the
expected value then the cmpset can pass when it should have failed.

To work around this keep the full head and tail value and use the
upper bits as a generation count.

Reviewed by:	kib
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46151
2024-08-19 10:53:11 +00:00
Andrew Turner 17a597bc13 buf_ring: Consistently use atomic_*_32
We are operating on uint32_t values, use uint32_t atomic functions.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46150
2024-08-19 10:04:25 +01:00
Andrew Turner d3d34d56be buf_ring: Support DEBUG_BUFRING in userspace
The only part of DEBUG_BUFRING we don't support in userspace is the
mutex checks. Add _KERNEL checks around these so we can enable the
extra debugging.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46149
2024-08-19 10:04:25 +01:00
Andrew Turner 5048308bdb buf_ring: Remove PREFETCH_DEFINED
I'm not able to find anything in the tree that ever defined it. Remove
as it's unused so is untested.

Reviewed by:	alc, imp, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D46148
2024-08-19 10:04:24 +01:00
Dag-Erling Smørgrav 9ff2ebd928 adduser: Better document ZFS dataset creation.
MFC after:	3 days
PR:		280873
Reviewed by:	bcr
Differential Revision:	https://reviews.freebsd.org/D46316
2024-08-19 10:30:11 +02:00
Andre Albsmeier 308399a179 tail -F: fix crash
When show() detects an error and closes the file and follow() wants to
close it again, a NULL dereference occurs.

PR:	280910
MFC after:	1 week
2024-08-19 10:54:24 +03:00
Eugene Grosbein 8132e95909 libalias: fix subtle racy problem in outside-inside forwarding
sys/netinet/libalias/alias_db.c has internal static function UseLink()
that passes a link to CleanupLink() to verify if the link has expired.
If so, UseLink() may return NULL.

_FindLinkIn()'s usage of UseLink() is not quite correct.

Assume there is "redirect_port udp" configured to forward incoming
traffic for specific port to some internal address.
Such a rule creates partially specified permanent link.

After first such packet libalias creates new fully specifiled
temporary LINK_UDP with default timeout 60 seconds.
Also, in case of low traffic libalias may assign "timestamp"
for this new temporary link way in the past because
LibAliasTime is updated seldom and can keep old value
for tens of seconds, and it will be used for the temporary link.

It may happen that next incoming packet for redirected port
passed to _FindLinkIn() results in a call to UseLink()
that returns NULL due to detected expiration.
Immediate return of NULL results in broken translation:
either a packet is dropped (deny_incoming mode) or delivered to
original destination address instead of internal one.

Fix it with additional check for NULL to proceed with a search
for original partially specified link. In case of UDP,
it also recreates temporary fully specified link
with a call to ReLink().

Practical examples are "redirect_port udp" rules for unidirectional
SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.

Thanks to Peter Much for initial analysis and first version of a patch.

Reported by:	Peter Much <pmc@citylink.dinoex.sub.org>
PR:		269770
MFC after:	1 week
2024-08-19 10:34:37 +07:00
Navdeep Parhar 0a9d1da6e6 cxgbe(4): Stop work request queues in a reliable manner.
Clear the EQ_HW_ALLOCATED flag with the wrq lock held and discard all
work requests, pending or new, when it's not set.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2024-08-17 11:23:32 -07:00
Navdeep Parhar b5332809c6 cxgbe/iw_cxgbe: Fix typo in assertion.
eanbled -> enabled

MFC after:	3 days
2024-08-17 10:38:36 -07:00
Peter Holm c7bc30c24f stress2: Some tests use hw.ncpu to scale the load. Tests on a box with
a large number of CPUs show that this number needs to be capped
2024-08-17 08:37:34 +02:00
Rick Macklem 10d5b43424 nfsproto.h: Define the new mode_umask attribute
RFC8275 defines a new attribute as an extension to NFSv4.2
called MODE_UMASK.  This patch adds the attribute number
to nfsproto.h.

Future patches will add optional support for the attribute.
This patch does not cause any semantics change.

MFC after:	2 weeks
2024-08-16 17:40:52 -07:00
Simon J. Gerraty 35399f68c8 safe_dot check file is a file
Since we are being paranoid, check that each arg to safe_dot is
actually a file as well as non-empty.

Check for white-space in filenames - these require special handling.
2024-08-16 13:15:20 -07:00
Cy Schubert 5685098846 unbound: Vendor import 1.21.0
Release notes at
	https://nlnetlabs.nl/news/2024/Aug/15/unbound-1.21.0-released/

MFC after:	1 week

Merge commit '96ef46e5cff01648c80c09c4364d10bc6f58119d'
2024-08-16 10:03:34 -07:00
Cy Schubert 96ef46e5cf unbound: Vendor import 1.21.0
Release notes at
	https://nlnetlabs.nl/news/2024/Aug/15/unbound-1.21.0-released/
2024-08-16 09:41:16 -07:00
Kajetan Staszkiewicz 788f194f60 pf: 'sticky-address' requires 'keep state'
When route_to() processes a packet without state, pf_map_addr() is called for
each packet. Pf_map_addr() will search for a source node and will find none
since those are created only in pf_create_state(). Thus sticky address,
even though requested in rule definition, will never work.

Raise an error when a stateless filter rule uses sticky address to avoid
confusion and to keep ruleset limitations in sync with what the pf code
really does.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D46310
2024-08-16 11:43:00 +02:00
Peter Holm 41e03b46da stress2: Fix warning about unused variable. Remove debug "date" 2024-08-16 09:19:51 +02:00
Warner Losh 3d89acf590 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 21:31:20 -06:00
Warner Losh ce7fac64ba Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert
it and do it again.

This reverts commit d5507f9e43.

Sponsored by:		Netflix
2024-08-15 21:29:53 -06:00
Warner Losh a233cb6914 nvmecontrol: Accept -a {1,2,3,4} for sanitize command for nvme-cli compat
Linux's `nvme sanititze -a` takes a number, not a string. Accept 1-4 for
compatibility so vendor's recepies are easier to implmement.

Sponsored by: Netflix
2024-08-15 20:22:31 -06:00
Warner Losh d5507f9e43 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 20:22:18 -06:00
Kevin Lo 8b21c469db ng_ubt: Add blacklist entries for MediaTek MT7925
This controller requires firmware patch downloading to operate,
block ng_ubt attachment unless operational firmware is loaded.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D46302
2024-08-16 10:03:19 +08:00
Simon J. Gerraty 82cb2a4158 Update safe_eval.sh to support --export
This update allows

safe_dot --export file ...

to export any variables that get set.

Reviewed by: obrien
2024-08-15 15:42:39 -07:00
Jessica Clarke 3cded05922 tmpfs: Fix OOB write when setting vfs.tmpfs.memory_percent
tmpfs_mem_percent is an int not a long, so on a 64-bit system this
writes 4 bytes past the end of the variable. The read above is correct,
so this was likely a copy paste error from sysctl_mem_reserved.

Found by:	CHERI
Fixes:		636592343c ("tmpfs: increase memory reserve to a percent of available memory + swap")
2024-08-15 20:33:22 +01:00
Pierre Pronchery ef9fc9609a sys: Mark ACL conversion routines as __result_use_check
Both acl_copy_oldacl_into_acl() and acl_copy_acl_into_oldacl() may fail
in some circumstances (e.g., acl.acl_cnt exceeding the capacity of
OLDACL_MAX_ENTRIES).  This change marks both routines with
__result_use_check, enforcing check for errors by the caller.

Suggested by:	markj
Reviewed by:	markj, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46254
2024-08-15 15:04:29 -04:00
Mark Johnston bef079254f arm64: Clamp segment sizes properly in bounce_bus_dmamap_load_buffer()
Commit 099b595154 ("Improve loading of multipage aligned buffers.")
modified bounce_bus_dmamap_load_buffer() with the assumption that busdma
memory allocations are physically contiguous, which is not always true:
bounce_bus_dmamem_alloc() will allocate memory with
kmem_alloc_attr_domainset() in some cases, and this function is not
guaranteed to return contiguous memory.

The damage seems to have been mitigated for most consumers by clamping
the segment size to maxsegsz, but this was removed in commit
a77e1f0f81 ("busdma: better handling of small segment bouncing"); in
practice, it seems busdma memory is often allocated with maxsegsz ==
PAGE_SIZE.  In particular, after commit a77e1f0f81 I see occasional
random kernel memory corruption when benchmarking TCP through mlx5
interfaces.

Fix the problem by using separate flags for contiguous and
non-contiguous busdma memory allocations, and using that to decide
whether to clamp.

Fixes:	099b595154 ("Improve loading of multipage aligned buffers.")
Fixes:	a77e1f0f81 ("busdma: better handling of small segment bouncing")
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D46238
2024-08-15 14:19:22 +00:00
Cheng Cui 8cc528c682 tcp cc: clean up some un-used cc_var flags
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D46299
2024-08-15 09:33:04 -04:00
Martin Matuska 29dc934914 zfs: merge openzfs/zfs@d2ccc2155
Notable upstream pull request merges:
 #16431 244ea5c48 Add missing kstats to dataset kstats

Obtained from:	OpenZFS
OpenZFS commit: d2ccc21552
2024-08-15 13:30:31 +02:00