KekenoBSD/src

Author	SHA1	Message	Date
Kyle Evans	ee9895e10d	kern: send parent a SIGCHLD when the debugger has detached The practical scenario that leads to this is porch(1) spawning some utility and sending it a SIGSTOP as a debugging aide. The user then attaches a debugger and walks through how some specific input is processed, then detaches to allow the script to continue. When ptrace is detached, the process resumes execution but the parent is never notified and may be stuck in wait(2) for it to continue or terminate. Other platforms seem to re-suspend the process after the debugger is detached, but neither behavior seems unreasonable. Just notifying the parent that the child has resumed is a relatively low-risk departure from our current behavior and had apparently been considered in the past, based on pre-existing comments. Move p_flag and p_xsig handling into childproc_continued(), as just sending the SIGCHLD here isn't really useful without P_CONTINUED set and the other caller already sets these up as well. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D50917	2025-06-19 10:32:04 -05:00
Ed Maste	5110a74afe	sys: Correct osreldate descriptions The kern.osreldate sysctl reports the kernel version, not a release date. Also correct a comment about /usr/include/osreldate.h. Reviewed by: kp, olce Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D50938	2025-06-19 10:52:52 -04:00
Maxim Konovalov	b78b7fa01f	nuageinit.7: language and grammar improvements Reviewed by: bapt	2025-06-19 13:14:33 +00:00
Kevin Lo	19d0dd8718	mtw: fix display of the MAC revision Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D50542	2025-06-19 13:42:39 +08:00
Bjoern A. Zeeb	f51c794cbc	net80211: in ieee80211_sta_join() only do_ht if HT is avail In ieee80211_sta_join() there are currently two ways to set "do_ht": (1) after checking HT IEs are avail, and (2) after checking VHT IEs are avail and we are not on 2GHz. In the latter case no one checks that HT IEs are available and when we hit ieee80211_ht_updateparams_final() htinfo may be NULL and we panic. Avoid this by only checking for VHT if do_ht was set. No VHT without HT IEs. While here switch do_ht to be a bool. Sponsored by: The FreeBSD Foundation MFC after: 3 days PR: 287625 Fixes: `51172f62a7` Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D50923	2025-06-19 01:23:12 +00:00
Mark Johnston	4c6c1dd8f7	vm_page: Fix nofree page accounting In commit `ae10431c98` ("vm_page: Allow PG_NOFREE pages to be freed"), I changed the v_nofree_count counter to instead count the size of the nofree queue, on the basis that with the ability to free nofree pages, the size of the queue is unbounded. The use of a counter(9) for this purpose is not really correct, as early initialization of per-CPU counters interferes with precise accounting that we want here. Instead, add a global tracker for this purpose, expose it elsewhere in the sysctl tree, and restore v_free_nofree's original use as a counter of allocated nofree pages. Reviewed by: bnovkov, alc, kib Reported by: alc Fixes: `ae10431c98` ("vm_page: Allow PG_NOFREE pages to be freed") Differential Revision: https://reviews.freebsd.org/D50877	2025-06-18 23:48:07 +00:00
Bjoern A. Zeeb	f1f71cc717	fwget: pci_intel_video: do no log on no match We should never "log" a statement on no match for a given device we do not know about. We do not control the PCI ID assignments and thus cannot predict if we would even support such a device. This also triggers an invalid output in the installer. Leave it as log_verbose for now. Sponsored by: The FreeBSD Foundation MFC after: 3 days PR: 287639 Reviewed by: manu, emaste Differential Revision: https://reviews.freebsd.org/D50916	2025-06-18 23:31:13 +00:00
Alan Cox	deddede58e	arm64 pmap: use the counter(9) KPI for L2 superpages Use the counter(9) KPI instead of atomics to maintain the L2 superpage mapping counts. (A similar change was made to the amd64 pmap in 2021.) While here, update the SYSCTL descriptions to reflect the possibility that the base page size is 16KB.	2025-06-18 18:05:58 -05:00
Mateusz Piotrowski	c29459f901	tracing.7: Add a single reference point for tracing facilities in FreeBSD FreeBSD has a fair number of tracing facilities. The new tracing(7) manual page aims to provide a starting point for users to learn about what is available. Reviewed by: christos, bnovkov, markj, ziaee Approved by: christos (mentor), bnovkov (mentor), markj (mentor) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D50854	2025-06-19 00:15:26 +02:00
Sergey A. Osokin	22c7815118	exec(3): add missing execvpe(3) to MLINKS Reviewed by: glebius	2025-06-18 17:40:22 -04:00
Warner Losh	c329931c02	pass: Make the name of the driver a #define "pass" is in several places, but should be a #define. Make it one. This also lets folks with particular needs that copy this driver to reduce diffs. Sponsored by: Netflix	2025-06-18 14:30:34 -06:00
Kyle Evans	eca5637760	stand: userboot: allow building on !x86 We can still get plenty of use out of a userboot that doesn't know anything about how to load or boot a kernel; notably, the test harness in tools/boot can still be used to test lua changes. Hack out the necessary bits to simply build on other platforms, and add a small warning with ample time to view the warning on other platforms. We still won't build userboot by default on these platforms, since the build product isn't useful for most people. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D41529	2025-06-18 13:42:29 -05:00
Konstantin Belousov	0452f5f7b3	audit: move the wait from the queue length from the commit to alloc AUDIT_SYSCALL_EXIT() and indirectly audit_commit() is intended to be called from arbitrary top-level context. This means that any sleepable locks can be owned by the caller, and which makes the sleeping in audit_commit() forbidden. Since we need to sleep for the record in audit_alloc() anyway, move the sleep for the queue limit there. At worst, if the audit is suspended is disabled when we actually reach the commit location, this means that we lost time uselessly. PR: 287566 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D50879	2025-06-18 20:57:49 +03:00
Mateusz Piotrowski	fa9ac741d0	truss.1: Reference sysdecode(3) MFC after: 1 week	2025-06-18 19:40:27 +02:00
Olivier Certner	013c58ced6	sched_ule: 32-bit platforms: Fix runq_print() after runq changes The compiler would report a mismatch between the format and the actual type of the runqueue status word because the latter is now unconditionally defined as an 'unsigned long' (which has the "natural" platform size) and the format expects a 'size_t', which expands to an 'unsigned int' on 32-bit platforms (although they are both of the same actual size). This worked before as the C type used depended on the architecture and was set to 'uint32_t' aka 'unsigned int' on these 32-bit platforms. Just fix the format (use 'l'). While here, remove outputting '0x' by hand, instead relying on '#' (only difference is for 0, and is fine). runq_print() should be moved out of 'sched_ule.c' in a subsequent commit. Reported by: Jenkins Fixes: 79d8a99ee583 ("runq: Deduce most parameters, remove machine headers") MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation	2025-06-18 12:00:13 -04:00
Olivier Certner	63c9b01806	arm64: lib32: Don't try to install removed <machine/runq.h> Reported by: Herbert J. Skuhra (herbert gojira.at) Fixes: 79d8a99ee583 ("runq: Deduce most parameters, remove machine headers") MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation	2025-06-18 12:00:08 -04:00
Kyle Evans	abdbd85d1b	lualoader: adapt builtin brand/logo definitions as well While these should be moved to the new format, it wasn't my intention to force them over immediately. Downstreams may embed their own brands in drawer.lua, and we shouldn't break them for something like this. Move adapt_fb_shim() up and use it for preloaded definitions to avoid forcing the matter for now. Perhaps in the future we'll start writing out warnings for those that do need adapted. Reported by: 0x1eef on IRC	2025-06-18 10:21:37 -05:00
Mark Johnston	9d0d55e398	ufshci: Remove an unneeded variable definition Reported by: gcc Fixes: `1349a733cf` ("ufshci: Introduce the ufshci(4) driver")	2025-06-18 13:13:08 +00:00
Randall Stewart	359f590b29	Fix a warning in the rack stack. There is an initialization warning where error may not be set when logging extended BBlogs. Lets fix this so error is init'd to zero so we won't have a warning.	2025-06-18 08:14:51 -04:00
Robert Wing	690f642fab	growfs(8): use gpart(8) instead of bsdlabel(8) in test bsdlabel(8) is deprecated Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D50865	2025-06-17 23:21:20 -08:00
Gleb Smirnoff	46023d54c7	tcp: fixup wording in comment Submitted by: Steffen Nurpmeso <steffen sdaoden.eu> Fixes: `b59753f1d5`	2025-06-17 20:47:31 -07:00
Olivier Certner	1d8f8f3e36	ps(1), top(1): Priority: Let 0 be the first timesharing level Change the origin from PZERO to PUSER. Doing so allows users to immediately detect if some thread is running under a high priority (kernel or realtime) or under a low one (timesharing or idle). MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation	2025-06-17 22:09:39 -04:00
Olivier Certner	eebc148f25	sched_4bsd: ESTCPULIM(): Allow any value in the timeshare range The current formula wastes queues and degrades usage estimation precision, since any increase of ticks that goes over 40 priorities (so, 8 * 40) is clamped to the last of these 40 levels (the nice value is subsequently added to that number to get the final priority level). Allow 'ts_estcpu' to grow up to a value corresponding to the greatest (i.e., lowest) priority of the timeshare range. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45392	2025-06-17 22:09:39 -04:00
Olivier Certner	51a4ae05ab	sched_4bsd: Remove RQ_PPQ from ESTCPULIM()'s formula Substracting RQ_PPQ to the maximum number of allowed priority values (the factor to INVERSE_ESTCPU_WEIGHT) has the effect of pessimizing the number of processes assigned to the last priority bucket. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45392	2025-06-17 22:09:38 -04:00
Olivier Certner	a454ff6b04	sched_4bsd: Move ESTCPULIM() after its macro dependencies No functional change (intended). Also makes the comment about INVERSE_ESTCPU_WEIGHT() adjacent to its definition. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45392	2025-06-17 22:09:38 -04:00
Olivier Certner	a33225efb4	sched_ule: Sanitize CPU's use and priority computations, and ticks storage Computation of %CPU in sched_pctcpu() was overly complicated, wrong in the case of a non-maximal window (10 seconds span; this is always the case in practice as the window would oscillate between 10 and 11 seconds for continuously running processes) and performed unshifted for the first part, essentially losing precision (up to 9% for SCHED_TICK_SECS being 10), and with some uneffective shift for the second part. Conserve maximum precision by only shifting by the require amount to attain FSHIFT before dividing. Apply classical rounding to nearest instead of rounding down. To generally avoid wraparound problems with tick fields in 'struct td_sched' (as already happened once in sched_pctcpu_update()), make then all unsigned, and ensure 'ticks' is always converted to some 'u_int'. While here, fix SCHED_AFFINITY(). Rewrite sched_pctcpu_update() while keeping the existing formulas: - Fix the hole in the cliff case that in theory 'ts_ticks' can become greater than the window size if a running thread has not been accounted for too long (today cannot happen because of sched_clock()). - Make the decay ratio explicit and configurable (SCHED_CPU_DECAY_NUMER, SCHED_CPU_DECAY_DENOM). Set it to the current value (10/11), currently producing a 95% attenuation after about ~32s. This eases experimenting with changing it. Apply the ratio on shifted ticks for better precision, independently of the chosen value for SCHED_TICK_MAX/SCHED_TICK_SECS. - Remove redundant SCHED_TICK_TARG. Compute SCHED_TICK_MAX from SCHED_TICK_SECS, the latter now really specifying the maximum size of the %CPU estimation window. - Ensure it is immune to varying 'hz' (which today can't happen), so that after computation SCHED_TICK_RUN(ts) is mathematically guaranteed lower than SCHED_TICK_LENGTH(ts). - Thoroughly explain the current formula, and mention its main drawback (it is completely dependent on the frequency of calls to sched_pctcpu_update(), which currently manifests itself for sleeping threads). Rework sched_priority(): - Ensure 'p_nice' is read only once, to be immune to a concurrent change. - Clearly show that the computed priority is the sum of 3 components. Make them all positive by shifting the starting priority and shifting the nice value in SCHED_PRI_NICE(). - Compute the priority offset deriving from the %CPU with rounding to nearest. - Much more informative KASSERT() output with details regarding the priority computation. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46567	2025-06-17 22:09:38 -04:00
Olivier Certner	6792f3411f	sched_ule: Recover previous nice and anti-starvation behaviors Justification for this change is to avoid disturbing ULE's behavior too much at this time. We however acknowledge that the effect of "nice" values is extremely weak and will most probably change it going forward. Tuning allows to mostly recover ULE's behavior prior to the switch to a single 256-queue runqueue and the increase of the timesharing priority levels' range. After this change, in a series of test involving two long-running processes with varying nice values competing for the same CPU, we observe that used CPU time ratios of the highest priority process to change by at most 1.15% and on average by 0.46% (absolute differences). In relative differences, they change by at most 2% and on average by 0.78%. In order to preserve these ratios, as the number of priority levels alloted to timesharing have been raised from 136 to 168 (and the subsets of them dedicated to either interactive or batch threads scaled accordingly), we keep the ratio of levels reserved to handle nice values to those reserved for CPU usage by applying a factor of 5/4 (which is close to 168/136). Time-based advance of the timesharing circular queue's head is ULE's main fairness and anti-starvation mechanism. The higher number of queues subject to the timesharing scheduling policy is now compensated by allowing a greater increment of the head offset per tick. Because there are now 109 queue levels dedicated to the timesharing scheduling policy (in contrast with the 168 levels alloted to timesharing levels, which include the former but also those dedicated to threads considered interactive) whereas there previously were 64 ones (priorities spread into a single, separate runqueue), we advance the circular queue's head 7/4 faster (a ratio close to 109/64). While here, take into account 'cnt' as the number of ticks when advancing the circular queue's head. This fix depends on the other code changes enabling incrementation by more than one. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46566	2025-06-17 22:09:37 -04:00
Olivier Certner	dee257c28d	sched: Internal priority ranges: Reduce kernel, increase timeshare Now that a difference of 1 in priority level is significant, we can shrink the priority range reserved for kernel threads. Only four distinct levels are necessary for the bottom half (3 base levels and arguably an additional one for demoted interrupt threads that run for full time slices so that they finally don't compete with other ones). To leave room for other possible uses, we settle on 8 levels. Given the symbolic constants for the top half, 10 levels are currently necessary. We settle on 16 levels. This allows to enlarge the timesharing range, which covers ULE's both interactive and batch range, to 168 distinct levels from less than 64 ones for ULE (as of before the changes to make it use a single runqueue and have 256 distinct levels per runqueue) and 34 ones for 4BSD. While here, note that the realtime range is required to have at least 32 priority levels since: - POSIX mandates at least 32 distinct levels for the SCHED_RR/SCHED_FIFO scheduling policies. - We directly map contiguous priority levels ('sched_priority') of these scheduling policies to distinct, contiguous internal priority levels. Conversely, having at least 32 priority levels is enough to guarantee compliance to the POSIX requirement mentioned above because different internal priority levels are treated differently since commit "runq: Switch to 256 levels". While here, list explicit change restrictions for the realtime and idle range. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45391	2025-06-17 22:09:37 -04:00
Olivier Certner	d710acecc0	runq: Add copyright MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation	2025-06-17 22:09:37 -04:00
Olivier Certner	055b5b5f85	runq: Restrict <sys/runq.h> to kernel only MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:09:36 -04:00
Olivier Certner	a2d1c3bc2b	epoch_test: Assign different priorities using offset 1 Replace the hardcoded 4 (old RQ_PPQ) by 1 (new RQ_PPQ), as all priority levels are now treated differently. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation	2025-06-17 22:09:36 -04:00
Olivier Certner	b2a9ee2a72	runq: Remove userland references to RQ_PPQ in rtprio contexts Concerns only a single test (ptrace_test.c). MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:09:36 -04:00
Olivier Certner	e3a4b989d7	runq: Bump __FreeBSD_version after switching to 256 levels Corresponding to changing RQ_PPQ to 1. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:09:29 -04:00
Olivier Certner	af8de65ef2	runq: Switch to 256 levels This increases the number of levels from 64 to 256, which coincides with the distinct internal priority values (priority is currently encoded in a 'u_char', whose range is entirely used). With this change, we become POSIX-compliant for SCHED_FIFO/SCHED_RR in that we really provide 32 distinct priority levels for these policies. Previously, threads in the same "priority group", with priority groups defined as the threads in consecutive spans of 4 priority levels starting with level 0 up to 31 (so there are 8 groups), could not preempt or be preempted by each other even if they were assigned different priority levels. See also commit "sched_ule: Use a single runqueue per CPU" for all the drawbacks that this change also removes. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:08:03 -04:00
Olivier Certner	fd141584cf	zfs: spa: ZIO_TASKQ_ISSUE: Use symbolic priority This allows to change the meaning of priority differences in FreeBSD without requiring code changes in ZFS. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:08:02 -04:00
Olivier Certner	8ecc419180	Internal scheduling priorities: Always use symbolic ones Replace priorities specified by a base priority and some hardcoded offset value by symbolic constants. Hardcoded offsets prevent changing the difference between priorities without changing their relative ordering, and is generally a dangerous practice since the resulting priority may inadvertently belong to a different selection policy's range. Since RQ_PPQ is 4, differences of less than 4 are insignificant, so just remove them. These small differences have not been changed for years, so it is likely they have no real meaning (besides having no practical effect). One can still consult the changes history to recover them if ever needed. No functional change (intended). MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45390	2025-06-17 22:08:02 -04:00
Olivier Certner	baecdea10e	sched_ule: Use a single runqueue per CPU Previously, ULE would use 3 separate runqueues per CPU to store threads, one for each of its selection policies, which are realtime, timesharing and idle. They would be examined in this order, and the first thread found would be the one selected. This choice indeed appears as the easiest evolution from the single runqueue used by sched_4bsd (4BSD): It allows sharing most of the same runqueue code, which currently defines 64 levels per runqueue, while multiplying the number of levels (by 3). However, it has several important drawbacks: 1. The number of levels is the same for each selection policy. 64 is unnecessarily large for the idle policy (only 32 distinct levels would be necessary, given the 32 levels of our RTP_PRIO_IDLE and their future aliases in the to-be-introduced SCHED_IDLE POSIX scheduling policy) and unnecessary restrictive both for the realtime policy (which should include 32 distinct levels for PRI_REALTIME, given our implementation of SCHED_RR/SCHED_FIFO, leaving at most 32 levels for ULE's interactive processes where the current implementation provisions 48 (perhaps taking into account the spreading problem, see next point)) and the timesharing one (88 distinct levels currently provisioned). 2. A runqueue has only 64 distinct levels, and maps priorities in the range [0;255] to a queue index by just performing a division by 4. Priorities mapped to the same level are treated exactly the same from a scheduling perspective, which is generally both unexpected and incorrect. ULE's code tries to compensate for this aliasing in the timesharing selection policy, by spreading the 88 levels into 256, knowing the latter amount in the end to only 64 distinct ones. This scaling is unfortunately not performed for the other policies, breaking the expectations mentioned in the previous point about distinct priority levels. With this change, only a single runqueue is now used to store all threads, regardless of the scheduling policy ULE applies to them (going back to what 4BSD has always been doing). ULE's 3 selection policies are assigned non-overlapping ranges of levels, and helper functions have been created to select or steal a thread in these distinct ranges, preserving the "circular" queue mechanism for the timesharing selection policy that (tries to) prevent starvation in the face of permanent dynamic priority adjustments. This change allows to choose any arbitrary repartition of runqueue levels between selection policies. It is a prerequisite to the increase to 256 levels per runqueue, which will allow to dispense with all the drawbacks listed above. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45389	2025-06-17 22:08:01 -04:00
Olivier Certner	fdf31d2747	sched_ule: runq_steal_from(): Suppress first thread special case This special case was introduced as soon as commit "ULE 3.0" (`ae7a6b38d5`, r171482, from July 2007). It caused runq_steal_from() to ignore the highest-priority thread while stealing. Its functionality was changed in commit "Rework CPU load balancing in SCHED_ULE" (`36acfc6507`, r232207, from February 2012), where the intent was to keep track of that first thread and return it if no other one was stealable, instead of returning NULL (no steal). Some bug prevented it from working in loaded cases (more than one thread, and all threads but the first one not stealable), which was subsequently fixed in commit "sched_ule(4): Fix interactive threads stealing." (`bd84094a51`, from September 2021). All the reasons for this mechanism we could second-guess were dubious at best. Jeff Roberson, ULE's main author, says in the differential revision that "The point was to move threads that are least likely to benefit from affinity because they are unlikely to run soon enough to take advantage of it.", to which we responded: "(snip) This may improve affinity in some cases, but at the same time we don't really know when the next thread on the queue is to run. Not stealing in this case also amounts to slightly violating the expected execution ordering and fairness.". As this twist doesn't seem to bring any performance improvement in general, let's just remove it. MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45388	2025-06-17 22:08:01 -04:00
Olivier Certner	f4be333bc5	sched_ule: Re-implement stealing on top of runq common-code Stop using internal knowledge of runqueues. Remove duplicate boilerplate parts. Concretely, runq_steal() and runq_steal_from() are now implemented on top of runq_findq(). Besides considerably simplifying the code, this change also brings an algorithmic improvement since, previously, set bits in the runqueue's status words were found by testing each bit individually in a loop instead of using ffsl()/bsfl() (except for the first set bit per status word). This change also makes it more apparent that runq_steal_from() treats the first thread with highest priority specifically (which runq_steal() does not). MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45388	2025-06-17 22:08:01 -04:00
Olivier Certner	9c3f4682bb	runq: New runq_findq(), common low-level search implementation That new runq_findq(), based on the implementation of the former runq_findq_range(), is intended to become the foundation and unique low-level implementation for all searches in a runqueue. In addition to a range of queues' indices, it takes a predicate function, allowing to: - Possibly skip a non-empty queue with higher priority (numerically lower index) on some criteria. This is not yet used but will be in a subsequent commit revising ULE's stealing machinery. - Choose a specific thread in the queue, not necessarily the first. - Return whatever information is deemed necessary. It helps to remove duplicated boilerplate code, including redundant assertions, and generally makes things much clearer. These effects will be even greater in a subsequent commit modifying ULE to use it. runq_first_thread_range() replaces the old runq_findq_range() (returns the first thread of the highest priority queue in the requested range), and runq_first_thread() the old runq_findq() (same, but considering all queues). Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:08:00 -04:00
Olivier Certner	a31193172c	runq: New function runq_is_queue_empty(); Use it in ULE Indicates if some particular queue of the runqueue is empty. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:08:00 -04:00
Olivier Certner	757bab06fb	runq: Tidy up and rename runq_setbit() and runq_clrbit() Factorize common sub-expressions in a separate helper (runq_sw_apply()) for better readability. Rename these functions so that the names refer to the use cases rather than the implementations. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:08:00 -04:00
Olivier Certner	de78657a3a	runq: runq_check(): Re-implement on top of runq_findq() Remove one more loop and duplicated code, with the benefit of less instruction cache pollution at the expense of a few cycles more for the function calls and computing 'idx' (however, this gives a better diagnostic message). Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:59 -04:00
Olivier Certner	439dc920f2	runq: Revamp runq_find*(), new runq_find_range() Rename existing functions to use the simpler prefix 'runq_findq' instead of 'runq_findbit' (that they work on top of bit runs is an implementation detail). Add runq_findq_range(), which takes a range of indices to operate on (bounds included). This is in preparation for changing ULE to use a single runqueue, since it needs to treat the timesharing range differently. Rename runq_findbit_from() to runq_findq_circular(), which is more descriptive. To reduce code duplication, have runq_findq() and runq_findq_circular() leverage runq_findq_range() internally. For the latter, this also brings a small algorithmic improvement, since previously the second pass (from queue 0) would cover the whole runqueue if it was completely empty, scanning again empty queues after the start index. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:59 -04:00
Olivier Certner	200fc93dac	runq: Re-order functions more logically No code change in moved functions. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:59 -04:00
Olivier Certner	7e2502e3de	runq: More macros; Better and more consistent naming Most existing macros have ambiguous names regarding which index they operate on (queue, word, bit?), so have been renamed to improve clarity. Use the 'RQSW_' prefix for all macros related to status words, and change the status word type name accordingly. Rename RQB_FFS() to RQSW_BSF() to remove confusion about the return value (ffs*() return bit indices starting at 1, or 0 if the input is 0, whereas BSF on x86 returns 0-based indices, which is what the current code assumes). While here, add a check (under INVARIANTS) that RQSW_BSF() isn't called with 0 as an argument. Also, rename 'rqb_bits_t' to the more concise 'rqsw_t', 'struct rqbits' to 'struct rq_status', its 'rqb_bits' field to 'rq_sw' (it designates an array of words, not bits), and the type 'rqhead' to 'rq_queue' Add macros computing a queue index from a status word index and a bit in order to factorize code. If the precise index of the bit is known, callers can use RQSW_TO_QUEUE_IDX() to get the corresponding queue index, whereas if they want the one corresponding to the first (least-significant): set bit in a given status word (corresponding to the non-empty queue with lower index in the status word), they can use RQSW_FIRST_QUEUE_IDX() instead. Add RQSW_BIT_IDX(), which computes the correspond bit's index in the corresponding status word. This allows more code factorization (even if most uses will be eliminated in a later commit) and makes what is computed clearer. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:58 -04:00
Olivier Certner	57540a0666	runq: Clarity and style pass In runq_choose() and runq_choose_fuzz(), replace an unnecessary 'while' with an 'if', and separate assignment and test of 'idx' into two lines. Add missing parentheses to one 'sizeof' operator. Remove superfluous brackets for one-line "then" and "else" branches (to match style elsewhere in the file). Declare loop indices in their 'for'. Test for non-empty bit sets with an explicit '!= 0'. Move TABs in some prototypes of <sys/runq.h> (should not split the return type specifier, but instead separate the type specifier with the function declarator). No functional change intended. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:58 -04:00
Olivier Certner	a11926f2a5	runq: API tidy up: 'pri' => 'idx', 'idx' as int, remove runq_remove_idx() Make sure that external and internal users are aware that the runqueue API always expects queue indices, and not priority levels. Name arithmetic arguments in 'runq.h' for better immediate reference. Use plain integers to pass indices instead of 'u_char' (using the latter probably doesn't bring any gain, and an 'int' makes the API agnostic to a number of queues greater than 256). Add a static assertion that RQ_NQS can't be strictly greater than 256 as long as the 'td_rqindex' thread field is of type 'u_char'. Add a new macro CHECK_IDX() that checks that an index is non-negative and below RQ_NQS, and use it in all low-level functions (and "public" ones when they don't need to call the former). While here, remove runq_remove_idx(), as it knows a bit too much of ULE's internals, in particular by treating the whole runqueue as round-robin, which we are going to change. Instead, have runq_remove() return whether the queue from which the thread was removed is now empty, and leverage this information in tdq_runq_rem() (sched_ule(4)). While here, re-implement runq_add() on top of runq_add_idx() to remove its duplicated code (all lines except one). Introduce the new RQ_PRI_TO_IDX() macro to convert a priority to a queue index, and use it in runq_add() (many more uses will be introduced in later commits). While here, rename runq_check() to runq_not_empty() and have it return a boolean instead of an 'int', and same for sched_runnable() as an impact (and while here, fix a small style violation in sched_4bsd(4)'s version). While here, simplify sched_runnable(). While here, make <sys/sched.h> standalone include-wise. No functional change intended. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:57 -04:00
Olivier Certner	28b54827f5	runq: Hide function prototypes under _KERNEL And some structure definitions as well. This header really is not supposed to be included by userland, so should just error in this case. However, there is one remaining use for it in a test: Getting the value of RQ_PPQ to ensure a big enough priority level difference in order to guarantee that a realtime thread preempts another. This use will soon be obsoleted by guaranteeing that a realtime thread always preempts another one with lower priority, even if the priority level is very close. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:57 -04:00
Olivier Certner	c21c24adde	runq: More selective includes of <sys/runq.h> to reduce pollution <sys/proc.h> doesn't need <sys/runq.h>. Remove this include and add it back for kernel files that relied on the pollution. Reviewed by: kib MFC after: 1 month Event: Kitchener-Waterloo Hackathon 202506 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45387	2025-06-17 22:07:57 -04:00

1 2 3 4 5 ...

300534 Commits