KekenoBSD/src

Author	SHA1	Message	Date
Gleb Smirnoff	40dbb06fa7	inpcb: retire INP_DROPPED and in_pcbdrop() The inpcb flag INP_DROPPED served two purposes. It was used by TCP and subsystems running on top of TCP as a flag that marks a connection that is now in TCPS_CLOSED, but was in some other state before (not a new-born connection). Create a new TCP flag TF_DISCONNECTED for this purpose. The in_pcbdrop() was a TCP's version of in_pcbdisconnect() that also sets INP_DROPPED. Use in_pcbdisconnect() instead. Second purpose of INP_DROPPED was a negative lookup mask in inp_smr_lock(), as SMR-protected lookup may see inpcbs that had been removed from the hash. We already have had INP_INHASHLIST that marks inpcb that is in hash. Convert it into INP_UNCONNECTED with the opposite meaning. This allows to combine it with INP_FREED for the negative lookup mask. The Chelsio/ToE and kTLS changes are done with some style refactoring, like moving inp/tp assignments up and using macros for that. However, no deep thinking was taken to check if those checks are really needed, it could be that some are not. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D56186	2026-04-12 11:33:07 -07:00
Gleb Smirnoff	ac5b962800	inpcb: retire the inpcb global list The iteration over all pcbs is possible without the global list. The newborn inpcbs are put on a global list of unconnected inpcbs, then after connect(2) or bind(2) they move to respective hash slot list. This adds a bit of complexity to inp_next(), but the storage scheme is actually simplified. One potential problem before this change was that a couple of pcbs fall into the same hash slot and are linked A->B there, but they also sit next to each other in the global list, linked as B->A. This can deadlock of course. The problem was never observed in the wild, but I was able to instrument it with lots of effort: just few pcbs in the system, hash size reduced down to 2 and a lot of repetitive calls into two kinds of iterators. However the main motivation is not the above problem, but make a step towards splitting the big hash lock into per-slot locks. Differential Revision: https://reviews.freebsd.org/D55967	2026-04-12 11:31:09 -07:00
Gleb Smirnoff	2cfe62664a	inpcb: retire the inpcbinfo list lock With the SMR locking of inpcbs the use of this lock reduced down to the global list and generation number. It was used only on an inpcb creation and destruction. Use the inpcbinfo hash lock for this purpose. Reviewed by: pouria, rrs, markj Differential Revision: https://reviews.freebsd.org/D55966	2026-04-12 11:30:59 -07:00
Gleb Smirnoff	8e1513dc67	inpcb: use hashalloc(9) While here remove ipi_lbgrouphashmask, as it is always has the same value as ipi_porthashmask. Differential Revision: https://reviews.freebsd.org/D56174	2026-04-12 10:25:57 -07:00
Gleb Smirnoff	a47c870930	inpcb: fix up !VIMAGE builds There are some files that don't include mutex.h and rwlock.h, but use inpcb locking macros. With VIMAGE the net/vnet.h pulls half of the possible kernel includes, masking the problem. The in_pcb.h also used to mask the problem, so restore that. Fixes: `041e9eb1ae`	2026-03-13 20:59:51 -07:00
Gleb Smirnoff	041e9eb1ae	inpcb: overhaul in_pcb.h Pull up all user-visible stuff to the top of the file and isolate the rest under _KERNEL. The user visible parts are: - struct in_conninfo - struct xinpcb - defines for inp_flags bits, that are shared between xinpcb and inpcb PR: 293493	2026-03-12 11:32:30 -07:00
Gleb Smirnoff	815ef05284	netinet: remove _WANT_INPCB and _WANT_TCPCB These were hacks since FreeBSD 12 that provided some transition period for utilities to migrate from reading kernel memory via kvm(3) to sysctl(3) based APIs. The transition period is over.	2026-03-12 09:37:53 -07:00
Michael Tuexen	5f43b0cb7c	ddb: provide inp_flags2 when printing inpcbs Reviewed by: markj, Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53542	2025-11-03 06:17:29 -05:00
Michael Tuexen	e8c50058e8	ddb: use %b when showing flags for an inp This is much more compact. Thanks to markj@ for suggesting the change. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53507	2025-11-02 11:14:57 -05:00
Michael Tuexen	9aa5a79e2a	ddb: optionally print inp when printing tcpcb Add /i option to the ddb commands show tcpcb and show all tcpcbs, which enables the printing of the t_inpcb. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53497	2025-10-31 14:05:02 -04:00
Konstantin Belousov	1b7d0c2ee9	in_pcb: add in_pcbrele_rlock() The helper that derefs and rlocks the provided inp. Returns false if inp is still usable. Reviewed by: glebius, markj Sponsored by: Nvidia networking Differential revision: https://reviews.freebsd.org/D51143	2025-07-10 17:42:27 +03:00
Konstantin Belousov	c9e9a0fe5b	ktls: define struct xktls_session and converter from ktls_session into external representation Reviewed by: jhb (previous version), markj Sponsored by: NVidia networking Differential revision: https://reviews.freebsd.org/D50653	2025-06-10 02:47:12 +03:00
Gleb Smirnoff	5f53917078	inpcb: retire two-level port hash database This structure originates from the pre-FreeBSD times when system RAM was measured in single digits of MB and Internet speeds were measured in Kb. At first level the database hashes the port value only to calculate index into array of pointers to lazily allocated headers that hold lists of inpcbs with the same local port. This design apparently was made to preserve kernel memory. In the modern kernel size of the first level of the hash is derived from maxsockets, which is derived from maxfiles, which in its turn is derived from amount of physical memory. Then the size of the hash is capped by IPPORT_MAX, cause it doesn't make any sense to have hash table larger then the set of possible values. In practice this cap works even on my laptop. I haven't done precise calculation or experiments, but my guess is that any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash. Apparently, this hash is a degenerate one: it never has more than one entries in any slot. You can check this with kgdb: set $i = 0 while ($i <= tcbinfo->ipi_porthashmask) set $p = tcbinfo->ipi_porthashbase[$i].clh_first set $c = 0 while ($p != 0) set $c = $c + 1 set $p = $p->phd_hash.cle_next end if ($c > 1) printf "Slot %u count %u", $i, $c end set $i = $i + 1 end Retiring the two level hash we remove a lot of complexity at the cost of only one comparison 'inp->inp_lport != lport' in the lookup cycle, which is going to be always false on most machines anyway. This comparison definitely shall be cheaper than extra pointer traversal. Another positive change to be singled out is that now we no longer need to allocate memory in non-sleepable context in in_pcbinshash(), so a potential ENOMEM on connect(2) is removed. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49151	2025-03-06 22:58:35 -08:00
Gleb Smirnoff	79fb0d2474	inpcb: make inpcb hash insertion/removal functions private	2025-03-06 22:58:29 -08:00
Gleb Smirnoff	2af953b132	inpcb: inline in_pcbconnect_setup() into in_pcbconnect() The separation had been done back in `5200e00e72` for the purposes of removing a true temporary connect of an unconnected UDP socket that does sendto(2) in `90162a4e87`. Now, with `69c05f4287` in place, the separation is no longer needed. There should be no functional change. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49142	2025-03-06 22:57:29 -08:00
Gleb Smirnoff	bafe022b1f	inpcb: add const qualifiers on functions that select address/port There are several functions that keep database locked and do address and port selection before a caller commits the changes to the inpcb. Mark the inpcb argument with a good documenting const.	2025-02-17 15:28:52 -08:00
Mark Johnston	ca94f92c23	inpcb: Move the definition of struct inpcblbgroup to in_pcb_var.h It's only needed for in_pcb.c and in6_pcb.c, so can go to the private header. No functional change intended. Reported by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield	2025-02-06 16:25:24 +00:00
Mark Johnston	da806e8db6	inpcb: Add FIB-aware inpcb lookup Allow protocol layers to look up an inpcb belonging to a particular FIB. This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be used is obtained from the specificed mbuf or ifnet. No functional change intended. Reviewed by: glebius, melifaro MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48662	2025-02-06 14:14:39 +00:00
Mark Johnston	bbd0084baf	inpcb: Add a flags parameter to in_pcbbind() Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB number. When this flag is specified, duplicate bindings are permitted, so long as each FIB contains at most one inpcb bound to the same address/port. If an inpcb is bound with this flag, it'll have the INP_BOUNDFIB flag set. No functional change intended. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48661	2025-02-06 14:14:23 +00:00
Mark Johnston	7cbb6b6e28	inpcb: Close some SO_REUSEPORT_LB races, part 2 Suppose a thread is adds a socket to an existing TCP lbgroup that is actively accepting connections. It has to do the following operations: 1. set SO_REUSEPORT_LB on the socket 2. bind() the socket to the shared address/port 3. call listen() Step 2 makes the inpcb visible to incoming connection requests. However, at this point the inpcb cannot accept new connections. If in_pcblookup() matches it, the remote end will see ECONNREFUSED even when other listening sockets are present in the lbgroup. This means that dynamically adding inpcbs to an lbgroup (e.g., by starting up new workers) can trigger spurious connection failures for no good reason. (A similar problem exists when removing inpcbs from an lbgroup, but that is harder to fix and is not addressed by this patch; see the review for a bit more commentary.) Fix this by augmenting each lbgroup with a linked list of inpcbs that are pending a listen() call. When adding an inpcb to an lbgroup, keep the inpcb on this list if listen() hasn't been called, so it is not yet visible to the lookup path. Then, add a new in_pcblisten() routine which makes the inpcb visible within the lbgroup now that it's safe to let it handle new connections. Add a regression test which verifies that we don't get spurious connection errors while adding sockets to an LB group. Reviewed by: glebius MFC after: 1 month Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48544	2025-01-23 17:12:10 +00:00
Gleb Smirnoff	0b4539ee54	inpcb: gc unused argument of in_pcbconnect()	2024-11-14 11:39:13 -08:00
Gleb Smirnoff	1a8d176432	inpcb: fully retire inp_ppcb pointer Before a protocol specific control block started to embed inpcb in self (see `0aa120d52f`, `e68b379244`, `483fe96511`) this pointer used to point at it. Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The exp-run detected a minimal use of the field in ports: * sysutils/lsof - patched upstream * net-mgmt/netdata - patch accepted upstream * emulators/qemu-user-static - upstream master branch seems not using the field anymore We can keep the field around for some time, but eventually it may be reused for something else. PR: 277659 (exp-run) Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D44491	2024-03-29 12:18:32 -07:00
Gleb Smirnoff	027fda80fe	inpcb: remove unused KPIs to manipulate inpcbs These KPIs were added in `9d29c635da` and through 15 years had zero use. They slightly remind what IfAPI does for struct ifnet. But IfAPI does that for the sake of large collection of NIC drivers not being aware of struct ifnet. For the inpcb it is unclear what could be a large collection of externally written kernel modules that need extensively use inpcb and not be aware of its internals at the same time. This isolation of a structure knowledge requires a lot of work, and just throwing in a few KPIs isn't helpful. Reviewed by: kib, bz, markj Differential Revision: https://reviews.freebsd.org/D44310	2024-03-18 08:49:39 -07:00
Gleb Smirnoff	a13039e270	inpcb: reoder inpcb destruction First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions. PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122	2023-12-27 08:34:37 -08:00
Gleb Smirnoff	0fac350c54	sockets: don't malloc/free sockaddr memory on getpeername/getsockname Just like it was done for accept(2) in `cfb1e92912`, use same approach for two simplier syscalls that return socket addresses. Although, these two syscalls aren't performance critical, this change generalizes some code between 3 syscalls trimming code size. Following example of accept(2), provide VNET-aware and INVARIANT-checking wrappers sopeeraddr() and sosockaddr() around protosw methods. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D42694	2023-11-30 08:31:10 -08:00
Warner Losh	29363fb446	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix	2023-11-26 22:23:30 -07:00
Gleb Smirnoff	bbbd7aab1b	inpcb: garbage collect in_pcbnotifyall()	2023-11-20 14:38:31 -08:00
Warner Losh	2ff63af9b8	sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/	2023-08-16 11:54:18 -06:00
Gleb Smirnoff	e3ba0d6add	inpcb: do not copy so_options into inp_flags2 Since `f71cb9f748` socket stays connnected with inpcb through latter's lifetime and there is no reason to complicate things and copy these flags. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D41198	2023-07-26 20:35:42 -07:00
Gleb Smirnoff	a43e7a96b6	inpcb: use internal flag to mark pcbs that are inserted into lbgroup Using INP_REUSEPORT_LB is unsafe, as it is basically a copy of socket's SO_REUSEPORT_LB flag, which can be cleared by userland after bind(). Reviewed by: markj Reported by: syzbot+e7d2e451f89fb444319b@syzkaller.appspotmail.com Differential Revision: https://reviews.freebsd.org/D41197	2023-07-26 20:35:30 -07:00
Gleb Smirnoff	c3c20de3b2	tcp: move HPTS/LRO flags out of inpcb to tcpcb These flags are TCP specific. While here, make also several LRO internal functions to pass tcpcb pointer instead of inpcb one. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39698	2023-04-25 12:19:48 -07:00
Gleb Smirnoff	c2a69e846f	tcp_hpts: move HPTS related fields from inpcb to tcpcb This makes inpcb lighter and allows future cache line optimizations of tcpcb. The reason why HPTS originally used inpcb is the compressed TIME-WAIT state (see `0d7445193a`), that used to free a tcpcb, while the associated connection is still on the HPTS ring. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39697	2023-04-25 12:18:33 -07:00
Mark Johnston	7b92493ab1	inpcb: Avoid inp_cred dereferences in SMR-protected lookup The SMR-protected inpcb lookup algorithm currently has to check whether a matching inpcb belongs to a jail, in order to prioritize jailed bound sockets. To do this it has to maintain a ucred reference, and for this to be safe, the reference can't be released until the UMA destructor is called, and this will not happen within any bounded time period. Changing SMR to periodically recycle garbage is not trivial. Instead, let's implement SMR-synchronized lookup without needing to dereference inp_cred. This will allow the inpcb code to free the inp_cred reference immediately when a PCB is freed, ensuring that ucred (and thus jail) references are released promptly. Commit `220d892129` ("inpcb: immediately return matching pcb on lookup") gets us part of the way there. This patch goes further to handle lookups of unconnected sockets. Here, the strategy is to maintain a well-defined order of items within a hash chain so that a wild lookup can simply return the first match and preserve existing semantics. This makes insertion of listening sockets more complicated in order to make lookup simpler, which seems like the right tradeoff anyway given that bind() is already a fairly expensive operation and lookups are more common. In particular, when inserting an unconnected socket, in_pcbinhash() now keeps the following ordering: - jailed sockets before non-jailed sockets, - specified local addresses before unspecified local addresses. Most of the change adds a separate SMR-based lookup path for inpcb hash lookups. When a match is found, we try to lock the inpcb and re-validate its connection info. In the common case, this works well and we can simply return the inpcb. If this fails, typically because something is concurrently modifying the inpcb, we go to the slow path, which performs a serialized lookup. Note, I did not touch lbgroup lookup, since there the credential reference is formally synchronized by net_epoch, not SMR. In particular, lbgroups are rarely allocated or freed. I think it is possible to simplify in_pcblookup_hash_wild_locked() now, but I didn't do it in this patch. Discussed with: glebius Tested by: glebius Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38572	2023-04-20 12:13:06 -04:00
Mark Johnston	fdb987bebd	inpcb: Split PCB hash tables Currently we use a single hash table per PCB database for connected and bound PCBs. Since we started using net_epoch to synchronize hash table lookups, there's been a bug, noted in a comment above in_pcbrehash(): connecting a socket can cause an inpcb to move between hash chains, and this can cause a concurrent lookup to follow the wrong linkage pointers. I believe this could cause rare, spurious ECONNREFUSED errors in the worse case. Address the problem by introducing a second hash table and adding more linkage pointers to struct inpcb. Now the database has one table each for connected and unconnected sockets. When inserting an inpcb into the hash table, in_pcbinhash() now looks at the foreign address of the inpcb to figure out which table to use. This ensures that queue linkage pointers are stable until the socket is disconnected, so the problem described above goes away. There is also a small benefit in that in_pcblookup_*() can now search just one of the two possible hash buckets. I also made the "rehash" parameter of in(6)_pcbconnect() unused. This parameter seems confusing and it is simpler to let the inpcb code figure out what to do using the existing INP_INHASHLIST flag. UDP sockets pose a special problem since they can be connected and disconnected multiple times during their lifecycle. To handle this, the patch plugs a hole in the inpcb structure and uses it to store an SMR sequence number. When an inpcb is disconnected - an operation which requires the global PCB database hash lock - the write sequence number is advanced, and in order to reconnect, the connecting thread must wait for readers to drain before reusing the inpcb's hash chain linkage pointers. raw_ip (ab)uses the hash table without using the corresponding accessors. Since there are now two hash tables, it arbitrarily uses the "connected" table for all of its PCBs. This will be addressed in some way in the future. inp interators which specify a hash bucket will only visit connected PCBs. This is not really correct, but nothing in the tree uses that functionality except raw_ip, which as mentioned above places all of its PCBs in the "connected" table and so is unaffected. Discussed with: glebius Tested by: glebius Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38569	2023-04-20 12:13:06 -04:00
Mark Johnston	317fa5169d	netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option It has no effect, and an exp-run revealed that it is not in use. PR: 261398 (exp-run) Reviewed by: mjg, glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38822	2023-02-28 15:57:21 -05:00
Mark Johnston	3aff4ccdd7	netinet: Remove IP(V6)_BINDMULTI This option was added in commit `0a100a6f1e` but was never completed. In particular, there is no logic to map flowids to different listening sockets, so it accomplishes basically the same thing as SO_REUSEPORT. Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to balance among listening sockets using a hash of the 4-tuple and some optional NUMA policy. The option was never documented or completed, and an exp-run revealed nothing using it in the ports tree. Moreover, it complicates the already very complicated in_pcbbind_setup(), and the checking in in_pcbbind_check_bindmulti() is insufficient. So, let's remove it. PR: 261398 (exp-run) Reviewed by: glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38574	2023-02-27 10:03:11 -05:00
Gleb Smirnoff	96871af013	inpcb: use family specific sockaddr argument for bind functions Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_bind method and from there on go down the call stack with family specific argument. Reviewed by: zlei, melifaro, markj Differential Revision: https://reviews.freebsd.org/D38601	2023-02-15 10:30:16 -08:00
Gleb Smirnoff	09d3671b0e	inpcb: better document INP_ANONPORT flag The name is pretty self explaining, but it is unclear why we need this flag, as kernel only sets it and never reads.	2023-02-03 11:33:36 -08:00
Gleb Smirnoff	9e46ff4d4c	netinet: don't return conflicting inpcb in in_pcbconnect_setup() Last time this inpcb was actually used was in tcp_connect() before `c94c54e4df`.	2023-02-03 11:33:36 -08:00
Gleb Smirnoff	a9d22cce10	inpcb: use family specific sockaddr argument for connect functions Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_connect method and from there on go down the call stack with family specific argument. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D38356	2023-02-03 11:33:36 -08:00
Gleb Smirnoff	0aa120d52f	inpcb: allow to provide protocol specific pcb size The protocol specific structure shall start with inpcb. Differential revision: https://reviews.freebsd.org/D37126	2022-12-02 14:10:55 -08:00
Mark Johnston	d93ec8cb13	inpcb: Allow SO_REUSEPORT_LB to be used in jails Currently SO_REUSEPORT_LB silently does nothing when set by a jailed process. It is trivial to support this option in VNET jails, but it's also useful in traditional jails. This patch enables LB groups in jails with the following semantics: - all PCBs in a group must belong to the same jail, - PCB lookup prefers jailed groups to non-jailed groups This is a straightforward extension of the semantics used for individual listening sockets. One pre-existing quirk of the lbgroup implementation is that non-jailed lbgroups are searched before jailed listening sockets; that is preserved with this change. Discussed with: glebius MFC after: 1 month Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37029	2022-11-02 13:46:24 -04:00
Gleb Smirnoff	19acc50667	inpcb: retire suppresion of randomization of ephemeral ports The suppresion was added in `5f311da2cc` with no explanation in the commit message of the exact problem that was fixed. In the BSDCan 2006 talk [1], slides 12 to 14, we can find that it seems that there was some problem with the TIME_WAIT state not properly being handled on the remote side (also FreeBSD!), and this switching off the suppression had hidden the problem. The rationale of the change was that other stacks may also be buggy wrt the TIME_WAIT. I did not find the actual problem in TIME_WAIT that the suppression has hidden, neither a commit that would fix it. However, since that time we started to handle SYNs with RFC5961 instead of RFC793, see `3220a2121c`. We also now have the tcp-testsuite [2], that has full coverage of all possible scenarios of receiving SYN in TIME_WAIT. This effectively reverts `5f311da2cc` and `6ee79c59d2`. [1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf [2] https://github.com/freebsd-net/tcp-testsuite Reviewed by: rscheff Discussed with: rscheff, rrs, tuexen Differential revision: https://reviews.freebsd.org/D37042	2022-10-31 08:57:11 -07:00
Gleb Smirnoff	24cf7a8d62	inpcb: provide pcbinfo pointer argument to inp_apply_all() Allows to clear inpcb layer of TCP knowledge.	2022-10-19 15:15:53 -07:00
Gleb Smirnoff	53af690381	tcp: remove INP_TIMEWAIT flag Mechanically cleanup INP_TIMEWAIT from the kernel sources. After `0d7445193a`, this commit shall not cause any functional changes. Note: this flag was very often checked together with INP_DROPPED. If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we will be able to remove most of this checks and turn them to assertions. Some of them can be turned into assertions right now, but that should be carefully done on a case by case basis. Differential revision: https://reviews.freebsd.org/D36400	2022-10-06 19:24:37 -07:00
Gordon Bergling	893f36b7f1	netinet: Correct a typo in source code comment - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:57:12 +02:00
Michael Tuexen	a35bdd4489	tcp: add sysctl interface for setting socket options This interface allows to set a socket option on a TCP endpoint, which is specified by its inp_gencnt. This interface will be used in an upcoming command line tool tcpsso. Reviewed by: glebius, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34138	2022-02-09 12:24:41 +01:00
Gleb Smirnoff	afad340a14	inpcb: garbage collect INP_LOCK_INIT(), used only once in sctp Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D33543	2022-01-03 10:20:30 -08:00
Gleb Smirnoff	fec8a8c7cb	inpcb: use global UMA zones for protocols Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses its private hash, but they all use same zone to allocate and SMR section to synchronize. Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit on the socket zone, which was always global. Historically same maxsockets value is applied also to every PCB zone. Important fact: you can't create a pcb without a socket! A pcb may outlive its socket, however. Given that there are multiple protocols, and only one socket zone, the per pcb zone limits seem to have little value. Under very special conditions it may trigger a little bit earlier than socket zone limit, but in most setups the socket zone limit will be triggered earlier. When VIMAGE was added to the kernel PCB zones became per-VNET. This magnified existing disbalance further: now we have multiple pcb zones in multiple vnets limited to maxsockets, but every pcb requires a socket allocated from the global zone also limited by maxsockets. IMHO, this per pcb zone limit doesn't bring any value, so this patch drops it. If anybody explains value of this limit, it can be restored very easy - just 2 lines change to in_pcbstorage_init(). Differential revision: https://reviews.freebsd.org/D33542	2022-01-03 10:17:46 -08:00
Gleb Smirnoff	a057769205	in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses. Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254	2021-12-26 10:47:28 -08:00

1 2 3 4 5 ...

290 Commits