buf.9: Sprinkle with mdoc macros
I did not bump the date here as the manual page looks more like a draft and I'm not sure if it is actually up-to-date considering that it's current Dd dates back to 1998. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D52770
This commit is contained in:
+72
-25
@@ -36,44 +36,70 @@ The kernel implements a KVM abstraction of the buffer cache which allows it
|
||||
to map potentially disparate vm_page's into contiguous KVM for use by
|
||||
(mainly file system) devices and device I/O.
|
||||
This abstraction supports
|
||||
block sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
|
||||
block sizes from
|
||||
.Dv DEV_BSIZE
|
||||
(usually 512) to upwards of several pages or more.
|
||||
It also supports a relatively primitive byte-granular valid range and dirty
|
||||
range currently hardcoded for use by NFS.
|
||||
The code implementing the
|
||||
VM Buffer abstraction is mostly concentrated in
|
||||
.Pa /usr/src/sys/kern/vfs_bio.c .
|
||||
.Pa sys/kern/vfs_bio.c
|
||||
in the
|
||||
.Fx
|
||||
source tree.
|
||||
.Pp
|
||||
One of the most important things to remember when dealing with buffer pointers
|
||||
(struct buf) is that the underlying pages are mapped directly from the buffer
|
||||
.Pq Vt struct buf
|
||||
is that the underlying pages are mapped directly from the buffer
|
||||
cache.
|
||||
No data copying occurs in the scheme proper, though some file systems
|
||||
such as UFS do have to copy a little when dealing with file fragments.
|
||||
The second most important thing to remember is that due to the underlying page
|
||||
mapping, the b_data base pointer in a buf is always *page* aligned, not
|
||||
*block* aligned.
|
||||
When you have a VM buffer representing some b_offset and
|
||||
b_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
|
||||
and not just b_data.
|
||||
mapping, the
|
||||
.Va b_data
|
||||
base pointer in a buf is always
|
||||
.Em page Ns -aligned ,
|
||||
not
|
||||
.Em block Ns -aligned .
|
||||
When you have a VM buffer representing some
|
||||
.Va b_offset
|
||||
and
|
||||
.Va b_size ,
|
||||
the actual start of the buffer is
|
||||
.Ql b_data + (b_offset & PAGE_MASK)
|
||||
and not just
|
||||
.Ql b_data .
|
||||
Finally, the VM system's core buffer cache supports
|
||||
valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
|
||||
valid and dirty bits
|
||||
.Pq Va m->valid , m->dirty
|
||||
for pages in
|
||||
.Dv DEV_BSIZE
|
||||
chunks.
|
||||
Thus
|
||||
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
|
||||
bits.
|
||||
These bits are generally set and cleared in groups based on the device
|
||||
block size of the device backing the page.
|
||||
Complete page's worth are often
|
||||
referred to using the VM_PAGE_BITS_ALL bitmask (i.e., 0xFF if the hardware page
|
||||
referred to using the
|
||||
.Dv VM_PAGE_BITS_ALL
|
||||
bitmask (i.e., 0xFF if the hardware page
|
||||
size is 4096).
|
||||
.Pp
|
||||
VM buffers also keep track of a byte-granular dirty range and valid range.
|
||||
This feature is normally only used by the NFS subsystem.
|
||||
I am not sure why it
|
||||
is used at all, actually, since we have DEV_BSIZE valid/dirty granularity
|
||||
is used at all, actually, since we have
|
||||
.Dv DEV_BSIZE
|
||||
valid/dirty granularity
|
||||
within the VM buffer.
|
||||
If a buffer dirty operation creates a 'hole',
|
||||
If a buffer dirty operation creates a
|
||||
.Dq hole ,
|
||||
the dirty range will extend to cover the hole.
|
||||
If a buffer validation
|
||||
operation creates a 'hole' the byte-granular valid range is left alone and
|
||||
operation creates a
|
||||
.Dq hole
|
||||
the byte-granular valid range is left alone and
|
||||
will not take into account the new extension.
|
||||
Thus the whole byte-granular
|
||||
abstraction is considered a bad hack and it would be nice if we could get rid
|
||||
@@ -81,16 +107,24 @@ of it completely.
|
||||
.Pp
|
||||
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
|
||||
order to allow the kernel to directly manipulate the data associated with
|
||||
the (vnode,b_offset,b_size).
|
||||
the
|
||||
.Pq Va vnode , b_offset , b_size .
|
||||
The kernel typically unmaps VM buffers the moment
|
||||
they are no longer needed but often keeps the 'struct buf' structure
|
||||
instantiated and even bp->b_pages array instantiated despite having unmapped
|
||||
they are no longer needed but often keeps the
|
||||
.Vt struct buf
|
||||
structure
|
||||
instantiated and even
|
||||
.Va bp->b_pages
|
||||
array instantiated despite having unmapped
|
||||
them from KVM.
|
||||
If a page making up a VM buffer is about to undergo I/O, the
|
||||
system typically unmaps it from KVM and replaces the page in the b_pages[]
|
||||
system typically unmaps it from KVM and replaces the page in the
|
||||
.Va b_pages[]
|
||||
array with a place-marker called bogus_page.
|
||||
The place-marker forces any kernel
|
||||
subsystems referencing the associated struct buf to re-lookup the associated
|
||||
subsystems referencing the associated
|
||||
.Vt struct buf
|
||||
to re-lookup the associated
|
||||
page.
|
||||
I believe the place-marker hack is used to allow sophisticated devices
|
||||
such as file system devices to remap underlying pages in order to deal with,
|
||||
@@ -107,18 +141,29 @@ you wind up with pages marked clean that are actually still dirty.
|
||||
If not
|
||||
treated carefully, these pages could be thrown away!
|
||||
Indeed, a number of
|
||||
serious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
|
||||
The kernel uses an instantiated VM buffer (i.e., struct buf) to place-mark pages
|
||||
serious bugs related to this hack were not fixed until the
|
||||
.Fx 2.2.8 /
|
||||
.Fx 3.0
|
||||
release.
|
||||
The kernel uses an instantiated VM buffer (i.e.,
|
||||
.Vt struct buf )
|
||||
to place-mark pages
|
||||
in this special state.
|
||||
The buffer is typically flagged B_DELWRI.
|
||||
The buffer is typically flagged
|
||||
.Dv B_DELWRI .
|
||||
When a
|
||||
device no longer needs a buffer it typically flags it as B_RELBUF.
|
||||
device no longer needs a buffer it typically flags it as
|
||||
.Dv B_RELBUF .
|
||||
Due to
|
||||
the underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
|
||||
the underlying pages being marked clean, the
|
||||
.Ql B_DELWRI|B_RELBUF
|
||||
combination must
|
||||
be interpreted to mean that the buffer is still actually dirty and must be
|
||||
written to its backing store before it can actually be released.
|
||||
In the case
|
||||
where B_DELWRI is not set, the underlying dirty pages are still properly
|
||||
where
|
||||
.Dv B_DELWRI
|
||||
is not set, the underlying dirty pages are still properly
|
||||
marked as dirty and the buffer can be completely freed without losing that
|
||||
clean/dirty state information.
|
||||
(XXX do we have to check other flags in
|
||||
@@ -128,7 +173,9 @@ The kernel reserves a portion of its KVM space to hold VM Buffer's data
|
||||
maps.
|
||||
Even though this is virtual space (since the buffers are mapped
|
||||
from the buffer cache), we cannot make it arbitrarily large because
|
||||
instantiated VM Buffers (struct buf's) prevent their underlying pages in the
|
||||
instantiated VM Buffers
|
||||
.Pq Vt struct buf Ap s
|
||||
prevent their underlying pages in the
|
||||
buffer cache from being freed.
|
||||
This can complicate the life of the paging
|
||||
system.
|
||||
|
||||
Reference in New Issue
Block a user