buf.9: Sprinkle with mdoc macros
I did not bump the date here as the manual page looks more like a draft and I'm not sure if it is actually up-to-date considering that it's current Dd dates back to 1998. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D52770
This commit is contained in:
+72
-25
@@ -36,44 +36,70 @@ The kernel implements a KVM abstraction of the buffer cache which allows it
|
|||||||
to map potentially disparate vm_page's into contiguous KVM for use by
|
to map potentially disparate vm_page's into contiguous KVM for use by
|
||||||
(mainly file system) devices and device I/O.
|
(mainly file system) devices and device I/O.
|
||||||
This abstraction supports
|
This abstraction supports
|
||||||
block sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
|
block sizes from
|
||||||
|
.Dv DEV_BSIZE
|
||||||
|
(usually 512) to upwards of several pages or more.
|
||||||
It also supports a relatively primitive byte-granular valid range and dirty
|
It also supports a relatively primitive byte-granular valid range and dirty
|
||||||
range currently hardcoded for use by NFS.
|
range currently hardcoded for use by NFS.
|
||||||
The code implementing the
|
The code implementing the
|
||||||
VM Buffer abstraction is mostly concentrated in
|
VM Buffer abstraction is mostly concentrated in
|
||||||
.Pa /usr/src/sys/kern/vfs_bio.c .
|
.Pa sys/kern/vfs_bio.c
|
||||||
|
in the
|
||||||
|
.Fx
|
||||||
|
source tree.
|
||||||
.Pp
|
.Pp
|
||||||
One of the most important things to remember when dealing with buffer pointers
|
One of the most important things to remember when dealing with buffer pointers
|
||||||
(struct buf) is that the underlying pages are mapped directly from the buffer
|
.Pq Vt struct buf
|
||||||
|
is that the underlying pages are mapped directly from the buffer
|
||||||
cache.
|
cache.
|
||||||
No data copying occurs in the scheme proper, though some file systems
|
No data copying occurs in the scheme proper, though some file systems
|
||||||
such as UFS do have to copy a little when dealing with file fragments.
|
such as UFS do have to copy a little when dealing with file fragments.
|
||||||
The second most important thing to remember is that due to the underlying page
|
The second most important thing to remember is that due to the underlying page
|
||||||
mapping, the b_data base pointer in a buf is always *page* aligned, not
|
mapping, the
|
||||||
*block* aligned.
|
.Va b_data
|
||||||
When you have a VM buffer representing some b_offset and
|
base pointer in a buf is always
|
||||||
b_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
|
.Em page Ns -aligned ,
|
||||||
and not just b_data.
|
not
|
||||||
|
.Em block Ns -aligned .
|
||||||
|
When you have a VM buffer representing some
|
||||||
|
.Va b_offset
|
||||||
|
and
|
||||||
|
.Va b_size ,
|
||||||
|
the actual start of the buffer is
|
||||||
|
.Ql b_data + (b_offset & PAGE_MASK)
|
||||||
|
and not just
|
||||||
|
.Ql b_data .
|
||||||
Finally, the VM system's core buffer cache supports
|
Finally, the VM system's core buffer cache supports
|
||||||
valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
|
valid and dirty bits
|
||||||
|
.Pq Va m->valid , m->dirty
|
||||||
|
for pages in
|
||||||
|
.Dv DEV_BSIZE
|
||||||
|
chunks.
|
||||||
Thus
|
Thus
|
||||||
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
|
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
|
||||||
bits.
|
bits.
|
||||||
These bits are generally set and cleared in groups based on the device
|
These bits are generally set and cleared in groups based on the device
|
||||||
block size of the device backing the page.
|
block size of the device backing the page.
|
||||||
Complete page's worth are often
|
Complete page's worth are often
|
||||||
referred to using the VM_PAGE_BITS_ALL bitmask (i.e., 0xFF if the hardware page
|
referred to using the
|
||||||
|
.Dv VM_PAGE_BITS_ALL
|
||||||
|
bitmask (i.e., 0xFF if the hardware page
|
||||||
size is 4096).
|
size is 4096).
|
||||||
.Pp
|
.Pp
|
||||||
VM buffers also keep track of a byte-granular dirty range and valid range.
|
VM buffers also keep track of a byte-granular dirty range and valid range.
|
||||||
This feature is normally only used by the NFS subsystem.
|
This feature is normally only used by the NFS subsystem.
|
||||||
I am not sure why it
|
I am not sure why it
|
||||||
is used at all, actually, since we have DEV_BSIZE valid/dirty granularity
|
is used at all, actually, since we have
|
||||||
|
.Dv DEV_BSIZE
|
||||||
|
valid/dirty granularity
|
||||||
within the VM buffer.
|
within the VM buffer.
|
||||||
If a buffer dirty operation creates a 'hole',
|
If a buffer dirty operation creates a
|
||||||
|
.Dq hole ,
|
||||||
the dirty range will extend to cover the hole.
|
the dirty range will extend to cover the hole.
|
||||||
If a buffer validation
|
If a buffer validation
|
||||||
operation creates a 'hole' the byte-granular valid range is left alone and
|
operation creates a
|
||||||
|
.Dq hole
|
||||||
|
the byte-granular valid range is left alone and
|
||||||
will not take into account the new extension.
|
will not take into account the new extension.
|
||||||
Thus the whole byte-granular
|
Thus the whole byte-granular
|
||||||
abstraction is considered a bad hack and it would be nice if we could get rid
|
abstraction is considered a bad hack and it would be nice if we could get rid
|
||||||
@@ -81,16 +107,24 @@ of it completely.
|
|||||||
.Pp
|
.Pp
|
||||||
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
|
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
|
||||||
order to allow the kernel to directly manipulate the data associated with
|
order to allow the kernel to directly manipulate the data associated with
|
||||||
the (vnode,b_offset,b_size).
|
the
|
||||||
|
.Pq Va vnode , b_offset , b_size .
|
||||||
The kernel typically unmaps VM buffers the moment
|
The kernel typically unmaps VM buffers the moment
|
||||||
they are no longer needed but often keeps the 'struct buf' structure
|
they are no longer needed but often keeps the
|
||||||
instantiated and even bp->b_pages array instantiated despite having unmapped
|
.Vt struct buf
|
||||||
|
structure
|
||||||
|
instantiated and even
|
||||||
|
.Va bp->b_pages
|
||||||
|
array instantiated despite having unmapped
|
||||||
them from KVM.
|
them from KVM.
|
||||||
If a page making up a VM buffer is about to undergo I/O, the
|
If a page making up a VM buffer is about to undergo I/O, the
|
||||||
system typically unmaps it from KVM and replaces the page in the b_pages[]
|
system typically unmaps it from KVM and replaces the page in the
|
||||||
|
.Va b_pages[]
|
||||||
array with a place-marker called bogus_page.
|
array with a place-marker called bogus_page.
|
||||||
The place-marker forces any kernel
|
The place-marker forces any kernel
|
||||||
subsystems referencing the associated struct buf to re-lookup the associated
|
subsystems referencing the associated
|
||||||
|
.Vt struct buf
|
||||||
|
to re-lookup the associated
|
||||||
page.
|
page.
|
||||||
I believe the place-marker hack is used to allow sophisticated devices
|
I believe the place-marker hack is used to allow sophisticated devices
|
||||||
such as file system devices to remap underlying pages in order to deal with,
|
such as file system devices to remap underlying pages in order to deal with,
|
||||||
@@ -107,18 +141,29 @@ you wind up with pages marked clean that are actually still dirty.
|
|||||||
If not
|
If not
|
||||||
treated carefully, these pages could be thrown away!
|
treated carefully, these pages could be thrown away!
|
||||||
Indeed, a number of
|
Indeed, a number of
|
||||||
serious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
|
serious bugs related to this hack were not fixed until the
|
||||||
The kernel uses an instantiated VM buffer (i.e., struct buf) to place-mark pages
|
.Fx 2.2.8 /
|
||||||
|
.Fx 3.0
|
||||||
|
release.
|
||||||
|
The kernel uses an instantiated VM buffer (i.e.,
|
||||||
|
.Vt struct buf )
|
||||||
|
to place-mark pages
|
||||||
in this special state.
|
in this special state.
|
||||||
The buffer is typically flagged B_DELWRI.
|
The buffer is typically flagged
|
||||||
|
.Dv B_DELWRI .
|
||||||
When a
|
When a
|
||||||
device no longer needs a buffer it typically flags it as B_RELBUF.
|
device no longer needs a buffer it typically flags it as
|
||||||
|
.Dv B_RELBUF .
|
||||||
Due to
|
Due to
|
||||||
the underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
|
the underlying pages being marked clean, the
|
||||||
|
.Ql B_DELWRI|B_RELBUF
|
||||||
|
combination must
|
||||||
be interpreted to mean that the buffer is still actually dirty and must be
|
be interpreted to mean that the buffer is still actually dirty and must be
|
||||||
written to its backing store before it can actually be released.
|
written to its backing store before it can actually be released.
|
||||||
In the case
|
In the case
|
||||||
where B_DELWRI is not set, the underlying dirty pages are still properly
|
where
|
||||||
|
.Dv B_DELWRI
|
||||||
|
is not set, the underlying dirty pages are still properly
|
||||||
marked as dirty and the buffer can be completely freed without losing that
|
marked as dirty and the buffer can be completely freed without losing that
|
||||||
clean/dirty state information.
|
clean/dirty state information.
|
||||||
(XXX do we have to check other flags in
|
(XXX do we have to check other flags in
|
||||||
@@ -128,7 +173,9 @@ The kernel reserves a portion of its KVM space to hold VM Buffer's data
|
|||||||
maps.
|
maps.
|
||||||
Even though this is virtual space (since the buffers are mapped
|
Even though this is virtual space (since the buffers are mapped
|
||||||
from the buffer cache), we cannot make it arbitrarily large because
|
from the buffer cache), we cannot make it arbitrarily large because
|
||||||
instantiated VM Buffers (struct buf's) prevent their underlying pages in the
|
instantiated VM Buffers
|
||||||
|
.Pq Vt struct buf Ap s
|
||||||
|
prevent their underlying pages in the
|
||||||
buffer cache from being freed.
|
buffer cache from being freed.
|
||||||
This can complicate the life of the paging
|
This can complicate the life of the paging
|
||||||
system.
|
system.
|
||||||
|
|||||||
Reference in New Issue
Block a user