Data Structures on Secondary Storage

What is one reason why we store databases on disk?
One reason why we store databases on disk is so that the data can persist after the program stops running.

How does the database store data structures on the disk?
The database stores data structures on the disk as a file or multiple files.

What is another situation when we have to store data structures on disk?
Another situation when we have to store data structures on disk is when they're too big to keep in memory and must be partially or fully saved on the disk.

What is an important factor to consider if you're storing data structures on disk?
An important factor to consider if you're storing data structures on disk is the time it takes to read from the disk.

Is reading from disk slower or faster than reading from memory?
Reading from a disk is slower than reading from memory.

What do you do to the cost of accessing the disk for reading and writing?
For the cost of accessing the disk for reading and writing, you amortize it.

What does it mean to amortize the cost of accessing the disk?
To amortize the cost of accessing the disk is to spread the cost over multiple operations by reading or writing a block's (disk page) worth of elements in one operation.

How does the speed of reading or writing a single element on disk compare to a block or page's worth of elements?
The speed of reading or writing a single element on disk and the speed of reading or writing a block or page's worth of elements is equivalent.

What are the two parameters that should be minimized when analyzing algorithms that save to disk?
The two parameters that should be minimized when analyzing algorithms that save to disk are:

  1. The number of disk operations.
  2. The CPU time.

What are the three solutions for storing a data structure on disk?
The three solutions for storing a data structure on disk are by using:

  1. A standardized, human-readable file format - JSON, XML, etc.
  2. Built-in serialization in the programming language.
  3. A custom binary format for serializing objects.

What is the benefit of using a standardized, human-readable file format to store data structures?
The benefit of using a standardized, human-readable file format to store data structures is that it's easy to use and pass to other languages.

What are two downsides when using a standardized, human-readable file format to store data structures?
The two downsides when using a standardized, human-readable file format to store data are that:

  1. It takes more storage space.
  2. It takes more time to parse.

What is serialization?
Serialization is the act of saving a data structure as an array of bytes to a file.

How do you mark a Java class as serializable?
You mark a Java class as serializable by adding implements Serializable.

What is the benefit of using serialization to store data structures?
The benefit of using serialization to store data structures is that it's easy to use and compact.

What are two downsides when using serialization to store data structures?
Two downsides when using serialization to store data structures are that:

  1. It doesn't support random access and must be loaded in its entirety.
  2. It's not easy to pass to programs in other languages.

What is the benefit of using a custom binary format to store data structures?
The benefit of using a custom binary format to store data structures is that it's much faster and compact for random access.

What are two downsides when using a custom binary format to store data structures?
Two downsides when using a custom binary format to store data structures is that:

  1. It's harder to deal with in our code.
  2. It's not easy to pass to other programs or languages.

Notes on B-Tree and disk access

Of the three types of solutions for storing data structures on disk, what type of solution is a B-tree?
Of the three types of solutions for storing data structures on disk, a B-tree is a custom binary format.

Why do B-trees have a high branching factor?
B-trees have a high branching factor so that each node fills up one block or disk page.

What is the typical branching factor for a B-tree?
The typical branching factor for a B-tree is 50 to 2000 depending on the disk's block size.

How many elements can a B-tree branching factor of 1000 and a height of 2 store?
With a branching factor of 1000 and a height of 2, a B-tree can store a billion elements.

What is a diagram of an example of a B-tree?
A diagram of an example of a B-tree is:
B-Tree Diagram.png

What do B-tree algorithms do with selected pages from disk?
With selected pages from disk, B-tree algorithms copy them into the main memory as needed and write back onto the disk the pages that have changed.

B-tree algorithms keep a ... number of pages in memory at a time.
B-tree algorithms keep a constant number of pages in memory at a time.

...

Definition of a B-Tree

Does a B-tree have a root node?
Yes, a B-tree has a root node.

What is the root node of a B-tree referred to as?
The root node of a B-tree is referred to as .

What are the three things that every node in a B-tree has?
The three things that every node in a B-tree has is:

  1. - The number of keys stored in .
  2. keys - .
  3. - A Boolean indicating if the node is a leaf.

What does each internal node also contain?
Each internal node also contains points () to its children.

What are the values of a leaf node?
The values of a leaf node are null or undefined.

What do the keys do?
The keys separate the ranges of keys stored in each subtree.

...

What is the depth of all leaves in a B-tree?
The depth of all leaves in a B-tree is the height of the tree.

What can the value of the minimum degree of a B-tree be?
The value of the minimum degree of a B-tree can be .

At least how many keys must every node other than the root node have?
Every node other than the root node must have at least keys.

At most how many keys can every node have?
Every node can have at most keys.

At least how many children must every internal node other than the root node have?
Every internal node other than the root node must have at least children.

At most how many children can every internal node have?
Every internal node can have at most children.

At least how many keys must the root node have if the B-tree is non-empty.
If the B-tree is non-empty, the root node must have at least 1 key.

From Gemini

An internal node is any node that isn't a leaf or the root node. These nodes contain keys and pointers to child nodes. The keys act as separators for the ranges of keys stored in the subtrees root at its child nodes. In essence, internal nodes guide the search process by directing it to the appropriate subtree based on the target key.

When is a node considered full?
A node is considered full when it contains exactly keys.

Summary

The minimum degree is a value which determines the minimum and maximum number of children and keys that the nodes of a B-tree can have.

Number of keys:

  • All nodes other than the root node - From to keys.
  • The root node - At least 1 key.

Number of children:

  • All internal nodes - From to children.

What is the minimum degree of the simplest B-tree?
The minimum degree of the simplest B-tree is .

How many children can each node of a B-tree have with a minimum degree of 2?
With a minimum degree of 2, each node of a B-tree can have 2, 3, or 4 children.

What is the name of a B-tree that has a minimum degree of 2?
The name of a B-tree that has a minimum degree of 2 is a 2-3-4 tree.

What is the height of an -key B-tree of height and minimum degree if ?
The height of an -key B-tree of height and minimum degree if is .

What is an example of a B-tree?
An example of a B-tree is:
B-Tree Example.png

...