B-trees were invented by Rudolf Bayer and Edward McCreight at the Boeing Research Labs. Their paper – “Organization and Maintenance or Large Order Indices,” containing details of these structures, was circulated in 1970.
It was used to manage the index pages, especially for large random-access files.
The basic assumptions regarding the b-tree in data structure were that the indexes were voluminous and that only a tiny chunk of the tree fits in the main memory.
Importance of choosing b-tree in data structure
This is a tree data structure that keeps data sorted, and this allows for the following operations in a logarithmic amortized time –
- Deletions, etc.
The main idea of using these is to reduce the number of disk accesses. It is also optimized for systems that read and write huge data blocks.
These trees are also known to be –
- These are a self-balancing structure that means that the performance can be guaranteed when these trees are
- These trees are broad and expand horizontally instead of vertically
- The height of these trees is kept low by putting the maximum possible keys in the node.
- Since the same is low, it reduces the disk access for maximum operations
- It is dependent on a positive constant integer (called minimum)
- The running time of the B-tree create O(1).
Inserting in a b-tree
Inserting in a b-tree means that you need to have a place to insert the new key, and the algorithm is used to split the allocated node root with the new root node leaf.
Advantages of choosing b-tree for your database
The b-tree uses all the ideas described above and further helps to –
- Keeps the key sorted for sequential traversing
- Using a partially full block to speed up insertion and deletion
- Using a hierarchical index to minimize the disk reads
- Keeps your index balance with a recursive algorithm
- The same also reduces waste by ensuring that each interior node is half full.
- Such a b-tree can handle any arbitrary number of insertion/deletions.
The rules for b-tree
- A b-tree can have more than one single element.
Each b-tree depends on the positive constant integer called Minimum. This is used to determine the appropriate number of elements in a single node.
- The root can have as low as one element.
In some instances, the root can have no elements, especially if it has no children. On the other hand, a node has at least minimum elements.
- The maximum number of elements is twice the value of a minimum
- The elements of each b-tree node are stored in a partially filled array which is sorted from the smallest element to the largest element (i.e., from index 0 to the final used position for the array)
- The number of subtrees below a non-leaf node is often one more than the total amount of elements in the node
- An element at the index i is greater than all the other elements of the subtree number I of the node
- An element at the index i is less than all the elements of its sub-tree number i+1 of the node
- Each leaf of the b-tree has a great depth. Thus it ensures that the same avoids issues associated with an unbalanced .tree
What makes b-tree so special?
The b-tree is a self-balancing search tree that goes beyond the limitations of the traditional self-balancing trees like AVL and Red-black trees. When the number of keys is high, the data is read from the disk in the form of blocks. Hence the disk access time is high compared to the main memory access time.
Therefore, the idea behind using such b-trees is to reduce the number of disk accesses, and the tree operations like the following require O(h) disk access ( where h is the height of the tree –
- Min, etc.
The same height is kept low by putting the maximum possible keys in the b-tree mode. The b-tree mode is kept equal to the disk block size. Since the height of the b-tree is low, it reduces the disk access for most of the operations significantly.
Significant properties of the b-tree
- All leaves are at the same height.
- Every node except the root must contain t-1 keys on the minimum
- The root also must have a minimum 1 key
- A b-tree is defined by the term minimum degree – t. The value of t depends on the disk block size
- All nodes (including the root) must have 2*t-1 keys at the max
- The number of children of a node is equal to the number of keys in it (plus 1)
Apart from this –
- All keys of a node are sorted in an increasing order
- The tree grows and shrinks from the root, which is unlike the Binary Search Tree
- Such a tree grows downwards and also shrinks downwards
- The child between two keys, k-1 and k-2, has all the keys in this range
- Similar to other balanced Binary search trees, the time complexity is O (log n) for completing a search, insert and delete operations
Reasons for using b-tree
The major reasons for choosing b-tree over other algorithms include –
When searching tables on disc, the cost of accessing the same is quite high. However, the same is unaffected by the amount of data transfer.
It is often tough to improve the height of a tree, so the logic is to maximize the available space and make the tree’s height as small as possible.
The solution is to use a b-tree since it has more branches and lesser height. Moreover, in a b-tree, one node can hold many items or elements. This causes a significant impact on the access time.
If you want to access a node of b-tree, you need one disc read operation.
A b-tree of order 101 and height 3 can hold up to 100 million items, and merely 3X3 disc reads can access these. These trees are called balanced stored trees or multiway search trees, as these allow for multilevel indexing.
Non-leaf nodes of such trees are called internal nodes with a child node, but leaf nodes with no child are called external nodes.
Any b-tree of order 3 can satisfy the following properties –
- The root has a maximum of one key to a maximum of 2 keys. Therefore, the root can have a minimum of 2 children to a maximum of 3 children.
- Any node (non-leaf) except the root has a minimum of 1-2 keys. As a result, these can have a minimum of 2 and a maximum of 3 children.
- Duplicate key values are not allowed.
B tree organizes and stores data in a specific manner. It’s able to do this because it balances its leaf nodes as it grows, and can store at least the same amount of data in its root node (single node on the right) as any of its leaf nodes. B trees are often used in databases because they provide high performance and reasonable capacity utilization, which is important when storage space is limited.