rekey

Create a new key file from a data file.

template<
    class Hasher,
    class File,
    class Progress,
    class... Args>
void
rekey(
    path_type const& dat_path,
    path_type const& key_path,
    path_type const& log_path,
    std::size_t blockSize,
    float loadFactor,
    std::uint64_t itemCount,
    std::size_t bufferSize,
    error_code& ec,
    Progress&& progress,
    Args&&... args);

This algorithm rebuilds a key file for the given data file. It works efficiently by iterating the data file multiple times. During the iteration, a contiguous block of the key file is rendered in memory, then flushed to disk when the iteration is complete. The size of this memory buffer is controlled by the bufferSize parameter, larger is better. The algorithm works the fastest when bufferSize is large enough to hold the entire key file in memory; only a single iteration of the data file is needed in this case.

During the rekey, spill records may be appended to the data file. If the rekey operation is abnormally terminated, this would normally result in a corrupted data file. To prevent this, the function creates a log file using the specified path so that the database can be fixed in a subsequent call to recover.

Remarks

If a log file is already present, this function will fail with error::log_file_exists.

Template Parameters

Hasher: The hash function to use. This type must meet the requirements of Hasher. The hash function must be the same as that used to create the database, or else an error is returned.
File: The type of file to use. This type must meet the requirements of File.

Parameters

dat_path

The path to the data file.

key_path

The path to the key file.

log_path

The path to the log file.

blockSize

The size of a key file block. Larger blocks hold more keys but require more I/O cycles per operation. The ideal block size the largest size that may be read in a single I/O cycle, and device dependent. The return value of block_size returns a suitable value for the volume of a given path.

loadFactor

A number between zero and one representing the average bucket occupancy (number of items). A value of 0.5 is perfect. Lower numbers waste space, and higher numbers produce negligible savings at the cost of increased I/O cycles.

itemCount

The number of items in the data file.

bufferSize

The number of bytes to allocate for the buffer.

ec

Set to the error if any occurred.

progress

A function which will be called periodically as the algorithm proceeds. The equivalent signature of the progress function must be:

void progress(
    std::uint64_t amount,   // Amount of work done so far
    std::uint64_t total     // Total amount of work to do
);

args

Optional arguments passed to File constructors.

Requirements

Header: nudb/rekey.hpp