Skip to content

Utility Functions

The following are main utility functions to create archive.


This function is used to write the object to the file.


Name Type Description Default
file str | BytesIO

a string representing the file path


the object to be written to the file


additional keyword arguments to be passed to the LazyWriter



Type Description


Source code in src/msglc/
def dump(file: str | BytesIO, obj, **kwargs):
    This function is used to write the object to the file.

    :param file: a string representing the file path
    :param obj: the object to be written to the file
    :param kwargs: additional keyword arguments to be passed to the `LazyWriter`
    :return: None
    with LazyWriter(file, **kwargs) as msglc_writer:


This function is used to combine the multiple serialized files into a single archive.


Name Type Description Default
archive str | BytesIO

a string representing the file path of the archive

files FileInfo | list[FileInfo]

a list of FileInfo objects

mode Literal['a', 'w']

a string representing the combination mode, 'w' for write and 'a' for append

validate bool

switch on to validate the files before combining



Type Description


Source code in src/msglc/
def combine(
    archive: str | BytesIO,
    files: FileInfo | list[FileInfo],
    mode: Literal["a", "w"] = "w",
    validate: bool = True,
    This function is used to combine the multiple serialized files into a single archive.

    :param archive: a string representing the file path of the archive
    :param files: a list of FileInfo objects
    :param mode: a string representing the combination mode, 'w' for write and 'a' for append
    :param validate: switch on to validate the files before combining
    :return: None
    if isinstance(files, FileInfo):
        files = [files]

    if 0 < sum(1 for file in files if is not None) < len(files):
        raise ValueError("Files must either all have names or all not have names.")

    if len(all_names := { for file in files}) != len(files) and (
        len(all_names) != 1 or all_names.pop() is not None
        raise ValueError("Files must have unique names.")

    def _validate(_fp):
        if isinstance(_fp, str):
            if not os.path.exists(_fp):
                raise ValueError(f"File {_fp} does not exist.")
            with open(_fp, "rb") as _file:
                if != LazyWriter.magic:
                    raise ValueError(f"Invalid file format: {_fp}.")
            ini_pos = _fp.tell()
            magic =
            if magic != LazyWriter.magic:
                raise ValueError("Invalid file format.")

    if validate:
        for file in files:

    def _iter(path: str | BinaryIO):
        if isinstance(path, str):
            with open(path, "rb") as _file:
                while _data :=
                    yield _data
            while _data :=
                yield _data

    with LazyCombiner(archive, mode=mode) as combiner:
        for file in files:


This function is used to append the multiple serialized files to an existing single archive.


Name Type Description Default
archive str | BytesIO

a string representing the file path of the archive

files FileInfo | list[FileInfo]

a list of FileInfo objects

validate bool

switch on to validate the files before combining



Type Description


Source code in src/msglc/
def append(
    archive: str | BytesIO, files: FileInfo | list[FileInfo], *, validate: bool = True
    This function is used to append the multiple serialized files to an existing single archive.

    :param archive: a string representing the file path of the archive
    :param files: a list of FileInfo objects
    :param validate: switch on to validate the files before combining
    :return: None
    combine(archive, files, mode="a", validate=validate)


This function is used to configure the settings. It accepts any number of keyword arguments. The function updates the values of the configuration parameters if they are provided in the arguments.


Name Type Description Default
small_obj_optimization_threshold int | None

The threshold (in bytes) for small object optimization. Objects smaller than this threshold are not indexed.

write_buffer_size int | None

The size (in bytes) for the write buffer.

read_buffer_size int | None

The size (in bytes) for the read buffer.

fast_loading bool | None

Flag to enable or disable fast loading. If enabled, the container will be read in one go, instead of reading each child separately.

fast_loading_threshold int | float | None

The threshold (0 to 1) for fast loading. With the fast loading flag turned on, fast loading will be performed if the number of already read children over the total number of children is smaller than this threshold.

trivial_size int | None

The size (in bytes) considered trivial, around a dozen bytes. Objects smaller than this size are considered trivial. For a list of trivial objects, the container will be indexed in a blocked fashion.

disable_gc bool | None

Flag to enable or disable garbage collection.

simple_repr bool | None

Flag to enable or disable simple representation used in the repr method. If turned on, repr will not incur any disk I/O.

copy_chunk_size int | None

The size (in bytes) for the copy chunk.

numpy_encoder bool | None

Flag to enable or disable the numpy support. If enabled, the numpy arrays will be encoded using the dumps method provided by numpy. The arrays are stored as binary data directly. If disabled, the numpy arrays will be converted to lists before encoding.

magic bytes | None

Magic bytes (max length: 30) to set, used to identify the file format version.

Source code in src/msglc/
def configure(
    small_obj_optimization_threshold: int | None = None,
    write_buffer_size: int | None = None,
    read_buffer_size: int | None = None,
    fast_loading: bool | None = None,
    fast_loading_threshold: int | float | None = None,
    trivial_size: int | None = None,
    disable_gc: bool | None = None,
    simple_repr: bool | None = None,
    copy_chunk_size: int | None = None,
    numpy_encoder: bool | None = None,
    magic: bytes | None = None,
    This function is used to configure the settings. It accepts any number of keyword arguments.
    The function updates the values of the configuration parameters if they are provided in the arguments.

    :param small_obj_optimization_threshold:
            The threshold (in bytes) for small object optimization.
            Objects smaller than this threshold are not indexed.
    :param write_buffer_size:
            The size (in bytes) for the write buffer.
    :param read_buffer_size:
            The size (in bytes) for the read buffer.
    :param fast_loading:
            Flag to enable or disable fast loading.
            If enabled, the container will be read in one go, instead of reading each child separately.
    :param fast_loading_threshold:
            The threshold (0 to 1) for fast loading.
            With the fast loading flag turned on, fast loading will be performed if the number of
            already read children over the total number of children is smaller than this threshold.
    :param trivial_size:
            The size (in bytes) considered trivial, around a dozen bytes.
            Objects smaller than this size are considered trivial.
            For a list of trivial objects, the container will be indexed in a blocked fashion.
    :param disable_gc:
            Flag to enable or disable garbage collection.
    :param simple_repr:
            Flag to enable or disable simple representation used in the __repr__ method.
            If turned on, __repr__ will not incur any disk I/O.
    :param copy_chunk_size:
            The size (in bytes) for the copy chunk.
    :param numpy_encoder:
            Flag to enable or disable the `numpy` support.
            If enabled, the `numpy` arrays will be encoded using the `dumps` method provided by `numpy`.
            The arrays are stored as binary data directly.
            If disabled, the `numpy` arrays will be converted to lists before encoding.
    :param magic:
            Magic bytes (max length: 30) to set, used to identify the file format version.
    if (
        isinstance(small_obj_optimization_threshold, int)
        and small_obj_optimization_threshold > 0
        config.small_obj_optimization_threshold = small_obj_optimization_threshold
        if config.trivial_size > config.small_obj_optimization_threshold:
            config.trivial_size = config.small_obj_optimization_threshold

    if isinstance(write_buffer_size, int) and write_buffer_size > 0:
        config.write_buffer_size = write_buffer_size

    if isinstance(read_buffer_size, int) and read_buffer_size > 0:
        config.read_buffer_size = read_buffer_size

    if isinstance(fast_loading, bool):
        config.fast_loading = fast_loading

    if (
        isinstance(fast_loading_threshold, (int, float))
        and 0 <= fast_loading_threshold <= 1
        config.fast_loading_threshold = fast_loading_threshold

    if isinstance(trivial_size, int) and trivial_size > 0:
        config.trivial_size = trivial_size
        if config.trivial_size > config.small_obj_optimization_threshold:
            config.small_obj_optimization_threshold = config.trivial_size

    if isinstance(disable_gc, bool):
        config.disable_gc = disable_gc

    if isinstance(simple_repr, bool):
        config.simple_repr = simple_repr

    if isinstance(copy_chunk_size, int) and copy_chunk_size > 0:
        config.copy_chunk_size = copy_chunk_size

    if isinstance(numpy_encoder, bool):
        config.numpy_encoder = numpy_encoder

    if isinstance(magic, bytes) and 0 < len(magic) <= max_magic_len:
        from msglc import LazyWriter
