Examples

Serialization

Dumping one object to a file

Use dump to serialize a json object to a file.

Python
from msglc import dump

data = {"a": [1, 2, 3], "b": {"c": 4, "d": 5, "e": [0x221548313] * 10}}
dump("data.msg", data)

Combining several files

Use combine to combine several serialized files together. The combined files can be further combined.

Combine as `dict`

Python
from msglc import dump, combine, FileInfo
from msglc.reader import LazyReader

dump("dict.msg", {str(v): v for v in range(1000)})
dump("list.msg", [float(v) for v in range(1000)])

combine("combined.msg", [FileInfo("dict.msg", "dict"), FileInfo("list.msg", "list")])
# support recursively combining files
# ...

# the combined file uses a dict layout
# { 'dict' : {'1':1,'2':2,...}, 'list' : [1.0,2.0,3.0,...] }
# so one can read it as follows, details in coming section
with LazyReader("combined.msg") as reader:
    assert reader['dict/101'] == 101  # also reader['dict'][101]
    assert reader['list/101'] == 101.0  # also reader['list'][101]

Combine as `list`

Python
from msglc import dump, combine, FileInfo
from msglc.reader import LazyReader

dump("dict.msg", {str(v): v for v in range(1000)})
dump("list.msg", [float(v) for v in range(1000)])

combine("combined.msg", [FileInfo("dict.msg"), FileInfo("list.msg")])
# support recursively combining files
# ...

# the combined file uses a list layout
# [ {'1':1,'2':2,...}, [1.0,2.0,3.0,...] ]
# so one can read it as follows, details in coming section
with LazyReader("combined.msg") as reader:
    assert reader['0/101'] == 101  # also reader[0][101]
    assert reader['1/101'] == 101.0  # also reader[1][101]

Deserialization

Use LazyReader to read a file.

Python
from msglc.reader import LazyReader, to_obj

with LazyReader("data.msg") as reader:
    data = reader.read()  # return a LazyDict, LazyList, dict, list or primitive value
    data = reader["b/c"]  # subscriptable if the actual data is subscriptable
    # data = reader[2:]  # also support slicing if its underlying data is list compatible
    data = reader.read("b/c")  # or provide a path to visit a particular node
    print(data)  # 4
    b_dict = reader.read("b")
    print(b_dict.__class__)  # <class 'msglc.reader.LazyDict'>
    for k, v in b_dict.items():  # dict compatible
        if k != "e":
            print(k, v)  # c 4, d 5
    b_json = to_obj(b_dict)  # ensure plain dict

Streaming Data

The data fed to the writer does not need to be fully generated in advance. It is possible to generate data on the fly.

The writer expects and recognizes collections.abc.Mapping objects. It is thus possible to fake a dictionary with items generated from generators.

The following is a minimum implementation.

Python
from collections.abc import Generator, Mapping


class DictStream(Mapping):
    def __init__(self, generator: Generator, length: int):
        self._len = length
        self._gen = generator

    def __iter__(self): ...  # not used by writer but has to be implemented

    def __getitem__(self, key, /): ...  # not used by writer but has to be implemented

    def __len__(self):
        # required
        # note that the length needs to be known in advance
        # you do not want to get it from the generator as doing so consumes it
        return self._len

    def items(self):
        # required
        yield from self._gen

length requirement

Only two things will be invoked: len() and .items(). Thus, __len__(self) and items(self) must be properly implemented. If the length is not known in advance, streaming data is not feasible.

With the above, one can do the following.

Python
from msglc import dump
from msglc.reader import LazyReader


def example():
    yield "a", 1
    yield "b", 2


target = "example.msg"
dump(target, DictSteam(example(), 2))

with LazyReader(target) as reader:
    assert reader == {"a": 1, "b": 2}