Skip to content

S3 Support

S3 storage is supported via the s3fs package. One shall install it first.

Bash
1
2
3
pip install s3fs
# or in one step
pip install msglc[s3fs]

Read

For LazyReader, there are two options to use s3fs.

S3File

Pass a S3File object to LazyReader directly.

Python
1
2
3
4
5
6
7
from s3fs import S3FileSystem
from msglc.reader import LazyReader

s3 = S3FileSystem()
with s3.open('my-bucket/my-file', mode='rb') as f:
    with LazyReader(f) as reader:
        ...

Path String + S3FileSystem

Pass a path string to the object with an explicit S3FileSystem.

Python
1
2
3
4
5
6
from s3fs import S3FileSystem
from msglc.reader import LazyReader

s3 = S3FileSystem()
with LazyReader('my-bucket/my-file', s3fs=s3) as reader:
    ...

Write, Combine, Append

For LazyWriter and LazyCombiner, passing in a S3File object is not feasible. Only the second option is supported.

Python
1
2
3
4
5
6
from s3fs import S3FileSystem
from msglc.writer import LazyWriter

s3 = S3FileSystem()
with LazyWriter('my-bucket/my-file', s3fs=s3) as writer:
    writer.write(...)

implementation details

The S3 storage does not support random seek in write mode. However, the TOC can only be updated after all data are serialized. Thus, it is impossible to directly operate on S3File objects in write mode. The implementation will create a local temporary file and write the byte stream to it. Once all contents are written and updated properly, the whole blob will be uploaded to the S3 storage.

minimum local storage requirement

Due to the above limitation, the local storage must have sufficient space to write the whole blob.

Local Demo

Here we show a local demo using rustfs with minimum configuration. The following command spins up a rustfs container in the background.

Bash
docker run -d --rm -p 9000:9000 rustfs/rustfs:latest

Now one can use the default credentials to connect. After successful connection, just write/read as normal.

Python
import s3fs
from msglc.writer import LazyWriter
from msglc.reader import LazyReader

fs = s3fs.S3FileSystem(
    key="rustfsadmin",
    secret="rustfsadmin",
    client_kwargs={"endpoint_url": "http://localhost:9000"},
)

bucket: str = f"test-bucket"

fs.mkdir(bucket)

object_name = f'{bucket}/example'

with LazyWriter(object_name, s3fs=fs) as writer:
    writer.write({'hello': 'world'})

with LazyReader(object_name, s3fs=fs) as reader:
    print(reader.to_obj())