Utility Functions
The following are main utility functions to create archive.
FileInfo
Wrap the file path or in memory buffer and name into a FileInfo object.
The name is optional and is only used when the file is combined in the dictionary (key-value) mode.
The s3fs can be different for each FileInfo object, meaning it is possible to combine files from different sources.
It is not affected by the global s3fs object stored in config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | BinaryIO
|
a string representing the file path or an in memory buffer |
required |
name
|
str | None
|
key name of the content in the combined dict |
None
|
s3fs
|
s3fs object (s3fs.S3FileSystem) to read the object from |
None
|
Source code in src/msglc/__init__.py
dump
This function is used to write the object to the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
str | BytesIO
|
a string representing the file path |
required |
obj
|
the object to be written to the file |
required | |
kwargs
|
additional keyword arguments to be passed to the |
{}
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in src/msglc/__init__.py
combine
This function is used to combine the multiple serialized files into a single archive.
If s3fs is given, the combined archive will be uploaded to S3.
The files to be combined must exist in local filesystem regardless of whether s3fs is given.
In other words, only local files can be combined.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
archive
|
str | BytesIO
|
a string representing the file path of the archive |
required |
files
|
FileInfo | list[FileInfo]
|
a list of FileInfo objects |
required |
mode
|
Literal['a', 'w']
|
a string representing the combination mode, 'w' for write and 'a' for append |
'w'
|
validate
|
bool
|
switch on to validate the files before combining |
True
|
s3fs
|
s3fs object (s3fs.S3FileSystem) to be used for storing |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in src/msglc/__init__.py
append
This function is used to append the multiple serialized files to an existing single archive.
If s3fs is given, the target will be downloaded first if it exists in the bucket.
The final archive will be uploaded to S3.
The files to be appended must exist in local filesystem regardless of whether s3fs is given.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
archive
|
str | BytesIO
|
a string representing the file path of the archive |
required |
files
|
FileInfo | list[FileInfo]
|
a list of FileInfo objects |
required |
validate
|
bool
|
switch on to validate the files before combining |
True
|
s3fs
|
s3fs object (s3fs.S3FileSystem) to be used for storing |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in src/msglc/__init__.py
configure
This function is used to configure the settings. It accepts any number of keyword arguments. The function updates the values of the configuration parameters if they are provided in the arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
small_obj_optimization_threshold
|
int | None
|
The threshold (in bytes) for small object optimization. Objects smaller than this threshold are not indexed. |
None
|
write_buffer_size
|
int | None
|
The size (in bytes) for the write buffer. |
None
|
read_buffer_size
|
int | None
|
The size (in bytes) for the read buffer. |
None
|
fast_loading
|
bool | None
|
Flag to enable or disable fast loading. If enabled, the container will be read in one go, instead of reading each child separately. |
None
|
fast_loading_threshold
|
int | float | None
|
The threshold (0 to 1) for fast loading. With the fast loading flag turned on, fast loading will be performed if the number of already read children over the total number of children is smaller than this threshold. |
None
|
trivial_size
|
int | None
|
The size (in bytes) considered trivial, around a dozen bytes. Objects smaller than this size are considered trivial. For a list of trivial objects, the container will be indexed in a blocked fashion. |
None
|
disable_gc
|
bool | None
|
Flag to enable or disable garbage collection. |
None
|
simple_repr
|
bool | None
|
Flag to enable or disable simple representation used in the repr method. If turned on, repr will not incur any disk I/O. |
None
|
copy_chunk_size
|
int | None
|
The size (in bytes) for the copy chunk. |
None
|
numpy_encoder
|
bool | None
|
Flag to enable or disable the |
None
|
numpy_fast_int_pack
|
bool | None
|
If enabled, the integer numpy array will be packed assigning each element has identical size (4 or 8 bytes). This improves the performance of packing by avoiding the overhead of checking the size of each element. However, depending on the backend, for example, |
None
|
magic
|
bytes | None
|
Magic bytes (max length: 30) to set, used to identify the file format version. |
None
|
s3fs
|
The global |
None
|
Source code in src/msglc/config.py
| Python | |
|---|---|
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |