Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ else()
include(FetchContent)
FetchContent_Declare(blosc2
GIT_REPOSITORY https://github.com/Blosc/c-blosc2
GIT_TAG 1386ef42f58b61c876edf714a2af84bd7b59dc5d # v2.23.1
GIT_TAG 25197eb96d05318c939b3252a6b373ccd6ae49fe # variable-length chunks support in schunks
# SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../c-blosc2
)
FetchContent_MakeAvailable(blosc2)
include_directories("${blosc2_SOURCE_DIR}/include")
Expand Down
1 change: 1 addition & 0 deletions doc/getting_started/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Tutorials
tutorials/08.schunk-slicing_and_beyond
tutorials/09.ucodecs-ufilters
tutorials/10.prefilters
tutorials/11.vlarray
325 changes: 325 additions & 0 deletions doc/getting_started/tutorials/11.vlarray.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,325 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with VLArray\n",
"\n",
"A `VLArray` is a list-like container for variable-length Python values backed by a single `SChunk`. Each entry is stored in its own compressed chunk, and values are serialized with msgpack before reaching storage.\n",
"\n",
"This makes `VLArray` a good fit for heterogeneous, variable-length payloads such as small dictionaries, strings, tuples, byte blobs, or nested list/dict structures."
],
"id": "ceb4789a488cc07f"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.563663Z",
"start_time": "2026-03-14T16:57:57.294290Z"
}
},
"source": [
"import blosc2\n",
"\n",
"\n",
"def show(label, value):\n",
" print(f\"{label}: {value}\")\n",
"\n",
"\n",
"urlpath = \"vlarray_tutorial.b2frame\"\n",
"copy_path = \"vlarray_tutorial_copy.b2frame\"\n",
"blosc2.remove_urlpath(urlpath)\n",
"blosc2.remove_urlpath(copy_path)"
],
"id": "f264f2e4bcb57029",
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating and populating a VLArray\n",
"\n",
"Entries can be appended one by one or in batches with `extend()`. The container accepts the msgpack-safe Python types supported by the implementation: `bytes`, `str`, `int`, `float`, `bool`, `None`, `list`, `tuple`, and `dict`."
],
"id": "24ceae332dfa437"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.609603Z",
"start_time": "2026-03-14T16:57:57.569987Z"
}
},
"source": [
"vla = blosc2.VLArray(urlpath=urlpath, mode=\"w\")\n",
"vla.append({\"name\": \"alpha\", \"count\": 1})\n",
"vla.extend([b\"bytes\", (\"a\", 2), [\"x\", \"y\"], 42, None])\n",
"vla.insert(1, \"between\")\n",
"\n",
"show(\"Initial entries\", list(vla))\n",
"show(\"Length\", len(vla))"
],
"id": "10e4e9ce600cda9d",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initial entries: [{'name': 'alpha', 'count': 1}, 'between', b'bytes', ('a', 2), ['x', 'y'], 42, None]\n",
"Length: 7\n"
]
}
],
"execution_count": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing and slicing\n",
"\n",
"Indexing behaves like a Python list. Negative indexes are supported, and slice reads return a plain Python list."
],
"id": "2f2dbe81b7653d8f"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.677796Z",
"start_time": "2026-03-14T16:57:57.623048Z"
}
},
"source": [
"show(\"Last entry\", vla[-1])\n",
"show(\"Slice [1:6:2]\", vla[1:6:2])\n",
"show(\"Reverse slice\", vla[::-2])"
],
"id": "82ea38dca631efb9",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last entry: None\n",
"Slice [1:6:2]: ['between', ('a', 2), 42]\n",
"Reverse slice: [None, ['x', 'y'], b'bytes', {'name': 'alpha', 'count': 1}]\n"
]
}
],
"execution_count": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Updating, inserting, and deleting\n",
"\n",
"Single entries can be overwritten by index. Slice assignment follows Python list rules: slices with `step == 1` may resize the container, while extended slices require matching lengths."
],
"id": "a871bb9b21d6f36c"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.727569Z",
"start_time": "2026-03-14T16:57:57.678936Z"
}
},
"source": [
"vla[2:5] = [\"replaced\", {\"nested\": True}]\n",
"show(\"After slice replacement\", list(vla))\n",
"\n",
"vla[::2] = [\"even-0\", \"even-1\", \"even-2\"]\n",
"show(\"After extended-slice update\", list(vla))\n",
"\n",
"del vla[1::3]\n",
"show(\"After slice deletion\", list(vla))\n",
"\n",
"removed = vla.pop()\n",
"show(\"Popped entry\", removed)\n",
"show(\"After pop\", list(vla))"
],
"id": "e22e4f90499ae02",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"After slice replacement: [{'name': 'alpha', 'count': 1}, 'between', 'replaced', {'nested': True}, 42, None]\n",
"After extended-slice update: ['even-0', 'between', 'even-1', {'nested': True}, 'even-2', None]\n",
"After slice deletion: ['even-0', 'even-1', {'nested': True}, None]\n",
"Popped entry: None\n",
"After pop: ['even-0', 'even-1', {'nested': True}]\n"
]
}
],
"execution_count": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Copying with new storage or compression parameters\n",
"\n",
"The `copy()` method can duplicate the container into a different storage layout or with different compression settings."
],
"id": "f41af458cb5faa9f"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.747309Z",
"start_time": "2026-03-14T16:57:57.730015Z"
}
},
"source": [
"vla_copy = vla.copy(\n",
" urlpath=copy_path,\n",
" contiguous=False,\n",
" cparams={\"codec\": blosc2.Codec.LZ4, \"clevel\": 5},\n",
")\n",
"\n",
"show(\"Copied entries\", list(vla_copy))\n",
"show(\"Copy storage is contiguous\", vla_copy.schunk.contiguous)\n",
"show(\"Copy codec\", vla_copy.cparams.codec)"
],
"id": "6e752260e010272e",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Copied entries: ['even-0', 'even-1', {'nested': True}]\n",
"Copy storage is contiguous: False\n",
"Copy codec: Codec.LZ4\n"
]
}
],
"execution_count": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Round-tripping through cframes and reopening from disk\n",
"\n",
"Tagged persistent stores automatically reopen as `VLArray`, and a serialized cframe buffer does too."
],
"id": "bb576497d4b6f537"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.759998Z",
"start_time": "2026-03-14T16:57:57.748296Z"
}
},
"source": [
"cframe = vla.to_cframe()\n",
"restored = blosc2.from_cframe(cframe)\n",
"show(\"from_cframe type\", type(restored).__name__)\n",
"show(\"from_cframe entries\", list(restored))\n",
"\n",
"reopened = blosc2.open(urlpath, mode=\"r\", mmap_mode=\"r\")\n",
"show(\"Reopened type\", type(reopened).__name__)\n",
"show(\"Reopened entries\", list(reopened))"
],
"id": "42d59dccf6ea9c44",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"from_cframe type: VLArray\n",
"from_cframe entries: ['even-0', 'even-1', {'nested': True}]\n",
"Reopened type: VLArray\n",
"Reopened entries: ['even-0', 'even-1', {'nested': True}]\n"
]
}
],
"execution_count": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clearing and reusing a container\n",
"\n",
"Calling `clear()` resets the backing storage so the container remains ready for new variable-length entries."
],
"id": "53778312cc1a03bc"
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.778160Z",
"start_time": "2026-03-14T16:57:57.761236Z"
}
},
"source": [
"scratch = vla.copy()\n",
"scratch.clear()\n",
"scratch.extend([\"fresh\", 123, {\"done\": True}])\n",
"show(\"After clear + extend on in-memory copy\", list(scratch))\n",
"\n",
"blosc2.remove_urlpath(urlpath)\n",
"blosc2.remove_urlpath(copy_path)"
],
"id": "55b9ea793a41f38a",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"After clear + extend on in-memory copy: ['fresh', 123, {'done': True}]\n"
]
}
],
"execution_count": 7
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2026-03-14T16:57:57.789994Z",
"start_time": "2026-03-14T16:57:57.779434Z"
}
},
"cell_type": "code",
"source": "",
"id": "34e77790ab2a0f94",
"outputs": [],
"execution_count": 7
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions doc/reference/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Main Classes
DictStore
TreeStore
EmbedStore
VLArray
Proxy
ProxySource
ProxyNDSource
Expand All @@ -33,6 +34,7 @@ Main Classes
dict_store
tree_store
embed_store
vlarray
proxy
proxysource
proxyndsource
Expand Down
2 changes: 2 additions & 0 deletions doc/reference/misc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ This page documents the miscellaneous members of the ``blosc2`` module that do n
TreeStore,
DictStore,
EmbedStore,
VLArray,
vlarray_from_cframe,
abs,
acos,
acosh,
Expand Down
Loading
Loading