pax_global_header 0000666 0000000 0000000 00000000064 15176357430 0014525 g ustar 00root root 0000000 0000000 52 comment=85890b3bb404fd1d401267c508a2694f5734559e
zarr-python-3.2.1/ 0000775 0000000 0000000 00000000000 15176357430 0014025 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/.git-blame-ignore-revs 0000664 0000000 0000000 00000000232 15176357430 0020122 0 ustar 00root root 0000000 0000000 # lint codebase with black and ruff
4e348d6b80c96da461fd866576c971b8a659ba15
# migrate from black to ruff format
22cea005629913208a85799372e045f353744add
zarr-python-3.2.1/.git_archival.txt 0000664 0000000 0000000 00000000201 15176357430 0017271 0 ustar 00root root 0000000 0000000 node: 85890b3bb404fd1d401267c508a2694f5734559e
node-date: 2026-05-05T08:14:16-04:00
describe-name: v3.2.1
ref-names: tag: v3.2.1
zarr-python-3.2.1/.gitattributes 0000664 0000000 0000000 00000000134 15176357430 0016716 0 ustar 00root root 0000000 0000000 *.py linguist-language=python
*.ipynb linguist-documentation
.git_archival.txt export-subst
zarr-python-3.2.1/.github/ 0000775 0000000 0000000 00000000000 15176357430 0015365 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/.github/CODEOWNERS 0000664 0000000 0000000 00000000070 15176357430 0016755 0 ustar 00root root 0000000 0000000 zarr/_storage/absstore.py @zarr-developers/azure-team
zarr-python-3.2.1/.github/CONTRIBUTING.md 0000664 0000000 0000000 00000000262 15176357430 0017616 0 ustar 00root root 0000000 0000000 Contributing
============
Please see the [project documentation](https://zarr.readthedocs.io/en/stable/developers/contributing.html) for information about contributing to Zarr.
zarr-python-3.2.1/.github/ISSUE_TEMPLATE/ 0000775 0000000 0000000 00000000000 15176357430 0017550 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/.github/ISSUE_TEMPLATE/bug_report.yml 0000664 0000000 0000000 00000005133 15176357430 0022445 0 ustar 00root root 0000000 0000000 name: Bug Report
description: Report incorrect behaviour in the library.
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
Please provide the following information.
- type: input
id: Zarr-version
attributes:
label: Zarr version
description: Value of ``zarr.__version__``
placeholder: v2.10.2, v2.11.3, v2.12.0, etc.
validations:
required: true
- type: input
id: Numcodecs-version
attributes:
label: Numcodecs version
description: Value of ``numcodecs.__version__``
placeholder: v0.8.1, v0.9.0, v0.10.0, etc.
validations:
required: true
- type: input
id: Python-version
attributes:
label: Python Version
description: Version of Python interpreter
placeholder: 3.10, 3.11, 3.12 etc.
validations:
required: true
- type: input
id: OS
attributes:
label: Operating System
description: Operating System
placeholder: (Linux/Windows/Mac)
validations:
required: true
- type: input
id: installation
attributes:
label: Installation
description: How was Zarr installed?
placeholder: e.g., "using pip into virtual environment", or "using conda"
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: Explain why the current behavior is a problem, what the expected output/behaviour is, and why the expected output/behaviour is a better solution.
validations:
required: true
- type: textarea
id: reproduce
attributes:
label: Steps to reproduce
description: Minimal, reproducible code sample. Must list dependencies in [inline script metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/#example). When put in a file named `issue.py` calling `uv run issue.py` should show the issue.
value: |
```python
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
# your reproducer code
# zarr.print_debug_info()
```
validations:
required: true
- type: textarea
id: additional-output
attributes:
label: Additional output
description: If you think it might be relevant, please provide the output from ``pip freeze`` or ``conda env export`` depending on which was used to install Zarr.
zarr-python-3.2.1/.github/ISSUE_TEMPLATE/config.yml 0000664 0000000 0000000 00000001222 15176357430 0021535 0 ustar 00root root 0000000 0000000 blank_issues_enabled: true
contact_links:
- name: Propose a new Zarr specification feature
url: https://github.com/zarr-developers/zarr-specs
about: A new feature for the Zarr storage specification should be opened on the zarr-specs repository.
- name: Discuss something on ZulipChat
url: https://ossci.zulipchat.com/
about: For questions like "How do I do X with Zarr?", consider posting your question to our developer chat.
- name: Discuss something on GitHub Discussions
url: https://github.com/zarr-developers/zarr-python/discussions
about: For questions like "How do I do X with Zarr?", you can move to GitHub Discussions.
zarr-python-3.2.1/.github/ISSUE_TEMPLATE/documentation.yml 0000664 0000000 0000000 00000001202 15176357430 0023137 0 ustar 00root root 0000000 0000000 name: Documentation Improvement
description: Report missing or wrong documentation. Alternatively, you can just open a pull request with the suggested change.
title: "DOC: "
labels: [documentation, help wanted]
body:
- type: textarea
attributes:
label: Describe the issue linked to the documentation
description: >
Please provide a description of what documentation you believe needs to be fixed/improved.
validations:
required: true
- type: textarea
attributes:
label: Suggested fix for documentation
description: >
Please explain the suggested fix and why it's better than the existing documentation.
zarr-python-3.2.1/.github/ISSUE_TEMPLATE/feature_request.yml 0000664 0000000 0000000 00000000511 15176357430 0023473 0 ustar 00root root 0000000 0000000 name: Feature Request
description: Request a new feature for zarr-python
# labels: []
body:
- type: textarea
attributes:
label: Describe the new feature you'd like
description: >
Please provide a description of what new feature or functionality you'd like to see in zarr-python.
validations:
required: true
zarr-python-3.2.1/.github/ISSUE_TEMPLATE/release-checklist.md 0000664 0000000 0000000 00000005610 15176357430 0023463 0 ustar 00root root 0000000 0000000 ---
name: Zarr-Python release checklist
about: Checklist for a new Zarr-Python release. [For project maintainers only!]
title: Release Zarr-Python vX.Y.Z
labels: release-checklist
assignees: ''
---
**Release**: [v3.x.x](https://github.com/zarr-developers/zarr-python/milestones/?)
**Scheduled Date**: 20YY/MM/DD
**Priority PRs/issues to complete prior to release**
- [ ] Priority pull request #X
**Before release**:
- [ ] Check [SPEC 0](https://scientific-python.org/specs/spec-0000/#support-window) to see if the minimum supported version of Python or NumPy needs bumping.
- [ ] Verify that the latest CI workflows on `main` are passing: [Tests](https://github.com/zarr-developers/zarr-python/actions/workflows/test.yml), [GPU Tests](https://github.com/zarr-developers/zarr-python/actions/workflows/gpu_test.yml), [Hypothesis](https://github.com/zarr-developers/zarr-python/actions/workflows/hypothesis.yaml), [Docs](https://github.com/zarr-developers/zarr-python/actions/workflows/docs.yml), [Lint](https://github.com/zarr-developers/zarr-python/actions/workflows/lint.yml), [Wheels](https://github.com/zarr-developers/zarr-python/actions/workflows/releases.yml).
- [ ] Run the ["Prepare release" workflow](https://github.com/zarr-developers/zarr-python/actions/workflows/prepare_release.yml) with the target version. This will build the changelog and open a release PR with the `run-downstream` label.
- [ ] Verify that the [downstream tests](https://github.com/zarr-developers/zarr-python/actions/workflows/downstream.yml) (triggered automatically by the `run-downstream` label) pass on the release PR.
- [ ] Review the release PR and verify the changelog in `docs/release-notes.md` looks correct.
- [ ] Merge the release PR.
**Release**:
- [ ] [Draft a new GitHub Release](https://github.com/zarr-developers/zarr-python/releases/new) with tag `vX.Y.Z` targeting `main`. Use "Generate release notes" for the description.
- [ ] Verify the release is published on [PyPI](https://pypi.org/project/zarr/) and [ReadTheDocs](https://zarr.readthedocs.io/en/stable/).
**After release**:
- [ ] Review and merge the pull request on the conda-forge [zarr-feedstock](https://github.com/conda-forge/zarr-feedstock) that will be automatically generated.
---
- [ ] Party :tada:
---
Releasing from a branch other than main
In rare cases (e.g. patch releases for an older minor version), you may need to release from a dedicated release branch (e.g. `3.1.x`):
- Create the release branch from the appropriate tag if it doesn't already exist.
- Cherry-pick or backport the necessary commits onto the branch.
- Run `towncrier build --version x.y.z` and commit the result to the release branch instead of `main`.
- When drafting the GitHub Release, set the target to the release branch instead of `main`.
- After the release, ensure any relevant changelog updates are also reflected on `main`.
zarr-python-3.2.1/.github/PULL_REQUEST_TEMPLATE.md 0000664 0000000 0000000 00000000557 15176357430 0021175 0 ustar 00root root 0000000 0000000 [Description of PR]
TODO:
* [ ] Add unit tests and/or doctests in docstrings
* [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
* [ ] New/modified features documented in `docs/user-guide/*.md`
* [ ] Changes documented as a new file in `changes/`
* [ ] GitHub Actions have all passed
* [ ] Test coverage is 100% (Codecov passes)
zarr-python-3.2.1/.github/dependabot.yml 0000664 0000000 0000000 00000000723 15176357430 0020217 0 ustar 00root root 0000000 0000000 ---
version: 2
updates:
# Updates for main
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
groups:
actions:
patterns:
- "*"
cooldown:
default-days: 7
- package-ecosystem: "github-actions"
directory: "/"
target-branch: "support/v2"
schedule:
interval: "weekly"
groups:
actions:
patterns:
- "*"
cooldown:
default-days: 7
zarr-python-3.2.1/.github/labeler.yml 0000664 0000000 0000000 00000000143 15176357430 0017514 0 ustar 00root root 0000000 0000000 needs release notes:
- all:
- changed-files:
- all-globs-to-all-files: '!changes/*.md'
zarr-python-3.2.1/.github/workflows/ 0000775 0000000 0000000 00000000000 15176357430 0017422 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/.github/workflows/check_changelogs.yml 0000664 0000000 0000000 00000001202 15176357430 0023407 0 ustar 00root root 0000000 0000000 name: Check changelog entries
on:
pull_request:
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-changelogs:
name: Check changelog entries
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
persist-credentials: false
- name: Install uv
uses: astral-sh/setup-uv@61cb8a9741eeb8a550a1b8544337180c0fc8476b # v7.2.0
- name: Check changelog entries
run: uv run --no-sync python ci/check_changelog_entries.py
zarr-python-3.2.1/.github/workflows/codspeed.yml 0000664 0000000 0000000 00000002064 15176357430 0021735 0 ustar 00root root 0000000 0000000 name: CodSpeed Benchmarks
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9am UTC
pull_request:
types: [labeled]
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
benchmarks:
name: Run benchmarks
runs-on: codspeed-macro
if: |
github.event_name == 'schedule' ||
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'benchmark'))
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Run the benchmarks
uses: CodSpeedHQ/action@1c8ae4843586d3ba879736b7f6b7b0c990757fab # v4.12.1
with:
mode: walltime
run: hatch run test.py3.12-minimal:pytest tests/benchmarks --codspeed
zarr-python-3.2.1/.github/workflows/docs.yml 0000664 0000000 0000000 00000001371 15176357430 0021077 0 ustar 00root root 0000000 0000000 name: Docs
on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
docs:
name: Check docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb # v6.1.0
- run: uv sync --group docs
- run: uv run mkdocs build
env:
DISABLE_MKDOCS_2_WARNING: "true"
NO_MKDOCS_2_WARNING: "true"
- run: uv run python ci/check_unlinked_types.py
continue-on-error: true
zarr-python-3.2.1/.github/workflows/downstream.yml 0000664 0000000 0000000 00000006552 15176357430 0022340 0 ustar 00root root 0000000 0000000 name: Downstream
on:
workflow_dispatch:
pull_request:
types: [labeled]
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
xarray:
name: Xarray zarr backend tests
if: github.event_name == 'workflow_dispatch' || github.event.label.name == 'run-downstream'
runs-on: ubuntu-latest
steps:
- name: Check out zarr-python
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Check out xarray
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: pydata/xarray
path: xarray
persist-credentials: false
- name: Set up pixi
uses: prefix-dev/setup-pixi@19eac09b398e3d0c747adc7921926a6d802df4da # v0.8.8
with:
manifest-path: xarray/pixi.toml
- name: Install zarr-python from branch
working-directory: xarray
run: pixi run -e test-py313 -- pip install --no-deps ..
- name: Show versions
working-directory: xarray
run: |
pixi run -e test-py313 -- python -c "
import zarr; print(f'zarr {zarr.__version__}')
import xarray; print(f'xarray {xarray.__version__}')
"
- name: Run xarray zarr backend tests
working-directory: xarray
run: |
pixi run -e test-py313 -- python -m pytest -x --no-header -q \
xarray/tests/test_backends.py -k zarr \
xarray/tests/test_backends_api.py -k zarr \
xarray/tests/test_backends_datatree.py -k zarr
numcodecs:
name: numcodecs zarr3 codec tests
if: github.event_name == 'workflow_dispatch' || github.event.label.name == 'run-downstream'
runs-on: ubuntu-latest
steps:
- name: Check out zarr-python
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Check out numcodecs
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: zarr-developers/numcodecs
fetch-depth: 0
path: numcodecs
submodules: recursive
persist-credentials: false
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
- name: Install uv
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7
- name: Install numcodecs with test-zarr-main group
working-directory: numcodecs
run: |
uv venv
uv pip install --group dev
uv sync --group dev --group test-zarr-main
uv pip install --no-build-isolation -e .
- name: Override zarr-python with branch version
working-directory: numcodecs
run: uv pip install --no-deps ..
- name: Show versions
working-directory: numcodecs
run: |
uv run python -c "
import zarr; print(f'zarr {zarr.__version__}')
import numcodecs; print(f'numcodecs {numcodecs.__version__}')
"
- name: Run numcodecs zarr3 tests
working-directory: numcodecs
run: uv run python -m pytest -x --no-header -q tests/test_zarr3.py
zarr-python-3.2.1/.github/workflows/gpu_test.yml 0000664 0000000 0000000 00000004630 15176357430 0022002 0 ustar 00root root 0000000 0000000 # This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: GPU Test
on:
push:
branches: [ main, 3.1.x ]
pull_request:
branches: [ main, 3.1.x ]
workflow_dispatch:
env:
LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
name: py=${{ matrix.python-version }}
environment:
name: codecov-upload
deployment: false
runs-on: gpu-runner
strategy:
matrix:
python-version: ['3.12']
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # grab all branches and tags
persist-credentials: false
# - name: cuda-toolkit
# uses: Jimver/cuda-toolkit@v0.2.16
# id: cuda-toolkit
# with:
# cuda: '12.4.1'
- name: Set up CUDA
run: |
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6
echo "/usr/local/cuda/bin" >> $GITHUB_PATH
- name: GPU check
run: |
nvidia-smi
echo $PATH
echo $LD_LIBRARY_PATH
nvcc -V
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Set Up Hatch Env
env:
HATCH_ENV: gputest.py${{ matrix.python-version }}
run: |
hatch env create "$HATCH_ENV"
hatch env run -e "$HATCH_ENV" list-env
- name: Run Tests
env:
HATCH_ENV: gputest.py${{ matrix.python-version }}
run: |
hatch env run --env "$HATCH_ENV" run-coverage
- name: Upload coverage
uses: codecov/codecov-action@13ce06bfc6bbe3ecf90edbbf1bc32fe5978ca1d3 # v5.3.1
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: gpu
verbose: true # optional (default = false)
zarr-python-3.2.1/.github/workflows/hypothesis.yaml 0000664 0000000 0000000 00000007042 15176357430 0022510 0 ustar 00root root 0000000 0000000 name: Slow Hypothesis CI
on:
push:
branches: [main, 3.1.x]
pull_request:
branches: [main, 3.1.x]
types: [opened, reopened, synchronize, labeled]
schedule:
- cron: "0 0 * * *" # Daily “At 00:00” UTC
workflow_dispatch: # allows you to trigger manually
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
FORCE_COLOR: 3
jobs:
hypothesis:
name: Slow Hypothesis Tests
environment:
name: codecov-upload
deployment: false
runs-on: "ubuntu-latest"
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python-version: ['3.12']
dependency-set: ["optional"]
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Set HYPOTHESIS_PROFILE based on trigger
env:
EVENT_NAME: ${{ github.event_name }}
run: |
if [[ "$EVENT_NAME" == "schedule" || "$EVENT_NAME" == "workflow_dispatch" ]]; then
echo "HYPOTHESIS_PROFILE=nightly" >> $GITHUB_ENV
else
echo "HYPOTHESIS_PROFILE=ci" >> $GITHUB_ENV
fi
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Set Up Hatch Env
env:
HATCH_ENV: test.py${{ matrix.python-version }}-${{ matrix.dependency-set }}
run: |
hatch env create "$HATCH_ENV"
hatch env run -e "$HATCH_ENV" list-env
# https://github.com/actions/cache/blob/main/tips-and-workarounds.md#update-a-cache
- name: Restore cached hypothesis directory
id: restore-hypothesis-cache
uses: actions/cache/restore@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5.0.4
with:
path: .hypothesis/
key: cache-hypothesis-${{ runner.os }}-${{ github.run_id }}
restore-keys: |
cache-hypothesis-
- name: Run slow Hypothesis tests
if: success()
id: status
env:
HATCH_ENV: test.py${{ matrix.python-version }}-${{ matrix.dependency-set }}
run: |
echo "Using Hypothesis profile: $HYPOTHESIS_PROFILE"
hatch env run --env "$HATCH_ENV" run-hypothesis
# explicitly save the cache so it gets updated, also do this even if it fails.
- name: Save cached hypothesis directory
id: save-hypothesis-cache
if: always() && steps.status.outcome != 'skipped'
uses: actions/cache/save@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5.0.4
with:
path: .hypothesis/
key: cache-hypothesis-${{ runner.os }}-${{ github.run_id }}
- name: Upload coverage
uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: tests
verbose: true # optional (default = false)
- name: Generate and publish the report
if: |
failure()
&& steps.status.outcome == 'failure'
&& github.event_name == 'schedule'
&& github.repository_owner == 'zarr-developers'
uses: scientific-python/issue-from-pytest-log-action@8e905db353437cda1d6a773de245343fbfc940dd # v1.5.0
with:
log-path: output-${{ matrix.python-version }}-log.jsonl
issue-title: "Nightly Hypothesis tests failed"
issue-label: "topic-hypothesis"
zarr-python-3.2.1/.github/workflows/issue-metrics.yml 0000664 0000000 0000000 00000002651 15176357430 0022745 0 ustar 00root root 0000000 0000000 name: Monthly issue metrics
on:
workflow_dispatch:
schedule:
- cron: '3 2 1 * *'
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
name: issue metrics
runs-on: ubuntu-latest
permissions:
issues: write # Required to create the metrics report issue
pull-requests: read # Required to read PR metrics
steps:
- name: Get dates for last month
shell: bash
run: |
# Calculate the first day of the previous month
first_day=$(date -d "last month" +%Y-%m-01)
# Calculate the last day of the previous month
last_day=$(date -d "$first_day +1 month -1 day" +%Y-%m-%d)
#Set an environment variable with the date range
echo "$first_day..$last_day"
echo "last_month=$first_day..$last_day" >> "$GITHUB_ENV"
- name: Run issue-metrics tool
uses: github/issue-metrics@67526e7bd8100b870f10b1c120780a8375777b43 # v3.25.5
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SEARCH_QUERY: 'repo:zarr-developers/zarr-python is:issue created:${{ env.last_month }} -reason:"not planned"'
- name: Create issue
uses: peter-evans/create-issue-from-file@fca9117c27cdc29c6c4db3b86c48e4115a786710 # v6.0.0
with:
title: Monthly issue metrics report
token: ${{ secrets.GITHUB_TOKEN }}
content-filepath: ./issue_metrics.md
zarr-python-3.2.1/.github/workflows/lint.yml 0000664 0000000 0000000 00000001010 15176357430 0021103 0 ustar 00root root 0000000 0000000 name: Lint
on:
push:
branches: [main, 3.1.x]
pull_request:
branches: [main, 3.1.x]
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- uses: j178/prek-action@0bb87d7f00b0c99306c8bcb8b8beba1eb581c037 # v1.1.1
zarr-python-3.2.1/.github/workflows/needs_release_notes.yml 0000664 0000000 0000000 00000001637 15176357430 0024162 0 ustar 00root root 0000000 0000000 name: "Pull Request Labeler"
on:
# pull_request_target is needed to label PRs from forks.
# This workflow only runs actions/labeler (no code checkout), so it's safe.
pull_request_target: # zizmor: ignore[dangerous-triggers]
types: [opened, reopened, synchronize]
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
labeler:
name: Label pull request
if: ${{ github.event.pull_request.user.login != 'dependabot[bot]' && github.event.pull_request.user.login != 'pre-commit-ci[bot]' }}
permissions:
contents: read # Required to read label configuration
pull-requests: write # Required to add labels to PRs
runs-on: ubuntu-latest
steps:
- uses: actions/labeler@634933edcd8ababfe52f92936142cc22ac488b1b # v6.0.1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
sync-labels: true
zarr-python-3.2.1/.github/workflows/nightly_wheels.yml 0000664 0000000 0000000 00000002315 15176357430 0023173 0 ustar 00root root 0000000 0000000 name: Nightly Wheels
on:
schedule:
# Run nightly at 2 AM UTC
- cron: '0 2 * * *'
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build_and_upload_nightly:
name: Build and upload nightly wheels
environment:
name: nightly-wheel-upload
deployment: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
fetch-depth: 0
persist-credentials: false
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
name: Install Python
with:
python-version: '3.14'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Build wheel and sdist
run: hatch build
- name: Upload nightly wheels
uses: scientific-python/upload-nightly-action@5748273c71e2d8d3a61f3a11a16421c8954f9ecf
with:
artifacts_path: dist
anaconda_nightly_upload_token: ${{ secrets.ANACONDA_ORG_UPLOAD_TOKEN }}
zarr-python-3.2.1/.github/workflows/prepare_release.yml 0000664 0000000 0000000 00000005224 15176357430 0023306 0 ustar 00root root 0000000 0000000 name: Prepare release notes
on:
workflow_dispatch:
inputs:
version:
description: 'Release version notes (e.g. 3.2.0)'
required: true
type: string
target_branch:
description: 'Branch to target'
required: false
default: 'main'
type: string
permissions:
contents: write
pull-requests: write
jobs:
prepare:
name: Build changelog and open PR
runs-on: ubuntu-latest
steps:
- name: Validate inputs
run: |
if [[ ! "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+([-\.][a-zA-Z0-9]+)*$ ]]; then
echo "::error::Invalid version format: '$VERSION'"
exit 1
fi
if [[ ! "$TARGET_BRANCH" =~ ^[a-zA-Z0-9._/-]+$ ]]; then
echo "::error::Invalid branch name: '$TARGET_BRANCH'"
exit 1
fi
env:
VERSION: ${{ inputs.version }}
TARGET_BRANCH: ${{ inputs.target_branch }}
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.target_branch }}
fetch-depth: 0
persist-credentials: false
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.12'
- name: Install towncrier
run: pip install towncrier
- name: Build changelog
run: towncrier build --version "$VERSION" --yes
env:
VERSION: ${{ inputs.version }}
- name: Create pull request
uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8
with:
branch: release/v${{ inputs.version }}
base: ${{ inputs.target_branch }}
title: "Release v${{ inputs.version }}"
body: |
Automated release preparation for v${{ inputs.version }}.
This PR was generated by the "Prepare release" workflow. It includes:
- Rendered changelog via `towncrier build --version ${{ inputs.version }}`
- Removal of consumed changelog fragments from `changes/`
## Checklist
- [ ] Review the rendered changelog in `docs/release-notes.md`
- [ ] Downstream tests pass (see [downstream workflow](https://github.com/zarr-developers/zarr-python/actions/workflows/downstream.yml))
- [ ] Merge this PR, then [draft a GitHub Release](https://github.com/zarr-developers/zarr-python/releases/new) targeting `${{ inputs.target_branch }}` with tag `v${{ inputs.version }}`
commit-message: "chore: build changelog for v${{ inputs.version }}"
labels: run-downstream
delete-branch: true
zarr-python-3.2.1/.github/workflows/releases.yml 0000664 0000000 0000000 00000004456 15176357430 0021761 0 ustar 00root root 0000000 0000000 name: Wheels
on:
release:
types:
- published
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build_artifacts:
name: Build wheel on ubuntu-latest
runs-on: ubuntu-latest
strategy:
fail-fast: false
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
fetch-depth: 0
persist-credentials: false
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
name: Install Python
with:
python-version: '3.12'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Build wheel and sdist
run: hatch build
- uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
with:
name: releases
path: dist
test_dist_pypi:
name: Test distribution artifacts
needs: [build_artifacts]
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
with:
name: releases
path: dist
- name: test
run: |
ls
ls dist
upload_pypi:
name: Upload to PyPI
needs: [build_artifacts, test_dist_pypi]
runs-on: ubuntu-latest
if: github.event_name == 'release'
environment:
name: releases
url: https://pypi.org/p/zarr
permissions:
id-token: write # Required for OIDC trusted publishing to PyPI
attestations: write # Required for artifact attestation
artifact-metadata: write # Required for artifact attestation metadata
steps:
- uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
with:
name: releases
path: dist
- name: Generate artifact attestation
uses: actions/attest@59d89421af93a897026c735860bf21b6eb4f7b26 # v4.1.0
with:
subject-path: dist/*
- name: Publish package to PyPI
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0
zarr-python-3.2.1/.github/workflows/test.yml 0000664 0000000 0000000 00000013456 15176357430 0021135 0 ustar 00root root 0000000 0000000 # This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: Test
on:
push:
branches: [ main, 3.1.x ]
pull_request:
branches: [ main, 3.1.x ]
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
name: os=${{ matrix.os }}, py=${{ matrix.python-version }}, deps=${{ matrix.dependency-set }}
environment:
name: codecov-upload
deployment: false
defaults:
run:
shell: bash
strategy:
matrix:
python-version: ['3.12', '3.13', '3.14']
dependency-set: ["minimal", "optional"]
os: ["ubuntu-latest"]
include:
- python-version: '3.12'
dependency-set: 'optional'
os: 'macos-latest'
- python-version: '3.14'
dependency-set: 'optional'
os: 'macos-latest'
- python-version: '3.12'
dependency-set: 'optional'
os: 'windows-latest'
- python-version: '3.14'
dependency-set: 'optional'
os: 'windows-latest'
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # grab all branches and tags
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Set Up Hatch Env
env:
HATCH_ENV: test.py${{ matrix.python-version }}-${{ matrix.dependency-set }}
run: |
hatch env create "$HATCH_ENV"
hatch env run -e "$HATCH_ENV" list-env
- name: Run Tests
env:
HYPOTHESIS_PROFILE: ci
HATCH_ENV: test.py${{ matrix.python-version }}-${{ matrix.dependency-set }}
run: |
hatch env run --env "$HATCH_ENV" run-coverage
- name: Upload coverage
if: ${{ matrix.dependency-set == 'optional' && matrix.os == 'ubuntu-latest' }}
uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: tests
verbose: true # optional (default = false)
test-upstream-and-min-deps:
name: py=${{ matrix.python-version }}-${{ matrix.dependency-set }}
environment:
name: codecov-upload
deployment: false
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.12', "3.14"]
dependency-set: ["upstream", "min_deps"]
exclude:
- python-version: "3.14"
dependency-set: min_deps
- python-version: "3.12"
dependency-set: upstream
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Set Up Hatch Env
env:
HATCH_ENV: ${{ matrix.dependency-set }}
run: |
hatch env create "$HATCH_ENV"
hatch env run -e "$HATCH_ENV" list-env
- name: Run Tests
env:
HATCH_ENV: ${{ matrix.dependency-set }}
run: |
hatch env run --env "$HATCH_ENV" run-coverage
- name: Upload coverage
uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: tests
verbose: true # optional (default = false)
doctests:
name: doctests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # required for hatch version discovery, which is needed for numcodecs.zarr3
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Set Up Hatch Env
run: |
hatch run doctest:pip list
- name: Run Tests
run: |
hatch run doctest:test
benchmarks:
name: Benchmark smoke test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
cache: 'pip'
- name: Install Hatch
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
with:
version: '1.16.5'
- name: Run Benchmarks
run: |
hatch env run --env "test.py3.13-minimal" run-benchmark
test-complete:
name: Test complete
needs:
[
test,
test-upstream-and-min-deps,
doctests,
benchmarks
]
if: always()
runs-on: ubuntu-latest
steps:
- name: Check failure
if: |
contains(needs.*.result, 'failure') ||
contains(needs.*.result, 'cancelled')
run: exit 1
- name: Success
run: echo Success!
zarr-python-3.2.1/.github/workflows/zarr-metadata.yml 0000664 0000000 0000000 00000005333 15176357430 0022705 0 ustar 00root root 0000000 0000000 name: zarr-metadata
on:
push:
branches: [main]
paths:
- 'packages/zarr-metadata/**'
- '.github/workflows/zarr-metadata.yml'
pull_request:
paths:
- 'packages/zarr-metadata/**'
- '.github/workflows/zarr-metadata.yml'
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
name: pytest py=${{ matrix.python-version }}
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: packages/zarr-metadata
strategy:
fail-fast: false
matrix:
python-version: ['3.11', '3.12', '3.13', '3.14']
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
with:
enable-cache: true
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Sync test dependency group
run: uv sync --group test --python ${{ matrix.python-version }}
- name: Run pytest
run: uv run --group test pytest tests
ruff:
name: ruff
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: packages/zarr-metadata
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
- name: Run ruff
run: uvx ruff check .
pyright:
name: pyright
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: packages/zarr-metadata
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
with:
enable-cache: true
- name: Set up Python
run: uv python install 3.11
- name: Sync test dependency group
run: uv sync --group test --python 3.11
- name: Run pyright
run: uv run --group test --with pyright pyright src
zarr-metadata-complete:
name: zarr-metadata complete
needs: [test, ruff, pyright]
if: always()
runs-on: ubuntu-latest
steps:
- name: Check failure
if: |
contains(needs.*.result, 'failure') ||
contains(needs.*.result, 'cancelled')
run: exit 1
- name: Success
run: echo Success!
zarr-python-3.2.1/.github/workflows/zizmor.yml 0000664 0000000 0000000 00000001502 15176357430 0021475 0 ustar 00root root 0000000 0000000 name: GitHub Actions Security Analysis
on:
push:
branches: [main]
paths:
- '.github/workflows/**'
- '.github/actions/**'
pull_request:
branches: ["**"]
paths:
- '.github/workflows/**'
- '.github/actions/**'
workflow_dispatch:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
zizmor:
name: Run zizmor
runs-on: ubuntu-latest
permissions:
security-events: write # Required by zizmor-action to upload SARIF files
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Run zizmor
uses: zizmorcore/zizmor-action@71321a20a9ded102f6e9ce5718a2fcec2c4f70d8 # v0.5.2
zarr-python-3.2.1/.gitignore 0000664 0000000 0000000 00000002110 15176357430 0016007 0 ustar 00root root 0000000 0000000 # Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
.venv/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.coverage
.coverage.*
.cache
coverage.xml
*,cover
# Translations
*.mo
*.pot
# Django stuff:
*.log
# Documentation
site/
docs/_build/
docs/data
data
data.zip
# PyBuilder
target/
# PyCharm
.idea
# Jupyter
.ipynb_checkpoints/
# VCS versioning
src/zarr/_version.py
# emacs
*~
# VSCode
.vscode/
# test data
#*.zarr
#*.zip
#example*
#doesnotexist
#test_sync*
data/*
src/fixture/
fixture/
junit.xml
.DS_Store
tests/.hypothesis
.hypothesis/
zarr/version.py
zarr.egg-info/
# zarr-metadata package lockfile (a library, not an app)
packages/zarr-metadata/uv.lock
zarr-python-3.2.1/.pre-commit-config.yaml 0000664 0000000 0000000 00000004111 15176357430 0020303 0 ustar 00root root 0000000 0000000 ci:
autoupdate_commit_msg: "chore: update pre-commit hooks"
autoupdate_schedule: "monthly"
autofix_prs: false
skip: [] # pre-commit.ci only checks for updates, prek runs hooks locally
default_stages: [pre-commit, pre-push]
default_language_version:
python: python3.12
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.4
hooks:
- id: ruff-check
args: ["--fix", "--show-fixes"]
- id: ruff-format
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
args: ["-L", "fo,ihs,kake,te", "-S", "fixture"]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-yaml
exclude: mkdocs.yml
- id: trailing-whitespace
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.19.1
hooks:
- id: mypy
files: ^(src|tests)/
additional_dependencies:
# Package dependencies
- packaging
- donfig
- numcodecs
- google-crc32c>=1.5
- numpy==2.1 # https://github.com/zarr-developers/zarr-python/issues/3780 + https://github.com/zarr-developers/zarr-python/issues/3688
- typing_extensions
- universal-pathlib
- obstore>=0.5.1
# Tests
- pytest
- hypothesis
- s3fs
- repo: https://github.com/scientific-python/cookie
rev: 2026.03.02
hooks:
- id: sp-repo-review
- repo: https://github.com/numpy/numpydoc
rev: v1.10.0
hooks:
- id: numpydoc-validation
- repo: local
hooks:
- id: ban-lstrip-rstrip
name: ban lstrip/rstrip
language: pygrep
# Matches .lstrip() or .rstrip() where the string argument is 2+ characters.
entry: "\\.(lstrip|rstrip)\\([\"'][^\"']{2,}[\"']\\)"
types: [python]
files: ^(src|tests)/
- repo: https://github.com/zizmorcore/zizmor-pre-commit
rev: v1.23.1
hooks:
- id: zizmor
- repo: https://github.com/twisted/towncrier
rev: 25.8.0
hooks:
- id: towncrier-check
zarr-python-3.2.1/.pyup.yml 0000664 0000000 0000000 00000000513 15176357430 0015622 0 ustar 00root root 0000000 0000000 # pyup.io config file
# see https://pyup.io/docs/configuration/ for all available options
schedule: every month
requirements:
- requirements_dev_minimal.txt:
pin: True
update: all
- requirements_dev_numpy.txt:
pin: True
update: all
- requirements_dev_optional.txt:
pin: True
update: all
zarr-python-3.2.1/.readthedocs.yaml 0000664 0000000 0000000 00000000703 15176357430 0017254 0 ustar 00root root 0000000 0000000 version: 2
build:
os: ubuntu-22.04
tools:
python: "3.12"
jobs:
install:
- pip install --upgrade pip
- pip install .[remote] --group docs
pre_build:
- |
if [ "$READTHEDOCS_VERSION_TYPE" != "tag" ];
then
towncrier build --version Unreleased --yes;
fi
build:
html:
- mkdocs build --strict --site-dir $READTHEDOCS_OUTPUT/html
mkdocs:
configuration: mkdocs.yml
zarr-python-3.2.1/FUNDING.yml 0000664 0000000 0000000 00000000103 15176357430 0015634 0 ustar 00root root 0000000 0000000 github: [numfocus]
custom: ['https://numfocus.org/donate-to-zarr']
zarr-python-3.2.1/LICENSE.txt 0000664 0000000 0000000 00000002144 15176357430 0015651 0 ustar 00root root 0000000 0000000 The MIT License (MIT)
Copyright (c) 2015-2025 Zarr Developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
zarr-python-3.2.1/README.md 0000664 0000000 0000000 00000014510 15176357430 0015305 0 ustar 00root root 0000000 0000000
# Zarr
| Latest Release |
|
|
|
| Package Status |
|
| License |
|
| Build Status |
|
| Pre-commit Status |
|
| Coverage |
|
| Downloads |
|
| Developer Chat |
|
| Funding |
|
| Citation |
|
## What is it?
Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing. See the [documentation](https://zarr.readthedocs.io) for more information.
## Main Features
- [**Create**](https://zarr.readthedocs.io/en/stable/user-guide/arrays.html#creating-an-array) N-dimensional arrays with any NumPy `dtype`.
- [**Chunk arrays**](https://zarr.readthedocs.io/en/stable/user-guide/performance.html#chunk-optimizations) along any dimension.
- [**Compress**](https://zarr.readthedocs.io/en/stable/user-guide/arrays.html#compressors) and/or filter chunks using any NumCodecs codec.
- [**Store arrays**](https://zarr.readthedocs.io/en/stable/user-guide/storage.html) in memory, on disk, inside a zip file, on S3, etc...
- [**Read**](https://zarr.readthedocs.io/en/stable/user-guide/arrays.html#reading-and-writing-data) an array [**concurrently**](https://zarr.readthedocs.io/en/stable/user-guide/performance.html#parallel-computing-and-synchronization) from multiple threads or processes.
- [**Write**](https://zarr.readthedocs.io/en/stable/user-guide/arrays.html#reading-and-writing-data) to an array concurrently from multiple threads or processes.
- Organize arrays into hierarchies via [**groups**](https://zarr.readthedocs.io/en/stable/quickstart.html#hierarchical-groups).
## Where to get it
Zarr can be installed from PyPI using `pip`:
```bash
pip install zarr
```
or via `conda`:
```bash
conda install -c conda-forge zarr
```
For more details, including how to install from source, see the [installation documentation](https://zarr.readthedocs.io/en/stable/index.html#installation).
zarr-python-3.2.1/TEAM.md 0000664 0000000 0000000 00000001352 15176357430 0015076 0 ustar 00root root 0000000 0000000 ## Active core-developers
- @joshmoore (Josh Moore)
- @jni (Juan Nunez-Iglesias)
- @rabernat (Ryan Abernathey)
- @jhamman (Joe Hamman)
- @d-v-b (Davis Bennett)
- @jakirkham (jakirkham)
- @martindurant (Martin Durant)
- @normanrz (Norman Rzepka)
- @dstansby (David Stansby)
- @dcherian (Deepak Cherian)
- @TomAugspurger (Tom Augspurger)
- @maxrjones (Max Jones)
## Emeritus core-developers
- @alimanfoo (Alistair Miles)
- @shoyer (Stephan Hoyer)
- @ryan-williams (Ryan Williams)
- @jrbourbeau (James Bourbeau)
- @mzjp2 (Zain Patel)
- @grlee77 (Gregory Lee)
## Former core-developers
- @jeromekelleher (Jerome Kelleher)
- @tjcrone (Tim Crone)
- @funkey (Jan Funke)
- @shikharsg
- @Carreau (Matthias Bussonnier)
- @dazzag24
- @WardF (Ward Fisher)
zarr-python-3.2.1/bench/ 0000775 0000000 0000000 00000000000 15176357430 0015104 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/bench/compress_normal.py 0000664 0000000 0000000 00000001655 15176357430 0020670 0 ustar 00root root 0000000 0000000 import sys
import timeit
import blosc
import line_profiler
import numpy as np
import zarr
if __name__ == "__main__":
sys.path.insert(0, "..")
# setup
a = np.random.normal(2000, 1000, size=200000000).astype("u2")
z = zarr.empty_like(
a,
chunks=1000000,
compression="blosc",
compression_opts={"cname": "lz4", "clevel": 5, "shuffle": 2},
)
print(z)
print("*" * 79)
# time
t = timeit.repeat("z[:] = a", repeat=10, number=1, globals=globals())
print(t)
print(min(t))
print(z)
# profile
profile = line_profiler.LineProfiler(blosc.compress)
profile.run("z[:] = a")
profile.print_stats()
print("*" * 79)
# time
t = timeit.repeat("z[:]", repeat=10, number=1, globals=globals())
print(t)
print(min(t))
# profile
profile = line_profiler.LineProfiler(blosc.decompress)
profile.run("z[:]")
profile.print_stats()
zarr-python-3.2.1/bench/compress_normal.txt 0000664 0000000 0000000 00000025234 15176357430 0021056 0 ustar 00root root 0000000 0000000 zarr.core.Array((200000000,), uint16, chunks=(1000000,), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 2}
nbytes: 381.5M; nbytes_stored: 294; ratio: 1360544.2; initialized: 0/200
store: builtins.dict
*******************************************************************************
[0.27119584499996563, 0.2855067059999783, 0.2887747180002407, 0.3058794240005227, 0.3139041080003153, 0.3021271820007314, 0.31543190899992624, 0.31403100900024583, 0.3272544129995367, 0.31834129100025166]
0.27119584499996563
zarr.core.Array((200000000,), uint16, chunks=(1000000,), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 2}
nbytes: 381.5M; nbytes_stored: 314.1M; ratio: 1.2; initialized: 200/200
store: builtins.dict
Timer unit: 1e-06 s
Total time: 0.297223 s
File: /home/aliman/code/github/alimanfoo/zarr/zarr/blosc.pyx
Function: compress at line 137
Line # Hits Time Per Hit % Time Line Contents
==============================================================
137 def compress(source, char* cname, int clevel, int shuffle):
138 """Compress data in a numpy array.
139
140 Parameters
141 ----------
142 source : array-like
143 Data to be compressed.
144 cname : bytes
145 Name of compression library to use.
146 clevel : int
147 Compression level.
148 shuffle : int
149 Shuffle filter.
150
151 Returns
152 -------
153 dest : bytes-like
154 Compressed data.
155
156 """
157
158 cdef:
159 char *source_ptr
160 char *dest_ptr
161 Py_buffer source_buffer
162 size_t nbytes, cbytes, itemsize
163 200 506 2.5 0.2 array.array char_array_template = array.array('b', [])
164 array.array dest
165
166 # setup source buffer
167 200 458 2.3 0.2 PyObject_GetBuffer(source, &source_buffer, PyBUF_ANY_CONTIGUOUS)
168 200 119 0.6 0.0 source_ptr = source_buffer.buf
169
170 # setup destination
171 200 239 1.2 0.1 nbytes = source_buffer.len
172 200 103 0.5 0.0 itemsize = source_buffer.itemsize
173 200 2286 11.4 0.8 dest = array.clone(char_array_template, nbytes + BLOSC_MAX_OVERHEAD,
174 zero=False)
175 200 129 0.6 0.0 dest_ptr = dest.data.as_voidptr
176
177 # perform compression
178 200 1734 8.7 0.6 if _get_use_threads():
179 # allow blosc to use threads internally
180 200 167 0.8 0.1 compressor_set = blosc_set_compressor(cname)
181 200 94 0.5 0.0 if compressor_set < 0:
182 raise ValueError('compressor not supported: %r' % cname)
183 200 288570 1442.8 97.1 with nogil:
184 cbytes = blosc_compress(clevel, shuffle, itemsize, nbytes,
185 source_ptr, dest_ptr,
186 nbytes + BLOSC_MAX_OVERHEAD)
187
188 else:
189 with nogil:
190 cbytes = blosc_compress_ctx(clevel, shuffle, itemsize, nbytes,
191 source_ptr, dest_ptr,
192 nbytes + BLOSC_MAX_OVERHEAD, cname,
193 0, 1)
194
195 # release source buffer
196 200 616 3.1 0.2 PyBuffer_Release(&source_buffer)
197
198 # check compression was successful
199 200 120 0.6 0.0 if cbytes <= 0:
200 raise RuntimeError('error during blosc compression: %d' % cbytes)
201
202 # resize after compression
203 200 1896 9.5 0.6 array.resize(dest, cbytes)
204
205 200 186 0.9 0.1 return dest
*******************************************************************************
[0.24293352799941204, 0.2324290420001489, 0.24935673900017719, 0.25716222699975333, 0.24246313799994823, 0.23272456500035332, 0.2636815870000646, 0.2576046349995522, 0.2781278639995435, 0.23824110699933954]
0.2324290420001489
Timer unit: 1e-06 s
Total time: 0.240178 s
File: /home/aliman/code/github/alimanfoo/zarr/zarr/blosc.pyx
Function: decompress at line 75
Line # Hits Time Per Hit % Time Line Contents
==============================================================
75 def decompress(source, dest):
76 """Decompress data.
77
78 Parameters
79 ----------
80 source : bytes-like
81 Compressed data, including blosc header.
82 dest : array-like
83 Object to decompress into.
84
85 Notes
86 -----
87 Assumes that the size of the destination buffer is correct for the size of
88 the uncompressed data.
89
90 """
91 cdef:
92 int ret
93 char *source_ptr
94 char *dest_ptr
95 Py_buffer source_buffer
96 array.array source_array
97 Py_buffer dest_buffer
98 size_t nbytes
99
100 # setup source buffer
101 200 573 2.9 0.2 if PY2 and isinstance(source, array.array):
102 # workaround fact that array.array does not support new-style buffer
103 # interface in PY2
104 release_source_buffer = False
105 source_array = source
106 source_ptr = source_array.data.as_voidptr
107 else:
108 200 112 0.6 0.0 release_source_buffer = True
109 200 144 0.7 0.1 PyObject_GetBuffer(source, &source_buffer, PyBUF_ANY_CONTIGUOUS)
110 200 98 0.5 0.0 source_ptr = source_buffer.buf
111
112 # setup destination buffer
113 200 552 2.8 0.2 PyObject_GetBuffer(dest, &dest_buffer,
114 PyBUF_ANY_CONTIGUOUS | PyBUF_WRITEABLE)
115 200 100 0.5 0.0 dest_ptr = dest_buffer.buf
116 200 84 0.4 0.0 nbytes = dest_buffer.len
117
118 # perform decompression
119 200 1856 9.3 0.8 if _get_use_threads():
120 # allow blosc to use threads internally
121 200 235286 1176.4 98.0 with nogil:
122 ret = blosc_decompress(source_ptr, dest_ptr, nbytes)
123 else:
124 with nogil:
125 ret = blosc_decompress_ctx(source_ptr, dest_ptr, nbytes, 1)
126
127 # release buffers
128 200 754 3.8 0.3 if release_source_buffer:
129 200 326 1.6 0.1 PyBuffer_Release(&source_buffer)
130 200 165 0.8 0.1 PyBuffer_Release(&dest_buffer)
131
132 # handle errors
133 200 128 0.6 0.1 if ret <= 0:
134 raise RuntimeError('error during blosc decompression: %d' % ret)
zarr-python-3.2.1/changes/ 0000775 0000000 0000000 00000000000 15176357430 0015435 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/changes/.gitignore 0000664 0000000 0000000 00000000014 15176357430 0017420 0 ustar 00root root 0000000 0000000 !.gitignore
zarr-python-3.2.1/changes/README.md 0000664 0000000 0000000 00000000575 15176357430 0016723 0 ustar 00root root 0000000 0000000 Writing a changelog entry
-------------------------
Please put a new file in this directory named `xxxx..md`, where
- `xxxx` is the pull request number associated with this entry
- `` is one of:
- feature
- bugfix
- doc
- removal
- misc
Inside the file, please write a short description of what you have changed, and how it impacts users of `zarr-python`.
zarr-python-3.2.1/ci/ 0000775 0000000 0000000 00000000000 15176357430 0014420 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/ci/check_changelog_entries.py 0000664 0000000 0000000 00000003311 15176357430 0021605 0 ustar 00root root 0000000 0000000 """
Check changelog entries have the correct filename structure.
"""
import sys
from pathlib import Path
VALID_CHANGELOG_TYPES = ["feature", "bugfix", "doc", "removal", "misc"]
CHANGELOG_DIRECTORY = (Path(__file__).parent.parent / "changes").resolve()
def is_int(s: str) -> bool:
try:
int(s)
except ValueError:
return False
else:
return True
if __name__ == "__main__":
print(f"Looking for changelog entries in {CHANGELOG_DIRECTORY}")
entries = CHANGELOG_DIRECTORY.glob("*")
entries = [e for e in entries if e.name not in [".gitignore", "README.md"]]
print(f"Found {len(entries)} entries")
print()
bad_suffix = [e for e in entries if e.suffix != ".md"]
bad_issue_no = [e for e in entries if not is_int(e.name.split(".")[0])]
bad_type = [e for e in entries if e.name.split(".")[1] not in VALID_CHANGELOG_TYPES]
if len(bad_suffix) or len(bad_issue_no) or len(bad_type):
if len(bad_suffix):
print("Changelog entries without .md suffix")
print("-------------------------------------")
print("\n".join([p.name for p in bad_suffix]))
print()
if len(bad_issue_no):
print("Changelog entries without integer issue number")
print("----------------------------------------------")
print("\n".join([p.name for p in bad_issue_no]))
print()
if len(bad_type):
print("Changelog entries without valid type")
print("------------------------------------")
print("\n".join([p.name for p in bad_type]))
print(f"Valid types are: {VALID_CHANGELOG_TYPES}")
print()
sys.exit(1)
sys.exit(0)
zarr-python-3.2.1/ci/check_unlinked_types.py 0000664 0000000 0000000 00000005517 15176357430 0021174 0 ustar 00root root 0000000 0000000 """Check for unlinked type annotations in built documentation.
mkdocstrings renders resolved types as links and unresolved
types as Name without an anchor.
This script finds all such unlinked types in the built HTML and reports them.
Usage:
python ci/check_unlinked_types.py [site_dir]
Raises ValueError if unlinked types are found.
"""
from __future__ import annotations
import re
import sys
from pathlib import Path
# Matches the griffe/mkdocstrings pattern for unlinked cross-references:
# Name
UNLINKED_PATTERN = re.compile(
r'(?P[^<]+)'
)
# Patterns to exclude from the report
EXCLUDE_PATTERNS = [
# TypeVars and type parameters (single brackets like Foo[T])
re.compile(r"\[.+\]$"),
# Dataclass field / namedtuple field references (contain parens)
re.compile(r"\("),
# Private names
re.compile(r"\._"),
# Dunder attributes
re.compile(r"\.__\w+__$"),
# Testing utilities
re.compile(r"^zarr\.testing\."),
# Third-party types (hypothesis, pytest, etc.)
re.compile(r"^(hypothesis|pytest|typing_extensions|builtins|dataclasses)\."),
]
def should_exclude(qualname: str) -> bool:
return any(p.search(qualname) for p in EXCLUDE_PATTERNS)
def find_unlinked_types(site_dir: Path) -> dict[str, set[str]]:
"""Find all unlinked types in built HTML files.
Returns a dict mapping qualified type names to the set of pages where they appear.
"""
api_dir = site_dir / "api"
if not api_dir.exists():
raise FileNotFoundError(f"{api_dir} does not exist. Run 'mkdocs build' first.")
unlinked: dict[str, set[str]] = {}
for html_file in api_dir.rglob("*.html"):
content = html_file.read_text(errors="replace")
rel_path = str(html_file.relative_to(site_dir))
for match in UNLINKED_PATTERN.finditer(content):
qualname = match.group("qualname")
if not should_exclude(qualname):
unlinked.setdefault(qualname, set()).add(rel_path)
return unlinked
def main() -> None:
site_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("site")
unlinked = find_unlinked_types(site_dir)
if not unlinked:
print("No unlinked types found.")
return
lines = [f"Found {len(unlinked)} unlinked types:\n"]
for qualname in sorted(unlinked):
pages = sorted(unlinked[qualname])
lines.append(f" {qualname}")
lines.extend(f" - {page}" for page in pages)
all_pages = {p for ps in unlinked.values() for p in ps}
lines.append(f"\nTotal: {len(unlinked)} unlinked types across {len(all_pages)} pages")
report = "\n".join(lines)
raise ValueError(report)
if __name__ == "__main__":
main()
zarr-python-3.2.1/codecov.yml 0000664 0000000 0000000 00000001277 15176357430 0016201 0 ustar 00root root 0000000 0000000 coverage:
status:
patch:
default:
target: auto
informational: true
project:
default:
target: auto
threshold: 0.1
flags:
- tests
flags:
tests:
paths:
- src/
carryforward: true
gpu:
paths:
- src/
carryforward: true
codecov:
notify:
# 6 = test.yml: 3 (optional+ubuntu) + 2 (upstream + min_deps), hypothesis: 1
after_n_builds: 6
wait_for_ci: yes
comment:
layout: "diff, files"
behavior: default
require_changes: true # if true: only post the comment if coverage changes
branches: # branch names that can post comment
- "main"
github_checks:
annotations: false
zarr-python-3.2.1/design/ 0000775 0000000 0000000 00000000000 15176357430 0015276 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/design/chunk-grid.md 0000664 0000000 0000000 00000133506 15176357430 0017663 0 ustar 00root root 0000000 0000000 # Unified Chunk Grid
Version: 6
Design document for adding rectilinear (variable) chunk grid support to **zarr-python**, conforming to the [rectilinear chunk grid extension spec](https://github.com/zarr-developers/zarr-extensions/pull/25).
**Related:**
- [#3750](https://github.com/zarr-developers/zarr-python/issues/3750) (single ChunkGrid proposal)
- [#3534](https://github.com/zarr-developers/zarr-python/pull/3534) (rectilinear implementation)
- [#3735](https://github.com/zarr-developers/zarr-python/pull/3735) (chunk grid module/registry)
- [ZEP0003](https://github.com/zarr-developers/zeps/blob/main/draft/ZEP0003.md) (variable chunking spec)
- [zarr-specs#370](https://github.com/zarr-developers/zarr-specs/pull/370) (sharding v1.1: non-divisible subchunks)
- [zarr-extensions#25](https://github.com/zarr-developers/zarr-extensions/pull/25) (rectilinear extension)
- [zarr-extensions#34](https://github.com/zarr-developers/zarr-extensions/issues/34) (sharding + rectilinear)
## Problem
Chunk grids form a hierarchy — the rectilinear grid is strictly more general than the regular grid. Any regular grid is expressible as a rectilinear grid. There is no known chunk grid that is both (a) more general than rectilinear and (b) retains the axis-aligned tessellation properties Zarr assumes. All known grids are special cases:
| Grid type | Description | Example |
|---|---|---|
| Regular | Uniform chunk size, boundary chunks padded with fill_value | `[10, 10, 10, 10]` |
| Regular-bounded (zarrs) | Uniform chunk size, boundary chunks trimmed to array extent | `[10, 10, 10, 5]` |
| HPC boundary-padded | Regular interior, larger boundary chunks ([VirtualiZarr#217](https://github.com/zarr-developers/VirtualiZarr/issues/217)) | `[10, 8, 8, 8, 10]` |
| Fully variable | Arbitrary per-chunk sizes | `[5, 12, 3, 20]` |
Prior iterations on the chunk grid design were based on the Zarr V3 spec's definition of chunk grids as an extension point alongside codecs, dtypes, etc. Therefore, we started designing the chunk grid implementation following a similar registry-based approach. However, in practice chunk grids are fundamentally different than codecs. Codecs are independent; supporting `zstd` tells you nothing about `gzip`. Chunk grids are not: every regular grid is a valid rectilinear grid. A registry-based plugin system makes sense for codecs but adds complexity without clear benefit for chunk grids. Here we start from some basic goals and propose a more fitting design for supporting different chunk grids in zarr-python.
## Goals
1. **Follow the zarr extension proposal.** The implementation should conform to the [rectilinear chunk grid spec](https://github.com/zarr-developers/zarr-extensions/tree/main/chunk-grids/rectilinear), not innovate on the metadata format.
2. **Minimize changes to the public API.** Users creating regular arrays should see no difference. Rectilinear is additive.
3. **Maintain backwards compatibility.** Existing code using `.chunks`, `isinstance` checks, or importing `RegularChunkGrid`/`RectilinearChunkGrid` from `zarr.core.chunk_grids` should continue to work where practical (with deprecation warnings where appropriate). Internal code paths/imports may be broken with justification.
4. **Design for future iteration.** The internal architecture should allow refactoring (e.g., metadata/array separation, new dimension types) without breaking the public API.
5. **Minimize downstream changes.** xarray, VirtualiZarr, Icechunk, Cubed, etc. should need minimal updates.
6. **Minimize time to stable release.** Ship behind a feature flag, stabilize through real-world usage, promote to stable API.
7. **The new API should be useful.** `read_chunk_sizes`/`write_chunk_sizes`, `ChunkGrid.__getitem__`, `is_regular` — these should solve real problems, not just expose internals.
8. **Extensible for other serialization structures.** The per-dimension design should support future encodings (tile, temporal) without changes to indexing or codecs.
## Design
### Design choices
1. **A chunk grid is a concrete arrangement of chunks.** Not an abstract tiling pattern. This means that the chunk grid is bound to specific array dimensions, which enables the chunk grid to answer any question about any chunk (offset, size, count) without external parameters.
2. **One implementation, multiple serialization forms.** A single `ChunkGrid` class handles all chunking logic. The serialization format (`"regular"` vs `"rectilinear"`) is chosen by the metadata layer, not the grid.
3. **No chunk grid registry.** Simple name-based dispatch in the metadata layer's `parse_chunk_grid()`.
4. **Fixed vs Varying per dimension.** `FixedDimension(size, extent)` for uniform chunks; `VaryingDimension(edges, extent)` for per-chunk edge lengths with precomputed prefix sums. Avoids expanding regular dimensions into lists of identical values.
5. **Transparent transitions.** Operations like `resize()` can move an array from regular to rectilinear chunking.
### Internal representation
```python
@dataclass(frozen=True)
class FixedDimension:
"""Uniform chunk size. Boundary chunks contain less data but are
encoded at full size by the codec pipeline."""
size: int # chunk edge length (>= 0)
extent: int # array dimension length
def __post_init__(self) -> None:
# validates size >= 0 and extent >= 0
@property
def nchunks(self) -> int:
if self.size == 0:
return 0
return ceildiv(self.extent, self.size)
def index_to_chunk(self, idx: int) -> int:
return idx // self.size # raises IndexError if OOB
def chunk_offset(self, chunk_ix: int) -> int:
return chunk_ix * self.size # raises IndexError if OOB
def chunk_size(self, chunk_ix: int) -> int:
return self.size # always uniform; raises IndexError if OOB
def data_size(self, chunk_ix: int) -> int:
return max(0, min(self.size, self.extent - chunk_ix * self.size)) # raises IndexError if OOB
@property
def unique_edge_lengths(self) -> Iterable[int]:
return (self.size,) # O(1)
def indices_to_chunks(self, indices: NDArray) -> NDArray:
return indices // self.size
def with_extent(self, new_extent: int) -> FixedDimension:
return FixedDimension(size=self.size, extent=new_extent)
def resize(self, new_extent: int) -> FixedDimension:
return FixedDimension(size=self.size, extent=new_extent)
@dataclass(frozen=True)
class VaryingDimension:
"""Explicit per-chunk sizes. The last chunk may extend past the array
extent (extent < sum(edges)), in which case data_size clips to the
valid region while chunk_size returns the full edge length for codec
processing. This underflow is allowed to match how regular grids
handle boundary chunks, and to support shrinking an array without
rewriting chunk edges (the spec allows trailing edges beyond the extent)."""
edges: tuple[int, ...] # per-chunk edge lengths (all > 0)
cumulative: tuple[int, ...] # prefix sums for O(log n) lookup
extent: int # array dimension length (may be < sum(edges))
def __init__(self, edges: Sequence[int], extent: int) -> None:
# validates edges non-empty, all > 0, extent >= 0, extent <= sum(edges)
# computes cumulative via itertools.accumulate
# uses object.__setattr__ for frozen dataclass
@property
def nchunks(self) -> int:
# number of chunks that overlap [0, extent)
if extent == 0:
return 0
return bisect.bisect_left(self.cumulative, extent) + 1
@property
def ngridcells(self) -> int:
return len(self.edges)
def index_to_chunk(self, idx: int) -> int:
return bisect.bisect_right(self.cumulative, idx) # raises IndexError if OOB
def chunk_offset(self, chunk_ix: int) -> int:
return self.cumulative[chunk_ix - 1] if chunk_ix > 0 else 0 # raises IndexError if OOB
def chunk_size(self, chunk_ix: int) -> int:
return self.edges[chunk_ix] # raises IndexError if OOB
def data_size(self, chunk_ix: int) -> int:
offset = self.chunk_offset(chunk_ix)
return max(0, min(self.edges[chunk_ix], self.extent - offset)) # raises IndexError if OOB
@property
def unique_edge_lengths(self) -> Iterable[int]:
# lazy generator: yields unseen values, short-circuits deduplication
def indices_to_chunks(self, indices: NDArray) -> NDArray:
return np.searchsorted(self.cumulative, indices, side='right')
def with_extent(self, new_extent: int) -> VaryingDimension:
# validates cumulative[-1] >= new_extent (O(1)), re-binds extent
return VaryingDimension(self.edges, extent=new_extent)
def resize(self, new_extent: int) -> VaryingDimension:
# grow past edge sum: append chunk of size (new_extent - sum(edges))
# shrink or grow within edge sum: preserve all edges, re-bind extent
```
Both types implement the `DimensionGrid` protocol: `nchunks`, `extent`, `index_to_chunk`, `chunk_offset`, `chunk_size`, `data_size`, `indices_to_chunks`, `unique_edge_lengths`, `with_extent`, `resize`. Memory usage scales with the number of *varying* dimensions, not total chunks.
All per-chunk methods (`chunk_offset`, `chunk_size`, `data_size`) raise `IndexError` for out-of-bounds chunk indices, providing consistent fail-fast behavior across both dimension types.
The two size methods serve different consumers:
| Method | Returns | Consumer |
|---|---|---|
| `chunk_size` | Buffer size for codec processing | Codec pipeline (`ArraySpec.shape`) |
| `data_size` | Valid data region within the buffer | Indexing pipeline (`chunk_selection` slicing) |
For `FixedDimension`, these differ only at the boundary. For `VaryingDimension`, these differ only when the last chunk extends past the extent (i.e., `extent < sum(edges)`). This matches current zarr-python behavior: `get_chunk_spec` passes the full `chunk_shape` to the codec for all chunks, and the indexer generates a `chunk_selection` that clips the decoded buffer.
### DimensionGrid Protocol
```python
@runtime_checkable
class DimensionGrid(Protocol):
"""Structural interface shared by FixedDimension and VaryingDimension."""
@property
def nchunks(self) -> int: ...
@property
def ngridcells(self) -> int: ...
@property
def extent(self) -> int: ...
def index_to_chunk(self, idx: int) -> int: ...
def chunk_offset(self, chunk_ix: int) -> int: ... # raises IndexError if OOB
def chunk_size(self, chunk_ix: int) -> int: ... # raises IndexError if OOB
def data_size(self, chunk_ix: int) -> int: ... # raises IndexError if OOB
def indices_to_chunks(self, indices: NDArray[np.intp]) -> NDArray[np.intp]: ...
@property
def unique_edge_lengths(self) -> Iterable[int]: ...
def with_extent(self, new_extent: int) -> DimensionGrid: ...
def resize(self, new_extent: int) -> DimensionGrid: ...
```
The protocol is `@runtime_checkable`, enabling polymorphic handling of both dimension types without `isinstance` checks.
`nchunks` and `ngridcells` differ when `extent < sum(edges)`: `nchunks` counts only chunks that overlap `[0, extent)`, while `ngridcells` counts total defined grid cells (i.e., `len(edges)`). For `FixedDimension`, both are equal. For `VaryingDimension`, they differ after a resize that shrinks the extent below the edge sum.
### ChunkSpec
```python
@dataclass(frozen=True)
class ChunkSpec:
slices: tuple[slice, ...] # valid data region in array coordinates
codec_shape: tuple[int, ...] # buffer shape for codec processing
@property
def shape(self) -> tuple[int, ...]:
return tuple(s.stop - s.start for s in self.slices)
@property
def is_boundary(self) -> bool:
return self.shape != self.codec_shape
```
For interior chunks, `shape == codec_shape`. For boundary chunks of a regular grid, `codec_shape` is the full declared chunk size while `shape` is clipped. For rectilinear grids, `shape == codec_shape` unless the last chunk extends past the extent.
### API
```python
# Creating arrays
arr = zarr.create_array(shape=(100, 200), chunks=(10, 20)) # regular
arr = zarr.create_array(shape=(60, 100), chunks=[[10, 20, 30], [25, 25, 25, 25]]) # rectilinear
# ChunkGrid as a collection
grid = arr._chunk_grid # ChunkGrid (bound to array shape)
grid.grid_shape # (10, 10) — number of chunks per dimension
grid.ndim # 2
grid.is_regular # True if all dimensions are Fixed
spec = grid[0, 1] # ChunkSpec for chunk at grid position (0, 1)
spec.slices # (slice(0, 10), slice(20, 40))
spec.shape # (10, 20) — data shape
spec.codec_shape # (10, 20) — same for interior chunks
boundary = grid[9, 0] # boundary chunk (extent=100, size=10)
boundary.shape # (10, 20) — data shape
boundary.codec_shape # (10, 20) — codec sees full buffer
grid[99, 99] # None — out of bounds
for spec in grid: # iterate all chunks
...
# .chunks property: retained for regular grids, raises NotImplementedError for rectilinear
arr.chunks # (10, 20)
# .read_chunk_sizes / .write_chunk_sizes: works for all grids (dask-style)
arr.write_chunk_sizes # ((10, 10, ..., 10), (20, 20, ..., 20))
```
`ChunkGrid.__getitem__` constructs `ChunkSpec` using `chunk_size` for `codec_shape` and `data_size` for `slices`:
```python
def __getitem__(self, coords: int | tuple[int, ...]) -> ChunkSpec | None:
if isinstance(coords, int):
coords = (coords,)
slices = []
codec_shape = []
for dim, ix in zip(self.dimensions, coords):
if ix < 0 or ix >= dim.nchunks:
return None
offset = dim.chunk_offset(ix)
slices.append(slice(offset, offset + dim.data_size(ix)))
codec_shape.append(dim.chunk_size(ix))
return ChunkSpec(tuple(slices), tuple(codec_shape))
```
#### Construction
`from_sizes` requires `array_shape`, binding the extent per dimension at construction time. This is a core design choice: a chunk grid is a concrete arrangement for a specific array, not an abstract tiling pattern.
```python
# Regular grid — all FixedDimension
grid = ChunkGrid.from_sizes(array_shape=(100, 200), chunk_sizes=(10, 20))
# Rectilinear grid — extent = sum(edges) when shape matches
grid = ChunkGrid.from_sizes(array_shape=(60, 100), chunk_sizes=[[10, 20, 30], [25, 25, 25, 25]])
# Rectilinear grid with boundary clipping — last chunk extends past array extent
# e.g., shape=(55, 90) but edges sum to (60, 100): data_size clips at extent
grid = ChunkGrid.from_sizes(array_shape=(55, 90), chunk_sizes=[[10, 20, 30], [25, 25, 25, 25]])
# Direct construction
grid = ChunkGrid(dimensions=(FixedDimension(10, 100), VaryingDimension([10, 20, 30], 55)))
```
When `extent < sum(edges)`, the dimension is always stored as `VaryingDimension` (even if all edges are identical) to preserve the explicit edge count. The last chunk's `chunk_size` returns the full declared edge (codec buffer) while `data_size` clips to the extent. This mirrors how `FixedDimension` handles boundary chunks in regular grids.
#### Serialization
```python
# Regular grid:
{"name": "regular", "configuration": {"chunk_shape": [10, 20]}}
# Rectilinear grid (with RLE compression and "kind" field):
{"name": "rectilinear", "configuration": {"kind": "inline", "chunk_shapes": [[10, 20, 30], [[25, 4]]]}}
```
Both names deserialize to the same `ChunkGrid` class. The serialized form does not include the array extent — that comes from `shape` in array metadata and is combined with the chunk grid when constructing a `ChunkGrid` via `ChunkGrid.from_metadata()`.
**The `ChunkGrid` does not serialize itself.** The format choice (`"regular"` vs `"rectilinear"`) belongs to `ArrayV3Metadata`. Serialization and deserialization are handled by the metadata-layer chunk grid classes (`RegularChunkGridMetadata` and `RectilinearChunkGridMetadata` in `metadata/v3.py`), which provide `to_dict()` and `from_dict()` methods.
For `create_array`, the format is inferred from the `chunks` argument: a flat tuple produces `"regular"`, a nested list produces `"rectilinear"`. The `_is_rectilinear_chunks()` helper detects nested sequences like `[[10, 20], [5, 5]]`.
##### Rectilinear spec compliance
The rectilinear format requires `"kind": "inline"` (validated by `validate_rectilinear_kind()`). Per the spec, each element of `chunk_shapes` can be:
- A bare integer `m`: repeated until `sum >= array_extent`
- A list of bare integers: explicit per-chunk sizes
- A mixed array of bare integers and `[value, count]` RLE pairs
RLE compression is used when serializing: runs of identical sizes become `[value, count]` pairs, singletons stay as bare integers.
```python
# compress_rle([10, 10, 10, 5]) -> [[10, 3], 5]
# expand_rle([[10, 3], 5]) -> [10, 10, 10, 5]
```
For a single-element `chunk_shapes` tuple like `(10,)`, `RectilinearChunkGridMetadata.to_dict()` serializes it as a bare integer `10`. Per the rectilinear spec, a bare integer is repeated until the sum >= extent, preserving the full codec buffer size for boundary chunks.
**Zero-extent handling:** Regular grids serialize zero-extent dimensions without issue (the format encodes only `chunk_shape`, no edges). Rectilinear grids cannot represent zero-extent dimensions because the spec requires at least one positive-integer edge length per axis.
#### read_chunk_sizes / write_chunk_sizes
The `read_chunk_sizes` and `write_chunk_sizes` properties provide universal access to per-dimension chunk data sizes, matching the dask `Array.chunks` convention. They work for both regular and rectilinear grids:
- `write_chunk_sizes`: always returns outer (storage) chunk sizes
- `read_chunk_sizes`: returns inner chunk sizes when sharding is used, otherwise same as `write_chunk_sizes`
```python
>>> arr = zarr.create_array(store, shape=(100, 80), chunks=(30, 40))
>>> arr.write_chunk_sizes
((30, 30, 30, 10), (40, 40))
>>> arr = zarr.create_array(store, shape=(60, 100), chunks=[[10, 20, 30], [50, 50]])
>>> arr.write_chunk_sizes
((10, 20, 30), (50, 50))
```
The underlying `ChunkGrid.chunk_sizes` property (on the grid, not the array) returns the same as `write_chunk_sizes`.
#### Resize
```python
arr.resize((80, 100)) # re-binds extent; FixedDimension stays fixed
arr.resize((200, 100)) # VaryingDimension grows by appending a new chunk
arr.resize((30, 100)) # VaryingDimension shrinks: preserves all edges, re-binds extent
```
Resize uses `ChunkGrid.update_shape(new_shape)`, which delegates to each dimension's `.resize()` method:
- `FixedDimension.resize()`: simply re-binds the extent (identical to `with_extent`)
- `VaryingDimension.resize()`: grow past `sum(edges)` appends a chunk covering the gap; shrink or grow within `sum(edges)` preserves all edges and re-binds the extent (the spec allows trailing edges beyond the array extent)
**Known limitation (deferred):** When growing a `VaryingDimension`, the current implementation always appends a single chunk covering the new region. For example, `[10, 10, 10]` resized from 30 to 45 produces `[10, 10, 10, 15]` instead of the more natural `[10, 10, 10, 10, 10]`. A future improvement should add an optional `chunks` parameter to `resize()` that controls how the new region is partitioned, with a sane default (e.g., repeating the last chunk size). This is safely deferrable because:
- `FixedDimension` already handles resize correctly (regular grids stay regular)
- The single-chunk default produces valid state, just suboptimal chunk layout
- Rectilinear arrays are behind an experimental feature flag
- Adding an optional parameter is backwards-compatible
Open design questions for the `chunks` parameter:
- Does it describe the new region only, or the entire post-resize array?
- Must the overlapping portion agree with existing chunks (no rechunking)?
- What is the type? Same as `chunks` in `create_array`?
#### from_array
The `from_array()` function handles both regular and rectilinear source arrays:
```python
src = zarr.create_array(store, shape=(60, 100), chunks=[[10, 20, 30], [50, 50]])
new = zarr.from_array(data=src, store=new_store, chunks="keep")
# Preserves rectilinear structure: new.write_chunk_sizes == ((10, 20, 30), (50, 50))
```
When `chunks="keep"`, the logic checks `data._chunk_grid.is_regular`:
- Regular: extracts `data.chunks` (flat tuple) and preserves shards
- Rectilinear: extracts `data.write_chunk_sizes` (nested tuples) and forces shards to None
### Indexing
The indexing pipeline is coupled to regular grid assumptions — every per-dimension indexer takes a scalar `dim_chunk_len: int` and uses `//` and `*`:
```python
dim_chunk_ix = self.dim_sel // self.dim_chunk_len # IntDimIndexer
dim_offset = dim_chunk_ix * self.dim_chunk_len # SliceDimIndexer
```
Replace `dim_chunk_len: int` with the dimension object (`FixedDimension | VaryingDimension`). The shared interface means the indexer code structure stays the same — `dim_sel // dim_chunk_len` becomes `dim_grid.index_to_chunk(dim_sel)`. O(1) for regular, binary search for varying.
### Codec pipeline
Today, `get_chunk_spec()` returns the same `ArraySpec(shape=chunk_grid.chunk_shape)` for every chunk. For rectilinear grids, each chunk has a different codec shape:
```python
def get_chunk_spec(self, chunk_coords, array_config, prototype) -> ArraySpec:
spec = self._chunk_grid[chunk_coords]
return ArraySpec(shape=spec.codec_shape, ...)
```
Note `spec.codec_shape`, not `spec.shape`. For regular grids, `codec_shape` is uniform (preserving current behavior). The boundary clipping flow is unchanged:
```
Write: user data → pad to codec_shape with fill_value → encode → store
Read: store → decode to codec_shape → slice via chunk_selection → user data
```
### Sharding
The `ShardingCodec` constructs a `ChunkGrid` per shard using the shard shape as extent and the subchunk shape as `FixedDimension`. Each shard is self-contained — it doesn't need to know whether the outer grid is regular or rectilinear. Validation checks that every unique edge length per dimension is divisible by the inner chunk size, using `dim.unique_edge_lengths` for efficient polymorphic iteration (O(1) for fixed dimensions, lazy-deduplicated for varying).
```
Level 1 — Outer chunk grid (shard boundaries): regular or rectilinear
Level 2 — Inner subchunk grid (within each shard): always regular
Level 3 — Shard index: ceil(shard_dim / subchunk_dim) entries per dimension
```
[zarr-specs#370](https://github.com/zarr-developers/zarr-specs/pull/370) lifts the requirement that subchunk shapes evenly divide the shard shape. With the proposed `ChunkGrid`, this just means removing the `shard_shape % subchunk_shape == 0` validation — `FixedDimension` already handles boundary clipping via `data_size`.
| Outer grid | Subchunk divisibility | Required change |
|---|---|---|
| Regular | Evenly divides (v1.0) | None |
| Regular | Non-divisible (v1.1) | Remove divisibility validation |
| Rectilinear | Evenly divides | Remove "sharding incompatible" guard |
| Rectilinear | Non-divisible | Both changes |
### What this replaces
| Current | Proposed |
|---|---|
| `ChunkGrid` ABC + `RegularChunkGrid` subclass | Single concrete `ChunkGrid` with `is_regular` |
| `RectilinearChunkGrid` (#3534) | Same `ChunkGrid` class |
| Chunk grid registry + entrypoints (#3735) | Direct name dispatch |
| `arr.chunks` | Retained for regular; `arr.read_chunk_sizes`/`arr.write_chunk_sizes` for general use |
| `get_chunk_shape(shape, coord)` | `grid[coord].codec_shape` or `grid[coord].shape` |
## Design decisions
### Why store the extent in ChunkGrid?
The chunk grid is a concrete arrangement, not an abstract tiling pattern. A finite collection naturally has an extent. Storing it enables `__getitem__`, eliminates `dim_len` parameters from every method, and makes the grid self-describing.
This does *not* mean `ArrayV3Metadata.shape` should delegate to the grid. The array shape remains an independent field in metadata. The extent is passed into the grid at construction time so it can answer boundary questions without external parameters. It is **not** serialized as part of the chunk grid JSON — it comes from the `shape` field in array metadata and is combined with the chunk grid configuration in `ChunkGrid.from_metadata()`.
### Why distinguish chunk_size from data_size?
A chunk in a regular grid has two sizes. `chunk_size` is the buffer size the codec processes — always `size` for `FixedDimension`, even at the boundary (padded with `fill_value`). `data_size` is the valid data region — clipped to `extent % size` at the boundary. The indexing layer uses `data_size` to generate `chunk_selection` slices.
This matches current zarr-python behavior and matters for:
1. **Backward compatibility.** Existing stores have boundary chunks encoded at full `chunk_shape`.
2. **Codec simplicity.** Codecs assume uniform input shapes for regular grids.
3. **Shard index correctness.** The index assumes `subchunk_dim`-sized entries.
For `VaryingDimension`, `chunk_size == data_size` when `extent == sum(edges)`. When `extent < sum(edges)` (e.g., after a resize that keeps the last chunk oversized), `data_size` clips the last chunk. This is the fundamental difference: `FixedDimension` has a declared size plus an extent that clips data; `VaryingDimension` has explicit sizes that normally *are* the extent but can also extend past it.
### Why not a chunk grid registry?
There is no known chunk grid outside the rectilinear family that retains the tessellation properties zarr-python assumes. A `match` on the grid name is sufficient.
### Why a single ChunkGrid class instead of RegularChunkGrid + RectilinearChunkGrid?
[Discussed in #3534.](https://github.com/zarr-developers/zarr-python/pull/3534) @d-v-b argued that `RegularChunkGrid` is unnecessary since rectilinear is more general; @dcherian argued that downstream libraries need a fast way to detect regular grids without inspecting potentially millions of chunk edges (see [xarray#9808](https://github.com/pydata/xarray/pull/9808)).
The resolution: a single `ChunkGrid` class with an `is_regular` property (O(1), cached at construction). This gives downstream code the fast-path detection @dcherian needed without the class hierarchy complexity @d-v-b wanted to avoid. The metadata document's `name` field (`"regular"` vs `"rectilinear"`) is also available for clients who inspect JSON directly.
A backwards-compatibility shim in `chunk_grids.py` preserves the old `RegularChunkGrid` / `RectilinearChunkGrid` import paths with deprecation warnings — see [Backwards compatibility](#backwards-compatibility).
### Why is ChunkGrid a concrete class instead of a Protocol/ABC?
The old design had `ChunkGrid` as an ABC with `RegularChunkGrid` as its only subclass. #3534 added `RectilinearChunkGrid` as a second subclass. This branch makes `ChunkGrid` a single concrete class instead, with separate metadata DTOs (`RegularChunkGridMetadata` and `RectilinearChunkGridMetadata` in `metadata/v3.py`) for serialization.
All known grids are special cases of rectilinear, so there's no need for a class hierarchy at the grid level. A `ChunkGrid` Protocol/ABC would mean every caller programs against an abstract interface and adding a grid type requires implementing ~15 methods. A single class is simpler.
Note: the *dimension* types (`FixedDimension`, `VaryingDimension`) do use a `DimensionGrid` Protocol — that's where the polymorphism lives. The grid-level class is concrete; the dimension-level types are polymorphic. If a genuinely novel grid type emerges that can't be expressed as a combination of per-dimension types, a grid-level Protocol can be extracted.
### Why `.chunks` raises for rectilinear grids
[Debated in #3534.](https://github.com/zarr-developers/zarr-python/pull/3534) @d-v-b suggested making `.chunks` return `tuple[tuple[int, ...], ...]` (dask-style) for all grids. @dcherian strongly objected: every downstream consumer expects `tuple[int, ...]`, and silently returning a different type would be worse than raising. Materializing O(10M) chunk edges into a Python tuple is also a real performance risk ([xarray#8902](https://github.com/pydata/xarray/issues/8902#issuecomment-2546127373)).
The resolution:
- `.chunks` is retained for regular grids (returns `tuple[int, ...]` as before)
- `.chunks` raises `NotImplementedError` for rectilinear grids with a message pointing to `.read_chunk_sizes`/`.write_chunk_sizes`
- `.read_chunk_sizes` and `.write_chunk_sizes` return `tuple[tuple[int, ...], ...]` (dask convention) for all grids
@maxrjones noted in review that deprecating `.chunks` for regular grids was not desirable. The current branch does not deprecate it.
### User control over grid serialization format
@d-v-b raised in #3534 that users need a way to say "these chunks are regular, but serialize as rectilinear" (e.g., to allow future append/extend workflows without format changes). @jhamman initially made nested-list input always produce `RectilinearChunkGridMetadata`.
The current branch resolves this via the metadata-layer chunk grid classes. When metadata is deserialized, the original name (from `{"name": "regular"}` or `{"name": "rectilinear"}`) determines which metadata class is instantiated (`RegularChunkGridMetadata` or `RectilinearChunkGridMetadata`), and that class handles serialization via `to_dict()`. Current inference behavior for `create_array`:
- `chunks=(10, 20)` (flat tuple) → infers `"regular"`
- `chunks=[[10, 20], [5, 5]]` (nested lists with varying sizes) → infers `"rectilinear"`
- `chunks=[[10, 10], [20, 20]]` (nested lists with uniform sizes) → `from_sizes` collapses to `FixedDimension`, so `is_regular=True` and infers `"regular"`
**Open question:** Should uniform nested lists preserve `"rectilinear"` to support future append workflows without a format change? This could be addressed by checking the input form before collapsing, or by allowing users to pass `chunk_grid_name` explicitly through the `create_array` API.
### Deferred: Tiled/periodic chunk patterns
[#3750 discussion](https://github.com/zarr-developers/zarr-python/issues/3750) identified periodic chunk patterns as a use case not efficiently served by RLE alone. RLE compresses runs of identical values (`np.repeat`), but periodic patterns like days-per-month (`[31, 28, 31, 30, ...]` repeated 30 years) need a tile encoding (`np.tile`). Real-world examples include:
- **Oceanographic models** (ROMS): HPC boundary-padded chunks like `[10, 8, 8, 8, 10]` — handled by RLE
- **Temporal axes**: days-per-month, hours-per-day — need tile encoding for compact metadata
- **Temporal-aware grids**: date/time-aware chunk grids that layer over other axes (raised by @LDeakin)
A `TiledDimension` prototype was built ([commit 9c0f582](https://github.com/maxrjones/zarr-python/commit/9c0f582f)) demonstrating that the per-dimension design supports this without changes to indexing or the codec pipeline. However, it was intentionally excluded from this release because:
1. **Metadata format must come first.** Tile encoding requires a new `kind` value in the rectilinear spec (currently only `"inline"` is defined). This should go through [zarr-extensions#25](https://github.com/zarr-developers/zarr-extensions/pull/25), not zarr-python unilaterally.
2. **The per-dimension architecture doesn't preclude it.** A future `TiledDimension` can implement the `DimensionGrid` protocol alongside `FixedDimension` and `VaryingDimension` with no changes to indexing, codecs, or the `ChunkGrid` class.
3. **RLE covers the MVP.** Most real-world variable chunk patterns (HPC boundaries, irregular partitions) are efficiently encoded with RLE. Tile encoding is an optimization for a specific (temporal) subset.
### Metadata / Array separation (partially implemented)
An earlier design doc proposed decoupling `ChunkGrid` (runtime) from `ArrayV3Metadata` (serialization), so that metadata would store only a plain dict and the array layer would construct the `ChunkGrid`.
The current implementation partially realizes this separation:
- **Metadata DTOs** (`RegularChunkGridMetadata`, `RectilinearChunkGridMetadata` in `metadata/v3.py`): Pure data, frozen dataclasses, no array shape. These live on `ArrayV3Metadata.chunk_grid` and represent only what goes into `zarr.json`.
- **`ChunkGrid`** (`chunk_grids.py`): Shape-bound, supports indexing, iteration, and chunk specs. Lives on `AsyncArray._chunk_grid`, constructed from metadata + `shape` via `ChunkGrid.from_metadata()`.
This means `ArrayV3Metadata.chunk_grid` is now a `ChunkGridMetadata` (the DTO union type), **not** the runtime `ChunkGrid`. Code that previously accessed runtime methods on `metadata.chunk_grid` (e.g., `all_chunk_coords()`, `__getitem__`) must now use the grid from the array layer instead.
The name controls serialization format; each metadata DTO class provides its own `to_dict()` method for serialization. The `ChunkGrid` handles all runtime queries.
## Prior art
**zarrs (Rust):** Three independent grid types behind a `ChunkGridTraits` trait. Key patterns adopted: Fixed vs Varying per dimension, prefix sums + binary search, `Option` for out-of-bounds, `NonZeroU64` for chunk dimensions, separate subchunk grid per shard, array shape at construction.
**TensorStore (C++):** Stores only `chunk_shape` — boundary clipping via `valid_data_bounds` at query time. Both `RegularGridRef` and `IrregularGrid` internally. No registry.
## Migration
### Public API compatibility
The user-facing API is fully backward-compatible. Existing code that creates, opens, reads, and writes zarr arrays continues to work without changes:
- `zarr.create_array`, `zarr.open`, `zarr.open_array`, `zarr.open_group` -- unchanged signatures. The `chunks` parameter type is *widened* (now also accepts nested sequences for rectilinear grids), but all existing call patterns still work.
- `arr.chunks` -- returns `tuple[int, ...]` for regular arrays, same as before.
- `arr.shape`, `arr.dtype`, `arr.ndim`, `arr.shards` -- unchanged.
- Top-level `zarr` exports -- unchanged.
- Rectilinear chunks are gated behind `zarr.config.set({'array.rectilinear_chunks': True})`, so they cannot be created accidentally.
New additions (purely additive): `arr.read_chunk_sizes`, `arr.write_chunk_sizes`, `zarr.experimental.ChunkGrid`, `zarr.experimental.ChunkSpec`.
The breaking changes discussed below are confined to **internal modules** (`zarr.core.chunk_grids`, `zarr.core.metadata.v3`, `zarr.core.indexing`) that downstream libraries like cubed and VirtualiZarr access directly.
### Internal API compatibility trade-off analysis
This section analyzes the internal breaking changes from the metadata/array separation and evaluates two strategies: (A) add backward-compatibility shims in zarr-python, vs. (B) require downstream packages to update. The baseline is **no shims at all**.
#### What breaks without any shims
Three API changes affect downstream code:
1. **`RegularChunkGrid` class removed from `zarr.core.chunk_grids`.** On `main`, `RegularChunkGrid` is defined in `chunk_grids.py` as a `Metadata` subclass. This branch replaces it with `RegularChunkGridMetadata` in `metadata/v3.py`. Without a shim, `from zarr.core.chunk_grids import RegularChunkGrid` raises `ImportError`.
2. **`RegularChunkGrid` no longer available from `zarr.core.metadata.v3`.** On `main`, `v3.py` imports `RegularChunkGrid` from `chunk_grids.py` for internal use. VirtualiZarr imports it from this location (`from zarr.core.metadata.v3 import RegularChunkGrid`). Without the internal import, this raises `ImportError`.
3. **`OrthogonalIndexer` constructor expects `ChunkGrid`, not `RegularChunkGrid`/`RegularChunkGridMetadata`.** Even if the import shims above resolve to `RegularChunkGridMetadata`, the indexer constructors access `chunk_grid._dimensions`, which only exists on the runtime `ChunkGrid` class. Cubed constructs `OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))` directly.
#### Downstream impact without shims
**VirtualiZarr** (5 line changes across 2 files):
```python
# manifests/array.py (line 6): import
- from zarr.core.metadata.v3 import ArrayV3Metadata, RegularChunkGrid
+ from zarr.core.metadata.v3 import ArrayV3Metadata, RegularChunkGridMetadata
# manifests/array.py (line 53): isinstance check
- if not isinstance(_metadata.chunk_grid, RegularChunkGrid):
+ if not isinstance(_metadata.chunk_grid, RegularChunkGridMetadata):
# parsers/zarr.py (line 16): import
- from zarr.core.chunk_grids import RegularChunkGrid
+ from zarr.core.metadata.v3 import RegularChunkGridMetadata
# parsers/zarr.py (line 270): isinstance check
- if not isinstance(array_v3_metadata.chunk_grid, RegularChunkGrid):
+ if not isinstance(array_v3_metadata.chunk_grid, RegularChunkGridMetadata):
# parsers/zarr.py (line 390): cast
- cast(RegularChunkGrid, metadata.chunk_grid).chunk_shape
+ cast(RegularChunkGridMetadata, metadata.chunk_grid).chunk_shape
```
The `manifests/array.py` import is from `zarr.core.metadata.v3` (never a documented export; VirtualiZarr relied on a transitive import). The `parsers/zarr.py` import is from `zarr.core.chunk_grids` (the canonical location on `main`). Both are straightforward renames. The `.chunk_shape` attribute is unchanged on the new class.
If VirtualiZarr needs to support both old and new zarr-python, a version-conditional import adds ~5 more lines.
**Cubed** (3 line changes in 1 file):
```python
# core/ops.py (lines 626-631)
def _create_zarr_indexer(selection, shape, chunks):
if zarr.__version__[0] == "3":
- from zarr.core.chunk_grids import RegularChunkGrid
+ from zarr.core.chunk_grids import ChunkGrid
from zarr.core.indexing import OrthogonalIndexer
- return OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))
+ return OrthogonalIndexer(selection, shape, ChunkGrid.from_sizes(shape, chunks))
```
Note that `ChunkGrid` is *not* a renamed class. `RegularChunkGrid(chunk_shape=chunks)` took only chunk sizes; `ChunkGrid.from_sizes(shape, chunks)` also requires the array shape. The `shape` parameter is already available at this call site.
If cubed needs to support both old and new zarr-python:
```python
def _create_zarr_indexer(selection, shape, chunks):
if zarr.__version__[0] == "3":
from zarr.core.indexing import OrthogonalIndexer
try:
from zarr.core.chunk_grids import ChunkGrid
return OrthogonalIndexer(selection, shape, ChunkGrid.from_sizes(shape, chunks))
except ImportError:
from zarr.core.chunk_grids import RegularChunkGrid
return OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))
else:
from zarr.indexing import OrthogonalIndexer
return OrthogonalIndexer(selection, ZarrArrayIndexingAdaptor(shape, chunks))
```
#### What shims can cover
**Shim 1: `__getattr__` in `chunk_grids.py`** (~15 lines)
Maps `RegularChunkGrid` to `RegularChunkGridMetadata` with a deprecation warning. Covers:
- The `from zarr.core.chunk_grids import RegularChunkGrid` import pattern (used by cubed and VirtualiZarr's `parsers/zarr.py`)
- `isinstance(x, RegularChunkGrid)` checks (because the name resolves to the actual class)
- `RegularChunkGrid(chunk_shape=(...))` construction (because `RegularChunkGridMetadata` accepts the same arguments)
Does **not** cover: passing the result to `OrthogonalIndexer`, because `RegularChunkGridMetadata` lacks `._dimensions`.
**Shim 2: `__getattr__` in `metadata/v3.py`** (~12 lines)
Same pattern, covers VirtualiZarr's import from `zarr.core.metadata.v3`. Mirrors Shim 1 for a different import path.
**Shim 3: Auto-coerce `ChunkGridMetadata` in indexer constructors** (~30 lines)
A helper function + 1-line insertion in each of `BasicIndexer`, `OrthogonalIndexer`, `CoordinateIndexer`, and `MaskIndexer`:
```python
def _resolve_chunk_grid(chunk_grid, shape):
"""Coerce ChunkGridMetadata to runtime ChunkGrid if needed."""
from zarr.core.chunk_grids import ChunkGrid as _ChunkGrid
from zarr.core.metadata.v3 import ChunkGridMetadata
if isinstance(chunk_grid, _ChunkGrid):
return chunk_grid
if isinstance(chunk_grid, ChunkGridMetadata):
warnings.warn(
"Passing ChunkGridMetadata to indexers is deprecated. "
"Use ChunkGrid.from_sizes() instead.",
DeprecationWarning, stacklevel=2,
)
if hasattr(chunk_grid, "chunk_shape"):
return _ChunkGrid.from_sizes(shape, tuple(chunk_grid.chunk_shape))
return _ChunkGrid.from_sizes(shape, chunk_grid.chunk_shapes)
raise TypeError(f"Expected ChunkGrid or ChunkGridMetadata, got {type(chunk_grid)}")
```
This covers cubed's `OrthogonalIndexer(selection, shape, RegularChunkGrid(...))` pattern end-to-end (combined with Shim 1).
#### Comparison
| | No shims | Shims 1+2 only | Shims 1+2+3 |
|---|---|---|---|
| **zarr-python additions** | 0 lines | ~27 lines | ~57 lines |
| **VirtualiZarr changes** | 5 lines | 0 lines | 0 lines |
| **Cubed changes** | 3 lines | 3 lines | 0 lines |
| **Maintenance burden** | None | Low (deprecation shims are well-understood) | Medium (indexer coercion blurs metadata/runtime boundary) |
| **API clarity** | Clean (metadata DTOs and runtime types are distinct) | Good (old names redirect to new names) | Weaker (indexers implicitly accept two type families) |
With Shims 1+2 only, VirtualiZarr's `manifests/array.py` import from `zarr.core.metadata.v3` is covered by Shim 2, and the `parsers/zarr.py` import from `zarr.core.chunk_grids` is covered by Shim 1. The `isinstance` checks work because both shims resolve to `RegularChunkGridMetadata`. The `cast` works because `.chunk_shape` is unchanged. So VirtualiZarr needs 0 changes with Shims 1+2. The 3 lines for cubed remain because Shim 1 resolves the import but `OrthogonalIndexer` still needs a runtime `ChunkGrid`.
### Downstream migration
Migration from `main` (where only `RegularChunkGrid` and the abstract `ChunkGrid` ABC exist):
| Old pattern (on `main`) | New pattern |
|---|---|
| `from zarr.core.chunk_grids import RegularChunkGrid` | `from zarr.core.metadata.v3 import RegularChunkGridMetadata` |
| `from zarr.core.chunk_grids import ChunkGrid` (ABC) | `from zarr.core.chunk_grids import ChunkGrid` (concrete class, different API) |
| `isinstance(cg, RegularChunkGrid)` | `isinstance(cg, RegularChunkGridMetadata)` or `grid.is_regular` on the runtime `ChunkGrid` |
| `cg.chunk_shape` on `RegularChunkGrid` | `cg.chunk_shape` on `RegularChunkGridMetadata` (unchanged) |
| `ChunkGrid.from_dict(data)` | `parse_chunk_grid(data)` from `zarr.core.metadata.v3` |
| `chunk_grid.all_chunk_coords(array_shape)` | `chunk_grid.all_chunk_coords()` (shape now stored in grid) |
| `chunk_grid.get_nchunks(array_shape)` | `chunk_grid.get_nchunks()` (shape now stored in grid) |
During the earlier [#3534](https://github.com/zarr-developers/zarr-python/pull/3534) effort (which used separate `RegularChunkGrid`/`RectilinearChunkGrid` classes), downstream PRs and issues were opened to explore compatibility:
- xarray ([#10880](https://github.com/pydata/xarray/pull/10880)), VirtualiZarr ([#877](https://github.com/zarr-developers/VirtualiZarr/pull/877)), Icechunk ([#1338](https://github.com/earth-mover/icechunk/issues/1338)), cubed ([#876](https://github.com/cubed-dev/cubed/issues/876))
These target #3534's API, not this branch's unified `ChunkGrid` design. New downstream POC branches for this design are linked in [Proofs of concepts](#proofs-of-concepts).
### Credits
This implementation builds on prior work:
- **[#3534](https://github.com/zarr-developers/zarr-python/pull/3534)** (@jhamman) — RLE helpers, validation logic, test cases, and the review discussion that shaped the architecture.
- **[#3737](https://github.com/zarr-developers/zarr-python/pull/3737)** — extent-in-grid idea (adopted per-dimension).
- **[#1483](https://github.com/zarr-developers/zarr-python/pull/1483)** — original variable chunking POC.
- **[#3736](https://github.com/zarr-developers/zarr-python/pull/3736)** — resolved by storing extent per-dimension.
## Open questions
1. **Resize defaults (deferred):** When growing a rectilinear array, should `resize()` accept an optional `chunks` parameter? See the [Resize section](#resize) for details and open design questions. Regular arrays already stay regular on resize.
2. **`ChunkSpec` complexity:** `ChunkSpec` carries both `slices` and `codec_shape`. Should the grid expose separate methods for codec vs data queries instead?
3. **`__getitem__` with slices:** Should `grid[0, :]` or `grid[0:3, :]` return a sub-grid or an iterator of `ChunkSpec`s?
4. **Uniform nested lists:** Should `chunks=[[10, 10], [20, 20]]` serialize as `"rectilinear"` (preserving user intent for future append) or `"regular"` (current behavior, collapses uniform edges)? See [User control over grid serialization format](#user-control-over-grid-serialization-format).
5. **`zarr.open` with rectilinear:** @tomwhite noted in #3534 that `zarr.open(mode="w")` doesn't support rectilinear chunks directly. This could be addressed in a follow-up.
## Proofs of concepts
- Zarr-Python:
- branch - https://github.com/maxrjones/zarr-python/tree/poc/unified-chunk-grid
- diff - https://github.com/zarr-developers/zarr-python/compare/main...maxrjones:zarr-python:poc/unified-chunk-grid?expand=1
- Xarray:
- branch - https://github.com/maxrjones/xarray/tree/poc/unified-zarr-chunk-grid
- diff - https://github.com/pydata/xarray/compare/main...maxrjones:xarray:poc/unified-zarr-chunk-grid?expand=1
- VirtualiZarr:
- branch - https://github.com/maxrjones/VirtualiZarr/tree/poc/unified-chunk-grid
- diff - https://github.com/zarr-developers/VirtualiZarr/compare/main...maxrjones:VirtualiZarr:poc/unified-chunk-grid?expand=1
- Virtual TIFF:
- branch - https://github.com/virtual-zarr/virtual-tiff/tree/poc/unified-chunk-grid
- diff - https://github.com/virtual-zarr/virtual-tiff/compare/main...poc/unified-chunk-grid?expand=1
- Cubed:
- branch - https://github.com/maxrjones/cubed/tree/poc/unified-chunk-grid
- Microbenchmarks:
- https://github.com/maxrjones/zarr-chunk-grid-tests/tree/unified-chunk-grid
zarr-python-3.2.1/docs/ 0000775 0000000 0000000 00000000000 15176357430 0014755 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/docs/_static/ 0000775 0000000 0000000 00000000000 15176357430 0016403 5 ustar 00root root 0000000 0000000 zarr-python-3.2.1/docs/_static/favicon-96x96.png 0000664 0000000 0000000 00000030652 15176357430 0021347 0 ustar 00root root 0000000 0000000 PNG
IHDR _ ` {` pHYs od tEXtSoftware www.inkscape.org< IDATx}y|Uι7&lU*
Ep\@[kTlUYVZpׯ-l, [s}~$dAk|>瓛s3g33̼!_L;[wG4kڽ3>B~INj4!oPD1۔Ec.:Џ[O觎/B}J 18]Q?~jk Pp_zkg-~)1
Ժ\Q6e`HJܗ6s\3_|77=el0"d~z"oeO U05uम8?:ՑLs\E !A @*G`EG3nyv,Ƽy ``JZ{SNJOV:sϏbgav
LI++?j]-HL2 @m4Pe? W=SRG*>s~RL+Wzh