Chunkwise image loader #279

lucas-diedrich · 2025-02-17T17:02:53Z

Description

This PR addresses the challenge that the currently implemented and planned image loaders require loading imaging data entirely into memory, typically as NumPy arrays. Given the large size of microscopy datasets, this is not always feasible.

To mitigate this issue, and as discussed with @LucaMarconato, this PR aims to introduce a generalizable approach for reading large microscopy files in chunks, enabling efficient handling of data that does not fit into memory.

Some related discussions.

Strategy

In this PR, we focus on .tiff images, as implemented in the _tiff_to_chunks function.

Get a lazy representation of the image via a suitable reader function (here: tifffile.memmap)
Pre-define chunks that fit into memory, based on the dimensions of the image (_compute_chunks)
Load small chunks via a custom reader function and pass the chunks to dask.array which is memory-mapped and avoids memory overflow (_read_chunks)
Reassembling the chunks into a dask.array (via dask.array.block)
Parse to Image2DModel.

The strategy is implemented in

src/spatialdata_io/readers/generic.py and
src/spatialdata_io/readers/_utils/_image.py

Future extensions

The strategy can be implemented for any image type, as long as it is possible to implement

a lazy image-data loader
define a custom reader function

We have implemented similar readers for openslide-compatible whole slide images and the Carl-Zeiss microscopy format.

… image-reader-chunkwise

codecov-commenter · 2025-02-17T17:04:55Z

Codecov Report

❌ Patch coverage is 95.69892% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.91%. Comparing base (2ebfab0) to head (4b4a00e).
⚠️ Report is 16 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata_io/readers/generic.py	91.11%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #279      +/-   ##
==========================================
+ Coverage   55.16%   62.91%   +7.74%     
==========================================
  Files          26       27       +1     
  Lines        2844     3117     +273     
==========================================
+ Hits         1569     1961     +392     
+ Misses       1275     1156     -119

Files with missing lines	Coverage Δ
src/spatialdata_io/readers/_utils/_image.py	`100.00% <100.00%> (ø)`
src/spatialdata_io/readers/generic.py	`88.88% <91.11%> (+3.17%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/spatialdata_io/readers/_utils/_image.py

melonora

Thanks for your contribution! I have 2 minor suggestions. I also saw that you use the width by height convention. Personally, I don't have a strong opinion here, though we could also stick to array api conventions. @LucaMarconato WDYT? Pre-approving for now.

src/spatialdata_io/readers/generic.py

melonora

sorry had to change due to rethinking memmap. This does not always work, for example when dealing with compressed tiffs as far as I am aware.

Co-authored-by: Wouter-Michiel Vierdag <w-mv@hotmail.com>

lucas-diedrich · 2025-05-04T07:56:56Z

Thanks! Addressed your comments

Naming, simplification 46ed3e5, 99731fe
Read compressed images with dask_image.imread instead of tifffile 9e057de

lucas-diedrich · 2026-01-12T17:33:01Z

Hi @LucaMarconato - Just saw that you pushed some updates. Could you comment on the current implementation/is this PR still interesting for you?

LucaMarconato · 2026-01-13T13:19:57Z

@lucas-diedrich yes still interested in it, I have time now to go through it. I added my "code review" in terms of "TODO" items for myself. I'll check the code and push some modifications if I see that some minor fixes are needed; I'll comment eventual larger changes, but the code looks good. I'll also do a benchmark with asv for performance.

LucaMarconato · 2026-01-13T13:21:11Z

There was a problem with the dimension of the returned data. Image2DModel.parse() always returns a cyx, but the variable reference in test_read_tiff() was showing a non-cyx image. Now it's fixed.

LucaMarconato · 2026-01-13T16:19:40Z

@lucas-diedrich I'm done with edits, please double check the changes if you have time. The main comment, added in the notes in _read_chunks(), is the following:

    Notes
    -------
    As shown in `_compute_chunks()`, `coords` are in the form `(x, y,
    width, height)`. In that function, the inner list (dim = -1) iterates over `y`
    values, and the outer list (dim = -2) iterates over `x` values. In `_read_chunks(
    )`, we use the more common `(y, x)` ordering: the inner list (dim = -1) iterates
    over `x` values, and the outer list (dim = -2) iterates over `y` values.

    This mismatch can be confusing. A straightforward fix is to standardize `coords`
    to `(y, x, height, width)` instead of `(x, y, width, height)`.

In summary, redefining coords as y, x, height, width would make the code more readable, but it works also as it is, so no strong opinion.

I will now do the benchmark with asv.

…e computation order of chunks/assembly

lucas-diedrich · 2026-01-16T08:53:42Z

Hi @LucaMarconato, thanks for implementing the changes and apologies for the confusing coords convention!

The tests are currently failing when I pass an asymmetric chunk size (e.g (29, 71), ((30, 50) in test_read_tiff), I presume as there is a subtle error with the dimension ordering. I'll look into it.

…tion. Switch axes order everywhere from (x, y) to (y, x)

…d documentation 1. Enforce standard dimension order convention (y, x). 2. Change local variable names to better distinguish between chunk-level coordinates and pixel-level coordinates - All chunk indices are indicated as such with the prefix. - The pixel-coordinates of individual chunks are now consistently named (y, x, height, width)

…e it self-documenting

lucas-diedrich added 11 commits February 16, 2025 13:36

feature: Initial lazy tiff reader

8027c0b

Updated comments

7ff1d23

Updated comments

826133a

Move utility functions to designated submodule readers._utils._image

df258bc

Initial tests utils

db0d782

Fixes edge cases for min coordinate

c03932f

Added test for negative coordinates

339cbd8

Add support for png/jpg again

24c6eec

Add initial test

dbdc7c7

Fix: Fix jpeg and png reader, fix issues with local variable name

da98469

Merge branch 'main' of https://github.com/scverse/spatialdata-io into…

b7e5874

… image-reader-chunkwise

lucas-diedrich marked this pull request as draft February 17, 2025 17:03

lucas-diedrich marked this pull request as ready for review March 21, 2025 15:44

melonora reviewed Mar 24, 2025

View reviewed changes

src/spatialdata_io/readers/_utils/_image.py Outdated Show resolved Hide resolved

melonora reviewed Mar 24, 2025

View reviewed changes

src/spatialdata_io/readers/_utils/_image.py Outdated Show resolved Hide resolved

melonora approved these changes Mar 24, 2025

View reviewed changes

melonora reviewed Mar 24, 2025

View reviewed changes

src/spatialdata_io/readers/generic.py Outdated Show resolved Hide resolved

melonora requested changes Mar 24, 2025

View reviewed changes

lucas-diedrich and others added 6 commits May 2, 2025 17:41

Update src/spatialdata_io/readers/_utils/_image.py

2349be7

Co-authored-by: Wouter-Michiel Vierdag <w-mv@hotmail.com>

[Refactor|API] Rename dimensions to shape to stick to numpy convention

46ed3e5

[Refactor] Make suggested simplification of code, suggested by @melonora

99731fe

[Test] Add test for compressed tiffs

f03ca8e

[Fix] Account for compressed images

9e057de

[Refactor] Remove unnecessary type hint

c05b718

lucas-diedrich requested a review from melonora September 1, 2025 07:12

LucaMarconato added 2 commits January 12, 2026 17:16

Merge branch 'main' into image-reader-chunkwise

5705515

fix pre-commit

9f3cc3c

This was referenced Jan 12, 2026

[FEATURE] Direct Zarr Streaming for Memory-Efficient MSI Conversion Tomatokeftes/thyra#68

Open

feat: streaming converter for memory-efficient large dataset conversion Tomatokeftes/thyra#69

Draft

fix transpose in image(); wip code review

9d44f25

LucaMarconato added 5 commits January 13, 2026 15:18

add test for dask-image fallback for compressed tiffs

c14cb22

remove unused min_coordinate

a7a2b92

fix wrong dimension _compute_chunks(); cover with test

80a931d

np._int -> np.number

fc26342

fix indices in _read_chunks()

3b296da

LucaMarconato and others added 4 commits January 13, 2026 17:22

better english

f7ad81a

better docstring

7f5b7ce

wip benchmark (bugs)

0c482ac

[Test] Use small assymetric chunk sizes to capture any issues with th…

1466596

…e computation order of chunks/assembly

lucas-diedrich and others added 11 commits January 16, 2026 10:16

Add comment to clarify use of asymmetric chunk sizes in test_read_tiff

a2930d5

Follow standard convention of image dimensions throughout reader func…

a9c2a2b

…tion. Switch axes order everywhere from (x, y) to (y, x)

[Fix] Shape dimensions were inversed. Fix shape specification and mak…

5d9ece2

…e it self-documenting

chore: Remove Note and TODO

21f8fc6

Clarify documentation

93554e7

fix pre-commit benchmark_image

5a20f41

wip fix chunks

a890951

improve chunks support

19df325

benchmark for image() with synthetic data

163d8b0

fix pre-commit

4b4a00e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chunkwise image loader #279

Chunkwise image loader #279

Uh oh!

lucas-diedrich commented Feb 17, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

melonora left a comment

Uh oh!

Uh oh!

melonora left a comment

Uh oh!

lucas-diedrich commented May 4, 2025 •

edited

Loading

Uh oh!

lucas-diedrich commented Jan 12, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026 •

edited

Loading

Uh oh!

lucas-diedrich commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Chunkwise image loader #279

Are you sure you want to change the base?

Chunkwise image loader #279

Uh oh!

Conversation

lucas-diedrich commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Strategy

Future extensions

Uh oh!

codecov-commenter commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

melonora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

melonora left a comment

Choose a reason for hiding this comment

Uh oh!

lucas-diedrich commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucas-diedrich commented Jan 12, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026

Uh oh!

LucaMarconato commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucas-diedrich commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucas-diedrich commented Feb 17, 2025 •

edited

Loading

codecov-commenter commented Feb 17, 2025 •

edited

Loading

lucas-diedrich commented May 4, 2025 •

edited

Loading

LucaMarconato commented Jan 13, 2026 •

edited

Loading

lucas-diedrich commented Jan 16, 2026 •

edited

Loading