Skip to content

Conversation

@YooSunYoung
Copy link
Member

Fixes #128

I limited the scope of coordinate handling to keep it simple.
We can always handle the coordinates separately in more complicated situations if needed.

@nvaytet can you please have a look...?

@YooSunYoung YooSunYoung requested a review from nvaytet January 20, 2026 09:08
)


def _has_bin_edge(img: sc.DataArray, coord_name: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of implementing this, you can just use da.coords.is_edges('x') which will return True if the x coord is a bin-edge coord.

image: sc.Variable | sc.DataArray,
sizes: dict[str, int],
method: str | Callable = 'sum',
keep: tuple[str, ...] = ('position',),
Copy link
Member

@nvaytet nvaytet Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just not have an argument (I also find the position to be a strange default, as images will often have other coords than position which is more ESS-data specific?).
(I know we had some special handling of 'position' in the original code, and I understand it's not something you introduced, but I think it was a bad idea in the first place).

Instead, just keep the coordinates that it can keep, and throw away the others?
Basically keep all coordinates that are 1d and sorted. If you want to be even more picky, you could keep only coords that are linspace, as computing a mean of midpoints that are not evenly spaced is probably not completely correct... But I would say 1d and sorted is probably good for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, just keep the coordinates that it can keep, and throw away the others?
Basically keep all coordinates that are 1d and sorted. If you want to be even more picky, you could keep only coords that are linspace, as computing a mean of midpoints that are not evenly spaced is probably not completely correct... But I would say 1d and sorted is probably good for now?

Yeah that was what I was worried, non-linear coordinates.
But some coordinates, like detector_number, doesn't make sense to keep...

return out


def _is_1d_sorted_bin_edge(img: sc.DataArray, coord_name: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a check to verify that the coord is sorted? (which is indeed important).
Use sc.issorted(da, dim='x').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌚 wow... that's stupid of me. I added a test too.

if _is_1d_sorted_bin_edge(image, keep_coordinate):
orig_dim = next(iter(image.coords[keep_coordinate].dims))
reduced_dim = next(iter(reduced_dims))
out.coords[keep_coordinate] = sc.concat(
Copy link
Member

@nvaytet nvaytet Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would implement this differently:

Midpoint coords:

In the case of midpoint coords, use the blocked data to compute (before the reduction operation) a mean of the coords. So something like

for dim in sizes:
    # check first that coord exists, is sorted and 1d

    # Then compute mean over extra dims that were not in import for
    out.coords[dim] = blocked.coords[dim].mean(set(blocked.coords[dim].dims) - set(image.dims))

Note that here, to get the correct mean, the dimension we should mean over should be the inner dim. Thankfully, blockify already does the folding in the correct order (putting the dim first in out.fold({dim: -1, ...)), so we are fine here.

Bin edge coords:

I think in this case we just need to use a stride to sample the original coords and add them to the output?
So could be

for dim, stride in sizes.items():
    # check that coord exists and is bin edges

    out.coords[dim] = image.coords[dim][dim, ::stride]

You probably have to combine the 2 cases above inside the same for loop, but that should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the stride part but I didn't understand the first part about the midpoint coord...

out.coords[dim] won't work unless the coordinate name is same as the dimension name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out.coords[dim] won't work unless the coordinate name is same as the dimension name.

True, so instead we need to loop over all coords and check which ones have dims that have been folded.

I didn't understand the first part about the midpoint coord...

Which part was not clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an implementation that seems to work (tested on a couple of simple cases):

def resample(
    image: sc.Variable | sc.DataArray,
    sizes: dict[str, int],
    method: str | Callable = 'sum',
) -> sc.Variable | sc.DataArray:

    blocked = blockify(image, sizes=sizes)
    _method = getattr(sc, method) if isinstance(method, str) else method
    out = _method(blocked, set(blocked.dims) - set(image.dims))

    dim_set = set(sizes)
    for name, coord in image.coords.items():
        # Check that coord dim was folded, and that it can be kept
        if (not (set(coord.dims) & dim_set)) or (coord.ndim > 1) or (not sc.issorted(coord, dim=coord.dim)):
            continue

        if image.coords.is_edges(name):
            out.coords[name] = image.coords[name][::sizes[coord.dim]]
        else:
            out.coords[name] = blocked.coords[name].mean(set(blocked.coords[name].dims) - dim_set)

    return out

Does this help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that we should do it automatically though...
So you're suggesting that we should drop all non-1d coordinates even though they are not bin edges?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think whether they are bin edges or not has any influence?
I was thinking this would add back in the coords that had been dropped during the folding and reducing (at least the ones that are 1d and sorted). But we should double-check that this is indeed the case.

If I fold and then sum for the x dimension, all coords that depend on the x dimension are dropped in the implementation that is currently on main. All coords that do not depend on x are preserved.
You can for example do

da = load_scitiff(siemens_star_path())["image"]
resampled = img.tools.resample(da, sizes={'x': 2})

and resampled will have coords y and t.

With the main implementation, resampled is

<scipp.DataArray>
Dimensions: Sizes[t:3, y:1280, x:640, ]
Coordinates:
* t                           int64              [s]  (t)  [0, 1, 2]
* y                           int64  [dimensionless]  (y)  [0, 1, ..., 1278, 1279]
Data:
                            float32         [counts]  (t, y, x)  [510, 510, ..., 510, 510]

while with the new one I proposed here, it is now

<scipp.DataArray>
Dimensions: Sizes[t:3, y:1280, x:640, ]
Coordinates:
* t                           int64              [s]  (t)  [0, 1, 2]
* x                         float64  [dimensionless]  (x)  [0.5, 2.5, ..., 1276.5, 1278.5]
* y                           int64  [dimensionless]  (y)  [0, 1, ..., 1278, 1279]
Data:
                            float32         [counts]  (t, y, x)  [510, 510, ..., 510, 510]

Was this what you wanted resample to produce?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think whether they are bin edges or not has any influence?

I thought users might want to keep multi-dimensional coordinates as well after resampling the image.
But bin-edges are more difficult to handle if they are bin-edges in one dimension and non-bin-edges in the other dimension so that's why I wanted to ignore them if they are multi-dimensional.

But I don't have a strong use-case of the multi-dimensional coordinate so we can ignore all multi-dimensional coordinates...

And I thought it'd be waste of time to automatically check all coordinates in the folded dimension if they are sorted but if it's only 1 dimensinoal it's probably fine....?

Was this what you wanted resample to produce?

Yes. Wasn't that what mine did too...? Just had to specify which coordinate to keep.

image: sc.Variable | sc.DataArray,
sizes: dict[str, int],
method: str | Callable = 'sum',
keep: tuple[str, ...] = ('position',),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove arg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the argument at all or just the default argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant remove the keep arg, as in resample.

)
block_sizes[dim] = image.sizes[dim] // size
return resample(image, sizes=block_sizes, method=method)
return resample(image, sizes=block_sizes, method=method, keep=keep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, maybe we should also add to our docs the option of using image.rebin() if the image has bin edges, which can be used to create a new image of any size, not just sizes divisible by the original size?

Or maybe we can add a rebin function in this module that would auto-add bin edges if the input doesn't have them?
One could imagine this would just be inside resize, but since rebin can really only safely be used for the sum operation and not other ops like mean, I think it might be good to keep them as separate functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For x, y resizing, it's more intuitive to have these resize/resample function separately I think...

But tof or event_time_offset coordinate will be very likely to be bin-edges so rebin will make more sense for them. So maybe it's just worth mentioning that along t timension, use rebin instead of resize...?

@YooSunYoung
Copy link
Member Author

YooSunYoung commented Jan 21, 2026

@nvaytet I had to add couple more conditions in the if statements to cover some cases, like when resample sizes are just 1... doing nothing on that dimension.

Other than that, I made it work like you said above.
Can you please have another look?


for name in _dropped_cnames:
coord = image.coords[name]
if _sizes_not_changed(coord, sizes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can't we return earlier? In the case of a 1 in the sizes, we shouldn't need to fold and reduce either?
Can we instead remove any 1s found in the sizes at the start of the function?
Then, if sizes ends up being empty, just return the original image (or maybe a copy of it?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll update it in that way.

YooSunYoung and others added 3 commits January 22, 2026 11:41
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com>
Comment on lines 102 to 106
# Filter the resample sizes first.
sizes = {dim: size for dim, size in sizes.items() if size != 1}
if not sizes:
return image.copy()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the sizes are filtered here so we don't have to check if the coordinates are folded later.

@YooSunYoung YooSunYoung requested a review from nvaytet January 22, 2026 12:08
# Filter the resample sizes first.
sizes = {dim: size for dim, size in sizes.items() if size != 1}
if not sizes:
return image.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should return a shallow copy instead? (using deep=False)
This is potentially expensive if the data is large, for an operation which should do nothing?

@YooSunYoung YooSunYoung enabled auto-merge January 22, 2026 14:19
@YooSunYoung YooSunYoung merged commit 7386f2b into main Jan 22, 2026
4 checks passed
@YooSunYoung YooSunYoung deleted the blockify-coordinate branch January 22, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic coordinate handling for resize

3 participants