-
Notifications
You must be signed in to change notification settings - Fork 1
Keep coordinates after resizing/resampling. #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ess/imaging/tools/analysis.py
Outdated
| ) | ||
|
|
||
|
|
||
| def _has_bin_edge(img: sc.DataArray, coord_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of implementing this, you can just use da.coords.is_edges('x') which will return True if the x coord is a bin-edge coord.
src/ess/imaging/tools/analysis.py
Outdated
| image: sc.Variable | sc.DataArray, | ||
| sizes: dict[str, int], | ||
| method: str | Callable = 'sum', | ||
| keep: tuple[str, ...] = ('position',), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just not have an argument (I also find the position to be a strange default, as images will often have other coords than position which is more ESS-data specific?).
(I know we had some special handling of 'position' in the original code, and I understand it's not something you introduced, but I think it was a bad idea in the first place).
Instead, just keep the coordinates that it can keep, and throw away the others?
Basically keep all coordinates that are 1d and sorted. If you want to be even more picky, you could keep only coords that are linspace, as computing a mean of midpoints that are not evenly spaced is probably not completely correct... But I would say 1d and sorted is probably good for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, just keep the coordinates that it can keep, and throw away the others?
Basically keep all coordinates that are 1d and sorted. If you want to be even more picky, you could keep only coords that are linspace, as computing a mean of midpoints that are not evenly spaced is probably not completely correct... But I would say 1d and sorted is probably good for now?
Yeah that was what I was worried, non-linear coordinates.
But some coordinates, like detector_number, doesn't make sense to keep...
src/ess/imaging/tools/analysis.py
Outdated
| return out | ||
|
|
||
|
|
||
| def _is_1d_sorted_bin_edge(img: sc.DataArray, coord_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing a check to verify that the coord is sorted? (which is indeed important).
Use sc.issorted(da, dim='x').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌚 wow... that's stupid of me. I added a test too.
src/ess/imaging/tools/analysis.py
Outdated
| if _is_1d_sorted_bin_edge(image, keep_coordinate): | ||
| orig_dim = next(iter(image.coords[keep_coordinate].dims)) | ||
| reduced_dim = next(iter(reduced_dims)) | ||
| out.coords[keep_coordinate] = sc.concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would implement this differently:
Midpoint coords:
In the case of midpoint coords, use the blocked data to compute (before the reduction operation) a mean of the coords. So something like
for dim in sizes:
# check first that coord exists, is sorted and 1d
# Then compute mean over extra dims that were not in import for
out.coords[dim] = blocked.coords[dim].mean(set(blocked.coords[dim].dims) - set(image.dims))Note that here, to get the correct mean, the dimension we should mean over should be the inner dim. Thankfully, blockify already does the folding in the correct order (putting the dim first in out.fold({dim: -1, ...)), so we are fine here.
Bin edge coords:
I think in this case we just need to use a stride to sample the original coords and add them to the output?
So could be
for dim, stride in sizes.items():
# check that coord exists and is bin edges
out.coords[dim] = image.coords[dim][dim, ::stride]You probably have to combine the 2 cases above inside the same for loop, but that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the stride part but I didn't understand the first part about the midpoint coord...
out.coords[dim] won't work unless the coordinate name is same as the dimension name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out.coords[dim] won't work unless the coordinate name is same as the dimension name.
True, so instead we need to loop over all coords and check which ones have dims that have been folded.
I didn't understand the first part about the midpoint coord...
Which part was not clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an implementation that seems to work (tested on a couple of simple cases):
def resample(
image: sc.Variable | sc.DataArray,
sizes: dict[str, int],
method: str | Callable = 'sum',
) -> sc.Variable | sc.DataArray:
blocked = blockify(image, sizes=sizes)
_method = getattr(sc, method) if isinstance(method, str) else method
out = _method(blocked, set(blocked.dims) - set(image.dims))
dim_set = set(sizes)
for name, coord in image.coords.items():
# Check that coord dim was folded, and that it can be kept
if (not (set(coord.dims) & dim_set)) or (coord.ndim > 1) or (not sc.issorted(coord, dim=coord.dim)):
continue
if image.coords.is_edges(name):
out.coords[name] = image.coords[name][::sizes[coord.dim]]
else:
out.coords[name] = blocked.coords[name].mean(set(blocked.coords[name].dims) - dim_set)
return outDoes this help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced that we should do it automatically though...
So you're suggesting that we should drop all non-1d coordinates even though they are not bin edges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think whether they are bin edges or not has any influence?
I was thinking this would add back in the coords that had been dropped during the folding and reducing (at least the ones that are 1d and sorted). But we should double-check that this is indeed the case.
If I fold and then sum for the x dimension, all coords that depend on the x dimension are dropped in the implementation that is currently on main. All coords that do not depend on x are preserved.
You can for example do
da = load_scitiff(siemens_star_path())["image"]
resampled = img.tools.resample(da, sizes={'x': 2})and resampled will have coords y and t.
With the main implementation, resampled is
<scipp.DataArray>
Dimensions: Sizes[t:3, y:1280, x:640, ]
Coordinates:
* t int64 [s] (t) [0, 1, 2]
* y int64 [dimensionless] (y) [0, 1, ..., 1278, 1279]
Data:
float32 [counts] (t, y, x) [510, 510, ..., 510, 510]
while with the new one I proposed here, it is now
<scipp.DataArray>
Dimensions: Sizes[t:3, y:1280, x:640, ]
Coordinates:
* t int64 [s] (t) [0, 1, 2]
* x float64 [dimensionless] (x) [0.5, 2.5, ..., 1276.5, 1278.5]
* y int64 [dimensionless] (y) [0, 1, ..., 1278, 1279]
Data:
float32 [counts] (t, y, x) [510, 510, ..., 510, 510]
Was this what you wanted resample to produce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think whether they are bin edges or not has any influence?
I thought users might want to keep multi-dimensional coordinates as well after resampling the image.
But bin-edges are more difficult to handle if they are bin-edges in one dimension and non-bin-edges in the other dimension so that's why I wanted to ignore them if they are multi-dimensional.
But I don't have a strong use-case of the multi-dimensional coordinate so we can ignore all multi-dimensional coordinates...
And I thought it'd be waste of time to automatically check all coordinates in the folded dimension if they are sorted but if it's only 1 dimensinoal it's probably fine....?
Was this what you wanted resample to produce?
Yes. Wasn't that what mine did too...? Just had to specify which coordinate to keep.
src/ess/imaging/tools/analysis.py
Outdated
| image: sc.Variable | sc.DataArray, | ||
| sizes: dict[str, int], | ||
| method: str | Callable = 'sum', | ||
| keep: tuple[str, ...] = ('position',), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the argument at all or just the default argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant remove the keep arg, as in resample.
src/ess/imaging/tools/analysis.py
Outdated
| ) | ||
| block_sizes[dim] = image.sizes[dim] // size | ||
| return resample(image, sizes=block_sizes, method=method) | ||
| return resample(image, sizes=block_sizes, method=method, keep=keep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, maybe we should also add to our docs the option of using image.rebin() if the image has bin edges, which can be used to create a new image of any size, not just sizes divisible by the original size?
Or maybe we can add a rebin function in this module that would auto-add bin edges if the input doesn't have them?
One could imagine this would just be inside resize, but since rebin can really only safely be used for the sum operation and not other ops like mean, I think it might be good to keep them as separate functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For x, y resizing, it's more intuitive to have these resize/resample function separately I think...
But tof or event_time_offset coordinate will be very likely to be bin-edges so rebin will make more sense for them. So maybe it's just worth mentioning that along t timension, use rebin instead of resize...?
|
@nvaytet I had to add couple more conditions in the if statements to cover some cases, like when resample sizes are just 1... doing nothing on that dimension. Other than that, I made it work like you said above. |
src/ess/imaging/tools/analysis.py
Outdated
|
|
||
| for name in _dropped_cnames: | ||
| coord = image.coords[name] | ||
| if _sizes_not_changed(coord, sizes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can't we return earlier? In the case of a 1 in the sizes, we shouldn't need to fold and reduce either?
Can we instead remove any 1s found in the sizes at the start of the function?
Then, if sizes ends up being empty, just return the original image (or maybe a copy of it?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'll update it in that way.
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com>
| # Filter the resample sizes first. | ||
| sizes = {dim: size for dim, size in sizes.items() if size != 1} | ||
| if not sizes: | ||
| return image.copy() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the sizes are filtered here so we don't have to check if the coordinates are folded later.
src/ess/imaging/tools/analysis.py
Outdated
| # Filter the resample sizes first. | ||
| sizes = {dim: size for dim, size in sizes.items() if size != 1} | ||
| if not sizes: | ||
| return image.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should return a shallow copy instead? (using deep=False)
This is potentially expensive if the data is large, for an operation which should do nothing?
Fixes #128
I limited the scope of coordinate handling to keep it simple.
We can always handle the coordinates separately in more complicated situations if needed.
@nvaytet can you please have a look...?