-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Replace pickle with JSON in Auto3DSeg algo serialization #8695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Replace pickle with JSON in Auto3DSeg algo serialization #8695
Conversation
📝 WalkthroughWalkthroughThe PR replaces pickle-based Algo serialization in Auto3DSeg with JSON-based serialization. It adds Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @monai/auto3dseg/utils.py:
- Around line 474-494: The loop over paths_to_try manipulates sys.path
non-atomically (sys.path.insert/remove) around ConfigParser/
parser.get_parsed_content which can race if concurrent; fix by introducing a
module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any sys.path
modification and release it after cleanup, wrapping the path insert, parser
instantiation (ConfigParser and parser.get_parsed_content) and the removal in a
try/finally so removal always happens and the lock is released; reference the
paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.
🧹 Nitpick comments (10)
monai/apps/auto3dseg/utils.py (1)
67-70: Minor readability nit: condition can be simplified.The
or not only_trainedis a double negative. Consider inverting for clarity, though current logic is correct.monai/auto3dseg/utils.py (9)
281-304: LGTM with minor observation.Good coverage of common types. The fallback to
str(value)on line 304 silently converts unknown types—consider logging a warning for unexpected types to aid debugging.
307-312: Missing docstring parameter/return documentation.Per coding guidelines, docstrings should describe parameters and return values.
Suggested docstring
def _add_path_with_parent(paths: list[str], path: str | None) -> None: - """Add a path and its parent directory to the list if the path is a valid directory.""" + """ + Add a path and its parent directory to the list if the path is a valid directory. + + Args: + paths: List to append paths to (modified in-place). + path: Directory path to add; skipped if None or not a valid directory. + """
327-333: Hardcoded attribute list may become stale.Consider defining
SERIALIZABLE_ATTRSas a module-level constant or documenting why these specific attributes are serialized. Easier to maintain and extend.
346-348: Missing encoding parameter for file write.Explicit
encoding="utf-8"is recommended for cross-platform consistency.Fix
- with open(json_filename, "w") as f: + with open(json_filename, "w", encoding="utf-8") as f:
414-414: Unusedkwargsparameter.Static analysis (ARG001) notes
kwargsis unused. Docstring says "reserved for future use"—acceptable, but consider using_prefix convention or**_kwargsto suppress linter warnings.
446-447: Missing encoding parameter for file read.Same as write—explicit
encoding="utf-8"recommended.Fix
- with open(filename) as f: + with open(filename, encoding="utf-8") as f:
449-450: ConsiderTypeErrorfor type validation.Static analysis (TRY004) suggests
TypeErroroverValueErrorfor invalid types. Minor pedantic issue.
479-481: Simplify dictionary access.Static analysis (RUF019) notes unnecessary key check. Use
state.get("template_path")instead.Fix
algo_config: dict[str, Any] = {"_target_": target} - if "template_path" in state and state["template_path"]: - algo_config["template_path"] = state["template_path"] + if state.get("template_path"): + algo_config["template_path"] = state["template_path"]
499-502: State restoration silently skips unknown attributes.If the JSON contains an attribute not present on the algo object, it's silently ignored. This is likely intentional for forward compatibility, but logging a debug message would help troubleshoot version mismatches.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
monai/apps/auto3dseg/utils.pymonai/auto3dseg/__init__.pymonai/auto3dseg/utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/apps/auto3dseg/utils.pymonai/auto3dseg/__init__.pymonai/auto3dseg/utils.py
🧬 Code graph analysis (2)
monai/apps/auto3dseg/utils.py (2)
monai/auto3dseg/utils.py (2)
algo_from_json(414-510)algo_to_json(315-350)monai/utils/enums.py (1)
AlgoKeys(687-699)
monai/auto3dseg/__init__.py (1)
monai/auto3dseg/utils.py (3)
algo_from_json(414-510)algo_from_pickle(669-670)algo_to_json(315-350)
🪛 Ruff (0.14.10)
monai/auto3dseg/utils.py
384-384: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
391-391: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
414-414: Unused function argument: kwargs
(ARG001)
450-450: Prefer TypeError exception for invalid type
(TRY004)
450-450: Avoid specifying long messages outside the exception class
(TRY003)
455-455: Avoid specifying long messages outside the exception class
(TRY003)
480-480: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
497-497: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: build-docs
- GitHub Check: packaging
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
🔇 Additional comments (7)
monai/apps/auto3dseg/utils.py (3)
17-17: LGTM!Import updated to use new JSON-based functions.
45-56: LGTM!Clean preference logic: JSON first, pickle fallback. Good migration strategy.
75-84: LGTM!Docstring and function body updated to use JSON serialization.
monai/auto3dseg/utils.py (3)
353-411: Legacy pickle loader retained for backward compatibility.Static analysis flags pickle security (S301). Acceptable here since it's deprecated legacy support with clear warnings. No action required beyond eventual removal.
504-504:used_template_pathoverwrites state-restoredtemplate_path.Line 501-502 restores
template_pathfrom state, but line 504 immediately overwrites it withused_template_path(which could be different). Verify this is the intended behavior.
662-669: LGTM!Clean deprecated aliases delegating to new JSON functions.
monai/auto3dseg/__init__.py (1)
27-39: LGTM!New JSON functions properly exported alongside deprecated pickle aliases.
| for path in paths_to_try: | ||
| try: | ||
| if path and path not in sys.path: | ||
| sys.path.insert(0, path) | ||
|
|
||
| algo_config: dict[str, Any] = {"_target_": target} | ||
| if "template_path" in state and state["template_path"]: | ||
| algo_config["template_path"] = state["template_path"] | ||
|
|
||
| parser = ConfigParser(algo_config) | ||
| algo = parser.get_parsed_content() | ||
| used_template_path = path | ||
|
|
||
| if path and path in sys.path: | ||
| sys.path.remove(path) | ||
| break | ||
| except (ModuleNotFoundError, ImportError) as e: | ||
| if path and path in sys.path: | ||
| sys.path.remove(path) | ||
| logging.debug(f"Failed to instantiate {target} with path {path}: {e}") | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the function containing this code
sed -n '450,500p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 2103
🏁 Script executed:
# Check what function this is part of
grep -n "^def\|^class" monai/auto3dseg/utils.py | grep -B1 "474"Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Search for calls to algo_from_json or algo_from_pickle
rg "algo_from_json|algo_from_pickle" --type py -C 2Repository: Project-MONAI/MONAI
Length of output: 4135
🏁 Script executed:
# Check if there's any threading or async pattern in this file
rg "Thread|threading|asyncio|concurrent|Executor" monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Check if the function is exposed in __init__.py or used in tests
rg "algo_from_json|algo_from_pickle" --type py | head -20Repository: Project-MONAI/MONAI
Length of output: 1560
🏁 Script executed:
# Look for tests of algo_from_json
find . -path ./build -prune -o -name "*test*.py" -type f -exec grep -l "algo_from_json\|algo_from_pickle" {} \;Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Check the context of calls in hpo_gen.py to see if concurrent
sed -n '1,100p' monai/apps/auto3dseg/hpo_gen.py | head -60Repository: Project-MONAI/MONAI
Length of output: 2408
🏁 Script executed:
# Get the full context of algo_from_json and algo_from_pickle function signatures
sed -n '420,510p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 3791
sys.path manipulation lacks thread-safety guarantees, though current usage is sequential.
While sys.path.insert() and sys.path.remove() are individually atomic due to the GIL, the paired operations aren't atomic together and could race if called concurrently. Current callers invoke this sequentially during initialization (in hpo_gen.py.__init__ and utils.py iteration), so practical risk is low. Consider a lock or context manager if concurrent calls are anticipated.
🧰 Tools
🪛 Ruff (0.14.10)
480-480: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
🤖 Prompt for AI Agents
In @monai/auto3dseg/utils.py around lines 474 - 494, The loop over paths_to_try
manipulates sys.path non-atomically (sys.path.insert/remove) around
ConfigParser/ parser.get_parsed_content which can race if concurrent; fix by
introducing a module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any
sys.path modification and release it after cleanup, wrapping the path insert,
parser instantiation (ConfigParser and parser.get_parsed_content) and the
removal in a try/finally so removal always happens and the lock is released;
reference the paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@monai/auto3dseg/utils.py`:
- Around line 482-501: The current loop that tries paths_to_try inserts a path
into sys.path but only removes it on ImportError; change the logic in the for
loop around the insertion and parser.get_parsed_content() to guarantee cleanup
by using a try/finally: set a local inserted flag when you do sys.path.insert(0,
path), perform parser = ConfigParser(algo_config) and algo =
parser.get_parsed_content() in the try block, and in the finally block remove
the path from sys.path if inserted (and set used_template_path only after
success); ensure exceptions are either re-raised or handled the same way as the
previous ImportError branch so sys.path is never left polluted.
- Around line 395-407: The loop over template_paths_candidates currently inserts
p into sys.path but only removes it on ModuleNotFoundError or after successful
load; to guarantee cleanup wrap the insertion and pickle.loads(algo_bytes) in a
try/finally: set an inserted flag when you add p to sys.path, call
pickle.loads(algo_bytes) in the try block, and in the finally block remove p
from sys.path if inserted; keep the existing ModuleNotFoundError handling logic
but avoid leaving sys.path polluted for any other exception and only break the
loop after a successful load (i.e., after pickle.loads completes).
- Around line 315-358: The algo_meta_data passed into algo_to_json may contain
non-JSON-native types (e.g., numpy types) and must be converted before dumping;
update algo_to_json to apply _make_json_serializable to each value in
algo_meta_data (similar to how state is built) and use the sanitized values when
constructing the data dict (keep the existing keys: "_target_", "_state_",
"template_path", plus the serialized algo_meta_data). Ensure you do not change
the keys, only transform the values via _make_json_serializable before creating
json_filename and calling json.dump.
- Around line 281-313: The dict branch in _make_json_serializable fails when
keys are non-strings; change the dict comprehension to coerce keys to strings
(e.g., {str(k): _make_json_serializable(v) for k, v in value.items()}) so
json.dump() won't error on tuple/Path keys, and update both
_make_json_serializable and _add_path_with_parent docstrings to full
Google-style sections (Args, Returns, and Raises if applicable) describing
parameter types, return values, and any exceptions or side effects; keep the
existing behavior for arrays, numpy types, torch.Tensor, Path-like objects, and
the fallback to str(value).
🧹 Nitpick comments (1)
monai/auto3dseg/utils.py (1)
670-678: Add docstrings for deprecated alias functions.These aliases lack docstrings. Add short Google-style docstrings for API completeness. As per coding guidelines, please add docstrings with Args/Returns.
💡 Proposed fix
`@deprecated`(since="1.6", msg_suffix="Use algo_to_json instead.") def algo_to_pickle(algo: Algo, template_path: PathLike | None = None, **algo_meta_data: Any) -> str: + """Deprecated alias for algo_to_json.""" return algo_to_json(algo, template_path, **algo_meta_data) @@ `@deprecated`(since="1.6", msg_suffix="Use algo_from_json instead.") def algo_from_pickle(filename: str, template_path: PathLike | None = None, **kwargs: Any) -> Any: + """Deprecated alias for algo_from_json.""" return algo_from_json(filename, template_path, **kwargs)
| def _make_json_serializable(value: Any) -> Any: | ||
| """ | ||
| Export the Algo object to pickle file. | ||
| Args: | ||
| algo: Algo-like object. | ||
| template_path: a str path that is needed to be added to the sys.path to instantiate the class. | ||
| algo_meta_data: additional keyword to save into the dictionary, for example, model training info | ||
| such as acc/best_metrics | ||
| Convert a value to a JSON-serializable type. | ||
| Returns: | ||
| filename of the pickled Algo object | ||
| Handles numpy arrays, Path objects, torch tensors, and other common types. | ||
| """ | ||
| data = {"algo_bytes": pickle.dumps(algo), "template_path": str(template_path)} | ||
| pkl_filename = os.path.join(algo.get_output_path(), "algo_object.pkl") | ||
| for k, v in algo_meta_data.items(): | ||
| data.update({k: v}) | ||
| data_bytes = pickle.dumps(data) | ||
| with open(pkl_filename, "wb") as f_pi: | ||
| f_pi.write(data_bytes) | ||
| return pkl_filename | ||
|
|
||
|
|
||
| def algo_from_pickle(pkl_filename: str, template_path: PathLike | None = None, **kwargs: Any) -> Any: | ||
| if value is None: | ||
| return None | ||
| if isinstance(value, (str, int, float, bool)): | ||
| return value | ||
| if isinstance(value, (list, tuple)): | ||
| return [_make_json_serializable(v) for v in value] | ||
| if isinstance(value, dict): | ||
| return {k: _make_json_serializable(v) for k, v in value.items()} | ||
| if hasattr(value, "__fspath__"): # Path-like objects | ||
| return str(value) | ||
| if isinstance(value, np.ndarray): | ||
| return value.tolist() | ||
| if isinstance(value, (np.integer, np.floating)): | ||
| return value.item() | ||
| if isinstance(value, torch.Tensor): | ||
| return value.detach().cpu().numpy().tolist() | ||
| # Fallback to string representation | ||
| return str(value) | ||
|
|
||
|
|
||
| def _add_path_with_parent(paths: list[str], path: str | None) -> None: | ||
| """Add a path and its parent directory to the list if the path is a valid directory.""" | ||
| if path and os.path.isdir(str(path)): | ||
| abs_path = os.path.abspath(str(path)) | ||
| paths.append(abs_path) | ||
| paths.append(os.path.abspath(os.path.join(abs_path, ".."))) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the file
fd -t f "utils.py" | grep auto3dsegRepository: Project-MONAI/MONAI
Length of output: 117
🏁 Script executed:
# Read the specific lines from the file
sed -n '281,313p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 1315
🏁 Script executed:
# Also check surrounding context to understand the full functions
sed -n '270,325p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 2155
🏁 Script executed:
# Check if there are any tests for these functions
fd -t f test | xargs grep -l "_make_json_serializable\|_add_path_with_parent" 2>/dev/null | head -5Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Let's also verify the imports at the top of the file
head -30 monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 1031
🏁 Script executed:
# Let me verify json.dump behavior with non-string keys by testing
python3 << 'EOF'
import json
# Test with non-string keys
test_dict = {("a", "b"): "value", 1: "another"}
try:
json.dumps(test_dict)
print("No error with non-string keys")
except TypeError as e:
print(f"Error with non-string keys: {e}")
# Test with converted keys
converted_dict = {str(k): "value" for k in test_dict.keys()}
try:
result = json.dumps(converted_dict)
print(f"Success with converted keys: {result}")
except TypeError as e:
print(f"Error even with converted keys: {e}")
EOFRepository: Project-MONAI/MONAI
Length of output: 149
🏁 Script executed:
# Check if _make_json_serializable is used with json.dump/dumps anywhere
rg -A 5 "_make_json_serializable" monai/auto3dseg/utils.py | head -40Repository: Project-MONAI/MONAI
Length of output: 834
🏁 Script executed:
# Check the broader context of how this function is used
rg -B 3 -A 3 "json.dump" monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 257
🏁 Script executed:
# Check function signatures to understand what arguments they expect
rg -B 2 "def algo_to_json" monai/auto3dseg/utils.py -A 10Repository: Project-MONAI/MONAI
Length of output: 564
🏁 Script executed:
# Verify json.dump behavior with non-string dict keys
python3 << 'EOF'
import json
# Test case: dict with non-string key
test_data = {("path", "key"): "value", "normal": "data"}
try:
json.dumps(test_data)
print("JSON serialization succeeded")
except TypeError as e:
print(f"JSON serialization failed: {e}")
# Test the fix
converted = {str(k): v for k, v in test_data.items()}
try:
result = json.dumps(converted)
print(f"Fixed version works: {result}")
except TypeError as e:
print(f"Fixed version failed: {e}")
EOFRepository: Project-MONAI/MONAI
Length of output: 211
🏁 Script executed:
# Search for tests related to these functions
find . -name "*test*" -type f -exec grep -l "_make_json_serializable\|algo_to_json" {} \; 2>/dev/null | head -5Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Check if there are any existing tests in the auto3dseg module
fd -t f "test.*\.py" | grep auto3dseg | head -5Repository: Project-MONAI/MONAI
Length of output: 215
🏁 Script executed:
# Look for imports used in algo_to_json function to understand full context
sed -n '330,380p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 1636
Handle non-string dict keys and expand docstrings.
json.dump() fails with non-string dict keys (e.g., tuple, Path). Convert keys to str in the dict comprehension. Additionally, add full Google-style Args/Returns (and Raises if applicable) sections to both _make_json_serializable and _add_path_with_parent per coding guidelines.
💡 Proposed fix
- if isinstance(value, dict):
- return {k: _make_json_serializable(v) for k, v in value.items()}
+ if isinstance(value, dict):
+ return {str(k): _make_json_serializable(v) for k, v in value.items()}🤖 Prompt for AI Agents
In `@monai/auto3dseg/utils.py` around lines 281 - 313, The dict branch in
_make_json_serializable fails when keys are non-strings; change the dict
comprehension to coerce keys to strings (e.g., {str(k):
_make_json_serializable(v) for k, v in value.items()}) so json.dump() won't
error on tuple/Path keys, and update both _make_json_serializable and
_add_path_with_parent docstrings to full Google-style sections (Args, Returns,
and Raises if applicable) describing parameter types, return values, and any
exceptions or side effects; keep the existing behavior for arrays, numpy types,
torch.Tensor, Path-like objects, and the fallback to str(value).
| def algo_to_json(algo: Algo, template_path: PathLike | None = None, **algo_meta_data: Any) -> str: | ||
| """ | ||
| Import the Algo object from a pickle file. | ||
| Export the Algo object to a JSON file (pickle-free serialization). | ||
| Args: | ||
| pkl_filename: the name of the pickle file. | ||
| template_path: a folder containing files to instantiate the Algo. Besides the `template_path`, | ||
| this function will also attempt to use the `template_path` saved in the pickle file and a directory | ||
| named `algorithm_templates` in the parent folder of the folder containing the pickle file. | ||
| algo: Algo-like object (typically BundleAlgo or subclass). | ||
| template_path: path needed for sys.path setup when loading custom Algo classes. | ||
| algo_meta_data: additional metadata to save (e.g., best_metric, score). | ||
| Returns: | ||
| algo: the Algo object saved in the pickle file. | ||
| algo_meta_data: additional keyword saved in the pickle file, for example, acc/best_metrics. | ||
| Raises: | ||
| ValueError if the pkl_filename does not contain a dict, or the dict does not contain `algo_bytes`. | ||
| ModuleNotFoundError if it is unable to instantiate the Algo class. | ||
| Filename of the saved Algo object (algo_object.json). | ||
| """ | ||
| state = {} | ||
| for attr in [ | ||
| "template_path", | ||
| "data_stats_files", | ||
| "data_list_file", | ||
| "mlflow_tracking_uri", | ||
| "mlflow_experiment_name", | ||
| "output_path", | ||
| "name", | ||
| "best_metric", | ||
| "fill_records", | ||
| "device_setting", | ||
| ]: | ||
| if hasattr(algo, attr): | ||
| state[attr] = _make_json_serializable(getattr(algo, attr)) | ||
|
|
||
| # Build target string for dynamic class instantiation | ||
| cls = algo.__class__ | ||
| target = f"{cls.__module__}.{cls.__name__}" | ||
|
|
||
| data: dict[str, Any] = { | ||
| "_target_": target, | ||
| "_state_": state, | ||
| "template_path": str(template_path) if template_path else None, | ||
| **algo_meta_data, | ||
| } | ||
|
|
||
| json_filename = os.path.join(algo.get_output_path(), "algo_object.json") | ||
| with open(json_filename, "w") as f: | ||
| json.dump(data, f, separators=(",", ":")) | ||
|
|
||
| return json_filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the utils.py file to understand the _make_json_serializable function
fd "utils.py" monai/auto3dseg --type fRepository: Project-MONAI/MONAI
Length of output: 182
🏁 Script executed:
# Search for the _make_json_serializable function
rg "_make_json_serializable" monai/auto3dseg/Repository: Project-MONAI/MONAI
Length of output: 413
🏁 Script executed:
# Let's examine the algo_to_json function and surrounding context
cat -n monai/auto3dseg/utils.py | sed -n '300,380p'Repository: Project-MONAI/MONAI
Length of output: 3306
🏁 Script executed:
# Let's see the full _make_json_serializable function
cat -n monai/auto3dseg/utils.py | sed -n '280,305p'Repository: Project-MONAI/MONAI
Length of output: 1146
🏁 Script executed:
# Check if there are any test cases for algo_to_json that might use numpy data
rg "algo_to_json" monai/auto3dseg/ -A 5 -B 2Repository: Project-MONAI/MONAI
Length of output: 2616
🏁 Script executed:
# Let's check if json.dump would fail with numpy scalars/tensors
python3 << 'EOF'
import json
import numpy as np
# Test 1: numpy scalar
try:
data = {"value": np.float32(1.5)}
json.dumps(data)
print("Test 1 PASS: numpy scalar serialized directly")
except TypeError as e:
print(f"Test 1 FAIL: numpy scalar causes TypeError: {e}")
# Test 2: torch tensor
try:
import torch
data = {"value": torch.tensor(1.5)}
json.dumps(data)
print("Test 2 PASS: torch tensor serialized directly")
except TypeError as e:
print(f"Test 2 FAIL: torch tensor causes TypeError: {e}")
# Test 3: nested structure with numpy
try:
data = {"nested": {"value": np.int64(42)}}
json.dumps(data)
print("Test 3 PASS: nested numpy scalar serialized directly")
except TypeError as e:
print(f"Test 3 FAIL: nested numpy scalar causes TypeError: {e}")
EOFRepository: Project-MONAI/MONAI
Length of output: 149
Serialize algo_meta_data before dumping to JSON.
algo_meta_data can contain numpy scalars/tensors and will raise TypeError when passed to json.dump. Apply _make_json_serializable to each value, matching the treatment of state attributes above.
Proposed fix
- data: dict[str, Any] = {
+ meta_data = {k: _make_json_serializable(v) for k, v in algo_meta_data.items()}
+ data: dict[str, Any] = {
"_target_": target,
"_state_": state,
"template_path": str(template_path) if template_path else None,
- **algo_meta_data,
+ **meta_data,
}🤖 Prompt for AI Agents
In `@monai/auto3dseg/utils.py` around lines 315 - 358, The algo_meta_data passed
into algo_to_json may contain non-JSON-native types (e.g., numpy types) and must
be converted before dumping; update algo_to_json to apply
_make_json_serializable to each value in algo_meta_data (similar to how state is
built) and use the sanitized values when constructing the data dict (keep the
existing keys: "_target_", "_state_", "template_path", plus the serialized
algo_meta_data). Ensure you do not change the keys, only transform the values
via _make_json_serializable before creating json_filename and calling json.dump.
b106e24 to
5d22044
Compare
- Rename algo_to_pickle() to algo_to_json(): - Serialize algo state as JSON with _target_ for class reconstruction - Uses MONAI's ConfigParser pattern for dynamic instantiation - Truly pickle-free serialization (for algo metadata; model weights still use torch.save) - Add _make_json_serializable() to handle numpy arrays, tensors, Path objects - Rename algo_from_pickle() to algo_from_json() - Add deprecated aliases for backward compatibility - Update import_bundle_algo_history() to prefer .json files Fixes Project-MONAI#8586 Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
5d22044 to
5d39780
Compare
Summary
algo_to_pickle()withalgo_to_json()using compact JSON + MONAI's existing_target_ConfigParser pattern for truly pickle-free algo metadata serialization_make_json_serializable()helper to handle numpy arrays, tensors, Path objectsalgo_from_json()can still load legacy.pklfiles (with deprecation warning)algo_to_pickle()/algo_from_pickle()as deprecated aliasesNote: Model weights still use
torch.saveseparately—this PR focuses on the algo object serialization.Test plan
./runtests.sh --codeformat,./runtests.sh --ruff)_make_json_serializable()and_add_path_with_parent()helperstests/apps/test_auto3dseg_bundlegen.py(requires optional deps)Fixes #8586