-
Notifications
You must be signed in to change notification settings - Fork 1
refactor(oocana): simplify __wrap_output_value with extracted helper methods #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -344,87 +344,116 @@ def __store_ref(self, handle: str): | |||||||||||||||||||
| job_id=self.job_id, | ||||||||||||||||||||
| session_id=self.session_id, | ||||||||||||||||||||
| ) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| def __is_basic_type(self, value: Any) -> bool: | ||||||||||||||||||||
| return isinstance(value, (int, float, str, bool)) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| def __wrap_output_value(self, handle: str, value: Any): | ||||||||||||||||||||
|
|
||||||||||||||||||||
| def __is_dataframe_like(self, value: Any) -> bool: | ||||||||||||||||||||
| """Check if value is DataFrame-like by duck-typing. | ||||||||||||||||||||
| Supports pandas DataFrame and subclasses (GeoDataFrame, etc.) | ||||||||||||||||||||
| """ | ||||||||||||||||||||
| wrap the output value: | ||||||||||||||||||||
| if the value is a var handle, store it in the store and return the reference. | ||||||||||||||||||||
| if the value is a bin handle, store it in the store and return the reference. | ||||||||||||||||||||
| if the handle is not defined in the block outputs schema, raise an ValueError. | ||||||||||||||||||||
| otherwise, return the value. | ||||||||||||||||||||
| :param handle: the handle of the output | ||||||||||||||||||||
| :param value: the value of the output | ||||||||||||||||||||
| :return: the wrapped value | ||||||||||||||||||||
| return ( | ||||||||||||||||||||
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | ||||||||||||||||||||
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) | ||||||||||||||||||||
| ) | ||||||||||||||||||||
|
Comment on lines
+355
to
+358
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| def __serialize_dataframe_for_cache(self, handle: str, value: Any) -> Optional[str]: | ||||||||||||||||||||
| """Serialize DataFrame to cache file in background thread. | ||||||||||||||||||||
| Returns the serialize_path if successful, None otherwise. | ||||||||||||||||||||
| """ | ||||||||||||||||||||
| from .serialization import compression_suffix, compression_options | ||||||||||||||||||||
| from .internal import string_hash | ||||||||||||||||||||
|
||||||||||||||||||||
| from .internal import string_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关键错误:导入了不存在的符号 string_hash。
string_hash 函数已在本文件第 22 行定义,但此处却尝试从 .internal 模块导入。这是导致 CI 流水线失败的直接原因。
🐛 修复导入错误
- from .internal import string_hash直接使用本文件中已定义的 string_hash 函数即可,无需导入。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .internal import string_hash |
🧰 Tools
🪛 GitHub Actions: layer
[error] 365-365: "string_hash" is unknown import symbol (reportAttributeAccessIssue)
🪛 GitHub Actions: pr
[error] 365-365: pyright: 'string_hash' is an unknown import symbol (reportAttributeAccessIssue).
🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` at line 365, The import statement is pulling a
nonexistent symbol string_hash from .internal while string_hash is already
defined in this file (line ~22); remove the erroneous import ("from .internal
import string_hash") and ensure all references use the local string_hash
function (avoid re-import or shadowing), keeping any existing usages intact and
adjusting any tests or callers if they expected an external symbol.
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import threading statement is placed inside the function at line 366. While this is valid Python, it's unconventional. The import should ideally be at the module level with other imports, or if it must be local, it should come before the other imports from the package. The current placement between two from imports is unusual and could be confusing.
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to value.copy() at line 375 is outside the try-except block, but in the original code it was inside the try-except. If copy() raises an exception (e.g., MemoryError or AttributeError), it will now propagate up and potentially crash the serialization, whereas the original code would silently handle it. This breaks the original behavior where all exceptions during the caching attempt were silently ignored.
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling has changed from catching a broad exception and passing silently to only catching IOError. The original code had except IOError as e: pass which captured the exception variable but didn't use it. The new code catches only IOError and returns None. However, other exceptions like AttributeError (if copy() or to_pickle() fail) will now propagate up instead of being silently ignored. This could be a breaking change if the original code relied on silently ignoring all exceptions during serialization.
| except IOError: | |
| except Exception: | |
| logging.getLogger(__name__).exception( | |
| "Failed to serialize DataFrame for cache for handle %s", handle | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后台线程未被跟踪,可能导致数据丢失。
write_pickle 线程启动后立即返回 serialize_path,但没有任何机制确保线程完成。如果程序在序列化完成前退出,缓存文件可能不完整或损坏。
此外,当前仅捕获 IOError,但 value.copy() 或 to_pickle() 可能抛出其他异常(如 MemoryError、PicklingError)。
♻️ 建议:改进异常处理并考虑线程管理
try:
copy_value = value.copy()
def write_pickle():
- copy_value.to_pickle(serialize_path, compression=compression)
+ try:
+ copy_value.to_pickle(serialize_path, compression=compression)
+ except Exception:
+ pass # 后台序列化失败不应影响主流程
thread = threading.Thread(target=write_pickle)
thread.start()
return serialize_path
- except IOError:
+ except Exception:
return None🧰 Tools
🪛 Ruff (0.14.14)
[warning] 380-380: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` around lines 374 - 382, The background thread
started for write_pickle (which calls value.copy() and
copy_value.to_pickle(serialize_path, compression=...)) is not tracked and the
code only catches IOError, risking incomplete/corrupt files; change the logic so
the serialization is either performed synchronously or the thread is joined
before returning (e.g., create the thread via
threading.Thread(target=write_pickle), start it, then call thread.join() or
return a handle/future instead of immediately returning serialize_path), and
broaden exception handling to catch Exception as e around value.copy() and
to_pickle so you log/propagate the actual error (include the exception object)
rather than only catching IOError. Ensure you reference write_pickle,
value.copy(), to_pickle, serialize_path, and the threading.Thread when making
these changes.
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for __wrap_output_value has been updated but is now less detailed than the original. The original docstring explained what happens when the handle is not defined in the block outputs schema (raises ValueError), and mentioned the return type ("return the wrapped value"). The new docstring omits these details. Consider adding back the information about the ValueError being raised and what the return value is.
| - other: return value as-is | |
| - other: return value as-is | |
| Raises: | |
| ValueError: If the given handle is not defined in the block outputs schema. | |
| Returns: | |
| The wrapped value for var/bin handles (as a VarValueDict or BinValueDict), | |
| or the original value if no wrapping is applied or if binary handling fails. |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling flow for non-bytes binary values has changed. The original code sent a warning and returned the original value directly. The new code raises a ValueError in __handle_bin_output(), which is then caught in __wrap_output_value() to return the original value. While the end result is the same, this approach uses exceptions for control flow, which is generally considered an anti-pattern. The original simpler approach of returning the value directly after warning was clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant check
callable(getattr(value, 'to_pickle', None))afterhasattr(value, 'to_pickle')is unnecessary. Ifhasattr(value, 'to_pickle')returns True, thengetattr(value, 'to_pickle', None)will never be None (it will be the attribute itself). The callable check could be simplified to just check the attribute directly:callable(value.to_pickle). The same applies to the 'copy' check on line 357.