-
Notifications
You must be signed in to change notification settings - Fork 1
refactor(oocana): simplify __wrap_output_value with extracted helper methods #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Extract __is_dataframe_like() for duck-typing DataFrame detection - Extract __serialize_dataframe_for_cache() for cache serialization - Extract __handle_var_output() for var handle processing - Extract __handle_bin_output() for bin handle processing - Reduce max indentation from 5 levels to 2 levels - Improve code readability and testability
概览该变更引入了新的私有辅助方法来处理类DataFrame的输出和缓存,包括数据帧检测、后台序列化和输出处理。通过重构 变更
估算代码审查工作量🎯 4 (复杂) | ⏱️ ~45分钟 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the __wrap_output_value() method to reduce nesting complexity by extracting four helper methods. The goal is to improve code readability and maintainability by splitting a 78-line method with 5 levels of nesting into focused helper methods with maximum 2 levels of indentation.
Changes:
- Extracted DataFrame detection logic into
__is_dataframe_like()helper method - Extracted DataFrame serialization into
__serialize_dataframe_for_cache()helper method - Extracted var output handling into
__handle_var_output()helper method - Extracted binary output handling into
__handle_bin_output()helper method
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Returns the serialize_path if successful, None otherwise. | ||
| """ | ||
| from .serialization import compression_suffix, compression_options | ||
| from .internal import string_hash |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import from .internal import string_hash will fail because string_hash is not defined in the internal.py module. The string_hash function is actually defined at the module level in context.py itself (lines 22-26). Either move the function to internal.py or use the local string_hash function directly without importing it.
| from .internal import string_hash |
| return ( | ||
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | ||
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) | ||
| ) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataFrame detection logic has changed behavior. The original code checked value.__class__.__name__ == 'DataFrame' which only matched objects with the exact class name "DataFrame". The new code uses duck-typing with hasattr(value, 'to_pickle') and hasattr(value, 'copy'), which will match any object with these methods (like GeoDataFrame, Dask DataFrame, etc.). While the PR description mentions supporting subclasses like GeoDataFrame, this is actually a behavioral change, not a pure refactoring. The new implementation is more flexible but differs from the original behavior.
| try: | ||
| with open(bin_file, "wb") as f: | ||
| f.write(value) | ||
| except IOError as e: | ||
| raise IOError( | ||
| f"Output handle key: [{handle}] is defined as binary, but an error occurred while writing the file: {e}" | ||
| ) | ||
| return self.__handle_bin_output(handle, value) | ||
| except ValueError: | ||
| return value # Already warned, return original value |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling flow for non-bytes binary values has changed. The original code sent a warning and returned the original value directly. The new code raises a ValueError in __handle_bin_output(), which is then caught in __wrap_output_value() to return the original value. While the end result is the same, this approach uses exceptions for control flow, which is generally considered an anti-pattern. The original simpler approach of returning the value directly after warning was clearer.
| thread = threading.Thread(target=write_pickle) | ||
| thread.start() | ||
| return serialize_path | ||
| except IOError: |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling has changed from catching a broad exception and passing silently to only catching IOError. The original code had except IOError as e: pass which captured the exception variable but didn't use it. The new code catches only IOError and returns None. However, other exceptions like AttributeError (if copy() or to_pickle() fail) will now propagate up instead of being silently ignored. This could be a breaking change if the original code relied on silently ignoring all exceptions during serialization.
| except IOError: | |
| except Exception: | |
| logging.getLogger(__name__).exception( | |
| "Failed to serialize DataFrame for cache for handle %s", handle | |
| ) |
| os.makedirs(os.path.dirname(serialize_path), exist_ok=True) | ||
|
|
||
| try: | ||
| copy_value = value.copy() |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to value.copy() at line 375 is outside the try-except block, but in the original code it was inside the try-except. If copy() raises an exception (e.g., MemoryError or AttributeError), it will now propagate up and potentially crash the serialization, whereas the original code would silently handle it. This breaks the original behavior where all exceptions during the caching attempt were silently ignored.
| """ | ||
| from .serialization import compression_suffix, compression_options | ||
| from .internal import string_hash | ||
| import threading |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import threading statement is placed inside the function at line 366. While this is valid Python, it's unconventional. The import should ideally be at the module level with other imports, or if it must be local, it should come before the other imports from the package. The current placement between two from imports is unusual and could be confusing.
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | ||
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant check callable(getattr(value, 'to_pickle', None)) after hasattr(value, 'to_pickle') is unnecessary. If hasattr(value, 'to_pickle') returns True, then getattr(value, 'to_pickle', None) will never be None (it will be the attribute itself). The callable check could be simplified to just check the attribute directly: callable(value.to_pickle). The same applies to the 'copy' check on line 357.
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | |
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) | |
| hasattr(value, 'to_pickle') and callable(value.to_pickle) and | |
| hasattr(value, 'copy') and callable(value.copy) |
|
|
||
| - var handle: store in store and return reference | ||
| - bin handle: write to file and return reference | ||
| - other: return value as-is |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for __wrap_output_value has been updated but is now less detailed than the original. The original docstring explained what happens when the handle is not defined in the block outputs schema (raises ValueError), and mentioned the return type ("return the wrapped value"). The new docstring omits these details. Consider adding back the information about the ValueError being raised and what the return value is.
| - other: return value as-is | |
| - other: return value as-is | |
| Raises: | |
| ValueError: If the given handle is not defined in the block outputs schema. | |
| Returns: | |
| The wrapped value for var/bin handles (as a VarValueDict or BinValueDict), | |
| or the original value if no wrapping is applied or if binary handling fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@oocana/oocana/context.py`:
- Around line 374-382: The background thread started for write_pickle (which
calls value.copy() and copy_value.to_pickle(serialize_path, compression=...)) is
not tracked and the code only catches IOError, risking incomplete/corrupt files;
change the logic so the serialization is either performed synchronously or the
thread is joined before returning (e.g., create the thread via
threading.Thread(target=write_pickle), start it, then call thread.join() or
return a handle/future instead of immediately returning serialize_path), and
broaden exception handling to catch Exception as e around value.copy() and
to_pickle so you log/propagate the actual error (include the exception object)
rather than only catching IOError. Ensure you reference write_pickle,
value.copy(), to_pickle, serialize_path, and the threading.Thread when making
these changes.
- Line 365: The import statement is pulling a nonexistent symbol string_hash
from .internal while string_hash is already defined in this file (line ~22);
remove the erroneous import ("from .internal import string_hash") and ensure all
references use the local string_hash function (avoid re-import or shadowing),
keeping any existing usages intact and adjusting any tests or callers if they
expected an external symbol.
🧹 Nitpick comments (2)
oocana/oocana/context.py (2)
405-409: 建议使用TypeError而非ValueError进行类型检查。根据 Python 惯例,当参数类型不正确时应抛出
TypeError,ValueError用于值不合法但类型正确的情况。♻️ 使用 TypeError
if not isinstance(value, bytes): self.send_warning( f"Output handle key: [{handle}] is defined as binary, but the value is not bytes." ) - raise ValueError("Binary handle requires bytes value") + raise TypeError("Binary handle requires bytes value")注意:如果改为
TypeError,需要同步更新__wrap_output_value中第 454 行的except ValueError为except TypeError。
417-420: 重新抛出异常时应保留异常链。使用
raise ... from e可以保留原始异常的堆栈信息,便于调试。♻️ 使用异常链
try: with open(bin_file, "wb") as f: f.write(value) except IOError as e: - raise IOError( + raise IOError( f"Output handle key: [{handle}] is defined as binary, but an error occurred while writing the file: {e}" - ) + ) from e
| Returns the serialize_path if successful, None otherwise. | ||
| """ | ||
| from .serialization import compression_suffix, compression_options | ||
| from .internal import string_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关键错误:导入了不存在的符号 string_hash。
string_hash 函数已在本文件第 22 行定义,但此处却尝试从 .internal 模块导入。这是导致 CI 流水线失败的直接原因。
🐛 修复导入错误
- from .internal import string_hash直接使用本文件中已定义的 string_hash 函数即可,无需导入。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .internal import string_hash |
🧰 Tools
🪛 GitHub Actions: layer
[error] 365-365: "string_hash" is unknown import symbol (reportAttributeAccessIssue)
🪛 GitHub Actions: pr
[error] 365-365: pyright: 'string_hash' is an unknown import symbol (reportAttributeAccessIssue).
🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` at line 365, The import statement is pulling a
nonexistent symbol string_hash from .internal while string_hash is already
defined in this file (line ~22); remove the erroneous import ("from .internal
import string_hash") and ensure all references use the local string_hash
function (avoid re-import or shadowing), keeping any existing usages intact and
adjusting any tests or callers if they expected an external symbol.
| try: | ||
| copy_value = value.copy() | ||
| def write_pickle(): | ||
| copy_value.to_pickle(serialize_path, compression=compression) | ||
| thread = threading.Thread(target=write_pickle) | ||
| thread.start() | ||
| return serialize_path | ||
| except IOError: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后台线程未被跟踪,可能导致数据丢失。
write_pickle 线程启动后立即返回 serialize_path,但没有任何机制确保线程完成。如果程序在序列化完成前退出,缓存文件可能不完整或损坏。
此外,当前仅捕获 IOError,但 value.copy() 或 to_pickle() 可能抛出其他异常(如 MemoryError、PicklingError)。
♻️ 建议:改进异常处理并考虑线程管理
try:
copy_value = value.copy()
def write_pickle():
- copy_value.to_pickle(serialize_path, compression=compression)
+ try:
+ copy_value.to_pickle(serialize_path, compression=compression)
+ except Exception:
+ pass # 后台序列化失败不应影响主流程
thread = threading.Thread(target=write_pickle)
thread.start()
return serialize_path
- except IOError:
+ except Exception:
return None🧰 Tools
🪛 Ruff (0.14.14)
[warning] 380-380: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` around lines 374 - 382, The background thread
started for write_pickle (which calls value.copy() and
copy_value.to_pickle(serialize_path, compression=...)) is not tracked and the
code only catches IOError, risking incomplete/corrupt files; change the logic so
the serialization is either performed synchronously or the thread is joined
before returning (e.g., create the thread via
threading.Thread(target=write_pickle), start it, then call thread.join() or
return a handle/future instead of immediately returning serialize_path), and
broaden exception handling to catch Exception as e around value.copy() and
to_pickle so you log/propagate the actual error (include the exception object)
rather than only catching IOError. Ensure you reference write_pickle,
value.copy(), to_pickle, serialize_path, and the threading.Thread when making
these changes.
Summary
__is_dataframe_like()for duck-typing DataFrame detection__serialize_dataframe_for_cache()for cache serialization logic__handle_var_output()for var handle processing__handle_bin_output()for bin handle processingProblem
Original
__wrap_output_value()was 78 lines with 5 levels of nesting:Solution
Split into focused helper methods:
Test Plan