-
Notifications
You must be signed in to change notification settings - Fork 1
fix(oocana): use duck-typing for DataFrame detection instead of class name #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replaced hardcoded 'DataFrame' string check with __is_dataframe_like() method that uses duck-typing. Now correctly supports DataFrame subclasses like GeoDataFrame.
Walkthrough向 Context 类添加了私有辅助方法 Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
oocana/oocana/context.py (1)
387-399:⚠️ Potential issue | 🟠 Major防止后台线程中
to_pickle参数不兼容导致无声失败。鸭子类型检查
__is_dataframe_like会匹配任何拥有to_pickle和copy方法的对象,但不同实现的to_pickle方法签名可能不兼容。当调用copy_value.to_pickle(serialize_path, compression=compression)时,如果该对象的to_pickle不接受compression参数,将抛出TypeError。关键问题是:
- 此调用发生在后台线程中(第 398 行)
- 当前的 try-except 仅捕获
IOError,不捕获TypeError(第 401 行)- 线程中的异常不会传播到主线程,会无声失败
- 但
serialize_path已在线程启动前被设置为非 None 值(第 392 行),导致调用者获得一个指向不存在文件的路径建议添加降级处理:
修复建议
def write_pickle(): - copy_value.to_pickle(serialize_path, compression=compression) + try: + copy_value.to_pickle(serialize_path, compression=compression) + except TypeError: + # 若 to_pickle 不支持 compression 参数,则不使用压缩 + copy_value.to_pickle(serialize_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Updates DataFrame detection in Context.__wrap_output_value() from a hardcoded class-name check to a duck-typed helper so that DataFrame subclasses can be serialized for cache.
Changes:
- Added
Context.__is_dataframe_like()helper to detect DataFrame-like objects via presence ofto_pickle()andcopy(). - Replaced
value.__class__.__name__ == "DataFrame"conditional with the new helper when deciding whether to serialize outputs for cache.
Comments suppressed due to low confidence (1)
oocana/oocana/context.py:392
- The condition allows
len(self.__block_info.stacks) < 2, but the body unconditionally indexesself.__block_info.stacks[-1]. Sincenode_idexplicitly handlesstacksbeing empty, this path can raiseIndexErrorwhenstacksis empty and caching is enabled. Update the condition to require at least one stack entry (e.g.,0 < len(...) < 2), or computeflow_nodewithout indexing when stacks is empty.
if len(self.__block_info.stacks) < 2 and output_def.need_serialize_var_for_cache() and self.__is_dataframe_like(value):
from .serialization import compression_suffix, compression_options
suffix = compression_suffix(context=self)
compression = compression_options(context=self)
flow_node = self.__block_info.stacks[-1].get("flow", "unknown") + "-" + self.node_id
serialize_path = f"{self.pkg_data_dir}/.cache/{string_hash(flow_node)}/{handle}{suffix}"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __is_dataframe_like(self, value: Any) -> bool: | ||
| """Check if value is DataFrame-like by duck-typing. | ||
| Supports pandas DataFrame and subclasses (GeoDataFrame, etc.) | ||
| """ | ||
| return ( | ||
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | ||
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) | ||
| ) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__is_dataframe_like() returns True based only on presence/callability of to_pickle/copy, but the serialization path later calls to_pickle(..., compression=compression). Objects with a to_pickle() method that doesn’t accept the compression kwarg (or whose copy() returns an object without to_pickle) will pass this check and then fail at runtime. Consider either tightening the duck-type check to the actual required API (including kwarg support), or making the serialization call resilient (e.g., fallback call without compression / catch TypeError and skip caching).
Summary
'DataFrame'string check with duck-typing method__is_dataframe_like()Problem
Original code used hardcoded class name check:
This fails for:
Solution
Use duck-typing to check for required methods:
Test Plan
to_pickle()andcopy()methods will be handled correctly