Skip to content

Conversation

@leavesster
Copy link
Contributor

Summary

  • Replace hardcoded 'DataFrame' string check with duck-typing method __is_dataframe_like()
  • Now correctly supports DataFrame subclasses (e.g., GeoDataFrame)

Problem

Original code used hardcoded class name check:

if value.__class__.__name__ == 'DataFrame':

This fails for:

  • DataFrame subclasses (GeoDataFrame, etc.)
  • Different DataFrame implementations

Solution

Use duck-typing to check for required methods:

def __is_dataframe_like(self, value):
    return (
        hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
        hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
    )

Test Plan

  • All existing tests pass
  • Any object with to_pickle() and copy() methods will be handled correctly

Replaced hardcoded 'DataFrame' string check with __is_dataframe_like()
method that uses duck-typing. Now correctly supports DataFrame subclasses
like GeoDataFrame.
Copilot AI review requested due to automatic review settings January 31, 2026 09:27
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

Walkthrough

向 Context 类添加了私有辅助方法 __is_dataframe_like() 用于鸭子类型检测,通过检查 to_picklecopy 方法来识别类 DataFrame 的对象。用新的鸭子类型检测方法替换了严格的类名检查,以改进输出变量的序列化缓存逻辑。

Changes

Cohort / File(s) Summary
鸭子类型检测重构
oocana/oocana/context.py
添加了 __is_dataframe_like() 私有辅助方法,通过检查是否具有 to_picklecopy 方法来进行 DataFrame 类对象的鸭子类型检测。将 __wrap_output_value 中的严格类名检查(value.__class__.__name__ == 'DataFrame')替换为新的鸭子类型检测条件,用于决定是否序列化和缓存输出变量。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description is directly related to the changeset, explaining the problem, solution, and test plan for the DataFrame type checking change.
Title check ✅ Passed PR标题完全符合指定格式且准确描述了主要改动:使用鸭子类型替代类名检查来检测DataFrame。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
oocana/oocana/context.py (1)

387-399: ⚠️ Potential issue | 🟠 Major

防止后台线程中 to_pickle 参数不兼容导致无声失败。

鸭子类型检查 __is_dataframe_like 会匹配任何拥有 to_picklecopy 方法的对象,但不同实现的 to_pickle 方法签名可能不兼容。当调用 copy_value.to_pickle(serialize_path, compression=compression) 时,如果该对象的 to_pickle 不接受 compression 参数,将抛出 TypeError

关键问题是:

  • 此调用发生在后台线程中(第 398 行)
  • 当前的 try-except 仅捕获 IOError,不捕获 TypeError(第 401 行)
  • 线程中的异常不会传播到主线程,会无声失败
  • serialize_path 已在线程启动前被设置为非 None 值(第 392 行),导致调用者获得一个指向不存在文件的路径

建议添加降级处理:

修复建议
                     def write_pickle():
-                        copy_value.to_pickle(serialize_path, compression=compression)
+                        try:
+                            copy_value.to_pickle(serialize_path, compression=compression)
+                        except TypeError:
+                            # 若 to_pickle 不支持 compression 参数,则不使用压缩
+                            copy_value.to_pickle(serialize_path)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates DataFrame detection in Context.__wrap_output_value() from a hardcoded class-name check to a duck-typed helper so that DataFrame subclasses can be serialized for cache.

Changes:

  • Added Context.__is_dataframe_like() helper to detect DataFrame-like objects via presence of to_pickle() and copy().
  • Replaced value.__class__.__name__ == "DataFrame" conditional with the new helper when deciding whether to serialize outputs for cache.
Comments suppressed due to low confidence (1)

oocana/oocana/context.py:392

  • The condition allows len(self.__block_info.stacks) < 2, but the body unconditionally indexes self.__block_info.stacks[-1]. Since node_id explicitly handles stacks being empty, this path can raise IndexError when stacks is empty and caching is enabled. Update the condition to require at least one stack entry (e.g., 0 < len(...) < 2), or compute flow_node without indexing when stacks is empty.
            if len(self.__block_info.stacks) < 2 and output_def.need_serialize_var_for_cache() and self.__is_dataframe_like(value):
                from .serialization import compression_suffix, compression_options
                suffix = compression_suffix(context=self)
                compression = compression_options(context=self)
                flow_node = self.__block_info.stacks[-1].get("flow", "unknown") + "-" + self.node_id
                serialize_path = f"{self.pkg_data_dir}/.cache/{string_hash(flow_node)}/{handle}{suffix}"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +351 to +358
def __is_dataframe_like(self, value: Any) -> bool:
"""Check if value is DataFrame-like by duck-typing.
Supports pandas DataFrame and subclasses (GeoDataFrame, etc.)
"""
return (
hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
)
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__is_dataframe_like() returns True based only on presence/callability of to_pickle/copy, but the serialization path later calls to_pickle(..., compression=compression). Objects with a to_pickle() method that doesn’t accept the compression kwarg (or whose copy() returns an object without to_pickle) will pass this check and then fail at runtime. Consider either tightening the duck-type check to the actual required API (including kwarg support), or making the serialization call resilient (e.g., fallback call without compression / catch TypeError and skip caching).

Copilot uses AI. Check for mistakes.
@leavesster leavesster changed the title fix: use duck-typing for DataFrame detection instead of class name fix(oocana): use duck-typing for DataFrame detection instead of class name Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants