fix(oocana): use duck-typing for DataFrame detection instead of class name #458

leavesster · 2026-01-31T09:27:46Z

Summary

Replace hardcoded 'DataFrame' string check with duck-typing method __is_dataframe_like()
Now correctly supports DataFrame subclasses (e.g., GeoDataFrame)

Problem

Original code used hardcoded class name check:

if value.__class__.__name__ == 'DataFrame':

This fails for:

DataFrame subclasses (GeoDataFrame, etc.)
Different DataFrame implementations

Solution

Use duck-typing to check for required methods:

def __is_dataframe_like(self, value):
    return (
        hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
        hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
    )

Test Plan

All existing tests pass
Any object with to_pickle() and copy() methods will be handled correctly

Replaced hardcoded 'DataFrame' string check with __is_dataframe_like() method that uses duck-typing. Now correctly supports DataFrame subclasses like GeoDataFrame.

coderabbitai · 2026-01-31T09:28:01Z

Walkthrough

向 Context 类添加了私有辅助方法 __is_dataframe_like() 用于鸭子类型检测，通过检查 to_pickle 和 copy 方法来识别类 DataFrame 的对象。用新的鸭子类型检测方法替换了严格的类名检查，以改进输出变量的序列化缓存逻辑。

Changes

Cohort / File(s)	Summary
鸭子类型检测重构 `oocana/oocana/context.py`	添加了 `__is_dataframe_like()` 私有辅助方法，通过检查是否具有 `to_pickle` 和 `copy` 方法来进行 DataFrame 类对象的鸭子类型检测。将 `__wrap_output_value` 中的严格类名检查（`value.__class__.__name__ == 'DataFrame'`）替换为新的鸭子类型检测条件，用于决定是否序列化和缓存输出变量。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is directly related to the changeset, explaining the problem, solution, and test plan for the DataFrame type checking change.
Title check	✅ Passed	PR标题完全符合指定格式且准确描述了主要改动：使用鸭子类型替代类名检查来检测DataFrame。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

oocana/oocana/context.py (1)
387-399: ⚠️ Potential issue | 🟠 Major

防止后台线程中 to_pickle 参数不兼容导致无声失败。

鸭子类型检查 __is_dataframe_like 会匹配任何拥有 to_pickle 和 copy 方法的对象，但不同实现的 to_pickle 方法签名可能不兼容。当调用 copy_value.to_pickle(serialize_path, compression=compression) 时，如果该对象的 to_pickle 不接受 compression 参数，将抛出 TypeError。

关键问题是：

此调用发生在后台线程中（第 398 行）

当前的 try-except 仅捕获 IOError，不捕获 TypeError（第 401 行）

线程中的异常不会传播到主线程，会无声失败

但 serialize_path 已在线程启动前被设置为非 None 值（第 392 行），导致调用者获得一个指向不存在文件的路径

建议添加降级处理：
修复建议
                     def write_pickle():
-                        copy_value.to_pickle(serialize_path, compression=compression)
+                        try:
+                            copy_value.to_pickle(serialize_path, compression=compression)
+                        except TypeError:
+                            # 若 to_pickle 不支持 compression 参数，则不使用压缩
+                            copy_value.to_pickle(serialize_path)

Copilot

Pull request overview

Updates DataFrame detection in Context.__wrap_output_value() from a hardcoded class-name check to a duck-typed helper so that DataFrame subclasses can be serialized for cache.

Changes:

Added Context.__is_dataframe_like() helper to detect DataFrame-like objects via presence of to_pickle() and copy().
Replaced value.__class__.__name__ == "DataFrame" conditional with the new helper when deciding whether to serialize outputs for cache.

Comments suppressed due to low confidence (1)

oocana/oocana/context.py:392

The condition allows len(self.__block_info.stacks) < 2, but the body unconditionally indexes self.__block_info.stacks[-1]. Since node_id explicitly handles stacks being empty, this path can raise IndexError when stacks is empty and caching is enabled. Update the condition to require at least one stack entry (e.g., 0 < len(...) < 2), or compute flow_node without indexing when stacks is empty.

            if len(self.__block_info.stacks) < 2 and output_def.need_serialize_var_for_cache() and self.__is_dataframe_like(value):
                from .serialization import compression_suffix, compression_options
                suffix = compression_suffix(context=self)
                compression = compression_options(context=self)
                flow_node = self.__block_info.stacks[-1].get("flow", "unknown") + "-" + self.node_id
                serialize_path = f"{self.pkg_data_dir}/.cache/{string_hash(flow_node)}/{handle}{suffix}"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-31T09:31:51Z

oocana/oocana/context.py

+    def __is_dataframe_like(self, value: Any) -> bool:
+        """Check if value is DataFrame-like by duck-typing.
+        Supports pandas DataFrame and subclasses (GeoDataFrame, etc.)
+        """
+        return (
+            hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
+            hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
+        )


__is_dataframe_like() returns True based only on presence/callability of to_pickle/copy, but the serialization path later calls to_pickle(..., compression=compression). Objects with a to_pickle() method that doesn’t accept the compression kwarg (or whose copy() returns an object without to_pickle) will pass this check and then fail at runtime. Consider either tightening the duck-type check to the actual required API (including kwarg support), or making the serialization call resilient (e.g., fallback call without compression / catch TypeError and skip caching).

fix: use duck-typing for DataFrame detection instead of class name

5c8e5c9

Replaced hardcoded 'DataFrame' string check with __is_dataframe_like() method that uses duck-typing. Now correctly supports DataFrame subclasses like GeoDataFrame.

Copilot AI review requested due to automatic review settings January 31, 2026 09:27

Copilot started reviewing on behalf of leavesster January 31, 2026 09:28 View session

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

Copilot AI reviewed Jan 31, 2026

View reviewed changes

leavesster changed the title ~~fix: use duck-typing for DataFrame detection instead of class name~~ fix(oocana): use duck-typing for DataFrame detection instead of class name Jan 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(oocana): use duck-typing for DataFrame detection instead of class name #458

fix(oocana): use duck-typing for DataFrame detection instead of class name #458

Uh oh!

leavesster commented Jan 31, 2026

Uh oh!

coderabbitai bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(oocana): use duck-typing for DataFrame detection instead of class name #458

Are you sure you want to change the base?

fix(oocana): use duck-typing for DataFrame detection instead of class name #458

Uh oh!

Conversation

leavesster commented Jan 31, 2026

Summary

Problem

Solution

Test Plan

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 31, 2026 •

edited

Loading