Skip to content

Conversation

@leavesster
Copy link
Contributor

Summary

  • Extract __is_dataframe_like() for duck-typing DataFrame detection
  • Extract __serialize_dataframe_for_cache() for cache serialization logic
  • Extract __handle_var_output() for var handle processing
  • Extract __handle_bin_output() for bin handle processing
  • Reduce max indentation from 5 levels to 2 levels

Problem

Original __wrap_output_value() was 78 lines with 5 levels of nesting:

  • Hard to read and maintain
  • Violated "3 levels of indentation" rule
  • Mixed multiple concerns in one method

Solution

Split into focused helper methods:

  • Each method handles one concern
  • Max 2 levels of indentation
  • Clear separation of responsibilities

Test Plan

  • All existing tests pass
  • Behavior unchanged (pure refactoring)

- Extract __is_dataframe_like() for duck-typing DataFrame detection
- Extract __serialize_dataframe_for_cache() for cache serialization
- Extract __handle_var_output() for var handle processing
- Extract __handle_bin_output() for bin handle processing
- Reduce max indentation from 5 levels to 2 levels
- Improve code readability and testability
Copilot AI review requested due to automatic review settings January 31, 2026 09:27
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

概览

该变更引入了新的私有辅助方法来处理类DataFrame的输出和缓存,包括数据帧检测、后台序列化和输出处理。通过重构__wrap_output_value以委托给模块化处理器,集中化了错误处理和后台I/O逻辑,共增加95行、删除66行代码。

变更

队列 / 文件(s) 摘要
DataFrame和缓存处理
oocana/oocana/context.py
新增__is_dataframe_like()使用鸭子类型检测,__serialize_dataframe_for_cache()在后台线程中序列化DataFrame,__handle_var_output()存储变量输出并条件性缓存DataFrame数据,__handle_bin_output()验证字节并写入二进制文件。重构__wrap_output_value()以委托给新处理器,集中化序列化、缓存和错误处理逻辑。

估算代码审查工作量

🎯 4 (复杂) | ⏱️ ~45分钟

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed 描述清晰地说明了问题、解决方案和测试计划,准确反映了提取四个辅助方法、降低代码嵌套深度的改动,与代码变更充分相关。
Title check ✅ Passed The pull request title follows the required format with type and scope, and accurately describes the main refactoring change to simplify __wrap_output_value through extracted helper methods.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the __wrap_output_value() method to reduce nesting complexity by extracting four helper methods. The goal is to improve code readability and maintainability by splitting a 78-line method with 5 levels of nesting into focused helper methods with maximum 2 levels of indentation.

Changes:

  • Extracted DataFrame detection logic into __is_dataframe_like() helper method
  • Extracted DataFrame serialization into __serialize_dataframe_for_cache() helper method
  • Extracted var output handling into __handle_var_output() helper method
  • Extracted binary output handling into __handle_bin_output() helper method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Returns the serialize_path if successful, None otherwise.
"""
from .serialization import compression_suffix, compression_options
from .internal import string_hash
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import from .internal import string_hash will fail because string_hash is not defined in the internal.py module. The string_hash function is actually defined at the module level in context.py itself (lines 22-26). Either move the function to internal.py or use the local string_hash function directly without importing it.

Suggested change
from .internal import string_hash

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +358
return (
hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
)
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataFrame detection logic has changed behavior. The original code checked value.__class__.__name__ == 'DataFrame' which only matched objects with the exact class name "DataFrame". The new code uses duck-typing with hasattr(value, 'to_pickle') and hasattr(value, 'copy'), which will match any object with these methods (like GeoDataFrame, Dask DataFrame, etc.). While the PR description mentions supporting subclasses like GeoDataFrame, this is actually a behavioral change, not a pure refactoring. The new implementation is more flexible but differs from the original behavior.

Copilot uses AI. Check for mistakes.
Comment on lines 452 to +455
try:
with open(bin_file, "wb") as f:
f.write(value)
except IOError as e:
raise IOError(
f"Output handle key: [{handle}] is defined as binary, but an error occurred while writing the file: {e}"
)
return self.__handle_bin_output(handle, value)
except ValueError:
return value # Already warned, return original value
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling flow for non-bytes binary values has changed. The original code sent a warning and returned the original value directly. The new code raises a ValueError in __handle_bin_output(), which is then caught in __wrap_output_value() to return the original value. While the end result is the same, this approach uses exceptions for control flow, which is generally considered an anti-pattern. The original simpler approach of returning the value directly after warning was clearer.

Copilot uses AI. Check for mistakes.
thread = threading.Thread(target=write_pickle)
thread.start()
return serialize_path
except IOError:
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling has changed from catching a broad exception and passing silently to only catching IOError. The original code had except IOError as e: pass which captured the exception variable but didn't use it. The new code catches only IOError and returns None. However, other exceptions like AttributeError (if copy() or to_pickle() fail) will now propagate up instead of being silently ignored. This could be a breaking change if the original code relied on silently ignoring all exceptions during serialization.

Suggested change
except IOError:
except Exception:
logging.getLogger(__name__).exception(
"Failed to serialize DataFrame for cache for handle %s", handle
)

Copilot uses AI. Check for mistakes.
os.makedirs(os.path.dirname(serialize_path), exist_ok=True)

try:
copy_value = value.copy()
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to value.copy() at line 375 is outside the try-except block, but in the original code it was inside the try-except. If copy() raises an exception (e.g., MemoryError or AttributeError), it will now propagate up and potentially crash the serialization, whereas the original code would silently handle it. This breaks the original behavior where all exceptions during the caching attempt were silently ignored.

Copilot uses AI. Check for mistakes.
"""
from .serialization import compression_suffix, compression_options
from .internal import string_hash
import threading
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import threading statement is placed inside the function at line 366. While this is valid Python, it's unconventional. The import should ideally be at the module level with other imports, or if it must be local, it should come before the other imports from the package. The current placement between two from imports is unusual and could be confusing.

Copilot uses AI. Check for mistakes.
Comment on lines +356 to +357
hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant check callable(getattr(value, 'to_pickle', None)) after hasattr(value, 'to_pickle') is unnecessary. If hasattr(value, 'to_pickle') returns True, then getattr(value, 'to_pickle', None) will never be None (it will be the attribute itself). The callable check could be simplified to just check the attribute directly: callable(value.to_pickle). The same applies to the 'copy' check on line 357.

Suggested change
hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and
hasattr(value, 'copy') and callable(getattr(value, 'copy', None))
hasattr(value, 'to_pickle') and callable(value.to_pickle) and
hasattr(value, 'copy') and callable(value.copy)

Copilot uses AI. Check for mistakes.

- var handle: store in store and return reference
- bin handle: write to file and return reference
- other: return value as-is
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for __wrap_output_value has been updated but is now less detailed than the original. The original docstring explained what happens when the handle is not defined in the block outputs schema (raises ValueError), and mentioned the return type ("return the wrapped value"). The new docstring omits these details. Consider adding back the information about the ValueError being raised and what the return value is.

Suggested change
- other: return value as-is
- other: return value as-is
Raises:
ValueError: If the given handle is not defined in the block outputs schema.
Returns:
The wrapped value for var/bin handles (as a VarValueDict or BinValueDict),
or the original value if no wrapping is applied or if binary handling fails.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@oocana/oocana/context.py`:
- Around line 374-382: The background thread started for write_pickle (which
calls value.copy() and copy_value.to_pickle(serialize_path, compression=...)) is
not tracked and the code only catches IOError, risking incomplete/corrupt files;
change the logic so the serialization is either performed synchronously or the
thread is joined before returning (e.g., create the thread via
threading.Thread(target=write_pickle), start it, then call thread.join() or
return a handle/future instead of immediately returning serialize_path), and
broaden exception handling to catch Exception as e around value.copy() and
to_pickle so you log/propagate the actual error (include the exception object)
rather than only catching IOError. Ensure you reference write_pickle,
value.copy(), to_pickle, serialize_path, and the threading.Thread when making
these changes.
- Line 365: The import statement is pulling a nonexistent symbol string_hash
from .internal while string_hash is already defined in this file (line ~22);
remove the erroneous import ("from .internal import string_hash") and ensure all
references use the local string_hash function (avoid re-import or shadowing),
keeping any existing usages intact and adjusting any tests or callers if they
expected an external symbol.
🧹 Nitpick comments (2)
oocana/oocana/context.py (2)

405-409: 建议使用 TypeError 而非 ValueError 进行类型检查。

根据 Python 惯例,当参数类型不正确时应抛出 TypeErrorValueError 用于值不合法但类型正确的情况。

♻️ 使用 TypeError
         if not isinstance(value, bytes):
             self.send_warning(
                 f"Output handle key: [{handle}] is defined as binary, but the value is not bytes."
             )
-            raise ValueError("Binary handle requires bytes value")
+            raise TypeError("Binary handle requires bytes value")

注意:如果改为 TypeError,需要同步更新 __wrap_output_value 中第 454 行的 except ValueErrorexcept TypeError


417-420: 重新抛出异常时应保留异常链。

使用 raise ... from e 可以保留原始异常的堆栈信息,便于调试。

♻️ 使用异常链
         try:
             with open(bin_file, "wb") as f:
                 f.write(value)
         except IOError as e:
-            raise IOError(
+            raise IOError(
                 f"Output handle key: [{handle}] is defined as binary, but an error occurred while writing the file: {e}"
-            )
+            ) from e

Returns the serialize_path if successful, None otherwise.
"""
from .serialization import compression_suffix, compression_options
from .internal import string_hash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

关键错误:导入了不存在的符号 string_hash

string_hash 函数已在本文件第 22 行定义,但此处却尝试从 .internal 模块导入。这是导致 CI 流水线失败的直接原因。

🐛 修复导入错误
-        from .internal import string_hash

直接使用本文件中已定义的 string_hash 函数即可,无需导入。

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .internal import string_hash
🧰 Tools
🪛 GitHub Actions: layer

[error] 365-365: "string_hash" is unknown import symbol (reportAttributeAccessIssue)

🪛 GitHub Actions: pr

[error] 365-365: pyright: 'string_hash' is an unknown import symbol (reportAttributeAccessIssue).

🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` at line 365, The import statement is pulling a
nonexistent symbol string_hash from .internal while string_hash is already
defined in this file (line ~22); remove the erroneous import ("from .internal
import string_hash") and ensure all references use the local string_hash
function (avoid re-import or shadowing), keeping any existing usages intact and
adjusting any tests or callers if they expected an external symbol.

Comment on lines +374 to +382
try:
copy_value = value.copy()
def write_pickle():
copy_value.to_pickle(serialize_path, compression=compression)
thread = threading.Thread(target=write_pickle)
thread.start()
return serialize_path
except IOError:
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

后台线程未被跟踪,可能导致数据丢失。

write_pickle 线程启动后立即返回 serialize_path,但没有任何机制确保线程完成。如果程序在序列化完成前退出,缓存文件可能不完整或损坏。

此外,当前仅捕获 IOError,但 value.copy()to_pickle() 可能抛出其他异常(如 MemoryErrorPicklingError)。

♻️ 建议:改进异常处理并考虑线程管理
         try:
             copy_value = value.copy()
             def write_pickle():
-                copy_value.to_pickle(serialize_path, compression=compression)
+                try:
+                    copy_value.to_pickle(serialize_path, compression=compression)
+                except Exception:
+                    pass  # 后台序列化失败不应影响主流程
             thread = threading.Thread(target=write_pickle)
             thread.start()
             return serialize_path
-        except IOError:
+        except Exception:
             return None
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 380-380: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In `@oocana/oocana/context.py` around lines 374 - 382, The background thread
started for write_pickle (which calls value.copy() and
copy_value.to_pickle(serialize_path, compression=...)) is not tracked and the
code only catches IOError, risking incomplete/corrupt files; change the logic so
the serialization is either performed synchronously or the thread is joined
before returning (e.g., create the thread via
threading.Thread(target=write_pickle), start it, then call thread.join() or
return a handle/future instead of immediately returning serialize_path), and
broaden exception handling to catch Exception as e around value.copy() and
to_pickle so you log/propagate the actual error (include the exception object)
rather than only catching IOError. Ensure you reference write_pickle,
value.copy(), to_pickle, serialize_path, and the threading.Thread when making
these changes.

@leavesster leavesster changed the title refactor: simplify __wrap_output_value with extracted helper methods refactor(oocana): simplify __wrap_output_value with extracted helper methods Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants