-
-
Notifications
You must be signed in to change notification settings - Fork 9k
修复企业微信会话存档SDK生命周期管理导致的JVM崩溃问题 #3848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Co-authored-by: binarywang <1343140+binarywang@users.noreply.github.com>
Co-authored-by: binarywang <1343140+binarywang@users.noreply.github.com>
Co-authored-by: binarywang <1343140+binarywang@users.noreply.github.com>
Co-authored-by: binarywang <1343140+binarywang@users.noreply.github.com>
🤖 Augment PR SummarySummary: 本PR修复企业微信会话存档 SDK 生命周期管理不当导致的 JVM 偶发崩溃(SIGSEGV)问题。 Changes:
Technical Notes: 核心思路是让框架统一管理 SDK 缓存 + 生命周期,避免业务方手动 Destroy 与内部缓存复用机制冲突。 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw new WxErrorException(chatDatas.toJson()); | ||
| } | ||
|
|
||
| return chatDatas.getChatData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| outputStream.write(i); | ||
| outputStream.close(); | ||
| } catch (Exception e) { | ||
| e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // 预留200秒的时间 | ||
| this.msgAuditSdkExpiresTime = System.currentTimeMillis() + (expiresInSeconds - 200) * 1000L; | ||
| // 重置引用计数 | ||
| this.msgAuditSdkRefCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
该 PR 修复了企业微信会话存档 SDK 生命周期管理混乱导致的 JVM 崩溃问题。核心解决方案是引入引用计数机制来自动管理 SDK 生命周期,并提供新的安全 API 方法,将 SDK 管理完全封装在框架内部。
Changes:
- 在配置存储层引入引用计数机制(incrementMsgAuditSdkRefCount、decrementMsgAuditSdkRefCount、getMsgAuditSdkRefCount)
- 新增 4 个安全 API 方法,不暴露 SDK 给调用方(getChatRecords、getDecryptChatData、getChatRecordPlainText、downloadMediaFile)
- 将旧 API 标记为 @deprecated,并在注释中说明迁移路径
- 新增完整的迁移指南文档和测试用例
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| weixin-java-cp/src/main/java/me/chanjar/weixin/cp/config/WxCpConfigStorage.java | 在接口中定义引用计数方法,用于 SDK 生命周期管理 |
| weixin-java-cp/src/main/java/me/chanjar/weixin/cp/config/impl/WxCpDefaultConfigImpl.java | 实现引用计数机制,通过 synchronized 保证线程安全,计数归零时自动销毁 SDK |
| weixin-java-cp/src/main/java/me/chanjar/weixin/cp/config/impl/WxCpRedisConfigImpl.java | 在 Redis 配置实现中添加引用计数机制,保持与默认实现一致 |
| weixin-java-cp/src/main/java/me/chanjar/weixin/cp/api/WxCpMsgAuditService.java | 定义 4 个新的安全 API 方法,将旧方法标记为 @deprecated |
| weixin-java-cp/src/main/java/me/chanjar/weixin/cp/api/impl/WxCpMsgAuditServiceImpl.java | 实现新的安全 API,使用 try-finally 模式自动管理引用计数 |
| weixin-java-cp/src/test/java/me/chanjar/weixin/cp/api/WxCpMsgAuditTest.java | 新增测试用例验证新 API 的功能 |
| docs/CP_MSG_AUDIT_SDK_SAFE_USAGE.md | 提供详细的迁移指南,包含问题背景、解决方案、使用示例和常见问题 |
| public synchronized int decrementMsgAuditSdkRefCount(long sdk) { | ||
| if (this.msgAuditSdk == sdk && this.msgAuditSdkRefCount > 0) { | ||
| int newCount = --this.msgAuditSdkRefCount; | ||
| // 当引用计数降为0时,自动销毁SDK以释放资源 | ||
| if (newCount == 0) { | ||
| Finance.DestroySdk(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在多线程环境下,SDK 可能会被重新初始化(通过 initSdk() 方法),导致 this.msgAuditSdk 发生变化。当前实现在引用计数降为 0 时销毁 SDK,但没有考虑 SDK 已过期并被重新初始化的情况。建议在销毁前再次检查 SDK 是否仍然是当前缓存的 SDK,避免销毁已经被替换的旧 SDK。可以考虑在 updateMsgAuditSdk 中先销毁旧的 SDK(如果引用计数为 0)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。在 decrementMsgAuditSdkRefCount() 和新增的 releaseMsgAuditSdk() 方法中,销毁SDK前会再次验证 this.msgAuditSdk == sdk。同时在 updateMsgAuditSdk() 中增加了对旧SDK的清理逻辑。(commit: 1d4e08c)
| public synchronized int decrementMsgAuditSdkRefCount(long sdk) { | ||
| if (this.msgAuditSdk == sdk && this.msgAuditSdkRefCount > 0) { | ||
| int newCount = --this.msgAuditSdkRefCount; | ||
| // 当引用计数降为0时,自动销毁SDK以释放资源 | ||
| if (newCount == 0) { | ||
| Finance.DestroySdk(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与 WxCpDefaultConfigImpl 相同的问题:在多线程环境下,SDK 可能会被重新初始化,导致 this.msgAuditSdk 发生变化。建议在销毁前再次检查 SDK 是否仍然是当前缓存的 SDK,避免销毁已经被替换的旧 SDK。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。WxCpRedisConfigImpl 中应用了与 WxCpDefaultConfigImpl 相同的改进,包括销毁前的二次检查和旧SDK清理逻辑。(commit: 1d4e08c)
| // 获取或初始化SDK | ||
| long sdk = this.initSdk(); | ||
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); | ||
|
|
||
| try { |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 initSdk() 返回后到 incrementMsgAuditSdkRefCount() 调用之间存在时间窗口,如果此时 SDK 过期并被其他线程重新初始化,可能会导致引用计数操作在错误的 SDK 上。建议将 SDK 获取和引用计数增加操作合并到一个原子操作中,或者在配置存储层提供一个 getOrInitSdkWithRefCount() 方法来确保原子性。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。新增 acquireMsgAuditSdk() 原子操作方法,在同步块内完成SDK获取和引用计数增加。所有新API方法改用 acquireSdk() 和 releaseSdk() 辅助方法,消除了竞态条件。(commit: 1d4e08c)
| // 获取或初始化SDK | ||
| long sdk = this.initSdk(); | ||
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与 getChatRecords 方法相同的并发问题:在 SDK 获取和引用计数增加之间存在竞态条件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。通过原子操作方法消除了竞态条件。(commit: 1d4e08c)
| // 获取或初始化SDK | ||
| long sdk = this.initSdk(); | ||
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与 getChatRecords 方法相同的并发问题:在 SDK 获取和引用计数增加之间存在竞态条件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。通过原子操作方法消除了竞态条件。(commit: 1d4e08c)
| // 获取或初始化SDK | ||
| long sdk; | ||
| try { | ||
| sdk = this.initSdk(); | ||
| } catch (WxErrorException e) { | ||
| throw e; | ||
| } catch (Exception e) { | ||
| throw new WxErrorException(e); | ||
| } | ||
|
|
||
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与其他方法相同的并发问题:在 SDK 获取和引用计数增加之间存在竞态条件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。通过原子操作方法消除了竞态条件。(commit: 1d4e08c)
| // 获取或初始化SDK | ||
| long sdk; | ||
| try { | ||
| sdk = this.initSdk(); | ||
| } catch (WxErrorException e) { | ||
| throw e; | ||
| } catch (Exception e) { | ||
| throw new WxErrorException(e); | ||
| } | ||
|
|
||
| WxCpConfigStorage configStorage = cpService.getWxCpConfigStorage(); | ||
|
|
||
| // 增加引用计数 | ||
| configStorage.incrementMsgAuditSdkRefCount(sdk); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与其他方法相同的并发问题:在 SDK 获取和引用计数增加之间存在竞态条件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。通过原子操作方法消除了竞态条件。(commit: 1d4e08c)
| public synchronized int incrementMsgAuditSdkRefCount(long sdk) { | ||
| if (this.msgAuditSdk == sdk) { | ||
| return ++this.msgAuditSdkRefCount; | ||
| } | ||
| return 0; | ||
| } |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当传入的 sdk 与当前缓存的 sdk 不匹配时,方法返回 0。这可能导致调用方无法区分两种情况:(1) SDK 不匹配,(2) SDK 匹配但引用计数为 0。建议在 SDK 不匹配时抛出异常或返回一个明确的错误值(如 -1),以便调用方能够正确处理这种异常情况。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。SDK不匹配时现在返回-1(而非0),调用方可以明确区分SDK不匹配和引用计数为0的情况。(commit: 1d4e08c)
| public synchronized int incrementMsgAuditSdkRefCount(long sdk) { | ||
| if (this.msgAuditSdk == sdk) { | ||
| return ++this.msgAuditSdkRefCount; | ||
| } | ||
| return 0; | ||
| } |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与 WxCpDefaultConfigImpl 相同的问题:当传入的 sdk 与当前缓存的 sdk 不匹配时,方法返回 0,可能导致调用方无法正确处理这种异常情况。建议在 SDK 不匹配时抛出异常或返回明确的错误值。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复。SDK不匹配时现在返回-1(而非0),与 WxCpDefaultConfigImpl 保持一致。(commit: 1d4e08c)
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: binarywang <1343140+binarywang@users.noreply.github.com>
修复企业微信会话存档SDK生命周期管理导致的JVM崩溃问题
主要改进(已完成代码审查反馈)
1. 解决并发竞态条件问题
问题: 在
initSdk()和incrementMsgAuditSdkRefCount()之间存在时间窗口,可能导致引用计数操作在错误的SDK上解决方案:
acquireMsgAuditSdk()原子操作方法,在一个同步块内完成SDK获取和引用计数增加releaseMsgAuditSdk()原子操作方法,在一个同步块内完成引用计数减少和SDK销毁检查acquireSdk()和releaseSdk()辅助方法封装原子操作逻辑2. 改进SDK不匹配时的返回值
问题: 当传入的SDK与缓存的SDK不匹配时,返回0无法区分是SDK不匹配还是引用计数为0
解决方案:
incrementMsgAuditSdkRefCount()SDK不匹配时返回-1(而非0)decrementMsgAuditSdkRefCount()SDK不匹配时返回-1(而非0)getMsgAuditSdkRefCount()SDK不匹配时返回-1(而非0)3. 加强SDK销毁时的检查
问题: 在销毁SDK时需要再次检查SDK是否仍然是当前缓存的SDK
解决方案:
decrementMsgAuditSdkRefCount()和releaseMsgAuditSdk()中,销毁前再次验证this.msgAuditSdk == sdkupdateMsgAuditSdk()中,如果有旧SDK且引用计数为0,先销毁旧SDK再更新4. 新增原子操作接口
在
WxCpConfigStorage接口中新增:核心改进对比
修复前:
修复后:
技术实现细节
acquireSdk() 方法逻辑:
acquireMsgAuditSdk())initSdk()初始化releaseSdk() 方法逻辑:
releaseMsgAuditSdk()原子减少引用计数测试验证
代码审查反馈处理
影响范围
Original prompt
This section details on the original issue you should resolve
<issue_title>企业微信会话存档下载时总是时不时出现报错,导致整个JVM直接崩溃</issue_title>
<issue_description>
简要描述
企业微信会话存档下载时总是时不时出现报错,导致整个JVM直接崩溃
模块版本情况
详细描述
public List pullWeworkChatData(long seq, long limit, Predicate duplicateMessageFilter) throws Exception {
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.