Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
当前的新视频检测逻辑
缺点
当前的检测逻辑大多时候可以工作正常,但无法防御一些边缘情况,主要有两点。
1. 接口请求过程中发生视频新增操作
请求第一页并写入到数据库的过程中,用户又收藏、更新了一条视频,这样第一页的最后一个视频被挤到了第二页的第一个。
这样在处理第二页时,会发现数据库中有“bvid、时间”重合的数据,错误触发了中断逻辑,导致后续的未处理视频丢失。
2. 接口上页请求成功而下页请求失败
为了避免对象囤积节省内存,当前将视频写入数据库的顺序和接口的请求顺序是相同的,均为从新到旧。
如果有多页新视频,第一页请求成功且正确写入,第二页由于网络等原因请求错误,当前的处理逻辑是直接中断等待下次运行。
下次运行这个任务时,第一页这些较新的视频会触发中断逻辑,导致后续页面永久丢失,无法再处理。
解决方案
不难看出当前问题在于检测逻辑过于信任检测过程中产生的部分结果,导致对上次处理到的位置判断错误,要解决这个问题需要维护一个在整个检测过程完成才更新的值。
因此该 PR 为所有的 video list 表引入了
latest_row_at
字段,记录了最后处理的视频时间,该字段会在新视频检测运行全部完成时被更新为最大的视频时间。中断的唯一判断条件是当前处理到的视频时间
<= latest_row_at
。