Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

IRONICBo
Copy link

提交信息

  • 队伍名称: 阮老师小分队
  • 队伍人数: 1
  • 队伍课题:基于异步HTTP实现p2p分布式协议加速S3 Object存取实现

@IRONICBo
Copy link
Author

I am very sorry that due to the time taken by the laboratory, some documents are not fully completed, which may affect your understanding of this project.

@IRONICBo
Copy link
Author

Core features: (p2p-with-tracker)

  1. clear code modules, functional reuse, according to server/api/task, etc. on the logic of the split
  2. build HTTP Server, external exposure interface, easy to test the code and a key to start, observe the cluster state , etc.
  3. Integrate client to facilitate the test delay information and success rate information, while providing docker-compose startup test S3 services, easy to reproduce.
  4. implement internal cache, in the node to store seed information, slice information, online information, download records, heartbeat information, timing information, etc.
  5. the implementation of the native p2p protocol, the test results are good, and there is a wealth of log information.

The general process:

  1. tracker node on-line, node node on-line, and through the timing thread to maintain an online node node heartbeat list (here the tracker and node can be reuse the same node, can also be split, is also equivalent to the bootstrap node)
  2. request the download task on the node node, the node node starts to download, here is divided into two cases:
    2.1. Initially, there is no information about the slice of S3 in the cluster, firstly, it will register the file information with the tracker node and get the file information as the seed information, and the other subsequent nodes will get the information from the seed information, if there is no downloaded slice, it will be downloaded directly from S3, if there is a downloaded slice, it will be directly obtained from the endpoint location of the other nodes. If there is a downloaded slice, it will get it directly from the other node's endpoint position. Here, the seed will be shuffled when getting the download task, to reduce the conflict when it is just started.
    2.2. stable condition, basically can be obtained from the surrounding online nodes, the new node can directly check the surrounding node's slice and download it directly, to avoid directly requesting files from S3.

Follow-up work:

  1. currently also testing the use of mDNS and DHT, due to manual implementation of some of the difficulties so in this version is not yet realized, in addition, can be combined with gossip to broadcast the node's cache information to reduce the dependence for the tracker or directly split tracker's tasks to other nodes.
  2. Regarding the optimization of the transmission, in order to facilitate the debugging and understanding of the current HTTP way to expose the interface, you can use these HTTP interfaces to directly understand the cluster slice information and status.

核心特点:(p2p-with-tracker)

  1. 代码模块清晰,功能复用,按照server/api/task等对逻辑进行拆分
  2. 构建HTTP Server,对外暴露接口,方便测试代码并一键启动,观察集群状态等
  3. 集成client方便测试延时信息和成功率信息,同时提供docker-compose启动测试S3服务,方便复现
  4. 实现内部缓存,在节点中存储种子信息,分片信息,在线信息,下载记录,心跳信息,计时信息等
  5. 实现原生的p2p协议,经测试效果较好,并且有丰富的日志信息

大致流程:

  1. tracker节点上线,node节点上线,并且通过定时线程维护一个在线node节点的心跳列表(这里的tracker和node可复用同一个节点,也可以拆分开,也相当于引导节点)
  2. 请求node节点上的下载任务,node节点开始下载,这里分成两种情况:
    2.1. 初始情况,集群里面没有该S3的分片信息,首先会向tracker节点中注册文件信息,并且获取文件信息作为种子信息,其他后续节点将会从这个种子信息里面获取,如果没有已下载的分片的话,会直接从S3中下载,如果有已下载的分片,则直接从其他node的endpoint位置获取。这里在获取下载任务时候会对种子进行洗牌操作,减少刚启动的冲突
    2.2. 稳定状况,基本上都可以从周围在线节点中获取,新增节点可以直接查到周围节点的分片并直接下载,避免直接向S3请求文件。

后续工作:

1.目前还在测试使用mDNS和DHT的方式,由于手动实现有些困难因此在这个版本中暂未实现,另外,可以结合gossip将节点的缓存信息广播出去,减少为tracker的依赖或者直接拆分tracker的任务到其他节点中。
2. 关于传输的优化,目前为了方便调试和理解都是用HTTP的方式对外暴露接口,可以通过这些HTTP接口直接了解集群中的分片信息和状态。

Comment on lines +311 to +323
for piece in pieces {
let piece_id: String = piece.get_checksum().clone();
match PIECE_CACHE.get(&piece_id.clone()) {
Some(piece) => {
file.extend(piece);
},
None => {
debug!("Piece: {:?} not found", piece);
continue;
},
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible that if a piece of a file is not found, the file will become broken with pieces missed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reto911 Thanks for the suggestion :) yes, in ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/file.rs lines 250 and 275 I've tried to fetch that file slice from the S3 raw storage and from the other online nodes, if it's unavailable it means that the slice is temporarily unavailable and cannot be stitched together into a complete file.

I think I can add a periodic retry to the case, or report the slice missing to S3/Tracker for more complete processing.

Comment on lines +22 to +23
// Update global time
GLOBAL_TIMESTAMP_CACHE.insert(timestamp);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me why did you insert the timestamp into it? By the way, what does GLOBAL_TIMESTAMP_CACHE do exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reto911 Hello, this cache is used in calc_download_time in ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/file.rs to test how long it takes the current node to download a file, it is done with a simple It is a simple list to access the timestamp information, and to calculate the time it takes to download a file, you just need to pop up the latest time and calculate the difference directly.

For multi-node testing, I didn't use this form directly, but instead calculated the time spent on calls to the synchronized HTTP interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants