feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

IRONICBo · 2023-12-15T16:20:15Z

提交信息

队伍名称：阮老师小分队
队伍人数： 1
队伍课题：基于异步HTTP实现p2p分布式协议加速S3 Object存取实现

IRONICBo · 2023-12-16T16:07:51Z

I am very sorry that due to the time taken by the laboratory, some documents are not fully completed, which may affect your understanding of this project.

IRONICBo · 2023-12-16T16:26:40Z

Core features: (p2p-with-tracker)

clear code modules, functional reuse, according to server/api/task, etc. on the logic of the split
build HTTP Server, external exposure interface, easy to test the code and a key to start, observe the cluster state , etc.
Integrate client to facilitate the test delay information and success rate information, while providing docker-compose startup test S3 services, easy to reproduce.
implement internal cache, in the node to store seed information, slice information, online information, download records, heartbeat information, timing information, etc.
the implementation of the native p2p protocol, the test results are good, and there is a wealth of log information.

The general process:

tracker node on-line, node node on-line, and through the timing thread to maintain an online node node heartbeat list (here the tracker and node can be reuse the same node, can also be split, is also equivalent to the bootstrap node)
request the download task on the node node, the node node starts to download, here is divided into two cases:
2.1. Initially, there is no information about the slice of S3 in the cluster, firstly, it will register the file information with the tracker node and get the file information as the seed information, and the other subsequent nodes will get the information from the seed information, if there is no downloaded slice, it will be downloaded directly from S3, if there is a downloaded slice, it will be directly obtained from the endpoint location of the other nodes. If there is a downloaded slice, it will get it directly from the other node's endpoint position. Here, the seed will be shuffled when getting the download task, to reduce the conflict when it is just started.
2.2. stable condition, basically can be obtained from the surrounding online nodes, the new node can directly check the surrounding node's slice and download it directly, to avoid directly requesting files from S3.

Follow-up work:

currently also testing the use of mDNS and DHT, due to manual implementation of some of the difficulties so in this version is not yet realized, in addition, can be combined with gossip to broadcast the node's cache information to reduce the dependence for the tracker or directly split tracker's tasks to other nodes.
Regarding the optimization of the transmission, in order to facilitate the debugging and understanding of the current HTTP way to expose the interface, you can use these HTTP interfaces to directly understand the cluster slice information and status.

核心特点：（p2p-with-tracker）

代码模块清晰，功能复用，按照server/api/task等对逻辑进行拆分
构建HTTP Server，对外暴露接口，方便测试代码并一键启动，观察集群状态等
集成client方便测试延时信息和成功率信息，同时提供docker-compose启动测试S3服务，方便复现
实现内部缓存，在节点中存储种子信息，分片信息，在线信息，下载记录，心跳信息，计时信息等
实现原生的p2p协议，经测试效果较好，并且有丰富的日志信息

大致流程：

tracker节点上线，node节点上线，并且通过定时线程维护一个在线node节点的心跳列表（这里的tracker和node可复用同一个节点，也可以拆分开，也相当于引导节点）
请求node节点上的下载任务，node节点开始下载，这里分成两种情况：
2.1. 初始情况，集群里面没有该S3的分片信息，首先会向tracker节点中注册文件信息，并且获取文件信息作为种子信息，其他后续节点将会从这个种子信息里面获取，如果没有已下载的分片的话，会直接从S3中下载，如果有已下载的分片，则直接从其他node的endpoint位置获取。这里在获取下载任务时候会对种子进行洗牌操作，减少刚启动的冲突
2.2. 稳定状况，基本上都可以从周围在线节点中获取，新增节点可以直接查到周围节点的分片并直接下载，避免直接向S3请求文件。

后续工作：

1.目前还在测试使用mDNS和DHT的方式，由于手动实现有些困难因此在这个版本中暂未实现，另外，可以结合gossip将节点的缓存信息广播出去，减少为tracker的依赖或者直接拆分tracker的任务到其他节点中。
2. 关于传输的优化，目前为了方便调试和理解都是用HTTP的方式对外暴露接口，可以通过这些HTTP接口直接了解集群中的分片信息和状态。

Reto911 · 2023-12-22T10:29:27Z

ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/file.rs

+            for piece in pieces {
+                let piece_id: String = piece.get_checksum().clone();
+                match PIECE_CACHE.get(&piece_id.clone()) {
+                    Some(piece) => {
+                        file.extend(piece);
+                    },
+                    None => {
+                        debug!("Piece: {:?} not found", piece);
+                        continue;
+                    },
+                }
+            }
+


Would it be possible that if a piece of a file is not found, the file will become broken with pieces missed?

@Reto911 Thanks for the suggestion :) yes, in ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/file.rs lines 250 and 275 I've tried to fetch that file slice from the S3 raw storage and from the other online nodes, if it's unavailable it means that the slice is temporarily unavailable and cannot be stitched together into a complete file.

I think I can add a periodic retry to the case, or report the slice missing to S3/Tracker for more complete processing.

Reto911 · 2023-12-22T12:19:07Z

ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/node.rs

+            // Update global time
+            GLOBAL_TIMESTAMP_CACHE.insert(timestamp);


Could you tell me why did you insert the timestamp into it? By the way, what does GLOBAL_TIMESTAMP_CACHE do exactly?

@Reto911 Hello, this cache is used in calc_download_time in ruan_lao_shi_xiao_fen_dui/p2p-with-tracker/src/server/logic/file.rs to test how long it takes the current node to download a file, it is done with a simple It is a simple list to access the timestamp information, and to calculate the time it takes to download a file, you just need to pop up the latest time and calculate the difference directly.

For multi-node testing, I didn't use this form directly, but instead calculated the time spent on calls to the synchronized HTTP interface.

IRONICBo added 2 commits December 16, 2023 00:16

feat: Add genfile and baseline.

d5aed85

feat: Add p2p implement.

e2f5c11

IRONICBo force-pushed the master branch from 1fd1a86 to e2f5c11 Compare December 15, 2023 16:24

IRONICBo added 4 commits December 16, 2023 00:37

misc: Update api docs and docker-compose.

50f7f6a

fix: Update config.

0d29ea2

misc: Update test client.

c0906d3

misc: Update docs.

d05762c

Reto911 reviewed Dec 22, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

IRONICBo commented Dec 15, 2023

IRONICBo commented Dec 16, 2023

IRONICBo commented Dec 16, 2023

Reto911 Dec 22, 2023

IRONICBo Dec 23, 2023

Reto911 Dec 22, 2023

IRONICBo Dec 23, 2023

		// Update global time
		GLOBAL_TIMESTAMP_CACHE.insert(timestamp);

feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

Are you sure you want to change the base?

feat: 阮老师小分队 - 基于异步HTTP实现p2p分布式协议加速S3 Object存取实现 #2

Conversation

IRONICBo commented Dec 15, 2023

提交信息

IRONICBo commented Dec 16, 2023

IRONICBo commented Dec 16, 2023

Reto911 Dec 22, 2023

Choose a reason for hiding this comment

IRONICBo Dec 23, 2023

Choose a reason for hiding this comment

Reto911 Dec 22, 2023

Choose a reason for hiding this comment

IRONICBo Dec 23, 2023

Choose a reason for hiding this comment