Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zsync2 does not support downloading large files. #31

Open
ghuls opened this issue Dec 21, 2018 · 7 comments
Open

zsync2 does not support downloading large files. #31

ghuls opened this issue Dec 21, 2018 · 7 comments

Comments

@ghuls
Copy link

ghuls commented Dec 21, 2018

zsync2 does not support downloading large files.
failed to parse content-range headerError while parsing headersOther error? -1

I patched zsync2 so it shows to and from values:

$ git diff -U10
diff --git a/src/legacy_http.c b/src/legacy_http.c
index 41310da..290603e 100644
--- a/src/legacy_http.c
+++ b/src/legacy_http.c
@@ -626,20 +626,21 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
             p[len] = 0;
         }
         /* buf is the header name (lower-cased), p the value */
         /* Switch based on header */
 
         if (status == 206 && !strcmp(buf, "content-range")) {
             /* Okay, we're getting a non-MIME block from the remote. Get the
              * range and set our state appropriately */
             int from, to;
             sscanf(p, "bytes " OFF_T_PF "-" OFF_T_PF "/", &from, &to);
+            fprintf(stderr, "content-range from: %d  to: %d\n", from, to);
             if (from <= to) {
                 rf->block_left = to + 1 - from;
                 rf->offset = from;
             } else {
                 fprintf(stderr, "failed to parse content-range header");
             }
 
             /* Can only have got one range. */
             rf->rangesdone++;
             rf->rangessent = rf->rangesdone;
$ ./zsync2 -v https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather.zsync
zsync2 version 2.0.0-alpha-1 (commit 7857ff1), build <local dev build> built on 2018-12-21 09:58:27 UTC
Checking for changes...
Cannot find file hg19-regions-9species.all_regions.mc8nr.feather, triggering full download
/ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part found, using as seed file
Target file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather
Reading seed file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part
Usable data from seed files: 0.000000%
Renaming temp file
Fetching remaining blocks
Downloading from https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather

-------------------- 0.0%* Hostname was NOT found in DNS cache
*   Trying 134.58.50.8...
* Adding handle: conn: 0x1654a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 3 (0x1654a00) send_pipe: 1, recv_pipe: 0
* Connected to resources.aertslab.org (134.58.50.8) port 443 (#3)
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* 	 subject: CN=resources.aertslab.org
* 	 start date: 2018-11-25 04:49:48 GMT
* 	 expire date: 2019-02-23 04:49:48 GMT
* 	 subjectAltName: resources.aertslab.org matched
* 	 issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* 	 SSL certificate verify ok.
> GET /cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather HTTP/1.1
Range: bytes=0-3475369983
Host: resources.aertslab.org
Accept: */*

< HTTP/1.1 206 Partial Content
< Date: Fri, 21 Dec 2018 10:09:34 GMT
* Server Apache/2.4.37 (Ubuntu) is not blacklisted
< Server: Apache/2.4.37 (Ubuntu)
< Strict-Transport-Security: max-age=15768000
< Last-Modified: Wed, 23 May 2018 07:38:22 GMT
< ETag: "16cf25e760-56cda9e8f304e"
< Accept-Ranges: bytes
< Content-Length: 3475369984
< Content-Range: bytes 0-3475369983/97964648288
< 
content-range from: 0  to: -819597313
failed to parse content-range headerError while parsing headersOther error? -1
-1 returned
-------------------- 0.0% 0.0 kBps aborted    

* Closing connection 3
failed to retrieve from hg19-regions-9species.all_regions.mc8nr.feather, status -1

As you can see int (signed int) is not big enough, from and to should be uint (unsigned int) (at least 32 bits).

@probonopd
Copy link
Member

probonopd commented Dec 21, 2018

Thanks @ghuls. Could you please send a PR that includes

  • The added verbosity
  • Using uint

Again, thank you very much.

@ghuls
Copy link
Author

ghuls commented Dec 21, 2018

@probonopd Adding this change is not enough.

It seems that there are a lot of issues with files bigger than 2GiB:

$ ./zsync2 -v https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather.zsync
zsync2 version 2.0.0-alpha-1 (commit 7857ff1), build <local dev build> built on 2018-12-21 12:30:07 UTC
Checking for changes...
Cannot find file hg19-regions-9species.all_regions.mc8nr.feather, triggering full download
/ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part found, using as seed file
Target file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather
Reading seed file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part
Usable data from seed files: 3.547576%
Renaming temp file
Fetching remaining blocks
Downloading from https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather

-------------------- 3.5%* Hostname was NOT found in DNS cache
*   Trying 134.58.50.8...
* Adding handle: conn: 0x16f74f0
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 3 (0x16f74f0) send_pipe: 1, recv_pipe: 0
* Connected to resources.aertslab.org (134.58.50.8) port 443 (#3)
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* 	 subject: CN=resources.aertslab.org
* 	 start date: 2018-11-25 04:49:48 GMT
* 	 expire date: 2019-02-23 04:49:48 GMT
* 	 subjectAltName: resources.aertslab.org matched
* 	 issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* 	 SSL certificate verify ok.
> GET /cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather HTTP/1.1
Range: bytes=3475369984-3475369983
Host: resources.aertslab.org
Accept: */*

< HTTP/1.1 200 OK
< Date: Fri, 21 Dec 2018 12:36:54 GMT
* Server Apache/2.4.37 (Ubuntu) is not blacklisted
< Server: Apache/2.4.37 (Ubuntu)
< Strict-Transport-Security: max-age=15768000
< Last-Modified: Wed, 23 May 2018 07:38:22 GMT
< ETag: "16cf25e760-56cda9e8f304e"
< Accept-Ranges: bytes
< Content-Length: 97964648288
< 

zsync received a data response (code 200) but this is not a partial content response
zsync can only work with servers that support returning partial content from files. The person/entity creating this .zsync has tried to use a server that is not returning partial content. zsync cannot be used with this server.
See http://zsync.moria.orc.uk/server-issues
Other error? -1
-1 returned
-------------------- 3.5% 0.0 kBps aborted    

* Closing connection 3
failed to retrieve from hg19-regions-9species.all_regions.mc8nr.feather, status -1

It seems to request an invalid byte range: Range: bytes=3475369984-3475369983

Tested with this changes:

$ git diff
diff --git a/src/legacy_http.c b/src/legacy_http.c
index 41310da..ccf4f06 100644
--- a/src/legacy_http.c
+++ b/src/legacy_http.c
@@ -53,8 +53,8 @@ struct http_file
     } handle;
 
     char *buffer;
-    size_t buffer_len;
-    size_t buffer_pos;
+    off_t buffer_len;
+    off_t buffer_pos;
     int still_running;
 };
 
@@ -391,9 +391,9 @@ static int fill_buffer(HTTP_FILE *file, size_t want, CURLM* multi_handle)
  *
  * Removes `want` bytes from the front of the buffer.
  */
-static int use_buffer(HTTP_FILE *file, int want)
+static off_t use_buffer(HTTP_FILE *file, off_t want)
 {
-    if((file->buffer_pos - want) <= 0){
+    if(file->buffer_pos <= want){
         /* trash the buffer */
         if(file->buffer){
             free(file->buffer);
@@ -416,7 +416,7 @@ static int use_buffer(HTTP_FILE *file, int want)
  */
 size_t http_fread(void *ptr, size_t size, size_t nmemb, HTTP_FILE *file, struct range_fetch *rf)
 {
-    size_t want;
+    off_t want;
     want = nmemb * size;
     fill_buffer(file, want, rf->multi_handle);
 
@@ -560,14 +560,14 @@ static void buflwr(char *s) {
 int range_fetch_read_http_headers(struct range_fetch *rf) {
     char buf[512];
     int status;
-    int seen_location = 0;
+   uint seen_location = 0;
 
     {                           /* read status line */
         char *p;
 
         if (rfgets(buf, sizeof(buf), rf) == NULL){
             /* most likely unexpected EOF from server */
-            fprintf(stderr, "EOF from server");
+            fprintf(stderr, "EOF from server\n");
             return -1;
         }
         if (buf[0] == 0)
@@ -622,7 +622,7 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
         p += 2;
         buflwr(buf);
         {   /* Remove the trailing \r\n from the value */
-            int len = strcspn(p, "\r\n");
+            uint len = strcspn(p, "\r\n");
             p[len] = 0;
         }
         /* buf is the header name (lower-cased), p the value */
@@ -631,13 +631,14 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
         if (status == 206 && !strcmp(buf, "content-range")) {
             /* Okay, we're getting a non-MIME block from the remote. Get the
              * range and set our state appropriately */
-            int from, to;
+            off_t from, to;
             sscanf(p, "bytes " OFF_T_PF "-" OFF_T_PF "/", &from, &to);
+            fprintf(stderr, "content-range from: %d  to: %d\n", from, to);
             if (from <= to) {
                 rf->block_left = to + 1 - from;
                 rf->offset = from;
             } else {
-                fprintf(stderr, "failed to parse content-range header");
+                fprintf(stderr, "failed to parse content-range header\n");
             }
 
             /* Can only have got one range. */
@@ -678,7 +679,7 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
          */
     }
 
-    fprintf(stderr, "Error while parsing headers");
+    fprintf(stderr, "Error while parsing headers\n");
     return -1;
 }
 
diff --git a/src/zsclient.cpp b/src/zsclient.cpp
index 06a993b..c5fd3f0 100644
--- a/src/zsclient.cpp
+++ b/src/zsclient.cpp
@@ -269,12 +269,14 @@ namespace zsync2 {
 
                 // if interested in headers only, download 1 kiB chunks until end of zsync header is found
                 if (headersOnly) {
-                    static const auto chunkSize = 1024;
-                    unsigned long currentChunk = 0;
+issueStatusMessage("headersOnly");
+                    static const off_t chunkSize = 1024;
+                    off_t currentChunk = 0;
 
                     // download a chunk at a time
                     while (true) {
                         std::ostringstream bytes;
+issueStatusMessage("headersOnly:" + std::to_string(currentChunk) + " " + std::to_string( chunkSize) + " " + std::to_string(currentChunk + chunkSize - 1) + "\n");
                         bytes << "bytes=" << currentChunk << "-" << currentChunk + chunkSize - 1;
                         session.SetHeader(cpr::Header{{"range", bytes.str()}});
 

@TheAssassin
Copy link
Member

It'll be much more easy to review if you send a PR right away.

@probonopd
Copy link
Member

probonopd commented Dec 21, 2018

I think he is not sending a PR since despite his changes it is not working yet.

@JustTNE
Copy link

JustTNE commented Mar 24, 2021

Hmm, applying this diff file (manually, thanks a lot git apply for never working) makes it work on the Garuda Linux ISO file I tested this on: https://builds.garudalinux.org/iso/garuda/dr460nized/210324/garuda-dr460nized-linux-zen-210324.iso.zsync

JustTNE added a commit to JustTNE/zsync2 that referenced this issue Mar 25, 2021
@JustTNE
Copy link

JustTNE commented Mar 25, 2021

I've noticed while compiling this in cygwin that this is sometimes wrong and uses 32 bit stuff instead: https://github.com/AppImage/zsync2/blob/86cfd3a1d6a27483ec40edd62c1a6bd409cbbe5d/src/format_string.h#L24-L36

Forcing it to use 64 bit stuff fixed any issues I had on the cygwin compiled version.

@TheAssassin
Copy link
Member

This patch goes in the right direction, but it actually doesn't solve the issue. See my comments in #59. A fix must use fixed 64-bit types. size_t and off_t are compiler-dependent and typically just 32-bit in size on 32-bit machines.

TheAssassin added a commit that referenced this issue Apr 2, 2021
Apparently, only a few lines have to be changed in order to support
large(r) files on 64-bit machines.

This commit doesn't (yet) fix the issue on 32-bit machines (it also
doesn't test this explicitly). In comparison to #59, however, it uses
types that help get this to work on 32-bit machines as well, as it
doesn't use compiler-dependent types, but types that are known to be
large enough even there.

Closes #59.

CC #31.
NiLuJe pushed a commit to NiLuJe/zsync2 that referenced this issue Jan 19, 2024
Apparently, only a few lines have to be changed in order to support
large(r) files on 64-bit machines.

This commit doesn't (yet) fix the issue on 32-bit machines (it also
doesn't test this explicitly). In comparison to AppImageCommunity#59, however, it uses
types that help get this to work on 32-bit machines as well, as it
doesn't use compiler-dependent types, but types that are known to be
large enough even there.

Closes AppImageCommunity#59.

CC AppImageCommunity#31.

(cherry picked from commit a8e2d68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants