Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPDS: Issues with acquisition across domains #2512

Closed
mpdunlop opened this issue Aug 19, 2024 · 25 comments
Closed

OPDS: Issues with acquisition across domains #2512

mpdunlop opened this issue Aug 19, 2024 · 25 comments

Comments

@mpdunlop
Copy link
Contributor

mpdunlop commented Aug 19, 2024

  • If an OPDS Bookshelf that requires authentication is hosted on a domain (e.g. https://bookshelf.contoso.org/purchased.json)
  • The OAuth Server on a different domain to that, (e.g. https://auth.contoso.org/connect/authorize); and
  • An OPDS publication's acquisition link is on a different domain (e.g. https://download.contoso.org/epub/1d5235d7-044d-46b1-bc8e-913868f20003.epub)

Thorium does not pass the access_token obtained from the OAuth response at auth.contoso.org to download.contoso.org and the request will fail to authenticate.

It appears that the authorization response's bearer token is cached against the OPDS Feed's domain (bookshelf.contoso.org) which can not later be retrieved by Thorium when using download.contoso.org as a lookup.

To see if it would work, I attempted to return an OPDS Authentication Document when accessing an acquisition link and the request was missing a bearer token. I hoped that the id of the OPDS Authentication Document would be used to look up the existing bearer token, but this also failed to work.

What is the correct way to represent this data and give the user a seamless experience?

@panaC
Copy link
Member

panaC commented Aug 19, 2024

In my head to access other sub-domain/domain with authentication you have to generate a signed URL . These URLs don't rely on bearer token and can be hosted on any S3 AWS/GCP provider for example.
By security bearerToken is attached to the current host bookshelf.contoso.org.
An another way to authenticate your download subdomain is to use a reverse Proxy like Nginx/Caddy/Envoy

@mpdunlop
Copy link
Contributor Author

mpdunlop commented Aug 20, 2024

I feel there are shortcomings with each of these approaches:

Signed URLs

This does not scale well. For large OPDS feeds this will require the generation of a large number of pre-signed links each
time the bookshelf is requested.

In our case, there may be over 2000 titles and each of these will have an LCP encrypted Epub and PDF available meaning we must generate over 4000 pre-signed links every time the OPDS feed is accessed, for each user. Pagination would offset this somewhat but it is still unnecessary effort when most acquisition links will not be accessed.

Reverse Proxy like Nginx/Caddy/Envoy

This requires OPDS feed providers to set up an additional service, and acts as a potential point of failure.

By security bearerToken is attached to the current host bookshelf.contoso.org.

That makes sense and I agree with this approach.

However Authentication for OPDS 1.0 states (bold emphasis added by me):

The primary objective is to allow access to specific feeds (such as bookshelf and subscriptions) along with support for interactions that require authentication (specific acquisition links such as buy, borrow and subscribe).

Thorium currently does not respect OPDS Authentication Documents returned when accessing Acquisition Links and receiving a 401 Unauthorized response, instead raising an error and not downloading the content. I think the correct solution is for Thorium to add support for this scenario.

@mpdunlop mpdunlop changed the title OPDS: Issues with aquisition across domains OPDS: Issues with acquisition across domains Aug 20, 2024
@danielweck
Copy link
Member

cross-posting from another issue, but just in case somebody in the future comes across this thread, here is a separate discussion about OPDS Auth:

opds-community/drafts#82

@HadrienGardeur
Copy link
Member

I can confirm that OPDS clients should store Access Tokens and Refresh Tokens in a way where they're tied to the id of the Authentication Document.

As mentioned above, upon encountering a 401 with an Authentication Document in its response, Thorium should use the id to look up for a known Access Token.

To avoid this situation where the client always needs to do an initial GET request without a Bearer Token, I plan on adding a new authenticate property as discussed on opds-community/drafts#43

This approach has been implemented in Aldiko a while back and is much more efficient when interacting with an OPDS catalog.

@mpdunlop
Copy link
Contributor Author

mpdunlop commented Aug 22, 2024

Apologies if that has already been discussed elsewhere - I was not able to find mention of this in the draft and wanted to raise a potential issue with using the id of the Authentication Document as the primary identifier for OPDS Clients determining whether to send an access token to a URL in a request.

Tying authentication tokens to the id of the OPDS Authentication Document has security considerations.

An attacker wanting to gain access to a user's account could craft an OPDS Feed that contains an OPDS Authentication Document that uses the same Id as a legitimate OPDS Feed provider. If the user has previously authenticated using that document, and the client looks up the auth token by the document's id, that token will be sent to the attacker.

  • https://opds.contoso.org/publications.json requires authentication, first access returns OPDS Authentication Document with id https://auth.contoso.org/odps-auth.json
  • User signs in and accesses books
  • User is later coerced into loading feed from attacker's domain (e.g. https://opds.contosoo.org/publications.json, a domain similar in appearance to an authentic one which is common in scams).
  • This also requires authentication, and the first access returns OPDS Authentication Document with the same id as a legitimate OPDS provider, e.g. https://auth.contoso.org/odps-auth.json

OPDS Clients must not provide the token in this scenario. In the case of Basic Auth, this could result in usernames and passwords being entered by users and shared with the attacker. For OAuth, the access_token would be shared with the attacker.

Some validation on the Authentication Document's Id to ensure that it matches the URL the document is being served on will be necessary to ensure that users are not being deceived.

@HadrienGardeur
Copy link
Member

@mpdunlop would you mind posting this message on https://github.com/opds-community/drafts as an issue?

I think that this is a legitimate concern that could be addressed by adding a new property in the Authentication Document that would list all the domains where the id can be legitimately used.

@mpdunlop
Copy link
Contributor Author

Done 👍

At this stage I'm not sure how this is going to affect our discussion here.

I would still like to see Thorium re-auth the user when accessing an acquisition link that returns an OPDS Authentication Document as it would resolve our immediate issues. The token obtained during this re-authentication could be stored against the acquisition link's host without compromising security, although it isn't strictly in-line with the OPDS specification.

@HadrienGardeur
Copy link
Member

Thank you for your contribution, it's a very useful discussion under the current context where we're evaluating how Authentication for OPDS could better align with OAuth 2.1.

In order to get a bit more context, is it correct to assume that ebooks.com is working on:

  • LCP support for EPUB, PDF and audiobooks
  • would use Thorium or a fork of Thorium for desktop reading
  • is working on its own mobile apps based on Readium Mobile
  • and would use OPDS both for its own mobile apps and Thorium?

@mpdunlop
Copy link
Contributor Author

We're not planning on forking Thorium or utilizing Readium Mobile at this stage.

Currently we're hoping to implement two feature requests for partners we work with:

The first would like us to support Thorium Reader. We don't plan to fork it at this stage, and in terms of the user-experience it should be better in most scenarios, or at least similar to, our users who currently read on desktop with Adobe Digital Editions. As an aside, the main reason they've requested Thorium is for it's focus on accessibility (we agree - it's great!).

The second is planning use our OPDS feeds to load content purchased by our users into their own app. We won't have any direct control over this reader, so we've been using Thorium as a sort of testing ground for our implementation. The thinking is that if ebooks.com adheres to the OPDS 1.2/2.0 specification, utilizes LCP for DRM, and it works in Thorium, then we should be able to refer this partner to all the relevant standards and documentation for them use in their own implementation.

There are some other projects we're working on that will benefit from this work, but it's not my place to talk about those yet :)

@HadrienGardeur
Copy link
Member

Understood, that's very helpful.

The first would like us to support Thorium Reader. We don't plan to fork it at this stage, and in terms of the user-experience it should be better in most scenarios, or at least similar to, our users who currently read on desktop with Adobe Digital Editions. As an aside, the main reason they've requested Thorium is for it's focus on accessibility (we agree - it's great!).

Since you can use auto-discovery for the LCP passphrase in OPDS, the overall UX can be much better than ADE and LCP becomes completely seamless for the end-user.
LCP can also protect audiobooks, which might be less significant for a retailer called ebooks.com but who knows 😇

The second is planning use our OPDS feeds to load content purchased by our users into their own app. We won't have any direct control over this reader, so we've been using Thorium as a sort of testing ground for our implementation. The thinking is that if ebooks.com adheres to the OPDS 1.2/2.0 specification, utilizes LCP for DRM, and it works in Thorium, then we should be able to refer this partner to all the relevant standards and documentation for them use in their own implementation.

Good to know. At this point I would say that you can skip OPDS 1.2 entirely and just support 2.0, that's what a lot of organizations have been doing for the last few years.
Readium Mobile has a built-in OPDS parser which should be helpful for apps based on these toolkits. Most LCP-compatible apps are built on top of Readium Mobile.

@danielweck
Copy link
Member

Hello, @panaC what is the current status of the OPDS improvements merged to the develop branch, in relation to this particular issue? This remains unsolved, right? I will re-read this thread to make sure I am not missing anything, and let's discuss Monday afternoon during our triage concall?

@danielweck
Copy link
Member

danielweck commented Dec 13, 2024

  • If an OPDS Bookshelf that requires authentication is hosted on a domain (e.g. https://bookshelf.contoso.org/purchased.json)

  • The OAuth Server on a different domain to that, (e.g. https://auth.contoso.org/connect/authorize); and

  • An OPDS publication's acquisition link is on a different domain (e.g. https://download.contoso.org/epub/1d5235d7-044d-46b1-bc8e-913868f20003.epub)

What about implementing a cookie SameSite=Lax approach, i.e. site (Lax) vs. origin (Strict) distinction, as determined by the server's HTTP Set-Cookie response? This could be a better system than manually managing access+refresh tokens and their origin (or site) mapping in Thorium (which currently based on ad-hoc strict-origin heuristics, if I remember correctly).

https://portswigger.net/web-security/csrf/bypassing-samesite-restrictions#what-s-the-difference-between-a-site-and-an-origin

Thorium currently delegates the cookie logic to

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Dec 14, 2024

Having a separate authorization server is one of the most common deployment scenario for OAuth.

I don't think that restricting Access Tokens and Refresh Tokens to a single sub-domain is the right approach here.

This is a screenshot taken from the latest OAuth 2.1 draft:
Screenshot 2024-12-14 at 09 52 35

@danielweck
Copy link
Member

danielweck commented Dec 14, 2024

ideally the auth server sets first-party cookies for particular same-site (potentially different origin, strictly speaking) API endpoint, even restricting specific URL paths for additional security, and limiting to HTTP (no JS script access) and secure HTTPS only.
This way, Thorium or any other client can just rely on standard HTTP/cookie mechanics to pass along the access token representing the user session. The refresh token interaction can be secured in the same manner.

Absent of cookies, how do you suggest the client should implement same-site=strict/lax policy? The same-site=none approach is only suitable for not-security / not-access-control related data flows, so Thorium currently implements same-site=strict (same origin only). We could switch over to same-site=lax (subdomains allowed) with a few trivial changes, I believe, but I just want to make sure we also cover the cross-origin case if needed (third party cookies equivalent).

@HadrienGardeur
Copy link
Member

I don't think that cookies are needed at all in this case.

An OPDS client can send an Access Token either when:

  • requested by the server through the use of a 401 either serving an Authentication Document or pointing to one
  • or if there's an authenticate hint present on a link in order to avoid the usual HTTP roundtrip (request -> 401 -> new request containing a token or credentials)

@danielweck
Copy link
Member

Sure, so absent of a cookie jar from which to automatically transmit access credentials or session data, what secure heuristics do you recommend the client implements in order to determine when to include a specific auth bearer header (for example in a publication download HTTP(S) GET request, when the access token was obtained from a different site, or maybe even a different origin)?
I am still not clear what to do when no access control information is available: we only receive access/refresh tokens, which by their very definition are useable by the bearer without proof of possession, and no indication of intended utilisation context.
My suggestion is (in cookie analogous language) to downgrade same-site=strict to same-site=lax, which would solve the OG problem (subdomains) but wouldn't resolve true cross-origin auth. In this case, what do you recommend?
@panaC am I getting this right? (just want to make sure I am not mis-remembering how we handle access tokens in Thorium, outside of cookie mechanics)

@danielweck
Copy link
Member

@HadrienGardeur to be clear I am asking about cases when there is no 'authenticate' on the 'download' link.

@danielweck
Copy link
Member

@mpdunlop could you please share an OPDS feed that demonstrates the problem Thorium needs to solve? This would help with testing and validating the implementation, thank you :)

@mpdunlop
Copy link
Contributor Author

mpdunlop commented Dec 16, 2024

@danielweck I can't make it available publicly, but I can provide access to our testing environment. If you email me at [email protected], I'll send you the necessary details.

In the meantime, here's the authentication flow where the issue occurs:

  1. User adds OPDS feed: https://bookshelf.example.com/v1/bookshelf
  2. Thorium receives a 401 Unauthorized with an OPDS Authentication document:
{
    "id": "opds-auth.example.com",
    "title": "Authentication Required",
    "authentication": [
        {
            "type": "http://opds-spec.org/auth/oauth/implicit",
            "links": [
                {
                    "href": "https://opds-auth.example.com/connect/authorize?scope=bookshelf%20download",
                    "rel": "authenticate",
                    "type": "text/html"
                }
            ]
        }
    ],
    "description": "You must be signed in to view your bookshelf.",
    "links": [
        {
            "href": "https://www.example.com/logo.png",
            "rel": "logo",
            "type": "image/png",
            "title": "Logo",
            "height": 46,
            "width": 240
        },
        {
            "href": "mailto:[email protected]",
            "rel": "help"
        },
        {
            "href": "https://support.example.com/",
            "rel": "help",
            "type": "text/html"
        }
    ]
}
  1. User authenticates via OAuth 2.0 Implicit Authentication → Thorium receives access_token
  2. Thorium successfully retrieves OPDS 2.0 feed from https://bookshelf.example.com/v1/bookshelf:
{
    "metadata": {
        "title": "User's Bookshelf",
        "numberOfItems": 1
    },
    "links": [
        {
            "href": "https://bookshelf.example.com/v1/bookshelf",
            "type": "application/opds+json",
            "rel": "self"
        }
    ],
    "publications": [
        {
            "metadata": {
                "identifier": "https://www.example.com/book/123456/",
                "title": "Example Book",
                "published": "2020-01-01T00:00:00",
                "language": "en",
                "author": {
                    "name": "John Doe",
                    "identifier": "https://www.example.com/en-au/author/jon-doe/123456",
                    "sortAs": "Doe, John",
                    "role": "author"
                },
                "publisher": [
                    {
                        "name": "Example Publisher",
                        "role": "Publisher"
                    }
                ],
                "subject": [
                    {
                        "name": "MATHEMATICS > General",
                        "sortAs": "MATHEMATICS > General",
                        "scheme": "https://www.bisg.org/#bisac",
                        "code": "MAT000000"
                    }
                ],
                "description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam in tellus ac mauris convallis mattis. Sed sit amet eleifend turpis. Cras ac magna feugiat, placerat massa a, pulvinar tortor."
            },
            "links": [
                {
                    "properties": {
                        "lcp_hashed_passphrase": "MV9b23bQeMQ7isAGTkoBZGErH853yGk0W/yUx1iU7dM=",
                        "indirectAcquisition": [
                            {
                                "type": "application/pdf"
                            }
                        ]
                    },
                    "href": "https://download.example.com/v1/download/123456/Pdf/lcpl/",
                    "type": "application/vnd.readium.lcp.license.v1.0+json",
                    "title": "Download PDF",
                    "rel": "http://opds-spec.org/acquisition",
                    "fileInfo": {
                        "hashValue": "315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3",
                        "hashAlgorithm": "http://www.w3.org/2001/04/xmlenc#sha256"
                    }
                }
            ],
            "images": [
                {
                    "href": "https://image.example.com/cover/123456.jpg",
                    "type": "image/jpeg",
                    "height": 250,
                    "width": 166
                }
            ]
        }
    ]
}
  1. User selects a publication and clicks "Import"
  2. Thorium requests the publication from the acquisition link: https://download.example.com/v1/download/123456/Pdf/lcpl/
  3. Thorium does not include the previously obtained access_token in this request, presumably because this is a different subdomain.
  4. The server replies with a 401 Unauthorized response with the same Authentication document as step 2, which the exception of the description field which is context sensitive:
    • The Bookshelf API returns a message saying you must sign in to access your bookshelf
    • The Download API returns a message saying you must sign in to download books
    • In both cases, the document's id remains: opds-auth.example.com
  5. Issue: Thorium doesn't retry with the access_token (which I would expect since the OPDS Authentication document id is identical) nor does Thorium prompt for re-authentication. Instead, it fails with a 401 error in the UI.

Note: Rather than hosting a static OPDS Authentication document at a specific URI, we return it as part of a 401 Unauthorized response whenever a resource is accessed without proper authorization. This follows the requirements specified in the draft Authentication for OPDS 1.0. If we were to host one, we could utilize Link-Level Hints for Authentication, but this appears to be an optional approach (usage of may vs must in the specification) so we didn't think it would be necessary.

@danielweck
Copy link
Member

issue: Thorium doesn't retry with the access_token (which I would expect since the OPDS Authentication document id is identical)

Couldn't an attacker replay / reuse an existing auth access token against a known ID? That being said, a subdomain is essentially the same site (albeit not the same origin), so Thorium could adopt a same-site:lax approach instead of it's current same-site:strict policy, which would solve the security problem of relying on a mapping with document IDs.

nor does Thorium prompt for re-authentication. Instead, it fails with a 401 error in the UI.

That's clearly a bug, we must fix this auth flow. @panaC I guess the download phase is not wired like the OPDS "browse" phase.

@mpdunlop we would love to be able to access your auth'ed OPDS feed. My email is [email protected]

@mpdunlop
Copy link
Contributor Author

Couldn't an attacker replay / reuse an existing auth access token against a known ID?

Yes, this is correct. The suggestions in the discussion thread would effectively mitigate this risk - and would be our preferred approach. However, since these are only suggestions at this stage, we can't rely on them being implemented by OPDS clients.

That being said, a subdomain is essentially the same site (albeit not the same origin), so Thorium could adopt a same-site:lax approach instead of it's current same-site:strict policy, which would solve the security problem of relying on a mapping with document IDs.

While this approach would solve our immediate problem, it could potentially interfere with OPDS providers who use multiple authentication systems across different subdomains (e.g., auth.customer.storefront.com and auth.staff.storefront.com). Though this would be an edge case, it's worth considering.

I believe @HadrienGardeur's suggestion in opds-community provides the most robust solution: explicitly listing trusted domains in the OPDS Authentication document and requiring authentication documents to be served from the same domain as their ID. This approach is both easier to explain and more secure.

Given that the trusted domains feature isn't yet specified, would adopting same-site:lax serve as a reasonable temporary solution?

@panaC
Copy link
Member

panaC commented Dec 18, 2024

That's clearly a bug, we must fix this auth flow. @panaC I guess the download phase is not wired like the OPDS "browse" phase.

Indeed the download part is entrusted to publication/importLink API endpoint and not to the opds flow.

verifyImport: (...data: Parameters<typeof importActions.verify.build>) => {
dispatch(dialogActions.closeRequest.build());
dispatch(importActions.verify.build(...data));
},

yield apiSaga("publication/importFromLink",
REQUEST_ID,
link,
pub,
);

export function* importFromLinkService(
link: IOpdsLinkView,
pub?: IOpdsPublicationView,
): SagaGenerator<[publicationDocument: PublicationDocument | undefined, alreadyImported: boolean]> {
let url: URL;
try {
url = new URL(link?.url);
} catch (e) {
debug("bad url", link, e);
throw new Error("Unable to get acquisition url from opds publication");
}
if (!link.type) {
try {
const response = yield* callTyped(() => nodeFetch(url.toString()));
const contentType = response?.headers?.get("Content-Type");
if (contentType) {
link.type = contentType;
} else {
link.type = "";
}
} catch (_e) {
debug("can't fetch url to determine the type", url.toString());
link.type = "";
}
}
const contentTypeArray = link.type.replace(/\s/g, "").split(";");
const title = link.title || link.url;
const isLcpFile = contentTypeArray.includes(ContentType.Lcp);
const isEpubFile = contentTypeArray.includes(ContentType.Epub);
const isAudioBookPacked = contentTypeArray.includes(ContentType.AudioBookPacked);
const isAudioBookPackedLcp = contentTypeArray.includes(ContentType.AudioBookPackedLcp);
const isHtml = contentTypeArray.includes(ContentType.Html);
const isDivinaPacked = contentTypeArray.includes(ContentType.DivinaPacked);
const isPdf = contentTypeArray.includes(ContentType.pdf);
const isLcpPdf = contentTypeArray.includes(ContentType.lcppdf);
const isJson = contentTypeArray.includes(ContentType.Json)
|| contentTypeArray.includes(ContentType.AudioBook)
|| contentTypeArray.includes(ContentType.JsonLd)
|| contentTypeArray.includes(ContentType.Divina)
|| contentTypeArray.includes(ContentType.webpub);
debug(contentTypeArray, isHtml, isJson);
if (!isLcpFile && !isEpubFile && !isAudioBookPacked && !isAudioBookPackedLcp && !isDivinaPacked && !isPdf && !isLcpPdf) {
debug(`OPDS download link is not EPUB or AudioBook or Divina or Pdf ! ${link.url} ${link.type}`);
}
if (isHtml || isJson) {
link = { url: url.toString() };
}
const downloadMayBePackageLink = function*() {
if (isHtml || isJson) {
debug("the link need to be packaged");
return yield* callTyped(packageFromLink, url.toString(), isHtml);
} else {
debug("Start the download", link);
const [downloadPath] = yield* callTyped(downloader, [{ href: link.url, type: link.type }], title);
return downloadPath;
}
};
const fileOrPackagePath = yield* callTyped(downloadMayBePackageLink);
if (fileOrPackagePath) {
return yield* callTyped(importLinkFromPath, fileOrPackagePath, link, pub);
} else {
debug("downloaded file path or package path is empty");
}
return [undefined, false];
}

@danielweck
Copy link
Member

Thank you @panaC for investigating in this PR: #2732

@HadrienGardeur
Copy link
Member

I believe @HadrienGardeur's suggestion in opds-community provides the most robust solution: explicitly listing trusted domains in the OPDS Authentication document and requiring authentication documents to be served from the same domain as their ID. This approach is both easier to explain and more secure.

I plan on opening up a PR next week focusing on improvements to the OPDS Authentication draft:

  • addition of the authenticate hint to the draft
  • same-domain requirement for serving Authentication Documents
  • new property for listing authorized domains

However, since these are only suggestions at this stage, we can't rely on them being implemented by OPDS clients.

While we can't influence all OPDS clients, we can have an impact on a number of them through Readium projects. It's already in our plans to re-work our OPDS support on iOS in 2025 with full support for Authentication for OPDS.

@danielweck
Copy link
Member

Thorium now updated with the auth flow fix implemented by @panaC (download from a different origin now triggers re-auth, credentials stored as usual for this domain so only asks once)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants