Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middleware process_resource called before 405 Method Not Allowed #2433

Open
numberoverzero opened this issue Feb 20, 2025 · 4 comments
Open

Comments

@numberoverzero
Copy link

Thanks for such a great framework! I've been happily using Falcon for a few years, and just ran into what was the first surprising (at least to me) behavior. I'd like help understanding if this is intended, and if so any recommendations for working around it.

Repro

import falcon
import falcon.testing

class Middleware:
    calls = 0
    def process_resource(self, req, resp, resource, params):
        self.calls += 1

class Resource:
    calls = 0
    def on_get(self, req, resp, **params)
        self.calls += 1

app = falcon.App()
app.add_route("/path", resource := Resource())
app.add_middleware(middleware := Middleware())

client = falcon.testing.TestClient(app)
assert client.simulate_get("/path").status == "200 OK"
assert client.simulate_post("/path").status == "405 Method Not Allowed"
assert resource.calls == 1
assert middleware.calls == 2

print("why was process_resource called the second time if Resource.on_post isn't defined?")

Expectation

I expected that process_resource would not be called since there is no route to the resource for the POST method (but there is a route to the resource for some methods). This can break middleware that expects process_resource to only be called when the request is going to land on (some method) of the resource passed to it.

As a concrete example, our authentication is based on adding tags to resources. To ensure we fail closed when someone forgets to annotate a class, the middleware throws an error if the passed in resource doesn't have a strategy tag. Something like:

class ExampleAuthMiddleware:
    def process_resource(self, req, resp, resource, params):
        if not (strategy := get_tags(req, resource, "auth:strategy")):
            raise error.ProgrammingError(
                f"resource {resource} must define an auth strategy for {req.method} {req.uri_template}")
        self.strategies[strategy].authenticate(req, resp, resource, params)

When a client tries to use an unsupported method for an existing route, get_tags will fail to find any tags on the corresponding methods of the resource, the middleware raises a ProgrammingError which we catch (via app.add_error_handler) and then return to the caller as a 500 Internal Error, instead of a more helpful 405 Method Not Allowed. It would be nice for the client to get the 405 (and for our alarms to not get the 500).

References

Workaround ideas

Based on this note I thought I could try wrapping App._get_responder to check if the first value of the tuple was one of the known responders.path_not_found or responders.bad_request but then realized each route's method map is filled with a local function, so I won't be able to check that based on identity. I considered then also subclassing and overriding CompiledRouter.add_route but the amount of logic in there after the call to set_default_responders scared me off.

Is there a much easier way forward that I'm missing?

@vytas7
Copy link
Member

vytas7 commented Feb 20, 2025

Hi @numberoverzero!
First of all, on behalf of the maintainer team, thank you for your kind words about Falcon!

Personally I would say that this behaviour is intended, although I can see why it is neither very intuitive nor convenient when one thinks from the perspective of your use case.
In RESTful API design, a resource is usually denoted by its URL. So if the server application can find a resource at the requested URL, IMHO, it is fair that it is present when calling the process_resource methods.

I agree that the details are not clearly documented at the moment. We also have an open issue to document the how and why of default responders: #735.

Let me check what the most elegant workaround could be for your use case.

@numberoverzero
Copy link
Author

Thanks for looking into this! I agree that by RESTful design there is a resource behind the URL; I'd just prefer not to run middleware once the route has been decided and we know there's not a supported method.

If 405 was raised on line 420 instead of waiting to invoke middleware, then the middleware handling would be consistent between 404 and 405.

Extending the original example to highlight that difference:

assert client.simulate_get("/path").status == "200 OK"
assert resource.calls == 1
assert middleware.calls == 1  # called middleware - OK

assert client.simulate_get("/unknown").status == "404 Not Found"
assert resource.calls == 1
assert middleware.calls == 1  # did not call middleware - OK

assert client.simulate_post("/path").status == "405 Method Not Allowed"
assert resource.calls == 1
assert middleware.calls == 2  # called middleware - ???

@vytas7
Copy link
Member

vytas7 commented Feb 20, 2025

Hi again,
one way to tackle this could be to simply expose more routing information. It is actually something we are looking to add to the framework itself in the 4.x cycle, e.g, req.route or similar that would point to a route object holding attributes like URI template, method map, suffix, OpenAPI path spec, etc.

For the time being, you could subclass the default router to at least surface the current responder (note that we manually extract the method map once again using falcon.routing.map_http_methods(), so we only add the responders that actually exist in your resource, not the default ones that are set separately).

We opt to store the currently routed responder in req.context.responder:

import logging

import falcon
import falcon.routing
import falcon.testing

logging.basicConfig(
    format='%(asctime)s [%(levelname)s] %(message)s', level=logging.INFO
)


class CustomRouter(falcon.routing.CompiledRouter):
    def __init__(self):
        super().__init__()
        self._known_responders = {}

    def get_responder(self, uri_template, method):
        return self._known_responders.get((uri_template, method))

    def add_route(self, uri_template, resource, **kwargs):
        super().add_route(uri_template, resource, **kwargs)

        suffix = kwargs.get('suffix')
        method_map = falcon.routing.map_http_methods(resource, suffix=suffix)
        for method, responder in method_map.items():
            self._known_responders[(uri_template, method)] = responder


class RoutingMiddleware:
    def __init__(self, router):
        self._router = router

    def process_request(self, req, resp):
        req.context.responder = None

    def process_resource(self, req, resp, resource, params):
        req.context.responder = self._router.get_responder(req.uri_template, req.method)
        if req.context.responder is not None:
            logging.info(f'routed to responder: {req.context.responder.__name__}')


class Middleware:
    all_calls = 0
    calls = 0

    def process_resource(self, req, resp, resource, params):
        logging.info(f'process_resource: {resource=} {params=}')

        self.all_calls += 1
        if req.context.responder is not None:
            self.calls += 1


class Resource:
    def __init__(self, mw):
        self._mw = mw
        self.calls = 0
        self.message = 'Hello, World!\n'

    def on_get(self, req, resp, **params):
        self.calls += 1

        resp.media = {
            'middleware': {
                'all_calls': self._mw.all_calls,
                'calls': self._mw.calls,
            },
            'resource': {
                'calls': self.calls,
            },
        }

    def on_get_message(self, req, resp):
        resp.context_type = falcon.MEDIA_TEXT
        resp.text = self.message

    def __repr__(self):
        return f'Resource<{self.message!r}>'


router = CustomRouter()
routing_mw = RoutingMiddleware(router)
mw = Middleware()
resource = Resource(mw)

app = falcon.App(router=router, middleware=[routing_mw, mw])
app.add_route('/path', resource)
app.add_route('/message', resource, suffix='message')

So now when I perform a GET request on /path twice, and try DELETE once inbetween, that is reflected in my call counts:

GET http://localhost:8000/path

HTTP/1.1 200 OK
Content-Length: 70
Content-Type: application/json

{
    "middleware": {
        "all_calls": 3,
        "calls": 2
    },
    "resource": {
        "calls": 2
    }
}

I know that subclassing the default router is a bit awkward, we are actually planning improvements in that direction too: #2219.

You can also check out the routing discussion that I started in our Discussions: #1818.

@numberoverzero
Copy link
Author

numberoverzero commented Feb 21, 2025

That is excellent, thank you! Just calling map_http_methods again is a clean solution and sidesteps all the patching of CompiledRouter I was worried about.

I've adjusted your impl slightly to avoid using another middleware, and since the router can access the req object it seemed ok to update the context there.

# falcon_ext.py
from __future__ import annotations

import typing

import falcon
import falcon.routing
import falcon.routing.util

if typing.TYPE_CHECKING:
    from falcon._typing import MethodDict  # NOTE: not a public type, but the router interface is public

    FindResult: typing.TypeAlias = tuple[object, MethodDict, dict[str, typing.Any], str | None] | None


class KnownResponderRouter(falcon.routing.DefaultRouter):
    """
    raises 405 during `.find()` if the resource exists but that resource does not define
    a corresponding `on_<method>`.  this prevents middleware from running `process_resource()`.
    this causes identical middleware invocation across 404 and 405.

    see: https://github.com/falconry/falcon/issues/2433
    """

    def __init__(self):
        super().__init__()
        self._known_responders: dict[tuple[str | None, str], t.Any] = {}

    def add_route(self, uri_template, resource, **kwargs):
        super().add_route(uri_template, resource, **kwargs)

        method_map = falcon.routing.util.map_http_methods(resource, suffix=kwargs.get("suffix"))
        for method, responder in method_map.items():
            self._known_responders[(uri_template, method)] = responder

    def find(self, uri: str, req: falcon.Request | None = None) -> FindResult:
        # req doesn't have uri_template yet, so we need to get it from super
        if (resp := super().find(uri, req)) and req:
            (_resource, method_map, _params, uri_template) = resp
            known_responder = self._known_responders.get((uri_template, req.method))
            req.context.responder = known_responder
            if not known_responder:
                # not very RESTful...
                raise falcon.HTTPMethodNotAllowed([])
        return resp

And to use the router:

import falcon
from falcon_ext import KnownResponderRouter

auth_middleware = AuthenticationMiddleware(...)  # never invoked for 405 unknown method
public_api = App(
    middleware=[auth_middleware],
    router=KnownResponderRouter()
)

The issue and discussion you linked both look relevant. Of the two I think that focusing on ease of subclassing CompiledRouter reduces the need for by-default attaching route or responder info to the req object. If it's simple and obvious enough how to subclass the router, then a recipe in the User Guide may be sufficient, with pointers into that recipe from other parts of the docs (eg. from Request, Middleware sections). And that could avoid the global perf hit for users that don't need the req.route object.

Thanks again for the detailed solution and reasoning!


(edit 2025/02/20 - fixed KnownResponderRouter.find impl to get uri_template from super().find)

(edit 2025/02/20 - simplified KnownResponderRouter sample code for others to copy/paste)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants