Reranking API #1066

yanxi0830 · 2025-02-12T18:33:10Z

🚀 Describe the new functionality needed

Our current retrieval stage for RAG only indexing and retrieve preliminary results from embedding-based retrieval system.

It's common practice to apply re-ranking on top of retrieved candidates in RAG workflow.

Reranking Providers

Remote

Inline

sentence-transformers
Using LLM inference as a reranker

API Reference

@json_schema_type
class RerankedDocument(BaseModel):
    """A single reranked document.

    :param index: The index of the document in the original list
    :param relevance_score: The relevance score of the document
    """

    index: int
    relevance_score: float


@json_schema_type
class RerankResponse(BaseModel):
    """Response containing reranked documents."""

    reranked_documents: List[RerankedDocument]
    metadata: Dict[str, Any]

@webmethod(route="/inference/rerank", method="POST")
    async def rerank(
        self,
        model_id: str,
        query: str,
        documents: List[str],
        top_n: Optional[int] = None,
    ) -> RerankResponse:
        """Rerank a list of documents based on a query.

        :param model_id: The identifier of the model to use. The model must be a reranking model registered with Llama Stack and available via the /models endpoint.
        :param query: The search query
        :param documents: List of text that will be compared to the query to
        :param top_n: (Optional) The number of returned rerank results. If not specified, all rerank results will be returned.
        :returns: A list of reranked documents
        """
        ...

NOTE: API above serves as reference on existing providers entry point. We should think about whether to incorporate reranking as a Tool or as an Inference endpoint.

💡 Why is this needed? What if we don't build it?

Reranking bring values to RAG workflows to improve performance.

Other thoughts

Relevance score from reranking can be integrated with Telemetry to provide additional observability and performance tuning.

kevincogan · 2025-02-19T14:01:44Z

Hi @yanxi0830 and team,

I'd love to contribute to implementing the Reranking API. Before I start, I wanted to check in and see if any work has already been done on this. Are there any ongoing discussions or branches related to reranking? Also, would you be open to collaboration if someone else has started? Looking forward to your thoughts! Thanks 😄

varshaprasad96 · 2025-02-19T19:38:07Z

Hi @yanxi0830, I would be interested in implementing it.

Adding some thoughts - it would be helpful to implement reranking as a part of existing RAGRuntime class. We could also have a knob to ensure that it is enabled only when required.

Introduce RAGRerankResult as a separate schema in https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/tools/rag_tool.py
Modify query() to support reranking if required. On the lines of:

@runtime_checkable
@trace_protocol
class RAGToolRuntime(Protocol):
    @webmethod(route="/tool-runtime/rag-tool/query", method="POST")
    async def query(
        self,
        content: InterleavedContent,
        vector_db_ids: List[str],
        query_config: Optional[RAGQueryConfig] = None,
        rerank: bool = True | False, 
    ) -> RAGQueryResult:
        ...

Add a rerank method in RAGRuntime

    @webmethod(route="/tool-runtime/rag-tool/rerank", method="POST") 
    async def rerank(
        self,
        query: InterleavedContent,
        retrieved_docs: List[RAGDocument],
        reranker_model: str = "xxxx",
        top_k: int = 5,
    ) -> RAGRerankResult:
        """Re-rank retrieved documents based on relevance"""
        ...

This way when a user calls query() they get best-ranked outputs by default, providing flexibility to opt out if needed. With this implementing reranking providers would also be easier. Just want to get some thoughts on this approach.

varshaprasad96 · 2025-02-19T19:39:33Z

@kevincogan sorry, missed your message. Would be open for collaboration :)

kevincogan · 2025-02-19T20:38:59Z

Hey @yanxi0830 and @varshaprasad96 ,
Given the scope of the reranking API, I think it might be beneficial to create a draft RFC before diving into implementation. This would help us:

Align on whether reranking should be a Tool or an Inference endpoint.
Define the API structure and schema.
Decide on provider integration and how reranking fits into RAGRuntime.

I’m happy to take the lead on drafting this RFC and can include:

An overview of the problem and the proposed solution.
API design, including endpoint structure and schemas.
Considerations for performance, flexibility, and testing.

Let me know if you think this is a good approach. If so, I can get started and share a draft within the next few days for feedback.

And of course, always open for collaboration @varshaprasad96 ! 😄

yanxi0830 · 2025-02-19T22:23:13Z

@kevincogan @varshaprasad96 Thanks for the interest! Very excited about it!

@kevincogan Yes, the points you've mentioned are very important, please draft an RFC for the API design for discussion before diving into the implementation. Looking forward to it!

Some additional considerations:

How to utilize & showcase reranking API to enhance RAG performance.
Integrate reranking results (e.g. relevance scores) with telemetry for better observability.

kevincogan · 2025-02-20T15:21:47Z

Sounds great, thanks! @yanxi0830

kevincogan · 2025-02-24T13:58:24Z

Would it be possible to add me as one of the assignees while I am working on the RFC for this ? @yanxi0830

yanxi0830 added enhancement New feature or request RAG Relates to RAG functionality of the agents API labels Feb 12, 2025

yanxi0830 modified the milestones: RAG, v0.1 Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reranking API #1066

Reranking API #1066

yanxi0830 commented Feb 12, 2025 •

edited

Loading

kevincogan commented Feb 19, 2025

varshaprasad96 commented Feb 19, 2025 •

edited

Loading

varshaprasad96 commented Feb 19, 2025

kevincogan commented Feb 19, 2025

yanxi0830 commented Feb 19, 2025

kevincogan commented Feb 20, 2025

kevincogan commented Feb 24, 2025 •

edited

Loading

Reranking API #1066

Reranking API #1066

Comments

yanxi0830 commented Feb 12, 2025 • edited Loading

🚀 Describe the new functionality needed

Reranking Providers

API Reference

💡 Why is this needed? What if we don't build it?

Other thoughts

kevincogan commented Feb 19, 2025

varshaprasad96 commented Feb 19, 2025 • edited Loading

varshaprasad96 commented Feb 19, 2025

kevincogan commented Feb 19, 2025

yanxi0830 commented Feb 19, 2025

kevincogan commented Feb 20, 2025

kevincogan commented Feb 24, 2025 • edited Loading

yanxi0830 commented Feb 12, 2025 •

edited

Loading

varshaprasad96 commented Feb 19, 2025 •

edited

Loading

kevincogan commented Feb 24, 2025 •

edited

Loading