Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reranking API #1066

Open
yanxi0830 opened this issue Feb 12, 2025 · 7 comments
Open

Reranking API #1066

yanxi0830 opened this issue Feb 12, 2025 · 7 comments
Labels
enhancement New feature or request RAG Relates to RAG functionality of the agents API

Comments

@yanxi0830
Copy link
Contributor

yanxi0830 commented Feb 12, 2025

🚀 Describe the new functionality needed

Our current retrieval stage for RAG only indexing and retrieve preliminary results from embedding-based retrieval system.

It's common practice to apply re-ranking on top of retrieved candidates in RAG workflow.

Reranking Providers

Remote

Inline

  • sentence-transformers
  • Using LLM inference as a reranker

API Reference

@json_schema_type
class RerankedDocument(BaseModel):
    """A single reranked document.

    :param index: The index of the document in the original list
    :param relevance_score: The relevance score of the document
    """

    index: int
    relevance_score: float


@json_schema_type
class RerankResponse(BaseModel):
    """Response containing reranked documents."""

    reranked_documents: List[RerankedDocument]
    metadata: Dict[str, Any]

@webmethod(route="/inference/rerank", method="POST")
    async def rerank(
        self,
        model_id: str,
        query: str,
        documents: List[str],
        top_n: Optional[int] = None,
    ) -> RerankResponse:
        """Rerank a list of documents based on a query.

        :param model_id: The identifier of the model to use. The model must be a reranking model registered with Llama Stack and available via the /models endpoint.
        :param query: The search query
        :param documents: List of text that will be compared to the query to
        :param top_n: (Optional) The number of returned rerank results. If not specified, all rerank results will be returned.
        :returns: A list of reranked documents
        """
        ...

NOTE: API above serves as reference on existing providers entry point. We should think about whether to incorporate reranking as a Tool or as an Inference endpoint.

💡 Why is this needed? What if we don't build it?

  • Reranking bring values to RAG workflows to improve performance.

Other thoughts

  • Relevance score from reranking can be integrated with Telemetry to provide additional observability and performance tuning.
@yanxi0830 yanxi0830 added enhancement New feature or request RAG Relates to RAG functionality of the agents API labels Feb 12, 2025
@yanxi0830 yanxi0830 modified the milestones: RAG, v0.1 Feb 12, 2025
@kevincogan
Copy link
Contributor

Hi @yanxi0830 and team,

I'd love to contribute to implementing the Reranking API. Before I start, I wanted to check in and see if any work has already been done on this. Are there any ongoing discussions or branches related to reranking? Also, would you be open to collaboration if someone else has started? Looking forward to your thoughts! Thanks 😄

@varshaprasad96
Copy link

varshaprasad96 commented Feb 19, 2025

Hi @yanxi0830, I would be interested in implementing it.

Adding some thoughts - it would be helpful to implement reranking as a part of existing RAGRuntime class. We could also have a knob to ensure that it is enabled only when required.

  1. Introduce RAGRerankResult as a separate schema in https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/tools/rag_tool.py
  2. Modify query() to support reranking if required. On the lines of:
@runtime_checkable
@trace_protocol
class RAGToolRuntime(Protocol):
    @webmethod(route="/tool-runtime/rag-tool/query", method="POST")
    async def query(
        self,
        content: InterleavedContent,
        vector_db_ids: List[str],
        query_config: Optional[RAGQueryConfig] = None,
        rerank: bool = True | False, 
    ) -> RAGQueryResult:
        ...
  1. Add a rerank method in RAGRuntime
    @webmethod(route="/tool-runtime/rag-tool/rerank", method="POST") 
    async def rerank(
        self,
        query: InterleavedContent,
        retrieved_docs: List[RAGDocument],
        reranker_model: str = "xxxx",
        top_k: int = 5,
    ) -> RAGRerankResult:
        """Re-rank retrieved documents based on relevance"""
        ...

This way when a user calls query() they get best-ranked outputs by default, providing flexibility to opt out if needed. With this implementing reranking providers would also be easier. Just want to get some thoughts on this approach.

@varshaprasad96
Copy link

@kevincogan sorry, missed your message. Would be open for collaboration :)

@kevincogan
Copy link
Contributor

Hey @yanxi0830 and @varshaprasad96 ,
Given the scope of the reranking API, I think it might be beneficial to create a draft RFC before diving into implementation. This would help us:

  • Align on whether reranking should be a Tool or an Inference endpoint.
  • Define the API structure and schema.
  • Decide on provider integration and how reranking fits into RAGRuntime.

I’m happy to take the lead on drafting this RFC and can include:

  • An overview of the problem and the proposed solution.
  • API design, including endpoint structure and schemas.
  • Considerations for performance, flexibility, and testing.

Let me know if you think this is a good approach. If so, I can get started and share a draft within the next few days for feedback.

And of course, always open for collaboration @varshaprasad96 ! 😄

@yanxi0830
Copy link
Contributor Author

@kevincogan @varshaprasad96 Thanks for the interest! Very excited about it!

@kevincogan Yes, the points you've mentioned are very important, please draft an RFC for the API design for discussion before diving into the implementation. Looking forward to it!

Some additional considerations:

  • How to utilize & showcase reranking API to enhance RAG performance.
  • Integrate reranking results (e.g. relevance scores) with telemetry for better observability.

@kevincogan
Copy link
Contributor

Sounds great, thanks! @yanxi0830

@kevincogan
Copy link
Contributor

kevincogan commented Feb 24, 2025

Would it be possible to add me as one of the assignees while I am working on the RFC for this ? @yanxi0830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RAG Relates to RAG functionality of the agents API
Projects
None yet
Development

No branches or pull requests

3 participants