Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response Relevancy: TypeError: object of type 'StringPromptValue' has no len() #1892

Open
ishachinniah-hds opened this issue Jan 30, 2025 · 9 comments
Assignees
Labels
answered 🤖 The question has been answered. Will be closed automatically if no new comments bug Something isn't working module-metrics this is part of metrics module

Comments

@ishachinniah-hds
Copy link

ishachinniah-hds commented Jan 30, 2025

[✓ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
When running the ResponseRelevancy metric on my query(user_input) and generated llm answer (response), I get a type error relating to the use of 'StringPromptValue' which I am not using anywhere.

Ragas version: 0.2.12
Python version: 3.9.20

Code to Reproduce
async def evaluate(context, response, query):
"""
Run LLM response evaluations for several criteria
"""
eval_llm = evaluator_llm()
eval_embeddings = HuggingFaceEmbeddings(model_name=embedding_model)

# RAGAS EVALUATIONS
sample = SingleTurnSample(
    user_input=query,
    response=response,
    retrieved_contexts=context
)

# Response Relevancy
scorer = ResponseRelevancy(llm=eval_llm, embeddings=eval_embeddings)
output = await scorer.single_turn_ascore(sample)
return output

Error trace
output = await scorer.single_turn_ascore(sample)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 541, in single_turn_ascore
raise e
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 534, in single_turn_ascore
score = await asyncio.wait_for(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 134, in _single_turn_ascore
return await self._ascore(row, callbacks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 148, in _ascore
responses = await asyncio.gather(*tasks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
future.result()
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 256, in __step
result = coro.send(None)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 127, in generate
output_single = await self.generate_multiple(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 188, in generate_multiple
resp = await llm.generate(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_core/language_models/chat_models.py", line 684, in generate
batch_size=len(messages),
TypeError: object of type 'StringPromptValue' has no len()

Expected behavior
I am expecting a Response Relevancy score to be outputted by the end of the function call.

Additional context
For context I also printed out the object types for the response, query, and context variables I am sending in as arguments and none are 'StringPromptValue':
Context Type: <class 'list'>
Response Type: <class 'str'>
Query Type: <class 'str'>

Thank you, let me know what additional information would be insightful.

@ishachinniah-hds ishachinniah-hds added the bug Something isn't working label Jan 30, 2025
@dosubot dosubot bot added the module-metrics this is part of metrics module label Jan 30, 2025
@sahusiddharth
Copy link
Collaborator

Hi @ishachinniah-hds,

I believe the error you're encountering might be related to the LLM model you're using. Could you kindly let me know which model you're working with and how you're initializing it?

@ishachinniah-hds
Copy link
Author

ishachinniah-hds commented Jan 31, 2025

Hi @sahusiddharth,

I have tried using LLMs provided by Hugging Face (specifically the meta-llama/Llama-3.2-3B-Instruct model) as well as AzureOpenAI (specifically the gpt-4o-mini model)

Azure OpenAI LLM model initialization

## create LLM instance for Azure Open AI
def evaluator_llm():
    """
    Load the Azure OpenAI LLM model for evaluation
    """
    # Define Azure ML properties
    os.environ["OPENAI_API_TYPE"] = openai_api_type
    os.environ["OPENAI_API_VERSION"] = openai_api_version
    os.environ["OPENAI_API_KEY"] = openai_api_key
    os.environ["AZURE_OPENAI_ENDPOINT"] = azure_openai_endpoint

    # Init the Azure OpenAI model
    llm = AzureChatOpenAI(
        deployment_name = openai_deployment_name,
        model = openai_model,
        temperature=0.1,
        max_tokens=256
    )
    return llm

Hugging Face LLM model initialization

## create LLM instance for Hugging face LLM 
def evaluator_llm():
    """
    Load the Hugging Face LLM model for evaluation
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    hf_pipeline = pipeline(
        "text-generation", 
        model=model, 
        tokenizer=tokenizer, 
        model_kwargs={"torch_dtype": torch.bfloat16}, 
        device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=256
        )
    llm = HuggingFacePipeline(pipeline=hf_pipeline)
    return llm

RAGAs evaluation invocation

async def evaluate(context, response, query):
    """
    Run LLM response evaluations for several criteria
    """
    eval_llm = evaluator_llm()
    eval_embeddings = HuggingFaceEmbeddings(model_name=embedding_model)

    ## RAGAS
    sample = SingleTurnSample(
        user_input=query,
        response=response,
        # retrieved_contexts=context
    )

    # Response Relevancy
    scorer = ResponseRelevancy(llm=eval_llm, embeddings=eval_embeddings)
    output = await scorer.single_turn_ascore(sample)
    return output

Azure OpenAI model error tracback:

output = await scorer.single_turn_ascore(sample)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 541, in single_turn_ascore
raise e
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 534, in single_turn_ascore
score = await asyncio.wait_for(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 134, in _single_turn_ascore
return await self._ascore(row, callbacks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 148, in _ascore
responses = await asyncio.gather(*tasks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
future.result()
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 256, in __step
result = coro.send(None)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 127, in generate
output_single = await self.generate_multiple(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 188, in generate_multiple
resp = await llm.generate(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_core/language_models/chat_models.py", line 684, in generate
batch_size=len(messages),
TypeError: object of type 'StringPromptValue' has no len()

Hugging Face model error traceback:

output = await scorer.single_turn_ascore(sample)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 541, in single_turn_ascore
raise e
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 534, in single_turn_ascore
score = await asyncio.wait_for(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 134, in _single_turn_ascore
return await self._ascore(row, callbacks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 148, in _ascore
responses = await asyncio.gather(*tasks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
future.result()
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 256, in __step
result = coro.send(None)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 127, in generate
output_single = await self.generate_multiple(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/prompt/pydantic_prompt.py", line 188, in generate_multiple
resp = await llm.generate(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 857, in generate
raise ValueError(msg) # noqa: TRY004
ValueError: Argument 'prompts' is expected to be of type List[str], received argument of type <class 'langchain_core.prompt_values.StringPromptValue'>.

Next Steps

Where can I find the correct llm usage and compatibility to set up these evaluations? Let me know if there is anything else I can provide. Thank you!

@sahusiddharth
Copy link
Collaborator

Hi @ishachinniah-hds,

I noticed that when using the Azure OpenAI model, it is not wrapped in the ragas Langchain LLM Wrapper. You can modify your function like this to wrap it properly:

def evaluator_llm():
    """
    Load the Azure OpenAI LLM model for evaluation
    """
    # Set Azure ML properties
    os.environ["OPENAI_API_TYPE"] = openai_api_type
    os.environ["OPENAI_API_VERSION"] = openai_api_version
    os.environ["OPENAI_API_KEY"] = openai_api_key
    os.environ["AZURE_OPENAI_ENDPOINT"] = azure_openai_endpoint

    # Initialize the Azure OpenAI model
    llm = AzureChatOpenAI(
        deployment_name=openai_deployment_name,
        model=openai_model,
        temperature=0.1,
        max_tokens=256
    )
    return llm

You can wrap it with the LangchainLLMWrapper like this:

from langchain_openai.chat_models import AzureChatOpenAI
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(AzureChatOpenAI(model="gpt-4o-mini"))

Let me know if this works for you!

@ishachinniah-hds
Copy link
Author

ishachinniah-hds commented Feb 3, 2025

Hi @sahusiddharth,

Thank you for your response and the code suggestions - I followed the documentation in ragas customize model:

Code

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

azure_configs = {
    "base_url": "https://test-poc-spriha-isha.openai.azure.com/",
    "model_deployment": "gpt-4o-mini-spriha",
    "model_name": "gpt-4o-mini",
    "embedding_deployment": "text-embedding-ada-002",
    "embedding_name": "text-embedding-ada-002",  # most likely
}

azure_llm = AzureChatOpenAI(
    openai_api_version="2024-05-01-preview",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
)

azure_llm = LangchainLLMWrapper(azure_llm)
azure_embeddings = LangchainEmbeddingsWrapper(azure_embeddings)

async def evaluate(context, response, query):
    """
    Run LLM response evaluations for several criteria
    """
    ## RAGAS
    sample = SingleTurnSample(
        user_input=query,
        response=response,
        # retrieved_contexts=context
    )

    # Response Relevancy
    scorer = ResponseRelevancy(llm=azure_llm, embeddings=azure_embeddings)
    output = await scorer.single_turn_ascore(sample)
    return output

Traceback Error

I am getting the following error that I am working on debugging:

output = await scorer.single_turn_ascore(sample)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 541, in single_turn_ascore
raise e
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/base.py", line 534, in single_turn_ascore
score = await asyncio.wait_for(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 134, in _single_turn_ascore
return await self._ascore(row, callbacks)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 150, in _ascore
return self._calculate_score(responses, row)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 125, in _calculate_score
cosine_sim = self.calculate_similarity(question, gen_questions)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/metrics/_answer_relevance.py", line 99, in calculate_similarity
question_vec = np.asarray(self.embeddings.embed_query(question)).reshape(1, -1)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/ragas/embeddings/base.py", line 124, in embed_query
return self.embeddings.embed_query(text)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_openai/embeddings/base.py", line 629, in embed_query
return self.embed_documents([text])[0]
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_openai/embeddings/base.py", line 588, in embed_documents
return self._get_len_safe_embeddings(texts, engine=engine)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/langchain_openai/embeddings/base.py", line 483, in _get_len_safe_embeddings
response = self.client.create(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openlit/instrumentation/openai/init.py", line 304, in wrapper
return completion_func(wrapped, instance, args, kwargs)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openlit/instrumentation/openai/openai.py", line 451, in wrapper
response = wrapped(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openai/resources/embeddings.py", line 124, in create
return self._post(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openai/_base_client.py", line 1283, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openai/_base_client.py", line 960, in request
return self._request(
File "/opt/homebrew/anaconda3/envs/chatbot/lib/python3.9/site-packages/openai/_base_client.py", line 1064, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.'}}

Thoughts

I am unsure why my azure openAI resource is causing this issue since it works for the response generation but not the evaluation section of the code - I was curious if a new azure openAI api key is required within this evaluation code section and cannot be reused anywhere else in the code?

@sahusiddharth
Copy link
Collaborator

I was curious if a new azure openAI api key is required within this evaluation code section and cannot be reused anywhere else in the code?

I don’t think a new Azure OpenAI API key is required for this evaluation section, and it should be reusable elsewhere in the code.

However, while reviewing your error trace, I noticed the following message, which could be causing the issue:

openai.NotFoundError: Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.'}}

@ishachinniah-hds
Copy link
Author

Right, I have been trying to debug this error - it seems to suggest that there is an issue with resource deployment and usage, however, I am able to get the llm to generate a response with this resource - it only throws the error when being called for the evaluation which is what is confusing me.

@jjmachan
Copy link
Member

jjmachan commented Feb 4, 2025

@ishachinniah-hds which version are you using?

@ishachinniah-hds
Copy link
Author

Hi @jjmachan ,

I was just able to work out my issue with the Azure OpenAI resource - I had created the embedding model endpoint incorrectly. When updating it to use the same endpoint url as the llm the issue was resolved. Thank you for the help and support.

@sahusiddharth sahusiddharth added the answered 🤖 The question has been answered. Will be closed automatically if no new comments label Feb 5, 2025
@mail2sachinkhanna
Copy link

I am also getting the same issue when doing sample SQL evaluation for checking the LLMSQLEquivalence

Error during batch scoring: object of type 'StringPromptValue' has no len()
Error initializing Bedrock model: object of type 'StringPromptValue' has no len()

Same is taken from
https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/sql/#sql-query-semantic-equivalence

from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample
from langchain_aws import ChatBedrock
import asyncio
from asyncio import TimeoutError, CancelledError
from langchain_community.embeddings import BedrockEmbeddings

from random import sample
from langchain_aws import BedrockLLM

async def process_samples(scorer, sample):
try:
return await scorer.single_turn_ascore(sample)
except Exception as e:
print(f"Error during batch scoring: {str(e)}")
raise

async def main(scorer, sample):
result = await process_samples(scorer, sample)
return result

if name == "main":

try:

    config = {
    "credentials_profile_name": "default",  # E.g "default"
    "region_name": "us-east-1",  # E.g. "us-east-1"
    "model_id": "anthropic.claude-3-5-sonnet-20240620-v1:0",  # E.g "anthropic.claude-3-5-sonnet-20241022-v2:0"
    "embeddings": "amazon.titan-embed-text-v2:0",  # E.g "amazon.titan-embed-text-v2:0"
    "model_kwargs": {"temperature": 0.4},
    }


    bedrock_model = ChatBedrock(
        credentials_profile_name=config["credentials_profile_name"],
        region_name=config["region_name"],
        endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
        model_id=config["model_id"],
        model_kwargs=config["model_kwargs"],
    )

    bedrock_embeddings = BedrockEmbeddings(
           credentials_profile_name=config["credentials_profile_name"],
            region_name=config["region_name"],
          model_id=config["embeddings"],
        )
    # Test the connection with a simple prompt
    response = bedrock_model.invoke("Test connection")
    print(response.content)

    sample = SingleTurnSample(
        response="""
            SELECT p.product_name, SUM(oi.quantity) AS total_quantity
            FROM order_items oi
            JOIN products p ON oi.product_id = p.product_id
            GROUP BY p.product_name;
        """,
        reference="""
            SELECT p.product_name, COUNT(oi.quantity) AS total_quantity
            FROM order_items oi
            JOIN products p ON oi.product_id = p.product_id
            GROUP BY p.product_name;
        """,
        reference_contexts=[
            """
            Table order_items:
            - order_item_id: INT
            - order_id: INT
            - product_id: INT
            - quantity: INT
            """,
            """
            Table products:
            - product_id: INT
            - product_name: VARCHAR
            - price: DECIMAL
            """
        ]
        )   
    scorer = LLMSQLEquivalence(llm=bedrock_model)
    result = asyncio.run(main(scorer, sample))
    print(result)
except Exception as e:
    print(f"Error initializing Bedrock model: {str(e)}")

Please help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered 🤖 The question has been answered. Will be closed automatically if no new comments bug Something isn't working module-metrics this is part of metrics module
Projects
None yet
Development

No branches or pull requests

4 participants