annotation by `alpaca_eval_llama3_70b_fn` causes `IndexError` #417

reihig-ut · 2024-11-02T12:25:17Z

Details of the error

I installed alpaca_eval==0.6.5, set OPENAI_API_KEY and ran the following commands, both worked fine.

alpaca_eval --model_outputs "alpaca_test.json"
alpaca_eval --model_outputs "alpaca_test.json" --annotators_config "alpaca_eval_apt4_turbo_fn"

However, when I reset the OPENAI_API_KEY with together.ai API key and tried the annotator alpaca_eval_llama3_70b_fn,

alpaca_eval --model_outputs "alpaca_test.json" --annotators_config "alpaca_eval_llama3_70b_fn"

it caused the following error:

WARNING:root:precomputed_leaderboard = 'auto'. But we have found no corresponding leaderboard
INFO:root:Evaluating the test outputs.
INFO:root:Creating the annotator from `alpaca_eval_llama3_70b_fn`.
INFO:root:Saving annotations to `/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json`.
INFO:root:Loading all annotations from /home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json.
WARNING:root:The length of outputs before and after merge are not the same. We have len(outputs_1)==805, len(outputs_2)==5, and len(df_annotated)==5. This means that there are missing examples or duplicates. We are taking a SQL inner join.
Annotation chunk:   0%|                                                                    | 0/1 [00:00<?, ?it/s]INFO:root:Annotating 5 examples with alpaca_eval_llama3_70b_fn
INFO:root:Using `openai_completions` on 5 prompts using meta-llama/Llama-3-70b-chat-hf.
INFO:root:Kwargs to completion: {'model': 'meta-llama/Llama-3-70b-chat-hf', 'temperature': 0, 'client_kwargs': {'base_url': 'https://api.together.xyz/v1'}, 'is_chat': True}. num_procs=5
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
                                                                                                                INFO:root:Using OAI client number 1 out of 1.                                               | 0/5 [00:00<?, ?it/s]
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
prompt_batches:   0%|                                                                      | 0/5 [00:00<?, ?it/s]
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
Annotation chunk:   0%|                                                                    | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/hoge/bin/alpaca_eval", line 8, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/main.py", line 611, in main
    fire.Fire(evaluate)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/main.py", line 155, in evaluate
    annotations = annotator.annotate_head2head(
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 237, in annotate_head2head
    out = self.__call__(df_to_annotate, **decoding_kwargs)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/annotators/base.py", line 203, in __call__
    df_annotated = self._annotate(curr_df_to_annotate, **decoding_kwargs)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/annotators/base.py", line 319, in _annotate
    curr_annotated = self.annotators[annotator](
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/annotators/base.py", line 692, in __call__
    completions = self.fn_completions(prompts=prompts, **self.completions_kwargs, **decoding_kwargs)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/decoders/openai.py", line 153, in openai_completions
    completions = list(
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/decoders/openai.py", line 248, in _openai_completion_helper
    choices[i]["text"] = choice.message.tool_calls[0].function.arguments
IndexError: list index out of range

The content of alpaca_test.json is

[
  {
    "generator": "test",
    "instruction": "You are given a sentence in first-order logic. Translate it into natural language.\n\n\u2203x\u2200y(soldier(x)\u2192general(x,y))",
    "output": "Every soldier is a general of someone.",
    "reference": "There exists an x such that for all y, if x is a soldier then x is a general of y."
  },
  {
    "generator": "test",
    "instruction": "Hi, I'm interested in learning to play badminton. Can you explain the game to me?",
    "output": "Two opposing players (singles) or two opposing pairs (doubles) rally a shuttlecock across a net.  Point is scored when the shuttlecock fails to be returned legally.  First to a predetermined number of points wins.",
    "reference": "Sure! Badminton is a sport played by two or four players. It is played with a shuttlecock and racquets. The game is played by hitting the shuttlecock back and forth over a net that divides the court. The objective is to hit the shuttlecock over the net and have it land in the opposing player's court. Points are scored when the shuttlecock is not returned properly. The first player or team to reach 21 points wins the game."
  },
  {
    "generator": "test",
    "instruction": "Provide me with a list of 10 names from various cultures that mean love, care, and/or empathy.",
    "output": "1. Amor (Spanish) 2. Caridad (Spanish) 3. Prem (Sanskrit) 4. Agape (Greek) 5. Hani (Korean) 6. Melek (Turkish) 7. Anjali (Indian) 8. Lian (Chinese) 9. Rahma (Arabic) 10. Seren (Welsh)",
    "reference": "1. Agapi (Greek)\n2. Amor (Latin)\n3. Prema (Sanskrit)\n4. Riko (Japanese)\n5. Asha (Hebrew)\n6. Bahiya (Arabic)\n7. Sehnsucht (German)\n8. Uyiosa (African)\n9. Pyar (Hindi)\n10. Chikondi (Chichewa)"
  },
  {
    "generator": "test",
    "instruction": "List pros and cons of lowering the capabilities of my ears by listening to loud music (70-75db) in order to withstand the incredibly loud sound of screaming kids at school.",
    "output": "Pros: None. Hearing damage is irreversible.\nCons: Hearing loss, tinnitus, other auditory problems. This is not a viable solution.",
    "reference": "Pros:\n\n1. Improved ability to tolerate loud noises like screaming kids.\n2. Increased awareness of lower-level sounds that may be drowned out by louder noises.\n3. Improved focus and concentration while in noisy environments.\n\nCons:\n\n1. Increased risk of hearing loss due to prolonged exposure to loud sounds.\n2. Difficulty hearing softer sounds, including conversations and music.\n3. Possible damage to the delicate structures of the ear and inner ear, leading to permanent hearing loss."
  },
  {
    "generator": "test",
    "instruction": "The given article should be checked for any scientific errors and corrected if necessary.\n\nZinc is an essential mineral that is required for good health. It is involved in many biochemical processes in the body and is necessary for the proper functioning of the immune system. Excess zink is not harmful as it is excreted in the urine.",
    "output": "Excess zinc is harmful; it's not all excreted.",
    "reference": "Zinc is an essential mineral that is required for good health. It is involved in many biochemical processes in the body and is necessary for the proper functioning of the immune system. Excess zinc is not harmful as it is excreted in the urine."
  }
]

Possible solution

Also, After I changed the line 243 in src/alpaca_eval/decoders/openai.py

                    if choice.message.tool_calls is not None:

into

                    if choice.message.tool_calls is not None and len(choice.message.tool_calls) > 0:

, it worked (but I'm not sure how this quick fix affects the evaluation).

WARNING:root:precomputed_leaderboard = 'auto'. But we have found no corresponding leaderboard
INFO:root:Evaluating the test outputs.
INFO:root:Creating the annotator from `alpaca_eval_llama3_70b_fn`.
INFO:root:Saving annotations to `/home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json`.
INFO:root:Loading all annotations from /home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json.
WARNING:root:The length of outputs before and after merge are not the same. We have len(outputs_1)==805, len(outputs_2)==5, and len(df_annotated)==5. This means that there are missing examples or duplicates. We are taking a SQL inner join.
Annotation chunk:   0%|                                                                    | 0/1 [00:00<?, ?it/s]INFO:root:Annotating 5 examples with alpaca_eval_llama3_70b_fn
INFO:root:Using `openai_completions` on 5 prompts using meta-llama/Llama-3-70b-chat-hf.
INFO:root:Kwargs to completion: {'model': 'meta-llama/Llama-3-70b-chat-hf', 'temperature': 0, 'client_kwargs': {'base_url': 'https://api.together.xyz/v1'}, 'is_chat': True}. num_procs=5
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
                                                                                                                WARNING:root:/home/user/miniconda3/envs/hoge/lib/python3.10/client_configs/openai_configs.yaml wasn't found. We are using environment variables to construct the client configs.This is the old and non-recommended way of doing it. Please see `client_configs/README.md` for the recommended way of specifying client configs.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:root:Using OAI client number 1 out of 1.
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.together.xyz/v1/chat/completions "HTTP/1.1 200 OK"
prompt_batches: 100%|██████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  3.71it/s]
INFO:root:Completed 5 examples in 1.4 seconds.                                     | 1/5 [00:01<00:05,  1.35s/it]
INFO:root:Saving all annotations to /home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json.
INFO:root:Loading all annotations from /home/user/miniconda3/envs/hoge/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/alpaca_eval_llama3_70b_fn/annotations_seed0_configs.json.
Annotation chunk: 100%|████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.60s/it]
INFO:root:Saving all results to samples/alpaca_eval_llama3_70b_fn
INFO:root:Not saving the result to the cached leaderboard because precomputed_leaderboard is not a path but <class 'NoneType'>.
      length_controlled_winrate  win_rate  standard_error  n_total  avg_length
test                       0.09      0.00            0.00        5         121

The text was updated successfully, but these errors were encountered:

reihig-ut changed the title ~~annotation by alpaca_eval_llama3_70b_fn causes~~ annotation by alpaca_eval_llama3_70b_fn causes IndexError Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotation by `alpaca_eval_llama3_70b_fn` causes `IndexError` #417

annotation by `alpaca_eval_llama3_70b_fn` causes `IndexError` #417

reihig-ut commented Nov 2, 2024 •

edited

Loading

annotation by alpaca_eval_llama3_70b_fn causes IndexError #417

annotation by alpaca_eval_llama3_70b_fn causes IndexError #417

Comments

reihig-ut commented Nov 2, 2024 • edited Loading

Details of the error

Possible solution

annotation by `alpaca_eval_llama3_70b_fn` causes `IndexError` #417

annotation by `alpaca_eval_llama3_70b_fn` causes `IndexError` #417

reihig-ut commented Nov 2, 2024 •

edited

Loading