(spike) Determine open source LLMs to evaluate on #717

jalling97 · 2024-07-02T20:55:34Z

Description

As part of the deliverables for an MVP Evals framework, we will need a short list of LLMs we are evaluating on as part of LFAI. The models chosen should fit the following criteria (with room for potential exceptions):

Fewer than 5 total models (excess evals at the beginning is information overload, requires extra time, and may not be valuable out of the gate)
Models should be foundational (avoid hyper-specific fine-tuned models)
Models should be Apache-2.0 (or more permissible)
Models should be able to fit on 12-16Gb VRAM

Relevant Links

Inspiration can be found on the HuggingFace Open LLM Leaderboard

jalling97 · 2024-08-20T12:49:38Z

Since we implement SynthIA-7B as our default model, it will be included in the list of initial evaluations.

We have also done our own quantization of a fine-tuned Mistral-7B, so this should be another model used in the initial evaluations.

jalling97 · 2024-08-20T13:54:33Z

Potential models for consideration:

Phi-3-small-instruct, in either the 8k or 128k context (also available for ONNX)
Mistral-NeMo (currently debating on size constraint concerns)

jalling97 · 2024-08-20T20:04:22Z

In order to balance scope and time; the following models will be used:

SynthIA-7B (default model for now)
Hermes2 (Defense Unicorns quantization)
LLama3.1-8B (pending license approval)
- If not available, Phi-3-small (8k context)

GPT-4o will also be used as a point of comparison in the results.

jxtngx · 2024-09-03T13:56:42Z

Hi, @jalling97! Have you seen NVIDIA's distilled Llama 3.1 8B models?

Here's the paper on model distillation, if interested: https://arxiv.org/abs/2408.11796

jalling97 · 2024-09-03T14:14:19Z

Hey @jxtngx! I have not, I'll be sure to take a look at these. The primary issue when it comes to using LLama3.1 is the license. We're still not sure based on the wording whether or not we're able to use it. Not being an expert in the legal side of things, I'm not sure if the NVIDIA Open Model License changes any of that, but regardless I'll be sure to mess around with these models anyways. The benchmark comparison looks promising. Thanks for sending this over!

jxtngx · 2024-09-03T14:24:39Z

There's also Nvidia's own Minitron 4B and 8B. Again though – the proprietary license may be a blocker.

I've added all of the models to this collection for easy reference:

https://huggingface.co/collections/jxtngx/nvidia-minitron-models-66d714aebae0e60d003a9693

justinthelaw · 2024-09-03T14:37:33Z

@jalling97 the 24Gb vRAM requirement should be lowered to 16Gb (or 12Gb ideally), and also reflected back into our documentation. If we are considering lower-end GPUs (e.g., laptop GPUs, v100s, etc.) and/or laptop CPU RAM - think government laptop with 16Gb of RAM (13Gb free in ideal situation) - as the only offloading target in a worst case scenario, then we should reduce our minimums.

If the "example/demo" model for our repository is to fit into the minimum requirement laptops and machines, then we should focus on heavily quantized, pruned and/or lower param models (down to ~2B effective params). Until we activate vLLM's (still not prod ready though), or another engine's, CPU + GPU offloading, we are restrained to one or the other's compute RAM.

Other factors besides model parameter size include architecture, context size, and engine parameters we may be able to tune to lower vRAM usage at the edge. An oversimplified example of these considerations can be seen in the ADDITIONAL CONTEXT section of this WIP PR: #854 (comment).

Additional parameters for vLLM can be found here: https://docs.vllm.ai/en/v0.4.3/models/engine_args.html. Pinned to v0.4.3 due to issues in 0.5.x, as described in the aforementioned WIP PR above.

jalling97 · 2024-09-03T14:58:28Z

Good call out! When I listed the 24Gb vRAM requirement, I don't think I properly conveyed that the intention was that the model should be fully "usable" on 24Gb vRAM (i.e not reaching OOM errors when using the model's full context). While keeping that determination of success the same, I agree that it makes sense to reduce that vRAM cap to 12Gb or 16Gb. I'll try to keep to 12Gb but may increase to 16Gb if a very promising model comes into play that requires it. Happy to discuss further if you like.

jalling97 · 2024-09-10T13:26:42Z

This research spike will be closed. The three models that will initially be evaluated are going to be:

SynthIA-7B (default model for now)
Hermes2 (Defense Unicorns quantization)
LLama3.1-8B
GPT-4o will also be used as a point of comparison in the results.

jalling97 added the spike label Jul 2, 2024

jalling97 mentioned this issue Jul 3, 2024

EPIC: RAG Evaluations MVP #723

Closed

11 tasks

jalling97 self-assigned this Aug 20, 2024

jalling97 added this to the Current - RAG UX Enhancements | Model Directory | API Odds and Ends milestone Aug 20, 2024

justinthelaw mentioned this issue Sep 3, 2024

feat(vllm)!: upgrade vllm backend and refactor deployment #854

Merged

jalling97 closed this as completed Sep 10, 2024

jalling97 changed the title ~~(spike) Determine Open Source LLMs for Initial Evaluations~~ (spike) Determine open source LLMs to evaluate on Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(spike) Determine open source LLMs to evaluate on #717

(spike) Determine open source LLMs to evaluate on #717

jalling97 commented Jul 2, 2024 •

edited

Loading

jalling97 commented Aug 20, 2024 •

edited

Loading

jalling97 commented Aug 20, 2024

jalling97 commented Aug 20, 2024 •

edited

Loading

jxtngx commented Sep 3, 2024

jalling97 commented Sep 3, 2024

jxtngx commented Sep 3, 2024

justinthelaw commented Sep 3, 2024 •

edited

Loading

jalling97 commented Sep 3, 2024

jalling97 commented Sep 10, 2024 •

edited

Loading

(spike) Determine open source LLMs to evaluate on #717

(spike) Determine open source LLMs to evaluate on #717

Comments

jalling97 commented Jul 2, 2024 • edited Loading

Description

Relevant Links

jalling97 commented Aug 20, 2024 • edited Loading

jalling97 commented Aug 20, 2024

jalling97 commented Aug 20, 2024 • edited Loading

jxtngx commented Sep 3, 2024

jalling97 commented Sep 3, 2024

jxtngx commented Sep 3, 2024

justinthelaw commented Sep 3, 2024 • edited Loading

jalling97 commented Sep 3, 2024

jalling97 commented Sep 10, 2024 • edited Loading

jalling97 commented Jul 2, 2024 •

edited

Loading

jalling97 commented Aug 20, 2024 •

edited

Loading

jalling97 commented Aug 20, 2024 •

edited

Loading

justinthelaw commented Sep 3, 2024 •

edited

Loading

jalling97 commented Sep 10, 2024 •

edited

Loading