-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't run gguf which download from ollama #1137
Comments
I am running on c9ac321 with a little patch From ff7f52bf4ab3ed1b67bbc7fe240554fd1e792929 Mon Sep 17 00:00:00 2001
From: Sherlock Holo <[email protected]>
Date: Thu, 13 Feb 2025 15:56:04 +0800
Subject: [PATCH] fix cuda 12.8
---
Cargo.lock | 4 ++--
Cargo.toml | 4 ++++
mistralrs-core/Cargo.toml | 1 +
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/Cargo.lock b/Cargo.lock
index bfa84a5b5..1013af2ee 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -811,8 +811,7 @@ dependencies = [
[[package]]
name = "cudarc"
version = "0.13.4"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "3b68d7c284d40d96a4251330ab583c2718b412f4fc53239d295b3a1f8735f426"
+source = "git+https://github.com/wcork/cudarc?branch=feat-cuda12080#a749c6c1bd1bd6d08c0d03cadb63b34585f6c7cc"
dependencies = [
"half",
"libloading",
@@ -2388,6 +2387,7 @@ dependencies = [
"chrono",
"clap",
"csv",
+ "cudarc",
"derive-new",
"derive_more",
"dirs",
diff --git a/Cargo.toml b/Cargo.toml
index 188f2700f..de2e47382 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -11,6 +11,9 @@ members = [
]
resolver = "2"
+[patch.crates-io]
+cudarc = { git = "https://github.com/wcork/cudarc", branch = "feat-cuda12080" }
+
[workspace.package]
version = "0.4.0"
edition = "2021"
@@ -23,6 +26,7 @@ license = "MIT"
rust-version = "1.82"
[workspace.dependencies]
+cudarc = {version="0.13",features=["cuda-12080"]}
anyhow = "1.0.80"
candle-core = { git = "https://github.com/EricLBuehler/candle.git", version = "0.8.0", rev = "fb5cc8c" }
candle-nn = { git = "https://github.com/EricLBuehler/candle.git", version = "0.8.0", rev = "fb5cc8c" }
diff --git a/mistralrs-core/Cargo.toml b/mistralrs-core/Cargo.toml
index 91ded26ef..861e88e75 100644
--- a/mistralrs-core/Cargo.toml
+++ b/mistralrs-core/Cargo.toml
@@ -12,6 +12,7 @@ license.workspace = true
homepage.workspace = true
[dependencies]
+cudarc.workspace = true
anyhow.workspace = true
candle-core.workspace = true
candle-nn.workspace = true
--
2.48.1 to fix the cuda problem (which cudarc not support cuda 12.8) |
llama.cpp can run the gguf file so I think the model file has no problem |
@Sherlock-Holo I pushed 87a7c23 which fixes the loading. It looks like the chat template for these models in the GGUF file migth be incorrect, as it does not match the official one from Qwen. I'm not really sure what a good workaround is, other than creating your own or using ISQ. |
I tried 87a7c23 and confirm it fixed the problem |
I want to try use mistral.rs to run a gguf model, and I have download deepseek r1 14b by ollama, it store the gguf file as /var/lib/ollama/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e
then I clone the https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B without lfs, because I already have the model file I think I just need other files in the repo
then I run
~/git/mistral.rs/target/release/mistralrs-server -i gguf -m . -f /var/lib/ollama/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e
the log is
what cause the problem
Error: Cannot find tensor info for blk.0.attn_output.bias
The text was updated successfully, but these errors were encountered: