Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/Bug with ChatFormat initialization and the new tokens added in the last update #9

Open
SkyAphid opened this issue Jul 31, 2024 · 7 comments

Comments

@SkyAphid
Copy link

SkyAphid commented Jul 31, 2024

Hello,

I've been using this project to study AI, and I'm also wanting to implement it into a small project and I've written a wrapper around your class to give me easier access. Anyway, in the last update you added some new things, but it also appears to have created a new bug.

In ChatFormat's constructor, you added this:
this.endOfMessage = specialTokens.get("<|eom_id|>");

The problem is that the model you suggest using with this program actually doesn't seem to recognize this special token. This leads to a NullPointerException because all of the special tokens are final variables. For now I just commented it out and it works fine again, but I think you might need to add a way for it to check if it has all of the tokens you're trying to map.

The model I used in this case was Meta-Llama-3-8B-Instruct-Q4_0.gguf

One other minor thing. In the runInteractive() function, you didn't close the in scanner. You can just wrap it in a try() block to make it a little cleaner:

try (Scanner in = new Scanner(System.in)) {}

Anyway, nice work on this. It's super helpful! I'm happy to help out/push some tweaks/handle the above myself if you guys are strapped for time. Thank you!

EDIT:
Also, I want to thank you again for making this. I've been looking for a raw Java approach like this for awhile. I have question as well, since I'm very new to this stuff (and have learned a lot from this class): how would you suggest going about writing code to throw out old tokens/responses when nearing the maxToken limit? I'm wanting to implement this into a game, so I'll need to be able to keep the conversations going without it stopping.

@SkyAphid SkyAphid changed the title Issue with ChatFormat initialization and the new tokens added in the last update Issue/Bug with ChatFormat initialization and the new tokens added in the last update Jul 31, 2024
@mukel
Copy link
Owner

mukel commented Jul 31, 2024

The missing token was fixed in 5f60250

I'm on the go ATM, could you please open a PR for the Scanner issue?
Would you mind to post your hardware config and tokens/s?

@SkyAphid
Copy link
Author

SkyAphid commented Aug 1, 2024

Sure, also take your time on answering. I'm not in any rush. I'll make a new issue for the Scanner tweak- just keep in mind that it really is just a one line change. That's why I didn't make a new thread for it since I figured a change as small as that didn't warrant one.

I have another small question if you don't mind humoring me- but do you think it would be possible to get Llama3.java to load a model like tinyllama-1.1b-chat-v1.0.Q4_0.gguf? I did some experiments and it mostly seems to load outside of the vocabulary/tokenizer not being compatible.

The reason I ask is because even a 4GB model has too much of a delay before it starts speaking. My goal is to essentially load a very lightweight model that can generate some basic NPC dialogue based on what I feed into the system prompt. In the AI performance data I'm including below, I've also recorded how many milliseconds it takes before the AI begins to stream out an output. Having a player wait more than 2-3 seconds for dialogue isn't really feasible, quality-wise. Do you have any tips on this?

Anyway, thanks for all the help!

AI Performance:

Parse models\Meta-Llama-3.1-8B-Instruct-Q4_0.gguf: 2456 millis
5.31 tokens/s
10468ms wait time before it begins to speak

Specs:

Operating System:
Microsoft Windows 11 x64

CPU Specifications:
AMD Ryzen 9 3900X 12-Core Processor            
 1 physical CPU package(s)
 12 physical CPU core(s)
 24 logical CPU(s)
Identifier: AuthenticAMD Family 23 Model 113 Stepping 0
Microarchitecture: Zen 2

Physical Memory Specifications:
  Bank label: BANK 1, Capacity: 16 GiB, Clock speed: 2.1 GHz, Manufacturer: Corsair, Memory type: DDR4
  Bank label: BANK 3, Capacity: 16 GiB, Clock speed: 2.1 GHz, Manufacturer: Corsair, Memory type: DDR4

GPU Specifications:
  Vendor: NVIDIA Corporation
  Renderer: NVIDIA GeForce RTX 2070 SUPER/PCIe/SSE2
  OpenGL version: 4.6.0 NVIDIA 552.44
  GLSL version: 4.60 NVIDIA

@mukel
Copy link
Owner

mukel commented Aug 1, 2024

Take a look at Qwen2 models, they also shipped 0.5B and 1.5B (this one is very decent) models that you can run with https://github.com/mukel/qwen2.svm.java (this a GraalVM's native-image compatible port with no Vector API support).
I also have Phi3 and Gemma ports and Google just released a small Gemma model.

@SkyAphid
Copy link
Author

SkyAphid commented Aug 1, 2024

Alright, I'll take a look. I've sent the PR as well. I really appreciate the help. This project has increased my understanding of AI greatly and I've learned a lot.

@mukel
Copy link
Owner

mukel commented Aug 2, 2024

This project has increased my understanding of AI greatly and I've learned a lot.

I'm glad it helps.
Even after you grasp what's behind, just simple math operations that anyone could understand; LLMs still seem magical to me, the idea that some form of intelligence and consciousness can hide in plain sight behind some matrices ...

@SkyAphid
Copy link
Author

SkyAphid commented Aug 3, 2024

I think the real key to AI finally becoming self aware will be when more developments on memory are done. In a way, I think memories are what play a part in something having a "soul." I've been wanting to catch up with AI, its developments, and coding for it- so this project has also been invaluable for that as well. Is there any way I could contact you outside of github? It would be interesting to have some sort of correspondence with you so that I could potentially learn more and share ideas with you.

@SkyAphid
Copy link
Author

SkyAphid commented Oct 8, 2024

Hello @mukel, I just wanted to pop back in here and try and ask again if you would be interested in some correspondence? I'm building a project that I want to implement some of your code into, and I'd like to have access to you for questions if possible. My website has a contact form here, so if you don't want to share your email publicly, you can just contact me via this instead:

https://www.nokoriware.com/contact

Thank you again for your time! I hope you've been well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants