-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue/Bug with ChatFormat initialization and the new tokens added in the last update #9
Comments
The missing token was fixed in 5f60250 I'm on the go ATM, could you please open a PR for the Scanner issue? |
Sure, also take your time on answering. I'm not in any rush. I'll make a new issue for the Scanner tweak- just keep in mind that it really is just a one line change. That's why I didn't make a new thread for it since I figured a change as small as that didn't warrant one. I have another small question if you don't mind humoring me- but do you think it would be possible to get Llama3.java to load a model like tinyllama-1.1b-chat-v1.0.Q4_0.gguf? I did some experiments and it mostly seems to load outside of the vocabulary/tokenizer not being compatible. The reason I ask is because even a 4GB model has too much of a delay before it starts speaking. My goal is to essentially load a very lightweight model that can generate some basic NPC dialogue based on what I feed into the system prompt. In the AI performance data I'm including below, I've also recorded how many milliseconds it takes before the AI begins to stream out an output. Having a player wait more than 2-3 seconds for dialogue isn't really feasible, quality-wise. Do you have any tips on this? Anyway, thanks for all the help! AI Performance:
Specs:
|
Take a look at Qwen2 models, they also shipped 0.5B and 1.5B (this one is very decent) models that you can run with https://github.com/mukel/qwen2.svm.java (this a GraalVM's native-image compatible port with no Vector API support). |
Alright, I'll take a look. I've sent the PR as well. I really appreciate the help. This project has increased my understanding of AI greatly and I've learned a lot. |
I'm glad it helps. |
I think the real key to AI finally becoming self aware will be when more developments on memory are done. In a way, I think memories are what play a part in something having a "soul." I've been wanting to catch up with AI, its developments, and coding for it- so this project has also been invaluable for that as well. Is there any way I could contact you outside of github? It would be interesting to have some sort of correspondence with you so that I could potentially learn more and share ideas with you. |
Hello @mukel, I just wanted to pop back in here and try and ask again if you would be interested in some correspondence? I'm building a project that I want to implement some of your code into, and I'd like to have access to you for questions if possible. My website has a contact form here, so if you don't want to share your email publicly, you can just contact me via this instead: https://www.nokoriware.com/contact Thank you again for your time! I hope you've been well! |
Hello,
I've been using this project to study AI, and I'm also wanting to implement it into a small project and I've written a wrapper around your class to give me easier access. Anyway, in the last update you added some new things, but it also appears to have created a new bug.
In ChatFormat's constructor, you added this:
this.endOfMessage = specialTokens.get("<|eom_id|>");
The problem is that the model you suggest using with this program actually doesn't seem to recognize this special token. This leads to a NullPointerException because all of the special tokens are final variables. For now I just commented it out and it works fine again, but I think you might need to add a way for it to check if it has all of the tokens you're trying to map.
The model I used in this case was
Meta-Llama-3-8B-Instruct-Q4_0.gguf
One other minor thing. In the
runInteractive()
function, you didn't close thein
scanner. You can just wrap it in atry()
block to make it a little cleaner:try (Scanner in = new Scanner(System.in)) {}
Anyway, nice work on this. It's super helpful! I'm happy to help out/push some tweaks/handle the above myself if you guys are strapped for time. Thank you!
EDIT:
Also, I want to thank you again for making this. I've been looking for a raw Java approach like this for awhile. I have question as well, since I'm very new to this stuff (and have learned a lot from this class): how would you suggest going about writing code to throw out old tokens/responses when nearing the maxToken limit? I'm wanting to implement this into a game, so I'll need to be able to keep the conversations going without it stopping.
The text was updated successfully, but these errors were encountered: