Releases: BBC-Esq/VectorDB-Plugin-for-LM-Studio
v6.9.0 - Welcome Kobold!!
Welcome Kobold edition
Ask Jeeves!
![](https://private-user-images.githubusercontent.com/108230321/376094972-d0e84ffe-3a6f-409b-a208-fad558d8532b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzc2MDk0OTcyLWQwZTg0ZmZlLTNhNmYtNDA5Yi1hMjA4LWZhZDU1OGQ4NTMyYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01MmE0OTJiYmI1OWVjMTNmYzUwN2E3YTdiZWMzODlmZWM0ZmFmNzlhY2M3YzM1M2MzZmJlODc5Mjk4YmQ5YjQ4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.9vsUgmRRToudUbY8uY8n9sJxac4Dudb4xXQmeTi8MG4)
- Exciting new "Ask Jeeves" helper who answers questions about how to use the program. Simply click "Jeeves" in the upper left.
- "Jeeves" gets his knowledge from a vector database that comes shipped with this release! NO MORE USER GUIDE TAB - just ASK JEEVES!
- IMPORTANT: After running
setup_windows.py
you must go into theAssets
folder, right-click onkoboldcpp_nocuda.exe
, and check the "Unblock" checkbox first! If it's not there, try starting Jeeves and see if it works. Create a Github Issue if it doesn't work because Ask Jeeves is a new feature. - IMPORTANT: You may also need to disable or make an exception for any firewall you have. Submit a Github
Issue
if you encounter any problems.
- IMPORTANT: After running
Scrape Python Library Documentation
![](https://private-user-images.githubusercontent.com/108230321/376091596-ab1848f4-e729-45cc-b3a0-e1342b9ba754.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzc2MDkxNTk2LWFiMTg0OGY0LWU3MjktNDVjYy1iM2EwLWUxMzQyYjliYTc1NC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04MjhlN2QxODIyYzEyZGY5NDRiMThlN2E2OWQzZWI5MzA3M2ZlNGMzZWE1MzgzZTBmM2FkMmUyODAwMmNiMjU3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.mJ2f4r4ppnVlnOO2XRja2ibpC-mGtv9nMGcNtJkA-dc)
- In the Tools Tab, simply select a python library, click
Scrape
, and all the.html
files will be downloaded to theScraped_Documentation
folder. - Create a vector database out of all of the
.html
files for a given library, then use one of the coding specific models to answer questions!
Huggingface Access Token
![](https://private-user-images.githubusercontent.com/108230321/376092818-cfea0187-f83c-4f85-88ea-4782529d825d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzc2MDkyODE4LWNmZWEwMTg3LWY4M2MtNGY4NS04OGVhLTQ3ODI1MjlkODI1ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00MDE2NTM3Y2Y2ZTQwZmM3NDE4MzI1OTY2YzYxMjkyNzJhZmEzNzhiN2IyMDYwYTU3ODM2NmU3MDcyOTc0ZGY2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.B1xCsH5jM_2UavNBhgKHrZUW3Ah60rl_yckH_aw_WsM)
- You can now enter an "access token" and access models that are "gated" on huggingface. Currently,
llama 3.2 - 3b
andmistral-small - 22b
are the only gated models. - Ask Jeeves how to get a huggingface access token.
Other Improvements
- The vector models are now downloaded using the
snapshot_download
functionality fromhuggingface_hub
, which can exclude unnecessary files such asonnx
,.bin
(when an equivalent.safetensors
version is available), and others. This significantly reduces the amount of data that this program downloads and therefore increases speed and usability. - This speedup should pertain to vector, chat, and whisper models, and implementing the
snapshot_download
for TTS models is planned. - New
Compare GPUs
button in the Tools Tab, which displays metrics for various GPUs so you can better determine your settings. Charts and graphs for chat/vision models will be added in the near future. - New metrics bar with speedometer-looking widgets.
- Removed the User Guide Tab altogether to free up space. You can now simply
Ask Jeeves
instead. - Lots and lots of refactoring to improve various things...
Added/Removed Chat Models
- Added
Qwen 2.5 - 1.5b
,Llama 3.2 - 3b
,Internlm 2.5 - 1.8b
,Dolphin-Llama 3.1 - 8b
,Mistral-Small - 22b
. - Removed
Longwriter Llama 3.1 - 8b
,Longwriter GLM4 - 9b
,Yi - 9b
,Solar Pro Preview - 22.1b
.
Added/Removed Vision Models
- Removed
Llava 1.5
,Bakllava
,Falcon-vlm - 11b
, andPhi-3-Vision
models as either under-performing or eclipsed by pre-existing models that have additional benefits.
Roadmap
- Add
Kobold
as a backend in addition toLM Studio
andLocal Models
, at which point I'll probably have to rename this github repo. - Add
OpenAI
backend. - Remove LM Studio Server settings and revise instructions since LM Studio has changed significantly since they were last done.
Full Changelog: v6.8.2...v6.9.0
v6.8.2 - quality focus
Due to the growing of number of chat and vector models with larger contexts, this mini-release focused on extensive testing at longer contexts. From this point forward, 4k chat models will only be included if they're exceptional and 8k++ models if they're quality and/or offer unique characteristics (e.g. focused on coding etc.)
Added Models (all 8k++ context):
LongWriter Llama 3.1 - 8b
-
- Exceptional at long responses where an unusual number of contexts are thrown at it.
-
Yi - 9b
-
- Long context Yi 9b, replacing Dolphin-Yi 9b, which was under performing at long context.
-
Solar Pro Preview - 22.1b
-
- Exceptional 4k model with a parent model that's 8k coming out in a few months; replaces Solar 10.7b.
-
Removed Models:
Danube 3 - 4b
- under performing at long contextDolphin-Qwen 2 - 1.5b
- under performing at long contextOrca 2 - 7b
- supersededNeural-Chat - 7b
- supersededDolphin-Llama 3.1 - 8b
- supersededHermes-3-Llama-3.1 - 8b
- supersededDolphin-Yi 1.5 - 9b
- redundantDolphin-Qwen 2 - 7b
- supersededDolphin-Phi 3 - Medium
- too difficult to work with and supersededLlama 2 - 13b
- supersededDolphin-Mistral-Nemo - 12b
- too difficult to work with and supersededSOLAR - 10.7b
- superseded
See Release 6.8 for full release notes, including how to upgrade old databases
Current Chat Models:
![chart_chat](https://private-user-images.githubusercontent.com/108230321/367541047-78396cf8-2e1f-4002-aaf3-60c6ff1da3d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzY3NTQxMDQ3LTc4Mzk2Y2Y4LTJlMWYtNDAwMi1hYWYzLTYwYzZmZjFkYTNkOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NTlmNGIyNjFiZDYwNTU1NzQwZWVhZmYzN2I2ZWZhZjQ0YWE0MWY2YWFlNDJjM2I4OWNhNGQ3OTU0MTU3ZWI5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.bb2VJXEuktqvDlsZtMZyi8LF2mx_BzshSzABleemxXo)
Current Vision Models
![chart_vision](https://private-user-images.githubusercontent.com/108230321/367541076-8532e873-5be6-46b5-b162-c2e29788eb50.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzY3NTQxMDc2LTg1MzJlODczLTViZTYtNDZiNS1iMTYyLWMyZTI5Nzg4ZWI1MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMjFlMmNkY2Y3YzdiOTk4OTNlNTc1MmU5NjczZWRkMjY3YjQ0MjhlNGQ2YmQ1NTlhZWQ5N2ZhOWJmZjM0NDk1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.zlcn8LlYZuhIMcVjejYgwEk9jHI4v4RmzE02h6XOBFg)
Current TTS Models
- Does not include Google's, which is online and hence no GPU or VRAM usage.
v6.8.1 - coding models!
Fix
- Added a single missing dependency from the last release.
See Release 6.8.0 for full notes, including how to update databases.
v6.8.0 - coding models!
Breaking Changes
Within the Manage Databases
tab, what's displayed is no longer derived from parsing multiple JSON files. Rather, sqlite3
is used for much much faster response/latency.
As such, databases created prior to this release will not function properly. To migrate old databases rather than creating them anew, use the create_sqlite3.py
attached to this release as follows:
- Run the script and select the folder containing the old JSON files. The script will folder location is as follows:
- Go into the
Vector_DB
folder. - Each folder within
Vector_DB
constitute a database. - Within each folder constituting a database there is a folder named
json
. - You need to run the
create_sqlite3.py
script selecting eachjson
folder for each of your database folders. - A new file named
metadata.db
should be created in each database folder. - It's now safe to remove the
json
folder altogether.
- Update the backup folder.
- The
Vector_DB_Backup
folder contains a mirror image of theVector_DB
- this is the "backup" folder. - Delete all the contents within the
Vector_DB_Backup
folder and copy all the contents of theVector_DB
folder into it.
- Start the program and it should run as usual.
These steps are unnecessary if you're creating a vector database for first time using Release v6.8.0, obviously.
New Chat Models for Coding Questions
- Deepseek-Coder-V2 - 16b (best)
- Yi-Coder - 9b (second best)
- CodeQwen1.5 - 7b (third, but still good)
Benchmarks will be forthcoming, but their metrics are comparable to similarly-sized models.
Misc.
- Now using a
distil-whisper
model for the voice recorder for an approximate 2x speedup. - Added buttons to backup all databases at once or restored all backups at once (Tools Tab).
v6.7.1 - patch update
Patch
- Fixed Florence models not showing up as an option for vision models when a gpu was detected.
See v6.7.0 for all other release notes:
https://github.com/BBC-Esq/VectorDB-Plugin-for-LM-Studio/releases/tag/V6.7.0
v6.7.0 - LONG CONTEXT no see!
General Updates
- CITATIONS! with hyperlinks when searching the Vector DB and getting a response.
- Display of a chat model's max context and how many tokens you've used.
2X Speed Increase
Choose "half" in the database creation settings. It will automatically choose bfloat16
or float16
based on your GPU.
This results in a 2x speed increase with extremely low loss in quality.
Chat Models
Removed Internlm2_5 - 1.8b
and Qwen 1.5 - 1.6b
as under performing.
Removed Dolphin-Llama 3 - 8b
and Internlm2 - 20b
as superseded.
Added Danube 3 - 4b
with 8k context.
Added Phi 3.5 Mini - 4b
with 8k context.
Added Hermes-4-Llama 3.1 - 8b
with 8k context
Added Internlm2_5 - 20b
with 8k context
The following models now have have 8192 context:
Model Name | Parameters (billion) | Context Length |
---|---|---|
Danube 3 - 4b | 4 | 8192 |
Dolphin-Qwen 2 - 1.5b | 1.5 | 8192 |
Phi 3.5 Mini - 4b | 4 | 8192 |
Internlm2_5 - 7b | 7 | 8192 |
Dolphin-Llama 3.1 - 8b | 8 | 8192 |
Hermes-3-Llama-3.1 - 8b | 8 | 8192 |
Dolphin-Qwen 2 - 7b | 7 | 8192 |
Dolphin-Mistral-Nemo - 12b | 12 | 8192 |
Internlm2_5 - 20b | 20 | 8192 |
Text to Speech Models
- Excited to add additional models to choose from when using
whisperspeech
as the text to speech backend - see the chart below for the variouss2a
andt2s
model combinations and "relative" compute times along with real vram usage stats.
![chart_tts](https://private-user-images.githubusercontent.com/108230321/360691630-0fc9122a-a223-4ca6-be27-5b349ae31dd4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzYwNjkxNjMwLTBmYzkxMjJhLWEyMjMtNGNhNi1iZTI3LTViMzQ5YWUzMWRkNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yN2EyMTQ2MGFmNDAwOTc1YzdkNTI3Y2JkOTkxZWYwYTNjYWJjYmYwYzMzNDQyNDgxMWI2MWM0Y2Q2ZDE5OWU1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.8BE5rqGmUeRdaeI2K6x9R4aMEkV2tl2idbrapEKyVZ0)
Current Chat and Vision Models
![chart_chat](https://private-user-images.githubusercontent.com/108230321/360691925-8094e2dd-d00f-4de2-8e44-ee82c6aefa01.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzYwNjkxOTI1LTgwOTRlMmRkLWQwMGYtNGRlMi04ZTQ0LWVlODJjNmFlZmEwMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNjBhMDAwOGU0YTNkYWFkYWZjOGEwNDIxY2MwYzZlNjk5Y2Y5ZDJmZTkyZTQ5ZjBmMDQ0OTc1ZjdmN2VkMDA1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.jZvsg-gilOlsXoqztPcVogAKpDVHiEE0GfePSDhswy0)
![chart_vision](https://private-user-images.githubusercontent.com/108230321/360691945-20e5a819-fe40-48fb-8174-00c98923255a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzYwNjkxOTQ1LTIwZTVhODE5LWZlNDAtNDhmYi04MTc0LTAwYzk4OTIzMjU1YS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yY2I0YmFhMDU2ZmE0MDg2YjUyODY3YWE5YzI4OGM2OWUwMGFiNmEwZDJhNzVlNDI5NmViMDUzMjVhODRiNDEwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.l46orSGRCjBiIxIPkpXVLWUDNfYV2QD3C8GVCcYnxJw)
v6.6.0 - 8192 CONTEXT!
General Updates
- Ensured that vector model pulldown menu auto-updates.
- Made the vector model pulldown menu more descriptive.
Local Models
- Added
Internlm v 2.5 1.8b
. In the last release, version 2.0 ofInternlm's
1.8b model was removed. However, the quality increased noticeably with their version 2.5 so I'm re-adding it.
Vector Models
- Excited to add
Alibaba-NLP/gte-base-en-v1.5
andAlibaba-NLP/gte-large-en-v1.5
. These vector models have a context limit of 8192, which is automatically set within the program. With a conservative estimate of 3 characters per token, that means that you can set the chunk size to approximatly24,576
!! - Removed
Stella
as it was under-performing and too difficult to work with. There is no love loss since the prior release marked it as "experimental" anyways.
Current Chat and Vision Models
![chart_chat](https://private-user-images.githubusercontent.com/108230321/356916676-c08980c1-c5dc-4efc-b6d5-5ce997c76352.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU2OTE2Njc2LWMwODk4MGMxLWM1ZGMtNGVmYy1iNmQ1LTVjZTk5N2M3NjM1Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05NzE3OWZmNzU3ZDRkODRhZjc5YWMzOGVlOTkxNGIwOTQ2NDc4NTEzZGZmYjlhZTk5ZWEwMDI5NDI4ZTlmOWYyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9._agnOcLmMXapWp5Dn2SugIns2Z3y0ObA42yBSTIXjxA)
![chart_vision](https://private-user-images.githubusercontent.com/108230321/356916682-e0047cfc-1114-4689-87c0-0885b4c748e0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU2OTE2NjgyLWUwMDQ3Y2ZjLTExMTQtNDY4OS04N2MwLTA4ODViNGM3NDhlMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02ODE5OGEwYjM4NjMwYmQ2MjBjNDIxYjY0NTUyZWIyZjU0ODYxZjkwNzQyZTkyMjljMTFiYmRkZTQwNmIzYzU0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ClIGxumEW6_V_s1IXNzMevk1msI7gGhUZL7q9w6lqLs)
v6.5.0 - Llama 3.1 & MiniCPM v2
General updates
- Remove
triton
dependency ascogvlm
vision model is also removed. - Redid all benchmarks with more-accurate parameters.
Local Models
Overall, the large amount of chat models was becoming unnecessary or redundant. Therefore, I removed models that weren't providing optimal responses to simplify the user's experience, and added Llama 3.1
.
Removed Models
Qwen 2 - 0.5b
Qwen 1.5 - 0.5b
Qwen 2 - 1.5b
Qwen 2 - 7b
- Redundant with
Dolphin Qwen 2 - 7b
- Redundant with
Yi 1.5 - 6b
Stablelm2 - 12b
Llama 3 - 8b
- Redundant with
Dolphin Llama 3 - 8b
- Redundant with
Added Models
Dolphin Llama 3.1 - 8b
Vision Models
Overall, two vision models were removed as unnecessary and MiniCPM-V-2_6 - 8b
was added. As of the date of this release, MiniCPM-V-2_6 - 8b
is now the best model in terms of quality. I currently recommend using this model if you have the time and VRAM.
Removed Models
cogvlm
MiniCPM-Llama3
Vector Models
- Added
Stella_en_1.5B_v5
, which ranks very high on the leaderboard.- Note, this is a work in progress as currently the results seem to be sub-optimal.
Current Chat and Vision Models
![chart_chat](https://private-user-images.githubusercontent.com/108230321/355933287-111a1179-e85e-4135-b386-df537dd5f2bb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU1OTMzMjg3LTExMWExMTc5LWU4NWUtNDEzNS1iMzg2LWRmNTM3ZGQ1ZjJiYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01NzA5NTlkZmZkOGM3ZDM5ODQ5ZTY4MzI1NDZlYmI5NjU5YTQ0MzczNGZkYmNlODY1ZTcyOTAyOTcyMjg3ZjVkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.uxQC6zNvaWeZJQPcF3p4HEf4Nf83_EesulTRHRJUkGM)
![chart_vision](https://private-user-images.githubusercontent.com/108230321/355933679-cd06591f-ddea-4bb9-a777-767ac50a5490.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU1OTMzNjc5LWNkMDY1OTFmLWRkZWEtNGJiOS1hNzc3LTc2N2FjNTBhNTQ5MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hZWMzYTNlYmZmZDhjN2JjMDJmNTYyYzU0OTc2YTJjZTM1NjM4M2E1MmRmYmFlMjI3ZGE5MjRhYWI1YjFlMGIzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9._mdXhwFpq-7xo82xVKTm--_qwyBTrFN90ykvzz85tCo)
v6.4 - stream responses
Improvements
- All "local models" now stream their responses for a better user experience.
- Various small improvements.
Local Models
- Fixed
Dolphin Phi3-Medium
- Added
Yi 1.5 - 6b
- Added
H2O Danube3 - 4b
- Great quality small model.
- Removed
Mistral v.03 - 7b
- The model is gated so it's difficult to implement in a program. Plus, there are a plethora of other good models.
- Removed
Llama 3.1 - 8b
- Same as with Mistral.
- Added
Internlm 2.5 - 7b
- Fixed
Dolphin-Mistral-Nemo
Vision Models
- Added
Falcon-vlm - 11b
- Great quality. Uses Llava 1.6's processor.
Falcon-vlm
, Llava 1.6 Vicuna - 7b
, and Llava 1.6 Vicuna - 13b
have arguably surpassed Cogvlm
and are faster for less VRAM. Thus, Cogvlm
may be deprecated in the future.
Misc.
- Most, but not all, models should now download to the
Models
folder so you can take your folder with you. FYI, ensuring that all models do so is a work in progress, the goal being to carry all of the necessary files + program on a flash drive.
Current Chat and Vision Models
![chart_chat](https://private-user-images.githubusercontent.com/108230321/354835318-ba3b46d7-7925-4ec8-a32c-da447f0d0297.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU0ODM1MzE4LWJhM2I0NmQ3LTc5MjUtNGVjOC1hMzJjLWRhNDQ3ZjBkMDI5Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02MTVhOWI2ZDFkMjUxZDJjNTc5Y2UyOTdiMjM5ZWVjMjZjMjE2NzFkZjFkZTM0ZDI5OTFmMDlmNjJiODllMmZkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.tSb6C6zxltFlB-7pW65rVOKD4kGQNZqtWcS6uHTSpNg)
![chart_vision](https://private-user-images.githubusercontent.com/108230321/354835321-8e843e50-06a5-4aa1-b0f4-5a73f89ca278.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTM1NTYsIm5iZiI6MTczOTAxMzI1NiwicGF0aCI6Ii8xMDgyMzAzMjEvMzU0ODM1MzIxLThlODQzZTUwLTA2YTUtNGFhMS1iMGY0LTVhNzNmODljYTI3OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQxMTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00MjA1ZGE5ZDlhZWIwNGIwZTZlMDgwMWU2YTIzNDExNGYxMjBhNjljYWFkZjI1MGI4Mzg0M2UzNzc3MzQzZjNlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.yaxy-9LrVKGTeT5sO4n8Tga-GL4ghHg6OhnM2LOtshA)
6.3.0 - whisper upgrade
NOTE
This release has been deleted a few times because of errors but this one should work now....
Updates:
- Added the large-v3 whisper model and removed large-v2.
- Added all three distil whisper model sizes.
- Ensured that all whisper model files are downloaded to the
Models/whisper
folder in the source code folder. - Added error handling in metrics bar for if/when the numbers go over 100% - e.g. a model overflows the vram.
- Modified
gui.py
to specify the multiprocess type earlier in the script to avoid some errors.