KotLLMs (Kotlin LLMs)

An experimental library for calling LLM apis from Kotlin.

Everything is bound to change

Feature	Status
Raw prompting	✅
Chat prompting	✅
Chat templates (with DSL)	✅
Result streaming	✅
Generation flags	✅
Message flags	✅
Images in prompts	✅ (only JVM + JS currently)
Grammars (with DSL)	✅
Function/tool calling	✅ (Using Functions submodule)
Json schemas	✅ (For non-recursive schemas)
Json schema to Grammar converter	✅
Json schema fixer (ensures json schema can be streamed)	✅

Current implemented APIs

KoboldCPP
OpenAI + Proxies

... More coming soon!

Implementing your own APIs

To implement your own API, simply make a class which extends API<S: Settings, F : Flags<*>>. Settings is an object provided to the API that can be used to store an endpoint, an api key, or similar. In raw-only apis the settings object can be used to store a template, to allow for formatting chats.

Submodules

Function calling (+ tools)
1. Grammar-based ✅
2. Tools (api built-in) ✅ (TODO: Support proper tool returns using call ids)

See submodules.

Usage

Initializing an API

// Init KoboldCPP with default localhost endpoint
val api = KoboldCPP()

// Init KoboldCPP with a custom endpoint (localhost with port 5002 instead of default 5001)
val api = KoboldCPP(KoboldCPPSettings("http://localhost:5002"))

// Init KoboldCPP with a Llama 3 chat template
val api = KoboldCPP(KoboldCPPSettings()).toChat(Llama3Template())
// The original KoboldCPP object is available through api.getWrapped().

Performing a raw generation call

val start = "This is a short story about a"
// KoboldCPP flags since it's the api we're using, createFlags() creates a flags object for whichever api you're using, some apis might have flags that others don't.
val flags = api.buildFlags {
    max_length = 50
}
// Assuming we're already in a suspend context
val result = api.rawGen(start, flags).getOrThrow()

print(result.getText())

Streaming responses

val start = "This is a short story about a"
// Some apis might have flags which others don't, if you need a method to support any API, use runIfImpl<Type> {  }
val flags = api.buildFlags {
    max_length = 50
    
    stream = true
}
// Assuming we're already in a suspend context
// Either
val result = api.rawGen(start, flags).getOrThrow() as KoboldCPPGenerationResultsStreamed
// Or the more generic
val result = api.rawGen(start, flags).getOrThrow() as StreamedGenerationResult<*>

// Now we can listen on the stream
result.registerStreamer {
    val chunk = it.getOrThrow() // Using Result
    print(chunk.getToken()) // Stream
    System.out.flush() // Show the new tokens even before newline (since print doesn't flush)
    if (chunk.isLastToken()) println() // End
}

Performing a chat generation call

Chat generations require apis which implement the ChatGen interface.
Chat support can be added to any RawGen API using a template with api.toChat(template).

Custom chat templates can be created by extending ChatTemplate, using ChatTemplateDSL.

// Universal syntax for creating chat messages from any chat API
val exampleChat = api.createChat {
    createMessage {
        content = "You are a helpful AI assistant."
        role = ChatGen.ChatRole.System
    }
    createMessage {
        content = "Who are you?"
        role = ChatGen.ChatRole.User
    }
}
val flags = api.buildFlags {
    max_length = 50
}
val result = api.chatGen(exampleChat, flags).getOrThrow()

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.idea		.idea
Functions		Functions
gradle/wrapper		gradle/wrapper
src		src
testres		testres
.gitignore		.gitignore
LICENSE		LICENSE
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
readme.md		readme.md
settings.gradle.kts		settings.gradle.kts
submodules.md		submodules.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KotLLMs (Kotlin LLMs)

Current implemented APIs

Implementing your own APIs

Submodules

Usage

Initializing an API

Performing a raw generation call

Streaming responses

Performing a chat generation call

About

Releases

Packages

Languages

License

Mylo-Softworks/KotLLMs

Folders and files

Latest commit

History

Repository files navigation

KotLLMs (Kotlin LLMs)

Current implemented APIs

Implementing your own APIs

Submodules

Usage

Initializing an API

Performing a raw generation call

Streaming responses

Performing a chat generation call

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages