Plugins are the units that could be orchestrated by TaskWeaver. One could view the plugins as tools that the LLM can utilize to accomplish certain tasks.
In TaskWeaver, each plugin is represented as a Python function that can be called within a code snippet. The orchestration is essentially the process of generating Python code snippets consisting of a certain number of plugins. One concrete example would be pulling data from database and apply anomaly detection. The generated code (simplified) looks like follows:
df, data_description = sql_pull_data(query="pull data from time_series table")
anomaly_df, anomaly_description = anomaly_detection(df, time_col_name="ts", value_col_name="val")
A plugin has two files:
- Plugin Implementation: a Python file that defines the plugin
- Plugin Schema: a file in yaml that defines the schema of the plugin
The plugin function needs to be implemented in Python. To be coordinated with the orchestration by TaskWeaver, a plugin python file consists of two parts:
- Plugin function implementation code
- TaskWeaver plugin decorator
Here we exhibit an example of the anomaly detection plugin as the following code:
import pandas as pd
from pandas.api.types import is_numeric_dtype
from taskWeaver.plugin import Plugin, register_plugin
@register_plugin
class AnomalyDetectionPlugin(Plugin):
def __call__(self, df: pd.DataFrame, time_col_name: str, value_col_name: str):
"""
anomaly_detection function identifies anomalies from an input dataframe of time series.
It will add a new column "Is_Anomaly", where each entry will be marked with "True" if the value is an anomaly
or "False" otherwise.
:param df: the input data, must be a dataframe
:param time_col_name: name of the column that contains the datetime
:param value_col_name: name of the column that contains the numeric values.
:return df: a new df that adds an additional "Is_Anomaly" column based on the input df.
:return desciption: the description about the anomaly detection results.
"""
try:
df[time_col_name] = pd.to_datetime(df[time_col_name])
except Exception:
print("Time column is not datetime")
return
if not is_numeric_dtype(df[value_col_name]):
try:
df[value_col_name] = df[value_col_name].astype(float)
except ValueError:
print("Value column is not numeric")
return
mean, std = df[value_col_name].mean(), df[value_col_name].std()
cutoff = std * 3
lower, upper = mean - cutoff, mean + cutoff
df["Is_Anomaly"] = df[value_col_name].apply(lambda x: x < lower or x > upper)
anomaly_count = df["Is_Anomaly"].sum()
description = "There are {} anomalies in the time series data".format(anomaly_count)
self.ctx.add_artifact(
name="anomaly_detection_results", # a brief description of the artifact
file_name="anomaly_detection_results.csv", # artifact file name
type="df", # artifact data type, support chart/df/file/txt/svg
val=df, # variable to be dumped
)
return df, description
You need to go through the following steps to implement your own plugin.
- import the TaskWeaver plugin decorator
from taskWeaver.plugin import Plugin, register_plugin
- create your plugin class inherited from
Plugin
parent class (e.g.,AnomalyDetectionPlugin(Plugin)
), which is decorated by@register_plugin
- implement your plugin function in
__call__
method of the plugin class. Most importantly, it is mandatory to includedescriptions
of your execution results in the return values of your plugin function. These descriptions can be utilized by the LLM to effectively summarize your execution results.
💡A key difference in a plugin implementation and a normal python function is that it always return a description of the result in natural language. As LLMs only understand natural language, it is important to let the model understand what the execution result is. In the example implementation above, the description says how many anomalies are detected. Behind the scene, only the description will be passed to the LLM model. In contrast, the execution result (e.g., df in the above example) is not handled by the LLM.
-
If the functionality of your plugin depends on additional libraries or packages, it is essential to ensure that they are installed before proceeding.
-
If you wish to persist intermediate results, such as data, figures, or prompts, in your plugin implementation, TaskWeaver provides an
add_artifact
API that allows you to store these results in the workspace. In the example we provide, if you have performed anomaly detection and obtained results in the form of a CSV file, you can utilize theadd_artifact
API to save this file as an artifact. The artifacts are stored in theproject/workspace/session_id/cwd
folder in the project directory.
self.ctx.add_artifact(
name="anomaly_detection_results", # a brief description of the artifact
file_name="anomaly_detection_results.csv", # artifact file name
type="df", # artifact data type, support chart/df/file/txt/svg
val=df, # variable to be dumped
)
The plugin schema is composed of several parts:
- name: The main function name of the Python code.
- enabled: determine whether the plugin is enabled for selection during conversations. The default value is true.
- descriptions: A brief description that introduces the plugin function.
- parameters: This section lists all the input parameter information. It includes the parameter's name, type, whether it is required or optional, and a description providing more details about the parameter.
- returns: This section lists all the return value information. It includes the return value's name, type, and description that provides information about the value that is returned by the function.
Note: The addition of any extra fields would result in a validation failure within the plugin schema.
The plugin schema is required to be written in YAML format. Here is the plugin schema example of the above anomaly detection plugin:
name: anomaly_detection
enabled: true
required: false
description: >-
anomaly_detection function identifies anomalies from an input DataFrame of
time series. It will add a new column "Is_Anomaly", where each entry will be marked with "True" if the value is an anomaly or "False" otherwise.
parameters:
- name: df
type: DataFrame
required: true
description: >-
the input data from which we can identify the anomalies with the 3-sigma
algorithm.
- name: time_col_name
type: str
required: true
description: name of the column that contains the datetime
- name: value_col_name
type: str
required: true
description: name of the column that contains the numeric values.
returns:
- name: df
type: DataFrame
description: >-
This DataFrame extends the input DataFrame with a newly-added column
"Is_Anomaly" containing the anomaly detection result.
- name: description
type: str
description: This is a string describing the anomaly detection results.
Besides, we also set two optional fields as below:
- code: In cases where multiple plugins map to the same Python code (i.e., the plugin name is different from the code name), it is essential to specify the code name (code file) in the plugin schema to ensure clarity and accuracy.
- configurations: When using common code that requires some configuration parameter modifications for different plugins, it is important to specify these configuration parameters in the plugin schema.