> [!CAUTION]
> As for [Tip #2], I found out why the models from Hugging Face outperform the ones from Ollama with the current code. I use OpenAI API to use Ollama, but due to that, the context window size is fixed (with 2048). Please refer more details here. So, I'm currently working on it to fix the issue. In the meantime, you should use huggingface
platform in order to get accurate inference until I fix it.
The above struck through part is fixed now. I will check whether [Tip #2] still holds true.
- Introduction
- Modification
- Overview
- Environmental Setup
- Program Execution
- Tips for Better Research Outcomes
- Regarding Local LLMs
- Regarding LLMs Overvall
- [Tip #4] π Make sure to write extensive notes! π
- [Tip #5] π Using more powerful models generally leads to better research π
- [Tip #6] β You can load previous saves from checkpoints β
- [Tip #7] π― If you are running in a language other than English π²
- [Tip #8] π There is a lot of room for improvement π
- π License
- Reference
This repository is based on AgentLaboratory, but this repository supports local LLMs, which the original repo does not. It is especially good if you or your organization has enough GPUs for you to use. Some of possible advantages are as follows.
- Don't need to send your data to cloud LLMs.
- Once you download a model, you don't need to access the internet to utilize LLMs.
- Can flexibly and locally finetune a LLM model if you want to.
- Don't need to spend money on pay-as-you-go APIs, which are ambiguous and hard to estimate the total costs that you need to pay. Besides, even if you decide to utilize one of the cloud LLMs later on, using local LLMs first can give you an approximation of the costs before using the cloud ones, such as how many tokens a LLM would produce in order to solve your problem.
- It is also good if you would like to investigate and experiment what kind of outcomes you can expect when you feed your data for internal investigation purposes.
I have made some modifications in this repo from the original one.
- Enable you to use local LLMs, which includes DeepSeek R1 models, instead of cloud ones
- Fixed some prompts for clearer instructions
- Made arguments and some crucial parameters configurable using a JSON config file. For details, please check config.json
- Enable you to override arguments such as a model selection for each phase using the config file explained above even when you restart with a previously saved state file. The original repo does not allow that unless you modify the code since all the instance variables of
LaboratoryWorkflow
that contain those arguments are all set when the class is instantiated and are saved as part of the state files. Due to that, when you restart with one of those state files, the instance still uses the same arguments that were set when instantiated - Made clear the import dependencies because the original code frequently uses
import *
, which is ambiguous and not recommended - Created Dockerfile in order to locally and efficiently build a development environment. For details, please check Environmental Setup
- Include some examples that were created using Local LLMs. Please refer to examples directory for details
Tip
The examples were created using end-to-end autonomous mode, which means no human intervention from start to finish. In order to get better results, one way to do it is to include human intervention in some or all of the phases, which is called co-pilot mode in the paper. For more details, please check Co-Pilot Mode.
Tip
Other ways to get better results are to adjust temperature and prompts, conduct various trials with saved states, and so on. For more details, please check Tips for Better Research Outcomes.
- Local Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas. Agent Laboratory consists of specialized agents driven by large language models to support you through the entire research workflowβfrom conducting literature reviews and formulating plans to executing experiments and writing comprehensive reports.
- This system is not designed to replace your creativity but to complement it, enabling you to focus on ideation and critical thinking while automating repetitive and time-intensive tasks like coding and documentation. By accommodating varying levels of computational resources and human involvement, Local Agent Laboratory aims to accelerate scientific discovery and optimize your research productivity.
- Agent Laboratory consists of three primary phases that systematically guide the research process: (1) Literature Review, (2) Experimentation, and (3) Report Writing. During each phase, specialized agents driven by LLMs collaborate to accomplish distinct objectives, integrating external tools like arXiv, Hugging Face, Python, and LaTeX to optimize outcomes. This structured workflow begins with the independent collection and analysis of relevant research papers, progresses through collaborative planning and data preparation, and results in automated experimentation and comprehensive report generation. Details on specific agent roles and their contributions across these phases are discussed in the paper.
For this repo, any models from Ollama or Hugging Face, including the recently-announced DeepSeek R1 models, are supported to be used as local LLMs.
So, pick a platform either huggingface
or ollama
using --platform
argument. E.g. --platform huggingface
.
If you'd like to check thought processes when you use one of the DeepSeek R1 models, set a flag named --show-r1-thought
as true
. That way, you can see the thought processes in the console!
Tip
Remove #s for the two lines after setting your time zone, so that you can avoid an interruption in the process of building the environment.
Follow the installation steps below.
(Type the following commands at host)
git clone https://github.com/Masao-Taketani/LocalAgentLaboratory.git
docker build -t agentlab .
docker run -it --rm --gpus '"device=[device id(s)]"' -v .:/work agentlab:latest
(If you decide to use Ollama platform, type the following commands after starting the container)
(Start a screen session in order to start Ollama in another session)
screen -S ollama
ollama serve
(Press [Ctrl+a+d] to get out of the screen session)
ollama pull [ollama model name]
Execute the following command. As for [your config path]
, please refer to config.json.
python ai_lab_repo.py --config_path [your config path]
If you would like to do co-pilot mode, modify the provided config file. You can intervene any phase(s) you want. In order to do that, modify here.
Since local LLMs' capabilities are not on par with cloud LLMs' such as GPT-4o, adjusting temperature is crucial. As I have experienced several times during experiments with this repo, I have encountered so many errors especially when LLMs are dealing with writing code and paper. Oftentimes adjusting temperature for those phases would work well although intial setting of temperature would not. As I said, data preparation
, running experiments
, and report writing
phases are the most notorious ones! So, be patient, and conduct grid search or whatever you feel like. For reference, I have tried temperature from 0.0 to 1.0. It sometime worked and sometime not. So, see it for yourself! You can adjust each temperature here.
As far as I've experimented, I can say that performance of the same model coming from huggingface
platform is better than ollama
one. For example, Qwen/Qwen2.5-72B-Instruct
from huggingface
is better than qwen2.5:72b-instruct-fp16
(which presumably is the best and non-quantized Qwen2.5 model available from Ollama) from ollama
. For more details, what I meant here is that one model from ollama
does not follow given instructions where the same model from huggingface
correctly follows them. So, unless you have strict computational restrictions, I suggest you use models from huggingface
, preferably models as capable as (or even better than) Qwen/Qwen2.5-72B-Instruct
.
[Tip #3] π€ Qwen2.5-72B-Instruct for non-coding and DeepSeek-R1-Distill-Llama-70B for coding phases! π€
Also as far as I've experimented, Qwen2.5-72B-Instruct
follows given instructions very well if the phases are non-coding, but not so much for coding phases. On the other hand, DeepSeek-R1-Distill-Llama-70B
does good job when it comes to coding, but sometimes does not correctly follow non-coding instructions. Those are the things that I've found so far. So, if things don't work out for your case, please try this tip. By the way, the best configuration I've found when it comes to model selections, it is written here.
Writing extensive notes is important for helping your agent understand what you're looking to accomplish in your project, as well as any style preferences. Notes can include any experiments you want the agents to perform, providing API keys, certain plots or figures you want included, or anything you want the agent to know when performing research.
This is also your opportunity to let the agent know what compute resources it has access to, e.g. GPUs (how many, what type of GPU, how many GBs), CPUs (how many cores, what type of CPUs), storage limitations, and hardware specs.
In order to add notes, you must modify the task_notes_LLM structure inside of ai_lab_repo.py
.
When conducting research, the choice of model can significantly impact the quality of results. More powerful models tend to have higher accuracy, better reasoning capabilities, and better report generation. If computational resources allow, prioritize the use of advanced models such as Qwen2.5-72B-Instruct or similar state-of-the-art local LLMs.
However, itβs important to balance performance and cost-effectiveness. While powerful models may yield better results, they are often more expensive and time-consuming to run. Consider using them selectivelyβfor instance, for key experiments or final analysesβwhile relying on smaller, more efficient models for iterative tasks or initial prototyping.
When resources are limited, optimize by fine-tuning smaller models on your specific dataset or combining pre-trained models with task-specific prompts to achieve the desired balance between performance and computational efficiency.
If you lose progress, or if a subtask fails, you can always load from a previous state. All of your progress is saved by default in the state_saves
variable, which stores each individual checkpoint. Just set load_existing
as true
, which can be found here, and pass your saved state here when running ai_lab_repo.py
If you are running Agent Laboratory in a language other than English, no problem, just make sure to provide a language flag to the agents to perform research in your preferred language. Note that we have not extensively studied running Local Agent Laboratory in other languages, so be sure to report any problems you encounter. You can adjust the language here.
There is a lot of room to improve this codebase, so if you end up making changes and want to help the community, please feel free to share the changes you've made! We hope this tool helps you!
Source Code Licensing: This repository's source code is licensed under the MIT License. This license permits the use, modification, and distribution of the code, subject to certain conditions outlined in the MIT License.