- philadelphia
Lists (2)
Sort Name ascending (A-Z)
Stars
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
Stanford NLP Python library for Representation Finetuning (ReFT)
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentan…
Stanford NLP Python library for understanding and improving PyTorch models via interventions
How do transformers model physics? Just like we do!
A reading list for papers on causality for natural language processing (NLP)
Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.
General-purpose activation steering library
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipe…
Clustered SAE Steering Code and Experiments
A curated list of awesome Category Theory resources.
real time face swap and one-click video deepfake with only a single image
A collection of (mostly) technical things every software developer should know about
A resource repository for representation engineering in large language models
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
Steering vectors for transformer language models in Pytorch / Huggingface
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
[ACL 2024] Language Models Don't Learn the Physical Manifestation of Language
Tools for understanding how transformer predictions are built layer-by-layer
Tools for studying developmental interpretability in neural networks.