We are thrilled to share that our paper has been accepted for publication! It is an exciting milestone, and we want to thank everyone involved for their hard work and support. Looking forward to seeing our research contribute to the field. Please check out our survey paper https://arxiv.org/abs/2310.08184
- Learn From Model Beyond Fine-Tuning: A Survey
-
[mikecaptain] Improving language understanding by generative pretraining
-
[arXiv] Better fine-tuning by reducing representational collapse
-
[ACM Computing Surveys] Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing
-
[arXiv] The power of scale for parameter-efficient prompt tuning
-
[ACM Computing Surveys] Recent advances in natural language processing via large pre-trained language models: A survey
-
[TMI] Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging
-
[ACL] Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
-
[CVPR] Robust fine-tuning of zero-shot models
-
[arXiv] Better fine-tuning by reducing representational collapse
-
[arXiv] Fine-tuning can distort pretrained features and underperform out-of-distribution
-
[CVPR] Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
-
[arXiv] Knowledge is a Region in Weight Space for Fine-tuned Language Models
-
[Nips] Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning
-
[AAAI] On the effectiveness of parameter-efficient fine-tuning
-
⭐ [nature] Parameter-efficient fine-tuning of large-scale pre-trained language models
-
[arXiv] Lightweight adapter tuning for multilingual speech translation
-
[arXiv] Multi-head adapter routing for cross-task generalization
-
[arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models
-
[arXiv] Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models
-
[arXiv] Adapterfusion: Non-destructive task composition for transfer learning
-
[arXiv] On the effectiveness of adapter-based tuning for pretrained language model adaptation
-
[arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models
-
[SLT] Exploring efficient-tuning methods in self-supervised speech models
-
[ICASSP] Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition
-
[arXiv] Learning to Prompt for Vision-Language Model
-
[arXiv] Prefix-tuning: Optimizing continuous prompts for generation
-
[ICLR] Progressive prompts: Continual learning for language models
-
[arXiv] Rlprompt: Optimizing discrete text prompts with reinforcement learning
-
[ICML] Black-Box Tuning for Language-Model-as-a-Service (BBTv1)
-
[EMNLP] BBTv2: Towards a gradient-free future with large language models
-
[arXiv] Gradient-regulated meta-prompt learning for generalizable vision-language models
-
[arXiv] Adversarial Prompting for Black Box Foundation Models
-
[ACM Computing Surveys] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
-
[arXiv] Scalable Prompt Generation for Semi-supervised Learning with Language Models
-
[arXiv] Dynamic Prompting: A Unified Framework for Prompt Tuning
-
[arXiv] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
-
[ICML] PromptBoosting: Black-Box Text Classification with Ten Forward Passes
-
[Arxiv] On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
-
[arXiv] Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery
-
[arXiv] Rethinking Efficient Tuning Methods from a Unified Perspective
-
[arXiv] Model-tuning Via Prompts Makes NLP Models Adversarially Robust
-
[arXiv] UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
-
[CVPR] Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
-
[COLING] Linguist: Language model instruction tuning to generate annotated utterances for intent classification and slot tagging
-
[arXiv] Visual Instruction Tuning
-
[arXiv] Gpt4roi: Instruction tuning large language model on region-of-interest
-
[ICML] The flan collection: Designing data and methods for effective instruction tuning
-
[arXiv] Instruction tuning with gpt-4
-
[Nips] Training language models to follow instructions with human feedback
-
[arXiv] Exploring the benefits of training expert language models over instruction tuning
-
[arXiv] Otter: A Multi-Modal Model with In-Context Instruction Tuning
-
[arXiv] Augmented Language Models: a Survey
-
[arXiv] Improving neural language models with a continuous cache
-
[arXiv] Generalization through memorization: Nearest neighbor language models
-
[Nips] Retrieval-augmented generation for knowledge-intensive nlp tasks
-
[arXiv] Few-shot learning with retrieval augmented language models
-
[arXiv] Replug: Retrieval-augmented black-box language models
- Loss function (retrieves KL divergence between likelihood and language model likelihood) :
$$\mathcal{L}=\frac{1}{|\mathcal{B}|} \sum_{x \in \mathcal{B}} K L\left(P_R(d \mid x) | Q_{\mathrm{LM}}(d \mid x, y)\right)$$
- Loss function (retrieves KL divergence between likelihood and language model likelihood) :
-
[arXiv] Murag: Multimodal retrieval-augmented generator for open question answering over images and text
-
[arXiv] Re-Imagen: Retrieval-Augmented Text-to-Image Generator
-
[openreview] Retrieval-Augmented Multimodal Language Modeling
-
[arXiv] A Survey on Retrieval-Augmented Text Generation
-
[IJCV] Knowledge distillation: A survey
- [arXiv] Data-Free Knowledge Transfer: A Survey
-
[arXiv] Data-free knowledge distillation for deep neural networks
-
[CVPR] Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
-
[arXiv] Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!
-
[arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey
-
[arXiv] Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
-
[arXiv] Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
-
[CVPR] Generic-to-Specific Distillation of Masked Autoencoders
- [arXiv] Deep Classifier Mimicry without Data Access
-
[Multiple Classifier Systems] Ensemble methods in machine learning
-
⭐[arXiv] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
-
⭐[arXiv] Tangent Model Composition for Ensembling and Continual Fine-tuning
- ⭐ [arXiv] Deep Model Fusion: A Survey
- ⭐ [arXiv] A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
- ⭐ [arXiv] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. Arxiv, 2024.
- [arXiv] FusionBench: A Comprehensive Benchmark of Deep Model Fusion
- [arXiv] SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
- ⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis
- [openreview] convexity and linear mode connectivity in neural networks
- ⭐ [ICML] Model soups: averaging weights of multiple finetuned models improves accuracy without increasing inference time
- [arXiv] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
- [arXiv] Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models
- [ICLR] Editing models with task arithmetic
- [arXiv] How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging
- [arXiv] LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
- [arXiv] Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation
- [arXiv] Knowledge Composition using Task Vectors with Learned Anisotropic Scaling
- [arXiv] Evolutionary optimization of model merging recipes
- [ICLR 2024] AdaMerging: Adaptive Model Merging for Multi-Task Learning
- [NeurIPS 2022] Merging Models with Fisher-Weighted Averaging
- [arXiv] Rethinking Weight-Averaged Model-merging
- ⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis
- [openreview] On convexity and linear mode connectivity in neural networks
- ⭐ [arXiv] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
- [arXiv] Git Re-Basin: Merging Models modulo Permutation Symmetries
- [arXiv] George Stoica et al. ZipIt! Merging Models from Different Tasks without Training
- [arXiv] Dataless Knowledge Fusion by Merging Weights of Language Models
- [arXiv]AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
- [ICML2024] Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
- [arXiv] Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
- [NeurIPS 2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
- [arXiv] SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
- [arXiv] Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
- [arXiv] Arcee's MergeKit: A Toolkit for Merging Large Language Models
- [arXiv] FusionBench: A Comprehensive Benchmark of Deep Model Fusion
- [arXiv] Realistic Evaluation of Model Merging for Compositional Generalization
- [arXiv] What Matters for Model Merging at Scale?
-
[TCSVT] Progressive meta-learning with curriculum
-
[CVPR] Metafscil: A meta-learning approach for few-shot class incremental learning.
-
[ICML] Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
-
[UAI] Meta-learning without data via Wasserstein distributionally-robust model fusion
-
[CVPR] Meta-Learning for Multi-Label Few-Shot Classification
-
[ECCV] Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
-
[CVPR] Learning To Learn and Remember Super Long Multi-Domain Task Sequence
-
[ICSE] Cross-domain deep code search with meta learning
-
[arXiv] Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
-
[CVPR] Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning
-
[arXiv] FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?
-
[AAAI] Training Meta-Surrogate Model for Transferable Adversarial Attack
-
[SP] D-DAE: Defense-Penetrating Model Extraction Attacks
-
[Neurocomputing] MGML: Momentum group meta-learning for few-shot image classification
-
[ICRA] Meta-Learning-Based Optimal Control for Soft Robotic Manipulators to Interact with Unknown Environments
-
[arXiv] Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
-
[Neuromorphic Computing and Engineering] Meta-learning spiking neural networks with surrogate gradient descent
-
[PMLR] The Role of Deconfounding in Meta-learning
-
[ITSP] Distributed Reptile Algorithm for Meta-Learning Over Multi-Agent Systems
-
[Nips] Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations
-
[Computers & Graphics] An overview on Meta-learning approaches for Few-shot Weakly-supervised Segmentation
- [ICRA] Editing Large Language Models: Problems, Methods, and Opportunities
- [EMNLP] Memory-assisted prompt editing to improve GPT-3 after deployment
-
[arXiv] Transformer-Patcher: One Mistake worth One Neuron
-
[arXiv] Calibrating Factual Knowledge in Pretrained Language Models
-
[arXiv] Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
-
[arXiv] Fixing Model Bugs with Natural Language Patches
-
[arXiv] Modifying Memories in Transformer Models
-
[arXiv] Mass-Editing Memory in a Transformer
-
[Nips] Locating and Editing Factual Associations in GPT
-
[arXiv] Rank-One Editing of Encoder-Decoder Models
-
[arXiv] Prompt-Based Editing for Text Style Transfer
-
[CVPR]Conditional Text Image Generation With Diffusion Models
-
[arXiv] Crawling the Internal Knowledge-Base of Language Models
-
[arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey
If you find this repository useful, please consider citing this paper:
@article{zheng2023learn,
title={Learn From Model Beyond Fine-Tuning: A Survey},
author={Zheng, Hongling and Shen, Li and Tang, Anke and Luo, Yong and Hu, Han and Du, Bo and Tao, Dacheng},
journal={arXiv preprint arXiv:2310.08184},
year={2023}
}