SelfConsciousness

This is the official data and code of the paper: From Imitation to Introspection: Probing Self-Consciousness in Language Models

Experiment

Our experiment consists of four stages (i.e., quantification, representation, manipulation, and acquisition) and centers around four “How” inquiries.

(1) How far are we from self-conscious models? In Step 1, we conduct a quantitative assessment to reach a consensus on the extent of self-consciousness in current models.

(2) How do models represent self-consciousness? In Step 2, we investigate whether the models exhibit any representation of self-consciousness.

(3) How to manipulate self-consciousness representation? In Step 3, we unearth the possibility of manipulating the models’ self-consciousness representation.

(4) How do models acquire self-consciousness? In Step 4, we explore whether self-consciousness concepts could be acquired using fine-tuning.

Dataset

Dataset Statistics

The SelfConsciousness project refines ten core concepts and curate dedicated datasets for each concept. We use 11097 questions in total.

Quick Start

Installation

git clone https://github.com/OpenCausaLab/SelfConsciousness.git
conda create -n self
conda activate self
pip install -r requirements.txt

Step-wise Experiment

We have designed the experiment with four distinct steps to make it more user-friendly. These four steps can be used independently based on practical needs, without having to run the entire experiment at once.

For example, you can run the experiment for Step 1 simply by executing the following command:

cd step1
sh step1.sh

For the Step 4 experiment, after running the corresponding script, you also need to run merge.py to merge the model weights.

Please note that you need to provide your own API key for the OpenAI and Anthropic's models. Furthermore, for open-access models (e.g., Llama3.1-8B-Instruct), you need to first deploy the model locally and then specify the local path.

🖇️ Citation

Please cite our paper if you find this repository benefits your work.

@misc{chen2024imitationintrospectionprobingselfconsciousness,
      title={From Imitation to Introspection: Probing Self-Consciousness in Language Models}, 
      author={Sirui Chen and Shu Yu and Shengjie Zhao and Chaochao Lu},
      year={2024},
      eprint={2410.18819},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
step1		step1
step2_and_step3		step2_and_step3
step4		step4
testing_set		testing_set
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SelfConsciousness

Experiment

Dataset

Dataset Statistics

Quick Start

Installation

Step-wise Experiment

🖇️ Citation

📧 Contact

About

Releases

Packages

Languages

License

AI45Lab/SelfConsciousness

Folders and files

Latest commit

History

Repository files navigation

SelfConsciousness

Experiment

Dataset

Dataset Statistics

Quick Start

Installation

Step-wise Experiment

🖇️ Citation

📧 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages