Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry About Neuron Activation Processing #8

Open
zlkqz opened this issue Jan 22, 2025 · 3 comments
Open

Inquiry About Neuron Activation Processing #8

zlkqz opened this issue Jan 22, 2025 · 3 comments
Assignees

Comments

@zlkqz
Copy link

zlkqz commented Jan 22, 2025

I hope this message finds you well. I am reaching out to share my experience using the tool available at [https://monitor.transluce.org/dashboard/chat ] and to seek your assistance regarding some challenges I have encountered during my own implementation.

I have found the neuron searching through the activation mode on your platform to be extremely helpful. However, when I attempted to replicate this using my own code, I faced difficulties in obtaining the corresponding neuron activations. Despite using the same input text, the activation values I obtained were inconsistent with those produced by your tool.

After seeing the code you have published on GitHub, I noticed that you apply a straightforward approach of taking the maximum and minimum 1e-4 quantile of the activation values.(is that so?)

Image

I implemented the same method in my code, yet I am unable to find the corresponding neurons in my outputs.

One observation I made is that the highest neuron activations in my results tend to concentrate in the last few layers of the model. The top 5 neuron activations per layer are as follows:

[0.271484375, 0.1962890625, 0.11572265625, 0.0712890625, 0.06884765625, 0.06689453125, 0.06201171875, 0.0556640625, 0.052490234375, 0.05029296875]
1 torch.Size([14336]) torch.Size([14336])
Layer 1
[0.1513671875, 0.107421875, 0.06591796875, 0.064453125, 0.064453125, 0.061767578125, 0.050537109375, 0.049560546875, 0.046875, 0.0439453125]
2 torch.Size([14336]) torch.Size([14336])
Layer 2
[0.392578125, 0.216796875, 0.1474609375, 0.1220703125, 0.12060546875, 0.12060546875, 0.11767578125, 0.10205078125, 0.07421875, 0.0732421875]
3 torch.Size([14336]) torch.Size([14336])
Layer 3
[0.416015625, 0.26953125, 0.216796875, 0.2001953125, 0.1953125, 0.1884765625, 0.1494140625, 0.1474609375, 0.134765625, 0.130859375]
4 torch.Size([14336]) torch.Size([14336])
Layer 4
[0.62109375, 0.283203125, 0.2431640625, 0.2177734375, 0.181640625, 0.1689453125, 0.16796875, 0.158203125, 0.142578125, 0.1318359375]
5 torch.Size([14336]) torch.Size([14336])
Layer 5
[0.494140625, 0.279296875, 0.263671875, 0.240234375, 0.1923828125, 0.19140625, 0.1884765625, 0.1865234375, 0.177734375, 0.17578125]
6 torch.Size([14336]) torch.Size([14336])
Layer 6
[1.28125, 0.244140625, 0.216796875, 0.20703125, 0.2060546875, 0.19921875, 0.193359375, 0.1884765625, 0.181640625, 0.177734375]
7 torch.Size([14336]) torch.Size([14336])
Layer 7
[0.33984375, 0.298828125, 0.298828125, 0.2890625, 0.2734375, 0.2421875, 0.2314453125, 0.21875, 0.21484375, 0.2138671875]
8 torch.Size([14336]) torch.Size([14336])
Layer 8
[0.515625, 0.453125, 0.443359375, 0.345703125, 0.345703125, 0.32421875, 0.318359375, 0.29296875, 0.287109375, 0.26953125]
9 torch.Size([14336]) torch.Size([14336])
Layer 9
[0.55859375, 0.53125, 0.384765625, 0.30078125, 0.296875, 0.296875, 0.28515625, 0.27734375, 0.263671875, 0.2578125]
10 torch.Size([14336]) torch.Size([14336])
Layer 10
[1.359375, 0.60546875, 0.3515625, 0.306640625, 0.3046875, 0.30078125, 0.251953125, 0.251953125, 0.2470703125, 0.24609375]
11 torch.Size([14336]) torch.Size([14336])
Layer 11
[0.7265625, 0.4609375, 0.435546875, 0.380859375, 0.376953125, 0.333984375, 0.333984375, 0.27734375, 0.267578125, 0.259765625]
12 torch.Size([14336]) torch.Size([14336])
Layer 12
[0.69140625, 0.5625, 0.4609375, 0.359375, 0.341796875, 0.32421875, 0.298828125, 0.296875, 0.255859375, 0.2431640625]
13 torch.Size([14336]) torch.Size([14336])
Layer 13
[1.1015625, 0.73046875, 0.466796875, 0.3515625, 0.3515625, 0.318359375, 0.28515625, 0.28515625, 0.283203125, 0.263671875]
14 torch.Size([14336]) torch.Size([14336])
Layer 14
[1.6640625, 0.69140625, 0.38671875, 0.35546875, 0.330078125, 0.306640625, 0.294921875, 0.28125, 0.27734375, 0.2734375]
15 torch.Size([14336]) torch.Size([14336])
Layer 15
[0.62890625, 0.46484375, 0.4296875, 0.388671875, 0.38671875, 0.373046875, 0.296875, 0.2890625, 0.287109375, 0.27734375]
16 torch.Size([14336]) torch.Size([14336])
Layer 16
[0.76171875, 0.69921875, 0.65234375, 0.5703125, 0.546875, 0.53515625, 0.53125, 0.439453125, 0.421875, 0.404296875]
17 torch.Size([14336]) torch.Size([14336])
Layer 17
[1.7890625, 0.9453125, 0.77734375, 0.77734375, 0.734375, 0.68359375, 0.431640625, 0.400390625, 0.3984375, 0.388671875]
18 torch.Size([14336]) torch.Size([14336])
Layer 18
[0.7734375, 0.7109375, 0.70703125, 0.66015625, 0.58984375, 0.5703125, 0.48828125, 0.4609375, 0.421875, 0.408203125]
19 torch.Size([14336]) torch.Size([14336])
Layer 19
[1.6171875, 1.15625, 0.96875, 0.5546875, 0.53125, 0.4609375, 0.458984375, 0.41796875, 0.39453125, 0.36328125]
20 torch.Size([14336]) torch.Size([14336])
Layer 20
[1.109375, 0.77734375, 0.7421875, 0.4453125, 0.404296875, 0.375, 0.341796875, 0.33984375, 0.33203125, 0.310546875]
21 torch.Size([14336]) torch.Size([14336])
Layer 21
[0.84765625, 0.7421875, 0.5859375, 0.546875, 0.5, 0.486328125, 0.44921875, 0.404296875, 0.37890625, 0.32421875]
22 torch.Size([14336]) torch.Size([14336])
Layer 22
[0.95703125, 0.9140625, 0.7890625, 0.73046875, 0.59375, 0.51171875, 0.484375, 0.46875, 0.466796875, 0.46484375]
23 torch.Size([14336]) torch.Size([14336])
Layer 23
[0.78125, 0.609375, 0.6015625, 0.51953125, 0.41796875, 0.416015625, 0.3828125, 0.369140625, 0.3671875, 0.35546875]
24 torch.Size([14336]) torch.Size([14336])
Layer 24
[3.625, 2.078125, 1.2265625, 1.0078125, 0.98046875, 0.81640625, 0.7421875, 0.609375, 0.5234375, 0.5078125]
25 torch.Size([14336]) torch.Size([14336])
Layer 25
[3.25, 1.09375, 0.66015625, 0.65625, 0.625, 0.58984375, 0.5703125, 0.55859375, 0.55078125, 0.50390625]
26 torch.Size([14336]) torch.Size([14336])
Layer 26
[1.2578125, 1.1484375, 0.91015625, 0.78515625, 0.60546875, 0.5546875, 0.51171875, 0.431640625, 0.423828125, 0.392578125]
27 torch.Size([14336]) torch.Size([14336])
Layer 27
[1.5078125, 0.92578125, 0.75, 0.63671875, 0.6328125, 0.5703125, 0.55859375, 0.55859375, 0.5078125, 0.50390625]
28 torch.Size([14336]) torch.Size([14336])
Layer 28
[1.9140625, 1.890625, 1.59375, 1.515625, 1.46875, 1.4609375, 1.2421875, 0.97265625, 0.90625, 0.84375]
29 torch.Size([14336]) torch.Size([14336])
Layer 29
[2.8125, 2.25, 1.9609375, 1.6875, 1.6640625, 1.3125, 1.2109375, 1.1171875, 1.109375, 0.9921875]
30 torch.Size([14336]) torch.Size([14336])
Layer 30
[7.90625, 2.96875, 2.75, 2.640625, 2.546875, 2.28125, 2.28125, 2.015625, 1.84375, 1.8203125]
31 torch.Size([14336]) torch.Size([14336])
Layer 31
[7.34375, 5.90625, 5.90625, 5.84375, 5.71875, 5.59375, 4.90625, 4.71875, 4.5, 4.375]

This pattern may be indicative of an issue within my code. As such, I would like to inquire whether there are any additional processing steps performed on the neuron activations in your implementation. For instance, do you apply any normalization or other techniques that might affect the activation values?

Your guidance on this matter would be greatly appreciated as it would significantly aid my understanding and facilitate the accurate replication of your results.

Thank you for your time and assistance. I look forward to your response.

@kmeng01
Copy link
Member

kmeng01 commented Jan 22, 2025

Hi, thanks for reaching out!

We do indeed normalize activations by their top or bottom quantile; see here for where it's done on the frontend. The quantiles are stored in the database in the quantiles table.

(We're looking for neurons that fire highly relative to their own distribution, not in terms of absolute magnitude.)

@kmeng01
Copy link
Member

kmeng01 commented Jan 22, 2025

I noticed that you apply a straightforward approach of taking the maximum and minimum 1e-4 quantile of the activation values.(is that so?)

Not quite - each value is the activation of a neuron on the corresponding token in that specific prompt. 1e-4 means that we filter down to neurons that fired at least as positively as the top 1e-4 quantile or at least as negatively as 1e-4 quantile. This gets rid of neurons that were not highly active, relative to their typical behavior.

@zlkqz
Copy link
Author

zlkqz commented Jan 22, 2025

@kmeng01 Thank you very much! I know how to normalize now. Let me double check if my idea is correct:

  1. First of all, quantiles are the quantiles of each (layer, neuron) obtained through a large number of samples when obtaining the description.
  2. And in the backend, return to the frontend all (layer, neuron) that exceed the quantile value (1e-4, both top && bottom)
  3. Then divide each activation by the corresponding quantile value(1e-5) on the front end, sort and display by the value after normalization.

And one more question:

Image
What does the argument "is_interesting" mean?

And did you do normalization in attribution mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants