Inquiry About Neuron Activation Processing #8

zlkqz · 2025-01-22T07:45:02Z

I hope this message finds you well. I am reaching out to share my experience using the tool available at [https://monitor.transluce.org/dashboard/chat ] and to seek your assistance regarding some challenges I have encountered during my own implementation.

I have found the neuron searching through the activation mode on your platform to be extremely helpful. However, when I attempted to replicate this using my own code, I faced difficulties in obtaining the corresponding neuron activations. Despite using the same input text, the activation values I obtained were inconsistent with those produced by your tool.

After seeing the code you have published on GitHub, I noticed that you apply a straightforward approach of taking the maximum and minimum 1e-4 quantile of the activation values.(is that so?)

I implemented the same method in my code, yet I am unable to find the corresponding neurons in my outputs.

One observation I made is that the highest neuron activations in my results tend to concentrate in the last few layers of the model. The top 5 neuron activations per layer are as follows:

[0.271484375, 0.1962890625, 0.11572265625, 0.0712890625, 0.06884765625, 0.06689453125, 0.06201171875, 0.0556640625, 0.052490234375, 0.05029296875]
1 torch.Size([14336]) torch.Size([14336])
Layer 1
[0.1513671875, 0.107421875, 0.06591796875, 0.064453125, 0.064453125, 0.061767578125, 0.050537109375, 0.049560546875, 0.046875, 0.0439453125]
2 torch.Size([14336]) torch.Size([14336])
Layer 2
[0.392578125, 0.216796875, 0.1474609375, 0.1220703125, 0.12060546875, 0.12060546875, 0.11767578125, 0.10205078125, 0.07421875, 0.0732421875]
3 torch.Size([14336]) torch.Size([14336])
Layer 3
[0.416015625, 0.26953125, 0.216796875, 0.2001953125, 0.1953125, 0.1884765625, 0.1494140625, 0.1474609375, 0.134765625, 0.130859375]
4 torch.Size([14336]) torch.Size([14336])
Layer 4
[0.62109375, 0.283203125, 0.2431640625, 0.2177734375, 0.181640625, 0.1689453125, 0.16796875, 0.158203125, 0.142578125, 0.1318359375]
5 torch.Size([14336]) torch.Size([14336])
Layer 5
[0.494140625, 0.279296875, 0.263671875, 0.240234375, 0.1923828125, 0.19140625, 0.1884765625, 0.1865234375, 0.177734375, 0.17578125]
6 torch.Size([14336]) torch.Size([14336])
Layer 6
[1.28125, 0.244140625, 0.216796875, 0.20703125, 0.2060546875, 0.19921875, 0.193359375, 0.1884765625, 0.181640625, 0.177734375]
7 torch.Size([14336]) torch.Size([14336])
Layer 7
[0.33984375, 0.298828125, 0.298828125, 0.2890625, 0.2734375, 0.2421875, 0.2314453125, 0.21875, 0.21484375, 0.2138671875]
8 torch.Size([14336]) torch.Size([14336])
Layer 8
[0.515625, 0.453125, 0.443359375, 0.345703125, 0.345703125, 0.32421875, 0.318359375, 0.29296875, 0.287109375, 0.26953125]
9 torch.Size([14336]) torch.Size([14336])
Layer 9
[0.55859375, 0.53125, 0.384765625, 0.30078125, 0.296875, 0.296875, 0.28515625, 0.27734375, 0.263671875, 0.2578125]
10 torch.Size([14336]) torch.Size([14336])
Layer 10
[1.359375, 0.60546875, 0.3515625, 0.306640625, 0.3046875, 0.30078125, 0.251953125, 0.251953125, 0.2470703125, 0.24609375]
11 torch.Size([14336]) torch.Size([14336])
Layer 11
[0.7265625, 0.4609375, 0.435546875, 0.380859375, 0.376953125, 0.333984375, 0.333984375, 0.27734375, 0.267578125, 0.259765625]
12 torch.Size([14336]) torch.Size([14336])
Layer 12
[0.69140625, 0.5625, 0.4609375, 0.359375, 0.341796875, 0.32421875, 0.298828125, 0.296875, 0.255859375, 0.2431640625]
13 torch.Size([14336]) torch.Size([14336])
Layer 13
[1.1015625, 0.73046875, 0.466796875, 0.3515625, 0.3515625, 0.318359375, 0.28515625, 0.28515625, 0.283203125, 0.263671875]
14 torch.Size([14336]) torch.Size([14336])
Layer 14
[1.6640625, 0.69140625, 0.38671875, 0.35546875, 0.330078125, 0.306640625, 0.294921875, 0.28125, 0.27734375, 0.2734375]
15 torch.Size([14336]) torch.Size([14336])
Layer 15
[0.62890625, 0.46484375, 0.4296875, 0.388671875, 0.38671875, 0.373046875, 0.296875, 0.2890625, 0.287109375, 0.27734375]
16 torch.Size([14336]) torch.Size([14336])
Layer 16
[0.76171875, 0.69921875, 0.65234375, 0.5703125, 0.546875, 0.53515625, 0.53125, 0.439453125, 0.421875, 0.404296875]
17 torch.Size([14336]) torch.Size([14336])
Layer 17
[1.7890625, 0.9453125, 0.77734375, 0.77734375, 0.734375, 0.68359375, 0.431640625, 0.400390625, 0.3984375, 0.388671875]
18 torch.Size([14336]) torch.Size([14336])
Layer 18
[0.7734375, 0.7109375, 0.70703125, 0.66015625, 0.58984375, 0.5703125, 0.48828125, 0.4609375, 0.421875, 0.408203125]
19 torch.Size([14336]) torch.Size([14336])
Layer 19
[1.6171875, 1.15625, 0.96875, 0.5546875, 0.53125, 0.4609375, 0.458984375, 0.41796875, 0.39453125, 0.36328125]
20 torch.Size([14336]) torch.Size([14336])
Layer 20
[1.109375, 0.77734375, 0.7421875, 0.4453125, 0.404296875, 0.375, 0.341796875, 0.33984375, 0.33203125, 0.310546875]
21 torch.Size([14336]) torch.Size([14336])
Layer 21
[0.84765625, 0.7421875, 0.5859375, 0.546875, 0.5, 0.486328125, 0.44921875, 0.404296875, 0.37890625, 0.32421875]
22 torch.Size([14336]) torch.Size([14336])
Layer 22
[0.95703125, 0.9140625, 0.7890625, 0.73046875, 0.59375, 0.51171875, 0.484375, 0.46875, 0.466796875, 0.46484375]
23 torch.Size([14336]) torch.Size([14336])
Layer 23
[0.78125, 0.609375, 0.6015625, 0.51953125, 0.41796875, 0.416015625, 0.3828125, 0.369140625, 0.3671875, 0.35546875]
24 torch.Size([14336]) torch.Size([14336])
Layer 24
[3.625, 2.078125, 1.2265625, 1.0078125, 0.98046875, 0.81640625, 0.7421875, 0.609375, 0.5234375, 0.5078125]
25 torch.Size([14336]) torch.Size([14336])
Layer 25
[3.25, 1.09375, 0.66015625, 0.65625, 0.625, 0.58984375, 0.5703125, 0.55859375, 0.55078125, 0.50390625]
26 torch.Size([14336]) torch.Size([14336])
Layer 26
[1.2578125, 1.1484375, 0.91015625, 0.78515625, 0.60546875, 0.5546875, 0.51171875, 0.431640625, 0.423828125, 0.392578125]
27 torch.Size([14336]) torch.Size([14336])
Layer 27
[1.5078125, 0.92578125, 0.75, 0.63671875, 0.6328125, 0.5703125, 0.55859375, 0.55859375, 0.5078125, 0.50390625]
28 torch.Size([14336]) torch.Size([14336])
Layer 28
[1.9140625, 1.890625, 1.59375, 1.515625, 1.46875, 1.4609375, 1.2421875, 0.97265625, 0.90625, 0.84375]
29 torch.Size([14336]) torch.Size([14336])
Layer 29
[2.8125, 2.25, 1.9609375, 1.6875, 1.6640625, 1.3125, 1.2109375, 1.1171875, 1.109375, 0.9921875]
30 torch.Size([14336]) torch.Size([14336])
Layer 30
[7.90625, 2.96875, 2.75, 2.640625, 2.546875, 2.28125, 2.28125, 2.015625, 1.84375, 1.8203125]
31 torch.Size([14336]) torch.Size([14336])
Layer 31
[7.34375, 5.90625, 5.90625, 5.84375, 5.71875, 5.59375, 4.90625, 4.71875, 4.5, 4.375]

This pattern may be indicative of an issue within my code. As such, I would like to inquire whether there are any additional processing steps performed on the neuron activations in your implementation. For instance, do you apply any normalization or other techniques that might affect the activation values?

Your guidance on this matter would be greatly appreciated as it would significantly aid my understanding and facilitate the accurate replication of your results.

Thank you for your time and assistance. I look forward to your response.

The text was updated successfully, but these errors were encountered:

kmeng01 · 2025-01-22T08:31:45Z

Hi, thanks for reaching out!

We do indeed normalize activations by their top or bottom quantile; see here for where it's done on the frontend. The quantiles are stored in the database in the quantiles table.

(We're looking for neurons that fire highly relative to their own distribution, not in terms of absolute magnitude.)

kmeng01 · 2025-01-22T08:36:48Z

I noticed that you apply a straightforward approach of taking the maximum and minimum 1e-4 quantile of the activation values.(is that so?)

Not quite - each value is the activation of a neuron on the corresponding token in that specific prompt. 1e-4 means that we filter down to neurons that fired at least as positively as the top 1e-4 quantile or at least as negatively as 1e-4 quantile. This gets rid of neurons that were not highly active, relative to their typical behavior.

zlkqz · 2025-01-22T09:20:49Z

@kmeng01 Thank you very much! I know how to normalize now. Let me double check if my idea is correct:

First of all, quantiles are the quantiles of each (layer, neuron) obtained through a large number of samples when obtaining the description.
And in the backend, return to the frontend all (layer, neuron) that exceed the quantile value (1e-4, both top && bottom)
Then divide each activation by the corresponding quantile value(1e-5) on the front end, sort and display by the value after normalization.

And one more question:

What does the argument "is_interesting" mean?

And did you do normalization in attribution mode?

tzengtif assigned kmeng01 Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry About Neuron Activation Processing #8

Inquiry About Neuron Activation Processing #8

zlkqz commented Jan 22, 2025 •

edited

Loading

kmeng01 commented Jan 22, 2025 •

edited

Loading

kmeng01 commented Jan 22, 2025

zlkqz commented Jan 22, 2025 •

edited

Loading

Inquiry About Neuron Activation Processing #8

Inquiry About Neuron Activation Processing #8

Comments

zlkqz commented Jan 22, 2025 • edited Loading

kmeng01 commented Jan 22, 2025 • edited Loading

kmeng01 commented Jan 22, 2025

zlkqz commented Jan 22, 2025 • edited Loading

zlkqz commented Jan 22, 2025 •

edited

Loading

kmeng01 commented Jan 22, 2025 •

edited

Loading

zlkqz commented Jan 22, 2025 •

edited

Loading