-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new algorithm for systematic sampling #34
base: master
Are you sure you want to change the base?
Conversation
lib/split/experiment.rb
Outdated
@@ -17,6 +18,7 @@ class Experiment | |||
DEFAULT_OPTIONS = { | |||
:resettable => true, | |||
:retain_user_alternatives_after_reset => false, | |||
:cohorting_block => ["control", "alternative"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a default value of ["control", "alternative"] here, we can set this to whatever we want. The problem currently is that if someone is using alternatives other than 'control' and 'alternative' the default won't work and they will have to override it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure how I feel about storing specific metadata for an algorithm at the experiment level without a more generic way of handling/storing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question and comment.
Additionally, if we are storing a single block or segment and using it to determine the cohort for each upcoming chunk based on participant size, wouldn't that make the algorithm deterministic as opposed to random? Without being given the block I would technically be able to determine the "next" cohorted account/user after so many tries as a pattern clearly would have emerged. When you have a block size of 2 we are essentially toggling back and forth between the control and alternative.
I would imagine a better approach would be to configure a block size and allow split to keep track of the "current" block and then proceed to generate new ones as blocks are "filled".
Just playing around with random ideas I imagine we can achieve the same effect with the existing Split block randomization by creating multiple alternatives ie: 5 control and 5 alternative for example and toggling the experience based off whether its in one of the alternative groups. With that said this would look pretty ugly in our dashboards and analysis as this would likely mean we'll have to aggregate the data.
lib/split/experiment.rb
Outdated
@@ -17,6 +18,7 @@ class Experiment | |||
DEFAULT_OPTIONS = { | |||
:resettable => true, | |||
:retain_user_alternatives_after_reset => false, | |||
:cohorting_block => ["control", "alternative"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure how I feel about storing specific metadata for an algorithm at the experiment level without a more generic way of handling/storing it.
def self.choose_alternative(experiment) | ||
block = experiment.cohorting_block | ||
raise ArgumentError, "Experiment configuration is missing cohorting_block array" unless block | ||
index = experiment.participant_count % block.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Is there any concerns/considerations that need to be taken here with experiment.participant_count and concurrency/race conditions ~ when say 2 accounts are being cohorted at the same time and both retrieve the same participant_count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thats definitely a concern, the participant count is incremented in choose!()
after this algorithm returns a value. I think the solution to this would be to store our own counter field in redis, then at the beginning of the algoritihm we can call incrby
on that values and use the return value from it as the index
Reply to @robin-phung 's comment
I think thats a valid concern. The idea was that if the block size is large enough then in a situation where, for example, every other account was being cohorted the block size would counteract that and keep results even between cohorts. But I don't know if that's a great solution because for example with a block of:
That could work. In that could we would need some sort of mutex lock when one of the accounts is generating a new block, OR we could seed the random generation of new blocks 🤔 We could also make an algorithm basically the same as |
module Algorithms | ||
module SystematicSampling | ||
def self.choose_alternative(experiment) | ||
count = experiment.next_cohorting_block_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Redis call to increment a counter that keeps track of the number of users that have been run through this algorithm so far.
It will return an integer from Redis that will be the next value of the counter, because this value is supplied by Redis there shouldn't be any race conditions where multiple users have the same count
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_cohorting_block_index
is more like a cohort count? The index
leads me to a misunderstanding, it's the index number in a block, which shouldn't be larger than the size of the block.
@@ -17,6 +19,7 @@ class Experiment | |||
DEFAULT_OPTIONS = { | |||
:resettable => true, | |||
:retain_user_alternatives_after_reset => false, | |||
:cohorting_block_magnitude => 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of block length, I chose to use cohorting_block_mangitude
which is the number of times each alternative repeats in a single block.
e.g cohorting_block_magnitude = 3
would produce blocks of length 6 like: [control, alt, control, control, alt, alt]
I did this because since there are two (or more) alternatives some block lengths would be invalid (e.g 5) since the blocks have to be balanced between alternatives
lib/split/experiment.rb
Outdated
@@ -43,6 +46,11 @@ def set_alternatives_and_options(options) | |||
self.metadata = options_with_defaults[:metadata] | |||
self.friendly_name = options_with_defaults[:friendly_name] || @name | |||
self.retain_user_alternatives_after_reset = options_with_defaults[:retain_user_alternatives_after_reset] | |||
|
|||
if self.algorithm == Split::Algorithms::SystematicSampling | |||
self.cohorting_block_seed = options_with_defaults[:cohorting_block_seed] || self.name.to_i(36) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seed is used so any instance of Rails can generate the blocks and they are all the same
Default seed if none is specified in YAML is based on the name of the experiment: self.name.to_i(36)
I think something like:
Digest::MD5.hexdigest('exp893_solo').to_i(16)
=> 214402640715795185602270768195163573728
would be better, but I wasn't sure if importing Digest module would be overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to point out, to_i
doesn't support special characters, and it may also end up with a really long integer. How about something like https://apidock.com/ruby/v2_5_5/String/sum ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is perfect for our use case, thanks!
block_length = experiment.cohorting_block_magnitude * experiment.alternatives.length | ||
block_num, index = count.divmod block_length | ||
|
||
r = Random.new(block_num + experiment.cohorting_block_seed) | ||
block = (experiment.alternatives*experiment.cohorting_block_magnitude).shuffle(random: r) | ||
block[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we do here to create any block with equal # of each alternative:
take the list of alternatives:
['red', 'blue', 'green']
repeat it cohorting_block_magnitude
times:
['red', 'blue', 'green', 'red', 'blue', 'green', 'red', 'blue', 'green']
shuffle it (seeded by cohorting_block_seed
and the current block #)
['green', 'red', 'blue', 'red', 'blue', 'green', 'red', 'green', 'blue']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
results[Split::Algorithms::SystematicSampling.choose_alternative(experiment).name] += 1 | ||
end | ||
|
||
expect(experiment.cohorting_block_magnitude * experiment.alternatives.length).to eq(6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to make sure the choose_alternative
won't change the value of cohorting_block_magnitude
? I didn't quite get what you want to secure here...
end | ||
|
||
let(:seeded_experiment2) do | ||
Split::Experiment.new('link_highlight', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line for exp_name would be easier to compare to the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QA Passed 🚀 |
Kyle has requested a new algorithm that can reduce the difference in participant counts between each alternative.
From the original ticket: (https://github.com/clio/themis/issues/88629)
See more information about the feature request here: https://github.com/clio/themis/issues/88629
Test Plan
gem 'split', path: "../../clio/split", require: 'split/dashboard'
)experiments.yml
and add the following experiments:bundle install
rails s
exp999
to the board from the dropdown and start ithttp://localhost:3000/test/index?user=1&test=true&experiment=exp999
http://localhost:3000/test/index?user=2&test=true&experiment=exp999
http://localhost:3000/test/index?user=3&test=true&experiment=exp999
http://localhost:3000/test/index?user=4&test=true&experiment=exp999
http://localhost:3000/test/index?user=5&test=true&experiment=exp999
http://localhost:3000/test/index?user=5&test=true&experiment=exp999
exp888
to the board from the dropdown and start ithttp://localhost:3000/test/index?user=10&test=true&experiment=exp888
http://localhost:3000/test/index?user=11&test=true&experiment=exp888
http://localhost:3000/split/
and verify that there is one user in 'control' and one user in 'alternative'http://localhost:3000/test/index?user=12&test=true&experiment=exp888
http://localhost:3000/test/index?user=13&test=true&experiment=exp888
http://localhost:3000/split/
and verify that there are two users in 'control' and two in 'alternative'http://localhost:3000/test/index?user=14&test=true&experiment=exp888
exp888
http://localhost:3000/test/index?user=100&test=true&experiment=exp888
http://localhost:3000/test/index?user=200&test=true&experiment=exp888
http://localhost:3000/split/
and verify that there is now one user in 'control' and one in 'alternative'Tips
If ruby is 3.0.1 is not installed, run:
If redis is not installed, run: