A query vector is a tokenized version of a query sequence for a given timeframe. This method is still work in progress. Focus has been on obfuscating the sequences to avoid re-identification of the client, which will limit the effectiveness of the analysis. The token space is deliberately small and tokens are generated by a hash function where collisions are to be expected. Any new tokens found are submitted with the vectors.
# | Name | Type | Required | Comment |
---|---|---|---|---|
1 | StartTime | Timestamp | yes | Starting point for vector |
2 | Duration | Integer | yes | Vector length in seconds |
3 | Vectors | list<Bytestring> | yes | Vectors for all clients for the given time window. The vectors consist of tokens that are 32 bit long hashes of the word they represent |
4 | Wordlist delta | list<Bytestring> | yes | Wordlist for all tokens not on the default list, ie the list of new words |