Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 1.14 KB

QueryVector.md

File metadata and controls

13 lines (9 loc) · 1.14 KB

Vector: Query vector

A query vector is a tokenized version of a query sequence for a given timeframe. This method is still work in progress. Focus has been on obfuscating the sequences to avoid re-identification of the client, which will limit the effectiveness of the analysis. The token space is deliberately small and tokens are generated by a hash function where collisions are to be expected. Any new tokens found are submitted with the vectors.

Data

# Name Type Required Comment
1 StartTime Timestamp yes Starting point for vector
2 Duration Integer yes Vector length in seconds
3 Vectors list<Bytestring> yes Vectors for all clients for the given time window. The vectors consist of tokens that are 32 bit long hashes of the word they represent
4 Wordlist delta list<Bytestring> yes Wordlist for all tokens not on the default list, ie the list of new words