Skip to content

medialab/DSA-facebook-ads

Repository files navigation

DSA Ads data

Data source and documentation:

https://github.com/Lejo1/facebook_ad_library

Data conversion and preparation:

Install bsondump:

wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-amazon2-x86_64-100.10.0.tgz
tar -xzvf mongodb-database-tools-amazon2-x86_64-100.10.0.tgz
export PATH="$(pwd)/mongodb-database-tools-amazon2-x86_64-100.10.0/bin:$PATH"

Convert bson.gz files to json:

zcat ads.bson.gz | bsondump -vvvvv --type=json | gzip > ads.json.gz

Install python dependencies:

(Ideally in a dedicated Python environment)

pip install -r requirements.txt

Analyze keys structure of the json:

python analyze_data_structure.py ads.json.gz

Or add an extra integer argument to run only on first N rows, for instance:

python analyze_data_structure.py ads.json.gz 1000000

Returns 2 files (possibly named ads_first_N_lines when relevant):

Convert json to csv:

python convert_json_to_csv.py ads.json.gz | gzip > ads.csv.gz

Or add an extra integer argument to run only on first N rows, for instance:

python convert_json_to_csv.py ads.json.gz 1000000 > ads_first_1000000_lines.csv

Filter french lines:

xan search -s languages "fr" ads.csv.gz | gzip > ads-fr.csv.gz

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages