From fc3fb2eeb3b4b0c352f4dff622440b5c13100312 Mon Sep 17 00:00:00 2001
From: Daniel Paleka <danepale@gmail.com>
Date: Mon, 16 Jan 2023 18:36:54 +0100
Subject: [PATCH] CItation (#271)

* Add citation

* code block
---
 README.md | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f5ab480..46cf8bb 100644
--- a/README.md
+++ b/README.md
@@ -463,11 +463,25 @@ It takes 3.7h to download 18M pictures
 downloading 2 parquet files of 18M items (result 936GB) took 7h24
 average of 1345 image/s
 
-## 190M benchmark
+### 190M benchmark
 
 downloading 190M images from the [crawling at home dataset](https://github.com/rom1504/cah-prepro) took 41h (result 5TB)
 average of 1280 image/s
 
-## 5B benchmark
+### 5B benchmark
 
 downloading 5.8B images from the [laion5B dataset](https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/) took 7 days (result 240TB), average of 9500 sample/s on 10 machines, [technical details](https://rom1504.medium.com/semantic-search-at-billions-scale-95f21695689a)
+
+
+
+## Citation
+```
+@misc{beaumont-2021-img2dataset,
+  author = {Romain Beaumont},
+  title = {img2dataset: Easily turn large sets of image urls to an image dataset},
+  year = {2021},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/rom1504/img2dataset}}
+}
+```