Skip to content

Commit

Permalink
Merge pull request #1511 from DerekMelchin/bug-1500-object-store-data…
Browse files Browse the repository at this point in the history
…-source

Update Object Store streaming docs
  • Loading branch information
DerekMelchin authored Oct 13, 2023
2 parents 3693a78 + afafcf5 commit 501637d
Show file tree
Hide file tree
Showing 18 changed files with 53 additions and 29 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
<p>The most common remote file providers to use are Dropbox, GitHub, and Google Sheets.</p>
<p>The most common file providers to use are the <a href='/docs/v2/writing-algorithms/object-store'>Object Store</a>, Dropbox, GitHub, and Google Sheets.</p>

<h4>Object Store</h4>
<p>The Object Store is the fastest file provider. If you import files from remote providers, you will be restricted by their rate limits and your download speed.</p>

<h4>Dropbox</h4>
<p>If you store your custom data in Dropbox, you need to create a link to the file and add <code>?dl=1</code> to the end of the file URL. To create file links, see <a href='https://help.dropbox.com/files-folders/share/share-file-or-folder' rel='nofollow' target='_blank'>How to share files or folders</a> in the Dropbox documentation. The <code>?dl=1</code> parameter lets you download the direct file link, not the HTML page of the file.</p>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
<p>The following table shows the number of files you can download during a single backtest or Research Environment session in QuantConnect Cloud:</p>
<p>There are no limits to the number of files you can load from the <a href='/docs/v2/research-environment/object-store'>Object Store</a> during a single backtest or Research Environment session in QuantConnect Cloud.</p>

<p>The following table shows the number of remote files you can download during a single backtest or Research Environment session in QuantConnect Cloud:</p>

<table class="qc-table table" id='file-quota-table'>
<thead>
Expand Down Expand Up @@ -38,6 +40,4 @@
}
</style>

<p>Each file can be up to 200 MB in size and have a file name up to 200 characters long.</p>

<p>If you need to import more files than your quota allows, save your custom data files in the <a href='/docs/v2/research-environment/object-store'>Object Store</a> and load them from there.</p>
<p>Remote files can be up to 200 MB in size and can have names up to 200 characters long.</p>
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<p>We do not impose a rate limit on file downloads but often external providers do. Dropbox caps download speeds to 10 kb/s after 3-4 download requests. To ensure your algorithms run fast, only use a small number of small custom data files.</p>
<p>We do not impose a rate limit on file downloads but often external providers do. Dropbox caps download speeds to 10 kb/s after 3-4 download requests. To ensure your algorithms run fast, only use a small number of small custom data files or use the Object Store.</p>
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<p>There are two techniques to import data into your algorithm. You can either manually import the entire file or stream the file line-by-line into your algorithm's <code>OnData</code> event. This page explores streaming a file's contents into your algorithm line-by-line.</p>
<p>There are two techniques to import data into your algorithm. You can either manually import the entire file or stream the file line-by-line into your algorithm's <code>OnData</code> event. This page explores streaming a file's contents into your algorithm line-by-line. The data you import can be from a remote server or the <a href='/docs/v2/writing-algorithms/object-store'>Object Store</a>.</p>
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
<p>Common data formats are <span class="public-file-name">CSV</span>, <span class="public-file-name">JSON</span>, and <span class="public-file-name">XML</span>, but you can use any file type that can be read over the internet. Each request has a one-second overhead, so you should package your custom data to minimize requests. Bundle dates together where possible to speed up execution. Just ensure the data in the file is in chronological order.</p>
<p>Common data formats are <span class="public-file-name">CSV</span>, <span class="public-file-name">JSON</span>, and <span class="public-file-name">XML</span>, but you can use any file type that can be read over the internet. For Excel files, double check the raw data format for parsing in the data reader, since data will be formatted for convenient visualization in Excel application view. To avoid confusion of data format, save the spreadsheet as a <span class="public-file-name">CSV</span> file and open it in a text editor to confirm the raw data format.</p>

<p>For Excel files, please double check the raw data format for parsing in the data reader, since data will be formatted for convenient visualization in Excel application view. To avoid confusion of data format, save the spreadsheet as a <span class="public-file-name">CSV</span> file and open it in a text editor to confirm the raw data format.</p>
<p>The data in the file must be in chronological order. If you import from a remote <a href=''>file provider</a>, each request has a one-second overhead, so package your custom data to minimize requests. Bundle dates together where possible to speed up execution. The Object Store file provider gives you the fastest execution because you don't need to download the files on every execution.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>In live trading, we pass custom data to your algorithm as soon as it arrives. The time it arrives may not align with the time of other <a href='https://www.quantconnect.com/docs/v2/writing-algorithms/key-concepts/time-modeling/timeslices'>slices</a>. Design your algorithm to handle unsychronized data so that you don't run into issues.</p>
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@
config: SubscriptionDataConfig,
date: datetime,
isLive: bool) -&gt; SubscriptionDataSource:

if not isLive:
return SubscriptionDataSource("&lt;custom_data_key&gt;", SubscriptionTransportMedium.ObjectStore, FileFormat.Csv)
return SubscriptionDataSource("https://www.dropbox.com/s/rsmg44jr6wexn2h/CNXNIFTY.csv?dl=1", SubscriptionTransportMedium.RemoteFile)

def Reader(self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@
config: SubscriptionDataConfig,
date: datetime,
isLive: bool) -&gt; SubscriptionDataSource:
if not isLive:
return SubscriptionDataSource("&lt;custom_data_key&gt;", SubscriptionTransportMedium.ObjectStore, FileFormat.Csv)
return SubscriptionDataSource("https://raw.githubusercontent.com/QuantConnect/Documentation/master/Resources/datasets/custom-data/unfolding-collection-example.json", SubscriptionTransportMedium.RemoteFile, FileFormat.UnfoldingCollection)

def Reader(self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@
config: SubscriptionDataConfig,
date: datetime,
isLive: bool) -&gt; SubscriptionDataSource:
if not isLive:
return SubscriptionDataSource("&lt;custom_data_key&gt;", SubscriptionTransportMedium.ObjectStore, FileFormat.Csv)
return SubscriptionDataSource("https://www.dropbox.com/s/7xe7lfac52mdfpe/custom-universe.json?dl=1",
SubscriptionTransportMedium.RemoteFile,
FileFormat.UnfoldingCollection)
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
<p>There are two techniques to import data into your algorithm. You can either manually import the entire file or stream the file line-by-line into your algorithm's <code>OnData</code> event. This page explores importing an entire file for manual use.</p>

<p>Instead of downloading the file from a remote file provider, you can upload the file to the Object Store (with the <a href='/docs/v2/cloud-platform/organizations/object-store#03-Upload-Files'>Algorithm Lab</a> or with the <a href='/docs/v2/lean-cli/object-store'>CLI</a>) for faster execution.</p>
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<?php echo file_get_contents(DOCS_RESOURCES."/datasets/custom-data/download-use-cases.html"); ?>
<? include(DOCS_RESOURCES."/datasets/custom-data/download-use-cases.html"); ?>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>When you download a remote file, save it into the Object Store so that you don't have to download the file again. If you need to import the file multiple times, it's faster to import it from the Object Store rather than repeatedly downloading the file from the remote file provider.</p>
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<p>Follow these steps to transport binary files:</p>

<ol>
<li>Add the following imports to your program:</li>
<li>Add the following imports to your local program:</li>
<div class="section-example-container">
<pre class="python">import pickle
import base64</pre>
Expand All @@ -13,9 +13,12 @@
base64_str = base64.b64encode(pickle_bytes).decode('ascii')</pre>
</div>

<li>Save the string representation of your object into the <a href='/docs/v2/writing-algorithms/object-store'>Object Store</a> or one of the <a href='/docs/v2/writing-algorithms/importing-data/key-concepts#04-Remote-File-Providers'>supported external sources</a>.</li>
<li>Save the string representation of your object to one of the <a href='/docs/v2/writing-algorithms/importing-data/key-concepts#04-Remote-File-Providers'>supported external sources</a>.</li>

<li>Load the string representation of your object into your trading algorithm.</li>
<li><a href='/docs/v2/writing-algorithms/importing-data/bulk-downloads#03-Download-Files'>Download the remote file</a> into your project.</li>
<div class="section-example-container">
<pre class="python">base64_str = self.Download("&lt;fileURL&gt;")</pre>
</div>

<li>Restore the object.</li>
<div class="section-example-container">
Expand Down
1 change: 1 addition & 0 deletions Resources/datasets/custom-data/download-use-cases.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
<p>The batch import technique is outside of the LEAN's awareness or control, so it can't enforce good practices. However, the batch import technique is good for the loading the following datasets:</p>
<ul>
<li>Loading data into the Object Store<li>
<li>Trained AI Models</li>
<li>Well-defined historical price datasets</li>
<li>Parameters and setting imports such as <code>Symbol</code> lists</li>
Expand Down
18 changes: 9 additions & 9 deletions Resources/object-store/read-data.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,6 @@
</div>


<h4>Bytes</h4>

<p>To read a <code>Bytes</code> object, call the <code>ReadBytes</code> method.</p>

<div class='section-example-container'>
<pre class='csharp'>var bytesData = <?=$cSharpPrefix?>ObjectStore.ReadBytes($"{<?=$cSharpPrefix?>ProjectId}/bytesKey");</pre>
<pre class='python'>byte_data = <?=$pythonPrefix?>ObjectStore.ReadBytes(f"{<?=$pythonPrefix?>ProjectId}/bytes_key")</pre>
</div>

<h4>Strings</h4>

<p>To read a <code>string</code> object, call the <code>Read</code> or <code>ReadString</code> method.</p>
Expand All @@ -60,4 +51,13 @@
<p class='csharp'>If you created the XML object from a dictionary, reconstruct the dictionary.</p>
<div class='csharp section-example-container'>
<pre class='csharp'>var dict = xmlData.Elements().ToDictionary(x => x.Name.LocalName, x => int.Parse(x.Value));</pre>
</div>

<h4>Bytes</h4>

<p>To read a <code>Bytes</code> object, call the <code>ReadBytes</code> method.</p>

<div class='section-example-container'>
<pre class='csharp'>var bytesData = <?=$cSharpPrefix?>ObjectStore.ReadBytes($"{<?=$cSharpPrefix?>ProjectId}/bytesKey");</pre>
<pre class='python'>byte_data = <?=$pythonPrefix?>ObjectStore.ReadBytes(f"{<?=$pythonPrefix?>ProjectId}/bytes_key")</pre>
</div>
20 changes: 13 additions & 7 deletions Resources/object-store/save-data.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,6 @@
<?=$writingAlgorithms ? "To store data, you need to provide a key. If you provide a key that is already in the Object Store, it will overwrite the data at that location. To avoid overwriting objects from other projects in your organization, prefix the key with your project ID. You can find the project ID in the URL of your browser when you open a project. For example, the ID of the project at <span class='public-file-name'>quantconnect.com/project/12345</span> is 12345." : ""?>
</p>

<h4>Bytes</h4>
<p>To save a <code>Bytes</code> object, call the <code>SaveBytes</code> method.</p>
<div class='section-example-container'>
<pre class='csharp'>var saveSuccessful = <?=$cSharpPrefix?>ObjectStore.SaveBytes($"{<?=$cSharpPrefix?>ProjectId}/bytesKey", bytesSample)</pre>
<pre class='python'>save_successful = <?=$pythonPrefix?>ObjectStore.SaveBytes(f"{<?=$pythonPrefix?>ProjectId}/bytes_key", bytes_sample)</pre>
</div>

<h4>Strings</h4>
<p>To save a <code>string</code> object, call the <code>Save</code> or <code>SaveString</code> method.</p>
<div class='section-example-container'>
Expand All @@ -47,3 +40,16 @@
<div class='csharp section-example-container'>
<pre class='csharp'>var saveSuccessful = <?=$cSharpPrefix?>ObjectStore.SaveXml&lt;XElement&gt;($"{<?=$cSharpPrefix?>ProjectId}/xmlKey", xmlSample);</pre>
</div>

<h4>Bytes</h4>
<p>To save a <code>Bytes</code> object (for example, zipped data), call the <code>SaveBytes</code> method.</p>
<div class='section-example-container'>
<pre class='csharp'>var saveSuccessful = <?=$cSharpPrefix?>ObjectStore.SaveBytes($"{<?=$cSharpPrefix?>ProjectId}/bytesKey", bytesSample)

var zippedDataSample = Compression.ZipBytes(Encoding.UTF8.GetBytes(stringSample), "data");
var saveSuccessful = <?=$cSharpPrefix?>ObjectStore.SaveBytes($"{<?=$cSharpPrefix?>ProjectId}/bytesKey.zip", zippedDataSample);</pre>
<pre class='python'>save_successful = <?=$pythonPrefix?>ObjectStore.SaveBytes(f"{<?=$pythonPrefix?>ProjectId}/bytes_key", bytes_sample)

zipped_data_sample = Compression.ZipBytes(bytes(string_sample, "utf-8"), "data")
zip_save_successful = <?=$pythonPrefix?>ObjectStore.SaveBytes($"{<?=$pythonPrefix?>ProjectId}/bytesKey.zip", zipped_data_sample)</pre>
</div>

0 comments on commit 501637d

Please sign in to comment.