-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Data\Implementing BaseData
All data that can be added and used in LEAN derives from the BaseData
class with the exception
of static files that can be downloaded using the Download
method in your algorithm.
BaseData
was created with the intention of being able to generically support any
reoccuring time-series data, including custom user data that is not provided by QuantConnect.
Compared to static file downloads using Download
, implementing BaseData
should be considered when you
are dealing with the following.
- Repeating data that spans across multiple days
- Custom data must be simulated in a backtest similar to price data
- You plan on associating the custom data to equity or option tickers
To get started, there are a few guidelines you should follow when considering implementing a new data source with regards to your data.
- You understand the data you plan on implementing
- Data has no look-ahead bias
- Data is in a single line
- If you can not represent your data in a single line because of the shape of the data (such as deeply nested objects), then you can try representing the data in JSON. But note that the data will have to be contained in one line with no line breaks in order to be read completely into the Reader method for parsing.
If you are using a static file, like for example a machine learning model hosted on DropBox, you can use the Download
method in your
algorithm to download the data. An example is provided below.
public class MyCustomAlgorithm : QCAlgorithm {
public override void Initialize() {
var myCustomModel = Download("https://<YOUR_SITE_GOES_HERE>/<FILE>");
// Deserialize `myCustomModel` and load into a framework
}
}
class MyCustomAlgorithm(QCAlgorithm):
def Initialize(self):
myCustomModel = self.Download("https://<YOUR_SITE_GOES_HERE>/<FILE>")
# Unpickle `myCustomModel` and load into a framework
If you are using a remote data source for your algorithm such as data hosted on an API, you can use the following template to get started.
using QuantConnect;
using QuantConnect.Util;
using System;
using System.IO;
namespace QuantConnect.Algorithm.CSharp {
// Algorithm implementation goes here
// ...
// ...
// ...
public class MyCustomDataSource : BaseData {
// define your values here. Examples: BullSentiment, BearSentiment
// Instructs LEAN to look for data at the given URL or location on disk.
public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
return new SubscriptionDataSource(
"<LOCATION>", // Location of the data. Can be a path or URL.
SubscriptionTransportMedium.<SOURCE>, // Specifies where to read the data from the source
FileFormat.<FORMAT> // Specifies how to read the file
);
}
// This will have to be implemented by you since almost all data sources differ in the way we parse them.
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
// Your implementation goes here
}
// We include the Clone method override to ensure that our data implementation
// is robust. Including the Clone method makes the custom data implementation
// more durable against failure.
public override BaseData Clone() {
return new MyCustomDataSource {
// Don't forget to copy these two properties over
EndTime = EndTime,
Symbol = Symbol,
// Copy your fields here
};
}
// Specifies the time zone for this data source. This is useful for custom data types
public override DateTimeZone DataTimeZone()
{
// Select the time zone of your data here.
// Defaults to New York timezone if no implementation is provided
// Example:
return TimeZones.Utc;
}
// Indicates whether this data type is linked to an underlying equity Symbol.
public override bool RequiresMapping() {
// If your data has a relationship with equities via their tickers,
// you should set this to true.
// Example: SEC filings are based on stock tickers, so it should be `true`
// Example: sentiment data tickers are related to stock tickers, so it should be `true`.
// Example: Federal Reserve data tickers are not related to stock tickers, so it should be `false`.
// Example: Weather data has no relationship to equities, so it should be `false`
// Setting to `false` for example purposes
return false;
}
// Sets the default resolution of this data source.
public override Resolution DefaultResolution() {
// Setting to `Resolution.Minute` for example purposes
return Resolution.Minute;
}
// Sets the supported resolutions for this data source.
public override List<Resolution> SupportedResolutions() {
// Setting to all resolutions for example purposes
return AllResolutions;
}
}
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *
# Algorithm implementation goes here
# ...
# ...
# ...
class MyCustomDataSource(PythonData):
def GetSource(self, config, date, isLiveMode):
'''
Instructs LEAN to look for data at the given URL or location on disk.
'''
return SubscriptionDataSource(
"<LOCATION>", # Location of the data. Can be a path or URL.
SubscriptionTransportMedium.<SOURCE>, # Specifies where to read the data from the source
FileFormat.<FORMAT> # Specifies how to read the file
)
def Reader(self, config, line, date, isLiveMode):
'''
This will have to be implemented by you since almost all data sources differ in the way we parse them.
'''
# Here we parse the custom fields that we've implemented. Define your values here. Example:
# instance["BullSentiment"] = ...
# instance["BearSentiment"] = ...
pass
# Python doesn't require the implementation of the `Clone` method.
# It is important that you do not override the `Clone` method from Python.
def DataTimeZone(self):
'''
Select the time zone of your data here.
Defaults to New York timezone if no implementation is provided.
'''
# Setting to "UTC" for example purposes
return TimeZones.Utc
def RequiresMapping(self):
'''
Indicates whether this data type is linked to an underlying equity Symbol.
'''
# If your data has a relationship with equities via their tickers,
# you should set this to True.
# Example: SEC filings are based on stock tickers, so it should be `True`
# Example: sentiment data tickers are related to stock tickers, so it should be `True`.
# Example: Federal Reserve data tickers are not related to stock tickers, so it should be `False`.
# Example: Weather data has no relationship to equities, so it should be `False`
# Setting to `False` for example purposes
return False
def DefaultResolution(self):
'''
Sets the default resolution of this data source.
'''
# Setting to `Resolution.Minute` for example purposes
return Resolution.Minute
def SupportedResolutions(self):
'''
Sets the supported resolutions for this data source.
'''
# Setting to all resolutions for example purposes
return self.AllResolutions
BaseData is the base class used to define a base structure for which we can represent data in LEAN. All data, including equities, forex, crypto, futures, options, and CFDs all use an implementation of BaseData to transmit data to the user building/running an algorithm. Because you can not use BaseData directly, we must provide an implementation of BaseData in order to use it and load our data into LEAN. You can view the fields and properties of BaseData by clicking on the table below:
TABLE GOES HERE
BaseData includes overridable properties as well as some default properties left for convinience purposes. The most used and important properties of BaseData are
-
Value: decimal
- Data value at the given EndTime. Useful for when you have a single series of data, not a DataFrame-like structure of data. -
Time: DateTime
- Beginning time of the data point. It explains when the data was emitted. -
EndTime: DateTime
- Explains when the data was emitted. By default, this will return the value ofTime
. This property is overridable in case we need to express a starting time withTime
and the emit time withEndTime
separately. -
Symbol: Symbol
- Associates the data with a Symbol object. An example would be AAPL sentiment data. The Symbol object would be representitive of an equity such as AAPL, FB, etc. depending on which assets you've subscribed to.
The rest of the BaseData properties are left for internal/specific use cases only.
BaseData contains a handful of virtual
methods, which means that these methods are overrideable.
The most important virtual methods provided in BaseData are:
-
GetSource
- Tells LEAN where to locate your data from -
Reader
- Creates a new instance of your BaseData implementation using data from disk. This is responsible for parsing data and converting it into a usable representation for your algorithm in LEAN. -
Clone
- Creates a clone of the data (deep copy)
-
RequiresMapping
- Tells LEAN if rename events apply to this data source. Defaults totrue
if the Symbol SecurityType is equity or option -
DataTimeZone
- Tells LEAN what timezone this data source is in. Defaults to New York time. -
IsSparseData
- Tells LEAN whether to log for missing files if the data source is sparse (i.e. data missing between data points). Defaults totrue
if the data source is custom data. -
ToString
- Converts the instance to a string. Defaults toSymbol: Value
(e.g.AAPL: 0.5
)
This method instructs LEAN on where to locate your data and which medium the data is in via the class SubscriptionDataSource
.
GetSource
has three parameters passed to it: config
, date
, and isLiveMode
.
-
The
config
parameter contains all of the data associated with an added equity, forex, crypto, future, option, cfd, or custom data such as theSecurityType
,Resolution
,Market
, andSymbol
. It is included to help you locate the data you want depending on the configuration you provided when the data was initially added to the algorithm. Its type isSubscriptionDataConfig
. -
The
date
parameter is the time the engine is requesting for the data. It is included to help you determine what date we want to load data for. Its type isDateTime
-
The
isLiveMode
parameter is included to help you determine whether we're running an algorithm live. This is so that you can decide what source you want to load data from in case the source for live data is different than backtesting. Its type isbool
You can view SubscriptionDataConfig
fields/properties in the table below.
TABLE GOES HERE
To return a SubscriptionDataSource
to LEAN, we must first specify the FileFormat
and SubscriptionTransportMedium
. An explanation of the two types is provided below.
-
SubscriptionTransportMedium
- describes where the data is stored (e.g. local disk, remote). Tells LEAN where and how to find the data -
FileFormat
- describes the format the file is in (e.g. CSV). Tells LEAN how to pump the data to Reader
This method is the main place where all the parsing of the data will take place. In this method, we will convert the raw data from the source into a class usable by LEAN.
Reader
has four parameters passed to it: config
, line
, date
, and isLiveMode
.
If FileFormat.Csv
was selected, data will be split into lines and be individually passed into Reader
. This means that your
Reader
method will be called n
times depending on how many lines your file contains. It is important that your Reader
method
is robust and your data is clean to prevent any errors from occurring.
The various parameters passed to Reader
are:
-
The
config
parameter contains all of the data associated with an added equity, forex, crypto, future, option, cfd, or custom data such as theSecurityType
,Resolution
,Market
, andSymbol
. It is included to help you locate the data you want depending on the configuration you provided when the data was initially added to the algorithm and has the typeSubscriptionDataConfig
. -
The
line
parameter will contain a line of data from the data source specified inGetSource
. This data is provided to you so that you can parse it. Its type isstring
. -
The
date
parameter is the time the engine is requesting for the data. It is included to inform you as to what time we're requesting the data for. Its type isDateTime
. -
The
isLiveMode
parameter is included to help you determine whether we're running an algorithm live. This is so that you can decide if you want to parse the data in a special way depending if you're trading live since the data might come from another source. Its type isbool
.
You can view SubscriptionDataConfig
fields/properties in the table below.
TABLE GOES HERE
An example of a Reader
implementation that parses CSV data is provided below:
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
// Assuming our CSV is as follows:
// TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
var csv = line.Split(',');
// Since MyCustomDataSource derives from BaseData, it is valid as a return type
return new MyCustomDataSource {
// This is the emit time, i.e. the time that the algorithm will output the event.
// Ensure you have this value set
EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
// This is the Symbol associated with the data. Usually should be set to `config.Symbol`
// Ensure you have this value set.
Symbol = config.Symbol,
// Here we parse the custom fields that we've implemented.
BullSentiment = Parse.Decimal(csv[1]),
BearSentiment = Parse.Decimal(csv[2])
};
}
def Reader(self, config, line, date, isLiveMode):
'''
This will have to be implemented by you since almost all data sources differ in the way we parse them.
Below we've provided an example showing how to correctly and idiomatically parse the data
'''
# Assuming our CSV is as follows:
# TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
csv = line.split(",")
# We must create an instance first to add our own custom data
instance = MyCustomDataSource()
# This is the emit time, i.e. the time that the algorithm will output the event.
# Ensure you have this value set.
instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")
# This is the Symbol associated with the data. Usually should be set to `config.Symbol`
# Ensure you have this value set.
instance.Symbol = config.Symbol
# Here we parse the custom fields that we've implemented. Define your values here
instance["BullSentiment"] = float(csv[1])
instance["BearSentiment"] = float(csv[2])
return instance
An example of a Reader
implementation that parses JSON data is provided below.
Please note that the JSON data must not contain any new lines/line breaks (i.e. data must be in a single line).
using Newtonsoft.Json;
using QuantConnect.Util;
namespace QuantConnect.Algorithm.CSharp {
public class MyCustomDataSource : BaseData {
[JsonProperty("bull_sentiment")]
public decimal BullSentiment { get; set; }
[JsonProperty("bear_sentiment")]
public decimal BearSentiment { get; set; }
// Because we have a format that Json.NET can't parse, we need to define
// the format of the date. You can use the `DateTimeJsonConverter` class
// to define the format of the date easily.
[JsonProperty("time"), JsonConverter(typeof(DateTimeJsonConverter), "yyyyMMdd HH:mm:ss")]
public override DateTime EndTime { get; set; }
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
// Assuming our JSON data is as follows:
// {"time": "20190101 00:00:00", "bull_sentiment": 0.5, "bear_sentiment": 0.5}
// If we're going to be parsing JSON, use Newtonsoft's JSON parser and decorate your types with `JsonProperty`
// Use the current class name (MyCustomDataSource) as the type parameter.
// Replace the type parameter with the name of your class.
var instance = JsonConvert.DeserializeObject<MyCustomDataSource>(line);
// This is the Symbol associated with the data. Usually should be set to `config.Symbol`
// Ensure you have this value set.
instance.Symbol = config.Symbol;
return instance;
}
}
}
import json
from datetime import datetime
class MyCustomDataSource(PythonData):
def Reader(self, config, line, date, isLiveMode):
# Assuming our JSON data is as follows:
# {"time": "20190101 00:00:00", "bull_sentiment": 0.5, "bear_sentiment": 0.5}
data = json.loads(line)
instance = MyCustomDataSource()
# Parse the time and set EndTime equal to it.
# Ensure you have this value set.
instance.EndTime = datetime.strptime(data["end_time"], "%Y%m%d %H:%M:%S")
# This is the Symbol associated with the data. Usually should be set to `config.Symbol`
# Ensure you have this value set.
instance.Symbol = config.Symbol
instance["BullSentiment"] = data["bull_sentiment"]
instance["BearSentiment"] = data["bear_sentiment"]
return instance
To ensure we have a robust BaseData type, we must implement the Clone
method (only in C#).
The Clone
method is called by the LEAN engine to create a copy of the data so that the original
data is not altered.
In addition, we guarantee a higher degree of robustness to your custom data source if you implement this method. More information can be found under the Debugging::Why is my data all null? section.
To implement the Clone
method, simply copy over all the types found in your data. The only property that is modified is the Time
property. But because it is a DateTime
, we are guaranteed to have a full copy occur.
In Python, do not override Clone
.
An example of the Clone
method implementation is provided below.
// We include the Clone method override to ensure that our data implementation
// is robust. Including the Clone method makes it more durable against failure.
public override BaseData Clone() {
return new MyCustomDataSource {
// Don't forget to copy these two properties over
Time = Time,
Symbol = Symbol,
BullSentiment = BullSentiment,
BearSentiment = BearSentiment
};
}
This method informs LEAN what time zone this data source is in. To set a timezone, use the QuantConnect.TimeZones
static class to select the appropriate time zone for your data source. This ensures that the time is assigned the proper timezone so that it can be emitted at the right time.
An example implementing this method is provided below.
// Specifies the time zone for this data source. This is useful for custom data types
public override DateTimeZone DataTimeZone()
{
// Select the time zone of your data here
return TimeZones.Utc;
}
import QuantConnect
def DataTimeZone(self):
return Quantconnect.TimeZones.Utc
This method will default to TimeZones.NewYork
if no implementation is provided.
This method informs LEAN that the data source has a relationship with equity Symbols.
A few items to determine if you should enable RequiresMapping
are provided below.
- The custom data source you are implementing is for equities or options.
- The custom data source you are implementing uses the same Symbols/tickers as equities (e.g. AAPL for equities and AAPL for custom data)
If you checked both of these boxes, you should set RequiresMapping
to true. Otherwise, set it to false as it has
no relationship to equities or options.
An example implementation of this method is provided below:
// Indicates whether this data type is linked to an underlying Symbol.
public override bool RequiresMapping() {
return true;
}
# Indicates whether this data type is linked to an underlying Symbol
def RequiresMapping(self):
return True
Note that this method will default to true
if the underlying Symbol is an Equity or Option and no implementation is provided.
Someday, you might receive malformed data from the data vendor due to a glitch or changes in the data spec. However, in our current implementation, if our parsing fails, the exception will be unhandled and cause the algorithm to terminate prematurely.
To deal with this issue, we can return [csharp]null
[/csharp][python]None[/python] to indicate that the value
failed to parse properly. This gives us a way to handle the error gracefully, which can be critical for live trading uptime.
An example of this pattern is present in the CBOE custom data implementation:
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
// Return null if we don't have a valid date for the first entry
if (!char.IsNumber(line.FirstOrDefault())) {
return null;
}
// ...
}
def Reader(self, config, line, date, isLiveMode):
if not line.isnumeric():
return None
# ...
A reoccurring pattern in the implementation of Reader
is the inclusion of exception handling. In this way, we are guaranteed
to only receive valid data from the Reader
method. However, it has the potential of suppressing/obscuring errors which would result in a
lower amount of data emitted.
If you plan on implementing exception handling, we recommend logging an error with the exception so that you can review it at a later time.
Below are a few reasons you may or may not want to handle exceptions.
Reasons to handle exceptions:
- Provide fault-tolerant behavior in your algorithm
- You want to custom tailor the behavior of
Reader
when an error is encountered - You want to provide redundant behavior inside
Reader
Reasons to not handle exceptions:
- Enforce consistency and correctness in your data
- Stop execution immediately if invalid data is encountered
- Performance (only applies if the
catch/except
block is fired periodically)
Sometimes, the structure in which your data comes in has already been implemented in LEAN.
A great example of this is are data sources that use "open, high, low, close" (OHLC) bars.
For data sources that use OHLC, we can use the existing Bar
class and inherit from it instead of BaseData
.
In that way, we wouldn't have to reimplement the properties of the class and also gain
access to some of the helper methods that it provides.
Here is a guideline to help you decide which type to inherit from when constructing your custom data.
-
Bar
- Your data source has OHLC fields (no volume field) -
TradeBar
- Your data source has OHLCV fields -
QuoteBar
- Your data source has OLHCV for both bid and ask sides -
BaseData
- None of the above apply
These features are unsupported in Python due to the differences between these two languages. There is no plans to support equivalent features at this time.
When adding new value definitions to your class such as BullIntensity
, prefer using properties
over fields. This is done to ensure consistency with the rest of the codebase.
Prefer:
public decimal BullIntensity { get; set; }
Disprefer:
public decimal BullIntensity;
When EndTime
is overridden, it is normally done so that we can specify the starting time that the data
applies to and the ending time that the data ends at (i.e. when it should be emitted).
Period
tends to also be included into the custom data source as well to describe how much
time has passed between Time
and EndTime
.
A common overridden implementation of EndTime
is shown below.
// Set a period of only a single minute. This means that a bar will encompass one minute
public TimeSpan Period { get; set; } = TimeSpan.FromMinutes(1);
// The end time of this data. Some data covers spans (trade bars) and as such we want
// to know the entire time span covered
public override DateTime EndTime
{
get { return Time + Period; }
set { Time = value - Period; }
}
Beware: if you override this property to the example above, you should not copy
EndTime
in the Clone
method. You should only copy Time
and your EndTime
will still be preserved.
Here we present to you a complete example comprised of all the sections explained so far. Note that Python is lacking some features due to its inability to implement C# specific code across interop boundaries.
using QuantConnect;
using QuantConnect.Logging;
using QuantConnect.Util;
using System;
using System.IO;
namespace QuantConnect.Data.Custom {
public class MyCustomDataSource : BaseData {
// Define the period of the bar last one minute.
// This is amount of time between the starting time to the ending time
public TimeSpan Period { get; set; } = TimeSpan.FromMinutes(1);
// Sets the EndTime of the bar. This is the time the data will be emitted
public override DateTime EndTime {
get { return Time + Period; }
set { Time = value - Period; }
}
// define your values here. Examples:
public decimal BullSentiment { get; set; }
public decimal BearSentiment { get; set; }
// Instructs LEAN to look for data at the given URL or location on disk.
public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
return new SubscriptionDataSource(
"https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", // Location of the data.
SubscriptionTransportMedium.RemoteFile, // Specifies to read a whole file from a remote source (URL)
FileFormat.Csv // Specifies to read the file line by line, like a CSV file
);
}
// Below we've provided an example showing how to correctly and idiomatically parse the data
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
try {
// Assuming our CSV is as follows:
// TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
var csv = line.Split(',');
// Since MyCustomDataSource derives from BaseData, it is valid as a return type
return new MyCustomDataSource {
// This is the emit time, i.e. the time that the algorithm will output the event.
// Ensure you have this value set
EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
// This is the Symbol associated with the data. Usually should be set to `config.Symbol`
// Ensure you have this value set.
Symbol = config.Symbol,
// Here we parse the custom fields that we've implemented.
BullSentiment = Parse.Decimal(csv[1]),
BearSentiment = Parse.Decimal(csv[2])
};
}
catch (Exception e) {
// Log the error for future debugging
Log.Error(e);
// Return null if we couldn't parse the data.
return null;
}
}
// We include the Clone method override to ensure that our data implementation
// is robust. Including the Clone method makes it more durable against failure.
public override BaseData Clone() {
return new MyCustomDataSource {
// Don't forget to copy these two properties over.
// Copy `Time` instead of `EndTime` to prevent our time from shifting
// over one whole `Period`
Time = Time,
Symbol = Symbol,
BullSentiment = BullSentiment,
BearSentiment = BearSentiment
};
}
// Specifies the time zone for this data source. This is useful for custom data types
public override DateTimeZone DataTimeZone()
{
// Select the time zone of your data here
return TimeZones.Utc;
}
// Indicates whether this data type is linked to an underlying equity Symbol.
public override bool RequiresMapping() {
return true;
}
public override string ToString() {
return $"{EndTime} - {Symbol}: Bull sentiment: {BullSentiment}, Bear sentiment: {BearSentiment}";
}
}
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *
class MyCustomDataSource(PythonData):
def GetSource(self, config, date, isLiveMode):
'''
Instructs LEAN to look for data at the given URL or location on disk.
'''
return SubscriptionDataSource(
"https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", # Location of the data.
SubscriptionTransportMedium.RemoteFile, # Specifies to read a whole file from a remote source (URL)
FileFormat.Csv # Specifies to read the file line by line, like a CSV file
)
def Reader(self, config, line, date, isLiveMode):
'''
This will have to be implemented by you since almost all data sources differ in the way we parse them.
'''
# Here we parse the custom fields that we've implemented.
try:
# Assuming our CSV is as follows:
# TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
var csv = line.Split(',')
# Since MyCustomDataSource derives from BaseData, it is valid as a return type
instance = MyCustomDataSource()
# This is the emit time, i.e. the time that the algorithm will output the event.
# Ensure you have this value set
instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")
# This is the Symbol associated with the data. Usually should be set to `config.Symbol`
# Ensure you have this value set.
instance.Symbol = config.Symbol
# Define your values here.
instance["BullSentiment"] = Parse.Decimal(csv[1])
instance["BearSentiment"] = Parse.Decimal(csv[2])
return instance
except Exception as e:
# Log the error for future debugging
Log.Error(e)
# Return null if we couldn't parse the data.
return None
# Python doesn't require the implementation of the `Clone` method.
# It is important that you do not override the `Clone` method in Python.
def DataTimeZone(self):
'''
Select the time zone of your data here.
Defaults to New York timezone if no implementation is provided.
'''
return TimeZones.Utc
def RequiresMapping(self):
'''
Indicates whether this data type is linked to an underlying equity Symbol.
'''
# If your data has a relationship with equities via their tickers,
# you should set this to True.
# Example: SEC filings are based on stock tickers, so it should be `True`
# Example: sentiment data tickers are related to stock tickers, so it should be `True`.
# Example: Federal Reserve data tickers are not related to stock tickers, so it should be `False`.
# Example: Weather data has no relationship to equities, so it should be `False`
# Sentiment data is for equities. Set to True because we share the same set of tickers.
return True
def DefaultResolution(self):
'''
Sets the default resolution of this data source.
'''
return Resolution.Minute
def SupportedResolutions(self):
'''
Sets the supported resolutions for this data source.
'''
return self.AllResolutions
If you require additional reference material/examples, please visit the LEAN repository on GitHub containing implementations of these concepts used in production
To access the custom data in your algorithm in OnData
, we recommend using the Slice.Get
method. Because
Slice
is keyed by Symbol, we want to access the data itself. We can do this by calling .Values
on the outcome of Slice.Get
, which we can then iterate over.
An example of this is provided below.
public override void OnData(Slice data) {
foreach (var sentiment in data.Get<MyCustomDataSource>().Values) {
Log($"{sentiment.Symbol}: Got bullish sentiment of {sentiment.BullSentiment}");
}
}
def OnData(self, data):
for sentiment in data.Get(MyCustomDataSource).Values:
self.Log(f"{sentiment.Symbol}: Got bullish sentiment of {sentiment.BullSentiment}")
This can be caused by either not implementing the Clone
method properly, GetSource
pointing
to a resource that does not exist, or an error inside your Reader
method exists.
To fix this, make sure you've done or verified the following:
- You have implemented the
Clone
method -
GetSource
points to a valid location - Parsing in
Reader
is successful - Your endlines are
\n
or\r\n
- Use
UTF-8
file encoding - If using compression, ensure your archive is not corrupted
This can be caused by not implementing the Clone
method properly, or a silent failure in Reader
.
To fix this, make sure you've done the following:
- You have implemented the
Clone
method - Parsing in
Reader
is successful
This can be caused by accessing your custom data via the Slice
indexer (e.g. data[_symbol]
).
To fix this, we recommend accessing the data using the Slice.Get
method as it will respect and preserve
nullable types.
Prefer:
public override void OnData(Slice data) {
var customData = data.Get<MyCustomDataSource>(_symbol).Value;
}
def OnData(self, data):
customData = data.Get(MyCustomDataSource, self.symbol).Value
Disprefer:
public override void OnData(Slice data) {
var customData = data[_symbol];
}
def OnData(self, data):
customData = data[self.symbol]
This can be caused by the algorithm operating in a different time zone than the data.
To fix this, you can set the algorithm's time zone to the same time zone as the data.
This is caused by not implementing the DefaultResolution
and SupportedResolutions
methods.
To fix this, you can override those methods and provide a suitable resolution for your data.
If you are backtesting locally and retrieving data from a remote source, we recommend gathering the data you want to backtest on locally before backtesting.
If you are backtesting on our cloud platform using an officially supported alternative data and it is too slow for your purposes, please contact support via e-mail with an example algorithm attached.
If your issues persist after following these steps, please e-mail support with your custom data class, an algorithm attached, and if possible, example data to replicate the issue with.
Sometimes, compression is desired for our data sources to save disk space as much as possible.
If this is the case, it is still possible to access data within a ZIP archive by using the hash ("#") feature
in GetSource
. In the <SOURCE>
position in the Quick Start
GetSource
method, you can reference
the zip file along with the file contained within that you want to read.
The syntax is as follows:
<SOURCE>#<FILE>
An example of this concept is provided below.
public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
return new SubscriptionDataSource(
"Data/my_custom_data/20180101.zip#file.json",
// ...
// ...
);
}
def GetSource(self, config, date, isLiveMode):
return SubscriptionDataSource(
"Data/my_custom_data/20180101.zip#file.json",
# ...
# ...
)
Currently, we only support ZIP compression. If you require support for alternative forms of compression, please e-mail support with the compression format you would like supported.
This file format is useful whenever you have a collection of tickers contained in a single piece of data, but don't want to duplicate
the data itself for each ticker. Similar to pointers, FileFormat.Index
indicates that the file located under the ticker you want to access
redirects to the final data that contains a collection of tickers, including the one that is being requested for.
An example diagram of the concept is provided below.
+----------------------+
GetSource(...) returns -----> | ./aapl/20180101.json | -----> (which is then iterated on and "GetSourceForAnIndex(...)" is called)
| -------------------- |
| 1234.json | -----> GetSourceForAnIndex(...) returns: ./contents/20180101.zip#1234.json
| 2345.json | -----> GetSourceForAnIndex(...) returns: ./contents/20180101.zip#2345.json
+----------------------+
To implement FileFormat.Index
, you need to do the following.
- Derive from
IndexedBaseData
instead of - Implement
GetSource
that returnsFileFormat.Index
to point towards the index file (./aapl/20180101.json
) - Create a new method called
GetSourceForAnIndex
that points to the final file containing the data and collection of tickers
An example implementation of the diagram above is provided below:
public class MyCustomDataSource : IndexedBaseData {
// ...
// This effectively tells LEAN where to find the index file
public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
return new SubscriptionDataSource(
$"./{config.Symbol.Value.ToLower()}/{date:yyyyMMdd}.json", // Assuming `Symbol` is "AAPL" and `date` is 2018-01-01
SubscriptionDataSource.LocalFile,
FileFormat.Index
);
}
// This tells LEAN where to find the real data for a given index.
// We will be redirected to another file from here
public override SubscriptionDataSource GetSourceForAnIndex(SubscriptionDataConfig config, DateTime date, string index, bool isLiveMode) {
return new SubscriptionDataSource(
$"./contents/{date:yyyyMMdd}.zip#{index}", // Assuming `index` is `1234.json` or `2345.json`
SubscriptionDataSource.LocalFile,
FileFormat.Csv
);
}
}
If you'd like to request this feature for Python, please e-mail support with a link to this section explaining your use case.
If you need to return a collection of your custom data type, FileFormat.Collection
can be used to do so if your
return type satisfies IEnumerable<BaseData>
.
To implement this, do the following.
- Return from
GetSource
withFileFormat
set toFileFormat.Collection
- In
Reader
, return aBaseDataCollection
object containing the data as the final argument.
You can see an example implementation of it in SECReport10Q
on GitHub
Custom data sources are not cached at this time.
When you are implementing your data source for live trading, it is important to know if your data will be in a different shape or will require
special parsing. If your data retrieval or data parsing differs from the backtesting implementation, then you will need to implement
a special branch inside the existing GetSource
and or Reader
.
To do so, you can use the flag isLiveMode
to determine whether the algorithm is trading live.
An example is provided below.
public class MyCustomDataSource : BaseData {
// Place our API key here for use in live trading
private string _apiKey = "<OMITTED>";
// Instructs LEAN to look for data at the given URL or location on disk.
public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
if (isLiveMode) {
return new SubscriptionDataSource(
$"https://<SOME_API_SITE>.com/?key={_apiKey}&date={date:yyyyMMddTHH:mm}",
SubscriptionTransportMedium.RemoteFile,
FileFormat.Csv
);
}
return new SubscriptionDataSource(
"https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", // Location of the data.
SubscriptionTransportMedium.RemoteFile, // Specifies to read a whole file from a remote source (URL)
FileFormat.Csv // Specifies to read the file line by line, like a CSV file
);
}
// This will have to be implemented by you since almost all data sources differ in the way we parse them.
// Below we've provided an example showing how to correctly and idiomatically parse the data
public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
if (isLiveMode) {
// Assuming our CSV is as follows from our live endpoint:
// TIME (yyyyMMddHH:mm:ss), BearSentiment, BullSentiment
//
// Notice how the data source format can be different. Having
// the `isLiveMode` flag is advantageous in allowing us to implement
// a custom parser for a live data source that differs from the backtesting feed
var csv = line.Split(',');
return new MyCustomDataSource {
EndTime = Parse.DateTimeExact(csv[0], "yyyyMMddHH:mm:ss"),
Symbol = config.Symbol,
BullSentiment = Parse.Decimal(csv[2]),
BearSentiment = Parse.Decimal(csv[1])
};
}
// Assuming our CSV is as follows:
// TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
var csv = line.Split(',');
// Since MyCustomDataSource derives from BaseData, it is valid as a return type
return new MyCustomDataSource {
// This is the emit time, i.e. the time that the algorithm will output the event.
// Ensure you have this value set
EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
// This is the Symbol associated with the data. Usually should be set to `config.Symbol`
// Ensure you have this value set.
Symbol = config.Symbol,
// Here we parse the custom fields that we've implemented.
BullSentiment = Parse.Decimal(csv[1]),
BearSentiment = Parse.Decimal(csv[2])
};
}
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *
class MyCustomDataSource(PythonData):
def __init__(self):
# Place our API key here for use in live trading
self.apiKey = "<OMITTED>"
def GetSource(self, config, date, isLiveMode):
'''
Instructs LEAN to look for data at the given URL or location on disk.
'''
if isLiveMode:
return SubscriptionDataSource(
f"https://<SOME_API_SITE>.com/?key={self.apiKey}&date={date.strftime('%Y%m%dT%H:%M')}",
SubscriptionTransportMedium.RemoteFile,
FileFormat.Csv
)
return SubscriptionDataSource(
"https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", # Location of the data.
SubscriptionTransportMedium.RemoteFile, # Specifies to read a whole file from a remote source (URL)
FileFormat.Csv # Specifies to read the file line by line, like a CSV file
);
def Reader(self, config, line, date, isLiveMode):
'''
This will have to be implemented by you since almost all data sources differ in the way we parse them.
Below we've provided an example showing how to correctly and idiomatically parse the data
'''
if isLiveMode:
# Assuming our CSV is as follows from our live endpoint:
# TIME (yyyyMMddHH:mm:ss), BearSentiment, BullSentiment
#
# Notice how the data source format can be different. Having
# the `isLiveMode` flag is advantageous in allowing us to implement
# a custom parser for a live data source that differs from the backtesting feed
csv = line.split(",")
instance = MyCustomDataSource()
instance.EndTime = datetime.strptime(csv[0], "%Y%m%d%H:%M:%S")
instance.Symbol = config.Symbol
instance.BullSentiment = float(csv[2])
instance.BearSentiment = float(csv[1])
return instance
# Assuming our CSV is as follows:
# TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment
csv = line.split(",");
# Since MyCustomDataSource derives from BaseData, it is valid as a return type
instance = MyCustomDataSource()
# This is the emit time, i.e. the time that the algorithm will output the event.
# Ensure you have this value set
instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")
# This is the Symbol associated with the data. Usually should be set to `config.Symbol`
# Ensure you have this value set.
instance.Symbol = config.Symbol
# Here we parse the custom fields that we've implemented.
instance.BullSentiment = float(csv[1])
instance.BearSentiment = float(csv[2])
return instance
QuantConnect/LEAN Engine is a community based open source initiative to revolutionize quantitative finance. We're united by a mission to radically open the tools and technology which drive quantitative finance and apply a modern community-first approach to the industry. Since its launch in 2012 QuantConnect has pioneered this with LEAN, the largest independent open source algorithmic trading project in the world.
Read more about our mission, documentation on using LEAN, or check out the community forum.