-
Notifications
You must be signed in to change notification settings - Fork 3
Open Questions
-
There is a lot of commonality between the
OphysManifest
class and theEphysManifest
class. It would make sense to gather this in a superclass, and theManifest
class is an obvious candidate. Would that be ok to do, or would that violate some principles of design?- VI: Much of OOP design pattern lore guides to "favor association over inheritance", which is what's done currently. However I believe this lore largely stems from Java & strongly reflects its lack of multiple inheritance. I agree with your instinct to favor inheritance where rational such as here. Seems a worthwhile refactor to consider. And regardless of association or inheritance relationship, refactoring to centralize common functionality (e.g.
updateManifest
as noted below) would be worthwhile. As your next question gets at, this may involve detective work to determine what is or could/should be common. - EH: This is now implemented and there were a handful of methods that are shared among
Ephys
- andOphysManifest
that are now methods of the Manifest superclass
- VI: Much of OOP design pattern lore guides to "favor association over inheritance", which is what's done currently. However I believe this lore largely stems from Java & strongly reflects its lack of multiple inheritance. I agree with your instinct to favor inheritance where rational such as here. Seems a worthwhile refactor to consider. And regardless of association or inheritance relationship, refactoring to centralize common functionality (e.g.
-
The
Ephys
- andOphysManifest
classes do not have the same internal logic for retrieving and caching manifests. In theOphysManifest
, all the different tables are fetched and put into a struct and immediately added to the disk cache, whereas in theEphysManifest
they are handled more individually and the final processed tables are initially cached in memory, and only added to the disk cache when the property get method for particular item types are invoked. Is there a particular reason for this (i.e performance considerations during retrieval)?- VI: Great that you're analyzing this. Perhaps DM has insights.
- DM: The
EphysManifest
handling adheres very closely to the Allen SDK. TheOphysManifest
handling predates my involvement, but it's also the case thatOphysManifest
tables are much much smaller, so re-fetching or re-processing portions is less costly. So the additional caching ofEPhysManifest
might anyway be warranted. - EH: I found that the biggest discrepancy was that item tables for the
OphysManifest
were all lumped together in one struct in the disk cache, whereas the ephys item tables were treated individually. I made the change to treat ophys tables individually as well, and I think this provides some improvements, e.g when fetching ophys sessions, it only fetches the session table where before it fetched the session, experiment and the cell tables (and the cell table is huge). Also, the underlying code is now more "symmetric". Thus, it should now be easier to do the same (post)processing of ophys tables as ephys tables, i.e joining tables and adding counts for subtables.
-
Would it be cleaner to use the
OnDemandProperties
mixin for memory caching of tables in theManifest
classes?- VI: So far OnDemandProperties has only been used for user-facing properties of item object properties. I guess adapting to private properties would be fairly trivial, but would have to analyze wrt to how intertwined/specialized it is for item objects today. Conceptually seems sound, but would tend to prioritize lower if it doesn't prove highly straightforward.
- EH: This was straight forward, and since item table get-access is public, they are in theory user facing (although users should not have to interact with them directly).
-
What is the purpose of the
UpdateManifests
method - Why is it called update? It seems more like a clear/reset method...- VI: I believe it was intended to allow a fresh check of the data source (whether API or now possibly S3), which is always subject to change (which has occurred from time to time during the project, affecting demos etc). Maybe "Refresh" would have been a better word than "Update", but the implementation is a clear/reset type operation as you say.
- EH: I changed the name to
resetManifest
, and added a new methodupdateManifest
that does the reset, but additionally fetches all the manifests again.
General note: I changed some syntax internally in these classes, so now all the tables are referred to as item tables (in general) and session table, unit table etc. in particular. A manifest would then be the collection of all item tables for a dataset. I think this makes things clearer, as before a manifest could be both an individual table, but also the collection of tables (and, as we discussed before, cache maps are also referred to as manifests).
General syntax suggestions:
- To give the manifests more of a matlab flavour I would suggest to rename the item table properties like ‘ophys_sessions' or ‘ephys_sessions' to ‘Sessions’.
- Following the same line, internally in the classes (manifest classes, but also others) to consistently refer to objects of the class with the variable name obj instead of various acronyms that are specific to specific classes.
-
In the bot.item.internal.mixin.OnDemandProps/fetch_cached method: a) If a property has previously been marked as Unavailable, right now a new attempt at fetching the property will not do anything. Maybe this makes sense in some circumstances, but I wonder if it would make more sense to make another attempt at fetching the property value. For example, I used the OnDemand properties for the Manifest classes, and sometimes there might be aborted file downloads that causes the property not to be cached. In these cases I want to reference the property again and hopefully fetch the value again. Does this make sense?
-
This is may be more from the developer’s point of view, but related to above, if the attempt at fetching the OnDemand property value fails, the error is now caught, but there is no indication to the user that it failed. Would it be better the throw an error if fetching a property fails? As a practical example, in issue #155 someone experienced a bug, but the error message was unrelated to the point of failure because the actual error was caught silently. This makes it also harder to debug such cases.
- In the
bot.internal.cache/CachedApiCall
method, there are a lot of optional arguments, like:nPageSize
,strFormat
,strRMAPrefix
,strHost
,strScheme
. However, none of the calls to this method makes use of them, and for some, likestrScheme
,strHost
andstrRMAPrefix
, it seems like they are not really variable in the context of the Allen Brain Observatory Api. Was there any past scenario where these were useful, or is there any future scenario where they might be put to use?- DM: This was my conservative implementation, as I learned how to use the back-end HTTP API. Future-proofing in case Allen changes the way the API should be accessed?
- Do you know if there exists documentation for the RMA API that is specific for the brain observatory? Or is the current API calls in matlab derived from the python toolbox?
- DM: Unknown to me
- Is there a reason why behavioral data and pupil data are not added as linked files? Are they not available as
well_known_files
from the API?
Are the visual stimuli used in the 2P dataset the same as those used in the neuropixels dataset? If no, can we find the 2P visual stimuli anywhere (maybe a question for someone at Allen)
-
Are there H5 files in the Visual Coding 2P dataset at this point? The "SessH5" linked file appears defunct in various ophys sessions I've retrieved lately.
- EH: The files exist in the data repository, and they contain analysis data, more specifically metrics about individual cells (rois). I don't think these data are used anywhere in matlab bot, but I imagine they could be in the future. The autodownload of these files are turned off, so they are not available through bot, and they also don't appear to have any implemented linked properties.