-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data integrity test: SIMBAD-resolvable sources? #64
base: main
Are you sure you want to change the base?
Conversation
Pretty sure tests will fail — still working on this. |
we want to instead do this for the |
SIMBAD as part of a set of monthly tests? |
todo: make this a ~monthly check. |
In SIMPLE, we called it Here's also the action file to use for inspiration. This runs every 30 days: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super cool but I don't really understand the difference between the two tests. I think it will be more clear if you make the initial query a fixture that's only executed once.
# Examine DB for each input, displaying results when more than one source matches | ||
t = db.search_object( | ||
simbad_names, output_table="Sources", fmt="astropy", fuzzy_search=False | ||
) | ||
if len(t) != 1: | ||
duplicate_rows += [row] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a very thorough test. This is making sure every possibly SIMBAD identifier also identifies only one database source, right? This seems extra to me. It seems like just checking that the first identifier matches only one database source should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it currently does the latter? simbad_results
is only 3 rows for the 3 sources in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how simbad_results
can be only 3 rows. A quick search for WASP-76 yielded about a dozen names. The logic here should have that looping over each one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I see why it's 3 rows now: the DB has 3 rows and that's what Simbad returns back. Each row can have multiple names so for example simbad_names
can contain ['BD+01 316b', 'WASP-76b']
for a single row and search_object
considers both of them when searching (since it can take a list as an input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to wrap my head around this test so I ended up making some readability comments.
Co-authored-by: Kelle Cruz <[email protected]>
Co-authored-by: Kelle Cruz <[email protected]>
Co-authored-by: Kelle Cruz <[email protected]>
thanks for all the comments! |
Tried running in CodeSpaces but this doesn't run as-is, so some more iteration is needed; see the comments above. |
Pushed a fix to the various issues I noticed above. The script runs successfully for me in CodeSpaces. |
Co-authored-by: Kelle Cruz <[email protected]>
todo: rename the files to be the source names |
do we want to test whether sources are simbad-resolvable?