Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of returned dataframe columns does not match to the order of columns passed via list parameter 'columns' in read() and ReadRequest() #2004

Open
grusev opened this issue Nov 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@grusev
Copy link
Collaborator

grusev commented Nov 15, 2024

Describe the bug

The documentation does not specify well 'columns' attribute. Therefore sometimes implicit expectation is that if you pass it, you will get dataframe with order of columns as they have been specified in the passed list.

i.e. if you pass a list with column names ['col_234', 'col_13', 'col_567', 'col_182']
you would expect a dataframe with same odredered columns to be returned and not DF where the column names are the same order as they have been defined.

In large DF the you cannot remember the way you have defined the order of the columns. Thus this is highly unexpected behavior

Currently 'column' serves more like a filter field - you want to have those columns returned, order is not important and will be the way when symbol was defined.

That is also OK but at least must be documented, which is not cyrrently

I am opening this issue to track our deicision. There is already a test case for that

Steps/Code to Reproduce

def test_read_batch_query_and_columns_returned_order(arctic_library):
'''
Column order is expected to match the 'columns' attribute lits
'''

def q(q):
    return q[q["bool"]]

lib = arctic_library

symbol = "sym"
df = get_sample_dataframe(size=100)
df.reset_index(inplace = True, drop = True)
columns = ['int32', 'float64', 'strings', 'bool']

lib.write(symbol, df)

batch = lib.read_batch(symbols=[ReadRequest(symbol, as_of=0, query_builder=q(QueryBuilder()), columns=columns)])

df_filtered = q(df)[columns]
assert_frame_equal_rebuild_index_first(df_filtered, batch[0].data)

Expected Results

The order of columns in returned dataframe to match the order of columns or to document this well that the order in which we return the dataframe columns will ways be the one we defined when the symbol was created

OS, Python Version and ArcticDB Version

any

Backend storage used

No response

Additional Context

No response

@grusev grusev added the bug Something isn't working label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant