You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a third-party component that I process the results from. It appends an undefined number of additional fields to a line in a CSV file. This represents a variable number of qualifying attributes associated with the row. In my context, I don't care about this attribute at all and I'm happy to just let them be dropped.
The issue is, to avoid the "Length of header or names does not match length of data. This leads to a loss of data with index_col=False." warning, the callable needs to return the truncated list. The parser doesn't provide the expected length, though. It would be nice if the parser passed col_len (expected number of fields) to the callable to make it easier to drop the additional fields.
Feature Description
In PythonParser:
def _rows_to_cols(self, content: list[list[Scalar]]) -> list[np.ndarray]:
col_len = self.num_original_columns
if self._implicit_index:
col_len += len(self.index_col)
max_len = max(len(row) for row in content)
# Check that there are no rows with too many
# elements in their row (rows with too few
# elements are padded with NaN).
# error: Non-overlapping identity check (left operand type: "List[int]",
# right operand type: "Literal[False]")
if (
max_len > col_len
and self.index_col is not False # type: ignore[comparison-overlap]
and self.usecols is None
):
footers = self.skipfooter if self.skipfooter else 0
bad_lines = []
iter_content = enumerate(content)
content_len = len(content)
content = []
for i, _content in iter_content:
actual_len = len(_content)
if actual_len > col_len:
if callable(self.on_bad_lines):
new_l = self.on_bad_lines(_content **, col_len**) #<-- Pass variable col_len to callable
if new_l is not None:
content.append(new_l)
Alternative Solutions
Use an alternative method to determine the expected number of columns, like processing the header separately to count the columns or hard coding a specific value.
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I have a third-party component that I process the results from. It appends an undefined number of additional fields to a line in a CSV file. This represents a variable number of qualifying attributes associated with the row. In my context, I don't care about this attribute at all and I'm happy to just let them be dropped.
The issue is, to avoid the "Length of header or names does not match length of data. This leads to a loss of data with index_col=False." warning, the callable needs to return the truncated list. The parser doesn't provide the expected length, though. It would be nice if the parser passed col_len (expected number of fields) to the callable to make it easier to drop the additional fields.
Feature Description
In PythonParser:
Alternative Solutions
Use an alternative method to determine the expected number of columns, like processing the header separately to count the columns or hard coding a specific value.
Additional Context
No response
The text was updated successfully, but these errors were encountered: