Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add to_numeric_br() function to convert Brazilian-formatted numbers #60998

Open
1 of 3 tasks
Veras-D opened this issue Feb 24, 2025 · 6 comments
Open
1 of 3 tasks
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Veras-D
Copy link

Veras-D commented Feb 24, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56) into numeric types.

Currently, pd.to_numeric() does not support this format, and users have to manually apply .str.replace(".", "").replace(",", "."), which is not intuitive.

This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.

Feature Description

Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.

Proposed Implementation (Pseudocode)

def to_numeric_br(series, errors="raise"):
    """
    Converts Brazilian-style numeric strings (1.234,56) into float.

    Parameters:
    ----------
    series : pandas.Series
        Data to be converted.
    errors : str, default 'raise'
        - 'raise' : Throws an error for invalid values.
        - 'coerce' : Converts invalid values to NaN.
        - 'ignore' : Returns the original data in case of error.

    Returns:
    -------
    pandas.Series with numeric values.
    """

Expected Behavior

import pandas as pd

df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")

print(df)

Expected Output:

      values  converted_values
0  1.234,56          1234.56
1  5.600,75          5600.75
2    100,50           100.50

Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric(), adding a locale="br" parameter.

Alternative Solutions

Currently, users must manually apply string replacements before using pd.to_numeric(), like this:

df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")

While this works, it is not user-friendly, especially for beginners.

Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.

Additional Context

  • Similar requests have been made by users handling locale-specific number formats.
  • Would the maintainers prefer a standalone function (to_numeric_br()) or a locale parameter in pd.to_numeric()?
  • Happy to implement this if maintainers approve!
@Veras-D Veras-D added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 24, 2025
@Liam3851
Copy link
Contributor

See also #4674, and #56934 which would have added this support.

@itayg2341
Copy link

I've encountered a similar need to handle Brazilian number formatting and created a function that might be helpful. It addresses the different errors parameter options as well.

import pandas as pd
import numpy as np

def to_numeric_br(series, errors="raise"):
    """
    Converts Brazilian-style numeric strings (1.234,56) into float.

    Parameters:
    ----------
    series : pandas.Series
        Data to be converted.
    errors : str, default 'raise'
        - 'raise' : Throws an error for invalid values.
        - 'coerce' : Converts invalid values to NaN.
        - 'ignore' : Returns the original data in case of error.

    Returns:
    -------
    pandas.Series with numeric values.
    """

    def converter(x):
        if pd.isna(x):
            return x
        try:
            return float(x.replace(".", "").replace(",", "."))
        except ValueError:
            if errors == "raise":
                raise
            elif errors == "coerce":
                return np.nan
            elif errors == "ignore":
                return x
            else:
                raise ValueError("Invalid error value")

    return series.apply(converter)

Example usage:

df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50", "invalid"]})

df["converted_coerce"] = to_numeric_br(df["values"], errors="coerce")
df["converted_ignore"] = to_numeric_br(df["values"], errors="ignore")

print(df)

try:
    df["converted_raise"] = to_numeric_br(df["values"], errors="raise")
except ValueError as e:
    print(f"Caught exception as expected: {e}")

This function handles NaN values gracefully and provides flexibility in how errors are managed. While integrating this directly into pd.to_numeric with a locale option would be ideal, this standalone function could be a useful workaround in the meantime. I hope this contributes to the discussion!

@Veras-D
Copy link
Author

Veras-D commented Feb 24, 2025

Thanks for pointing that out, @Liam3851 If #4674 and #56934 already added support for specifying decimal and a thousand separators in pd.to_numeric(), then my proposal might be redundant.

Could you confirm if this feature is already fully implemented in the latest Pandas release? If so, users in Brazil could simply use pd.to_numeric(..., decimal=',', thousands='.') instead of a separate function.

If there are any remaining gaps, I'd be happy to adjust my proposal accordingly.

@Veras-D
Copy link
Author

Veras-D commented Feb 24, 2025

Thank you, @itayg2341 looks a great solution.

@Veras-D Veras-D closed this as completed Feb 24, 2025
@Liam3851
Copy link
Contributor

@Veras-D I'd suggest you could re-open this, as I don't believe #56934 was ever merged. cc: @mroeschke

@Veras-D
Copy link
Author

Veras-D commented Feb 25, 2025

Thanks for the clarification @Liam3851! I've reopened the issue. Let me know if there's anything I can do to help move this forward.

@Veras-D Veras-D reopened this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants