Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transliterate Twig filter for Grav #3891

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

pmoreno-rodriguez
Copy link
Contributor

Transliterate is a Grav Twig filter that converts accented characters to their ASCII equivalents. For example, 'Ä' will become 'A'.

@rhukster
Copy link
Member

this seems to perform a similar task to the existing Utils::toAscii() method. I do like this regex-based approach better than the more manual char-replace approach that already exists. I suggest though that the Utils::toAscii() method be updated to use this logic and the translitate method be renamed tto_ascii, perhaps with an alias to transliterate.

@pmoreno-rodriguez
Copy link
Contributor Author

Hi @rhukster , Do you refer to this code in Utils.php?:

public static function toAscii(String $string): String
    {
        return strtr(utf8_decode($string),
            utf8_decode(
            'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
            'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
    }

I'm not sure where this method is used in the rest of the project (I can't find it in the code of the other files in the system folder), nor how I should create the alias you propose. Any suggestions?

@pmoreno-rodriguez
Copy link
Contributor Author

Ok. It seems I found a solution you might like. In the GravExtension.php file we add this:

new TwigFilter('to_ascii', [Utils::class, 'toAscii']),

And in Utils.php the following:

/**
* toAscii
*
* @param mixed $string
* @return String
*/
public static function toAscii(String $string): String
{
$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = preg_replace('/[^A-Za-z0-9 \-_]/', '', $string);
return $string;
}

Ok. It seems I found a solution you might like. In the GravExtension.php file we add this:

new TwigFilter('to_ascii', [Utils::class, 'toAscii']),

And in Utils.php the following:

/**
* toAscii
*
* @param mixed $string
* @return String
*/
public static function toAscii(String $string): String
{
$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = preg_replace('/[^A-Za-z0-9 \-_]/', '', $string);
return $string;
}

Is it okay if I make these changes to this PR?

@pmoreno-rodriguez
Copy link
Contributor Author

If you accept these suggestions, all that remains is to add the necessary help to the Grav documentation, the Twig Filters section.

@rhukster
Copy link
Member

rhukster commented Jan 24, 2025

sounds good!

@dimayakovlev
Copy link
Contributor

What if iconv() is not available? A rarely used function adds another dependency for Grav.

@rhukster
Copy link
Member

I think this maybe why i ended up going with the basic version i have already in Utils. Best solution in this scenarios is to check if function exists, if so, use best version, else fallback to basic (existing string replace version).

@pmoreno-rodriguez
Copy link
Contributor Author

Well, I'm not an expert programmer, I don't even know the whole internal structure of Grav in depth, but I'm trying to advance little by little in my knowledge of it.
Reading some articles about alternatives to utf8_decode, they suggest the use of other more modern functions like iconv() or mb_convert_encoding(), which are supported by the most modern versions of PHP.

https://www.dopethemes.com/utf8_decode-replacement-modern-php-encoding-alternatives/

What do you suggest @dimayakovlev , to add a filter that can have the purpose proposed here (like other CMS like Drupal have, with transliterate filter)

@pmoreno-rodriguez
Copy link
Contributor Author

pmoreno-rodriguez commented Jan 24, 2025

I've been testing on my local Grav and so far it's worked without any issues. There's nothing else to add to Grav's code to make use of this filter, so I understand that the iconv() function shouldn't be an impediment. What's more, it includes error handling, adding //IGNORE in case of using non-allowed characters.

In the PHP manual, it clearly states that iconv() is included in the C standard library (libc-bin)

@pmoreno-rodriguez
Copy link
Contributor Author

If preferred, you could use the mbstring library, which appears as requirements of php for Grav, or add the iconv() library to these requirements.

@dimayakovlev
Copy link
Contributor

@pmoreno-rodriguez, for transliteration it is better to use class Transliterator if the intl extension is available.

@dimayakovlev
Copy link
Contributor

The problem of transliteration is much more complicated than it seems. There is no single standard for transliteration, and different rules can be used in the same language. If the Twig transliteration filter is included in Grav CMS, it should be possible to define your own transliteration table to get an acceptable result.

@pmoreno-rodriguez
Copy link
Contributor Author

Ok. At this point, I only dare to make a basic proposal, made with the Transliterator class, which I have tested with different Russian, Bulgarian, etc. phrases and apparently they work. If it is feasible to introduce this filter in Grav, I will be happy to look for more information and improve it, but if not, I think I will spend my time on other projects that I still have pending for Grav.

Here is the code based on the php transliterator class:

 public static function toAscii(String $string): String
 {
 $transliterator = \Transliterator::create('Any-Latin; Latin-ASCII; [\u0100-\u7FFF] Remove');

 $string = $transliterator->transliterate($string);

 $string = preg_replace('/[^A-Za-z0-9 \-_]/', '', $string);

 return $string;
 }

The function could be called Transliterate instead of toAscii.
With this code I show an example of output for the following code:

{{ page.title|to_ascii }}

where title is: Как дела в Мадриде?, with transliterate filter is converted into Kak dela v Madrid

@dimayakovlev
Copy link
Contributor

In my opinion this is a good idea for a plugin.

Use Transliterator class by default, with the option to set your own rule. If the class is not available, you can use iconv() or a user defined transliteration table.

And if we're talking about simple transliteration, we don't need to remove characters, we just need to replace letters of one alphabet with letters of another, in our case, letters of the English alphabet.

So you can create two filters:

  • transliterate - transliterate text and return result as is;
  • to_ascii - convert and delete non-ASCII characters.

@pmoreno-rodriguez
Copy link
Contributor Author

Good. I'll see what I can do with your last suggestion. I will try to create a plugin with basic options and see how it works

@pmoreno-rodriguez
Copy link
Contributor Author

pmoreno-rodriguez commented Jan 28, 2025

Hello @rhukster and @dimayakovlev. I already have a base for a future plugin to include the filter to transliterate text in Grav.
How about you take a look at it and we can all improve it? In principle I have done some tests and it works more or less well. Perhaps it is missing the option for the user to be able to create their own transliteration rules.

This is the GitHub repository so you can contribute your suggestions. Thanks in advance.

@pmoreno-rodriguez
Copy link
Contributor Author

Hi again.

Today I've uploaded a new version of Transliterate plugin, with full README, some improvements for transliterate and ASCII filters.

Please, @rhukster and @dimayakovlev, take a look to see if it is a useful plugin to include in the Grav plugin repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants