Merge branch 'master' of https://github.com/cbhaley/calibre

Template language: Add a transliterate function, useful to convert metadata in other languages into its approximate English representation, when exporting books from calibre. Fixes #1249562 [Feature Request: option to transliterate metadata when sending to device](https://bugs.launchpad.net/calibre/+bug/1249562)
Q-Qaysaneah · Nov 9, 2013 · 016977d · 016977d
2 parents 3113191 + 60a6212
commit 016977d
Show file tree

Hide file tree

Showing 2 changed files with 33 additions and 11 deletions.
diff --git a/manual/template_lang.rst b/manual/template_lang.rst
@@ -131,12 +131,10 @@ The functions available are listed below. Note that the definitive documentation
     * ``language_codes(lang_strings)`` -- return the language codes for the strings passed in `lang_strings`. The strings must be in the language of the current locale. `Lang_strings` is a comma-separated list.
     * ``language_strings(lang_codes, localize)`` -- return the strings for the language codes passed in `lang_codes`. If `localize` is zero, return the strings in English. If localize is not zero, return the strings in the language of the current locale. `Lang_codes` is a comma-separated list.
     * ``list_item(index, separator)`` -- interpret the field as a list of items separated by `separator`, returning the `index`th item. The first item is number zero. The last item can be returned using `list_item(-1,separator)`. If the item is not in the list, then the empty value is returned. The separator has the same meaning as in the `count` function.
-    * ``re(pattern, replacement)`` -- return the field after applying the regular expression. All instances of `pattern` are replaced with `replacement`. As in all of |app|, these are python-compatible regular expressions.
-    * ``shorten(left chars, middle text, right chars)`` -- Return a shortened version of the field, consisting of `left chars` characters from the beginning of the field, followed by `middle text`, followed by `right chars` characters from the end of the string. `Left chars` and `right chars` must be integers. For example, assume the title of the book is `Ancient English Laws in the Times of Ivanhoe`, and you want it to fit in a space of at most 15 characters. If you use ``{title:shorten(9,-,5)}``, the result will be `Ancient E-nhoe`. If the field's length is less than ``left chars`` + ``right chars`` + the length of ``middle text``, then the field will be used intact. For example, the title `The Dome` would not be changed.
-    * ``swap_around_comma(val) `` -- given a value of the form ``B, A``, return ``A B``. This is most useful for converting names in LN, FN format to FN LN. If there is no comma, the function returns val unchanged.
-    * ``switch(pattern, value, pattern, value, ..., else_value)`` -- for each ``pattern, value`` pair, checks if the field matches the regular expression ``pattern`` and if so, returns that ``value``. If no ``pattern`` matches, then ``else_value`` is returned. You can have as many ``pattern, value`` pairs as you want.
     * ``lookup(pattern, field, pattern, field, ..., else_field)`` -- like switch, except the arguments are field (metadata) names, not text. The value of the appropriate field will be fetched and used. Note that because composite columns are fields, you can use this function in one composite field to use the value of some other composite field. This is extremely useful when constructing variable save paths (more later).
+    * ``re(pattern, replacement)`` -- return the field after applying the regular expression. All instances of `pattern` are replaced with `replacement`. As in all of |app|, these are python-compatible regular expressions.
     * ``select(key)`` -- interpret the field as a comma-separated list of items, with the items being of the form "id:value". Find the pair with the id equal to key, and return the corresponding value. This function is particularly useful for extracting a value such as an isbn from the set of identifiers for a book.
+    * ``shorten(left chars, middle text, right chars)`` -- Return a shortened version of the field, consisting of `left chars` characters from the beginning of the field, followed by `middle text`, followed by `right chars` characters from the end of the string. `Left chars` and `right chars` must be integers. For example, assume the title of the book is `Ancient English Laws in the Times of Ivanhoe`, and you want it to fit in a space of at most 15 characters. If you use ``{title:shorten(9,-,5)}``, the result will be `Ancient E-nhoe`. If the field's length is less than ``left chars`` + ``right chars`` + the length of ``middle text``, then the field will be used intact. For example, the title `The Dome` would not be changed.
     * ``str_in_list(val, separator, string, found_val, not_found_val)`` -- treat val as a list of items separated by separator, comparing the string against each value in the list. If the string matches a value, return found_val, otherwise return not_found_val. If the string contains separators, then it is also treated as a list and each value is checked.
     * ``subitems(val, start_index, end_index)`` -- This function is used to break apart lists of tag-like hierarchical items such as genres. It interprets the value as a comma-separated list of tag-like items, where each item is a period-separated list. Returns a new list made by first finding all the period-separated tag-like items, then for each such item extracting the components from `start_index` to `end_index`, then combining the results back together. The first component in a period-separated list has an index of zero. If an index is negative, then it counts from the end of the list. As a special case, an end_index of zero is assumed to be the length of the list. Examples::
 
@@ -154,7 +152,10 @@ The functions available are listed below. Note that the definitive documentation
         {tags:sublist(-1,0,\,)} returns "C"
         {tags:sublist(0,-1,\,)} returns "A, B"
         
+    * ``swap_around_comma(val) `` -- given a value of the form ``B, A``, return ``A B``. This is most useful for converting names in LN, FN format to FN LN. If there is no comma, the function returns val unchanged.
+    * ``switch(pattern, value, pattern, value, ..., else_value)`` -- for each ``pattern, value`` pair, checks if the field matches the regular expression ``pattern`` and if so, returns that ``value``. If no ``pattern`` matches, then ``else_value`` is returned. You can have as many ``pattern, value`` pairs as you want.
     * ``test(text if not empty, text if empty)`` -- return `text if not empty` if the field is not empty, otherwise return `text if empty`.
+	* ``transliterate()`` -- Returns a string in a latin alphabet formed by approximating the sound of the words in the source field. For example, if the source field is ``Фёдор Миха́йлович Достоевский`` the function returns ``Fiodor Mikhailovich Dostoievskii``.'
 
 Now, what about using functions and formatting in the same field. Suppose you have an integer custom column called ``#myint`` that you want to see with leading zeros, as in ``003``. To do this, you would use a format of ``0>3s``. However, by default, if a number (integer or float) equals zero then the field produces the empty value, so zero values will produce nothing, not ``000``. If you really want to see ``000`` values, then you use both the format string and the ``ifempty`` function to change the empty value back to a zero. The field reference would be::
 

diff --git a/src/calibre/utils/formatter_functions.py b/src/calibre/utils/formatter_functions.py
@@ -1,3 +1,6 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+
 '''
 Created on 13 Jan 2011
 
@@ -132,6 +135,7 @@ def eval_(self, formatter, kwargs, mi, locals, *args):
             return unicode(ret)
 
 class BuiltinFormatterFunction(FormatterFunction):
+
     def __init__(self):
         formatter_functions().register_builtin(self)
         eval_func = inspect.getmembers(self.__class__,
@@ -352,7 +356,7 @@ class BuiltinLookup(BuiltinFormatterFunction):
             'variable save paths')
 
     def evaluate(self, formatter, kwargs, mi, locals, val, *args):
-        if len(args) == 2: # here for backwards compatibility
+        if len(args) == 2:  # here for backwards compatibility
             if val:
                 return formatter.vformat('{'+args[0].strip()+'}', [], kwargs)
             else:
@@ -741,11 +745,11 @@ def evaluate(self, formatter, kwargs, mi, locals, val, template):
             v1 = float(val)
         except:
             return ''
-        try: # Try formatting the value as a float
+        try:  # Try formatting the value as a float
             return template.format(v1)
         except:
             pass
-        try: # Try formatting the value as an int
+        try:  # Try formatting the value as an int
             v2 = trunc(v1)
             if v2 == v1:
                 return template.format(v2)
@@ -1285,6 +1289,21 @@ def evaluate(self, formatter, kwargs, mi, locals_):
             return mi._proxy_metadata.virtual_libraries
         return _('This function can be used only in the GUI')
 
+class BuiltinTransliterate(BuiltinFormatterFunction):
+    name = 'transliterate'
+    arg_count = 1
+    category = 'String manipulation'
+    __doc__ = doc = _(u'transliterate(a) -- Returns a string in a latin alphabet '
+                      u'formed by approximating the sound of the words in the '
+                      u'source string. For example, if the source is "Фёдор '
+                      u'Миха́йлович Достоевский" the function returns "Fiodor '
+                      u'Mikhailovich Dostoievskii".')
+
+    def evaluate(self, formatter, kwargs, mi, locals, source):
+        from calibre.utils.filenames import ascii_text
+        return ascii_text(source)
+
+
 _formatter_builtins = [
     BuiltinAdd(), BuiltinAnd(), BuiltinApproximateFormats(),
     BuiltinAssign(), BuiltinBooksize(),
@@ -1306,10 +1325,12 @@ def evaluate(self, formatter, kwargs, mi, locals_):
     BuiltinStrcmp(), BuiltinStrInList(), BuiltinStrlen(), BuiltinSubitems(),
     BuiltinSublist(),BuiltinSubstr(), BuiltinSubtract(), BuiltinSwapAroundComma(),
     BuiltinSwitch(), BuiltinTemplate(), BuiltinTest(), BuiltinTitlecase(),
-    BuiltinToday(), BuiltinUppercase(), BuiltinVirtualLibraries()
+    BuiltinToday(), BuiltinTransliterate(), BuiltinUppercase(),
+    BuiltinVirtualLibraries()
 ]
 
 class FormatterUserFunction(FormatterFunction):
+
     def __init__(self, name, doc, arg_count, program_text):
         self.name = name
         self.doc = doc
@@ -1319,9 +1340,9 @@ def __init__(self, name, doc, arg_count, program_text):
 tabs = re.compile(r'^\t*')
 def compile_user_function(name, doc, arg_count, eval_func):
     def replace_func(mo):
-        return  mo.group().replace('\t', '    ')
+        return mo.group().replace('\t', '    ')
 
-    func = '    ' + '\n    '.join([tabs.sub(replace_func, line )
+    func = '    ' + '\n    '.join([tabs.sub(replace_func, line)
                                    for line in eval_func.splitlines()])
     prog = '''
 from calibre.utils.formatter_functions import FormatterUserFunction
@@ -1356,4 +1377,4 @@ def load_user_template_functions(library_uuid, funcs):
     formatter_functions().register_functions(library_uuid, compiled_funcs)
 
 def unload_user_template_functions(library_uuid):
-    formatter_functions().unregister_functions(library_uuid)
+    formatter_functions().unregister_functions(library_uuid)