Import kitchen_1.2.4.orig.tar.gz

This commit is contained in:
Sergio Durigan Junior 2016-07-08 19:18:01 -04:00
parent dfb12f36e6
commit 2faeac3a1a
154 changed files with 11665 additions and 2152 deletions

6
.gitignore vendored Normal file
View file

@ -0,0 +1,6 @@
*.pyc
MANIFEST
dist
*.egg*
*.pdf
build

12
.travis.yml Normal file
View file

@ -0,0 +1,12 @@
language: python
python:
- "2.6"
- "2.7"
- "3.4"
install: python setup.py develop
script: ./runtests.sh
notifications:
irc:
- "irc.freenode.net#threebean"
on_success: never
on_failure: always

7
.tx/config Normal file
View file

@ -0,0 +1,7 @@
[main]
host = https://www.transifex.com
[kitchen.kitchenpot]
file_filter = po/<lang>.po
source_file = po/kitchen.pot
source_lang = en

View file

@ -3,8 +3,9 @@ Some notes on hacking on kitchen
================================
:Author: Toshio Kuratomi
:Date: 2 Jan 2012
:Version: 1.1.x
:Maintainer: Ralph Bean
:Date: 2 Dec 2014
:Version: 1.2.x
For coding and kitchen, see the style guide in the documentation.
@ -40,20 +41,20 @@ be found in the `transifex user's guide`_.
.. `transifex user's guide`:: http://help.transifex.net/user-guide/translating.html
To generate the POT file (located in the po/ subdirectory), use pybabel to
extract the messages. Tun the following from the top level directory::
extract the messages. Run the following from the top level directory::
pybabel extract -o po/kitchen.pot kitchen -kb_ -kbN_
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
Then commit this pot file and upload to transifex::
tx push -s
bzr commit -m 'Extract new strings from the source files' po/kitchen.pot
bzr push
git commit -m 'Extract new strings from the source files' po/kitchen.pot
git push
To pull messages from transifex prior to making a release, do::
tx pull -a
bzr commit -m 'Merge new translations from transifex' po/*.po
git commit -m 'Merge new translations from transifex' po/*.po
If you see a status message from transifex like this::
Pulling new translations for resource kitchen.kitchenpot (source: po/kitchen.pot)
@ -62,8 +63,8 @@ If you see a status message from transifex like this::
it means that transifex has created a brand new po file for you. You need to
add the new file to source control and commit it like this::
bzr add po/fr.po
bzr commit -m 'New French translation' po/fr.po
git add po/fr.po
git commit -m 'New French translation' po/fr.po
TODO: Add information about announcing string freeze. Using transifex's add
@ -130,7 +131,8 @@ Unittest
Kitchen has a large set of unittests. All of them should pass before release.
You can run the unittests with the following command::
nosetests --with-coverage --cover-package kitchen
./runtests.sh
This will run all the unittests under the tests directory and also generate
some statistics about which lines of code were not accessed when kitchen ran.
@ -144,48 +146,70 @@ some statistics about which lines of code were not accessed when kitchen ran.
a look at :file:`test_i18n.py` and :file:`test_converters.py` to see tests
that attempt to cover enough input values to detect problems.
Since kitchen is currently supported on python-2.3.1+, it is desirable to test
kitchen on at least one python major version from python-2.3 through
python-2.7. We currently have access to a buildbot that has access to
python-2.4, python-2.6, and python-2.7. You can view it at
http://ci.csh.rit.edu:8080/view/Kitchen/ . The buildbot checks the devel
repository hourly and if new checkins have occurred, it attempts to rebuild.
If you need access to invoke builds on the buildbot more regularly than that,
contact Toshio to get access.
Since kitchen is currently supported on python2 and python3, it is desirable to
run tests against as many python versions as possible. We currently have a
jenkins instance in the Fedora Infrastructure private cloud with a job set up
for kitchen at http://jenkins.cloud.fedoraproject.org/job/kitchen/
We were unable to get python-2.3 working in the buildbot so I manually run the
unittests on a CentOS-4 virtual machine (with python-2.3). I currently don't
test on python-2.5 but I'd be happy to take bug reports or get a new committer
that was interested in that platform.
It is not currently running tests against python-2.{3,4,5,6}. If you are
interested in getting those builds running automatically, please speak up in
the #fedora-apps channel on freenode.
Creating the release
====================
Then commit this pot file and upload to transifex:
1. Make sure that any feature branches you want have been merged.
2. Pull in new translations and verify they are valid::
2. Make a fresh branch for your release::
git flow release start $VERSION
3. Extract strings for translation and push them to transifex::
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
tx push -s
git commit -m 'Extract new strings from the source files' po/kitchen.pot
git push
4. Wait for translations. In the meantime...
5. Update the version in ``kitchen/__init__.py`` and ``NEWS.rst``.
6. When they're all ready, pull in new translations and verify they are valid::
tx pull -a
# If msgfmt is installed, this will check that the catalogs are valid
./releaseutils.py
bzr commit -m 'Merge new translations from transifex.net'
3. Update the version in kitchen/__init__.py and NEWS.
4. Make a fresh clone of the repository::
cd $PATH_TO_MY_SHARED_REPO
bzr branch bzr://bzr.fedorahosted.org/bzr/kitchen/devel release
5. Make the source tarball in that directory::
cd release
git commit -m 'Merge new translations from transifex.net'
git push
7. Create a pull-request so someone else from #fedora-apps can review::
hub pull-request -b master
8. Once someone has given it a +1, then make a source tarball::
python setup.py sdist
6. Make sure that the source tarball contains all of the files we want in the release::
cd ..
tar -xzvf release/dist/kitchen*tar.gz
diff -uNr devel kitchen-$RELEASE_VERSION
7. Upload the docs to pypi::
cd release
9. Upload the docs to pypi::
mkdir -p build/sphinx/html
sphinx-build kitchen2/docs/ build/sphinx/html
python setup.py upload_docs
8. Upload the tarball to pypi::
python setup.py sdist upload --sign
9. Upload the tarball to fedorahosted::
scp dist/kitchen*tar.gz fedorahosted.org:/srv/web/releases/k/i/kitchen/
10. Tag the release::
cd ../devel
bzr tag $RELEASE_VERSION
bzr push
10. Upload the tarball to pypi::
python setup.py sdist upload --sign
11. Upload the tarball to fedorahosted::
scp dist/kitchen*tar.gz* fedorahosted.org:/srv/web/releases/k/i/kitchen/
12. Tag and bag it::
git flow release finish -m $VERSION -u $YOUR_GPG_KEY_ID $VERSION
git push origin develop:develop
git push origin master:master
git push origin --tags
# Your pull-request should automatically close. Double-check this, though.

11
MANIFEST.in Normal file
View file

@ -0,0 +1,11 @@
include COPYING COPYING.LESSER
include *.rst
include releaseutils.py
recursive-include tests *.py *.po *.pot *.mo
recursive-include docs *
include po/*.pot
include po/*.po
include locale/*/*/*.mo
recursive-include kitchen2 *.py *.po *.mo *.pot
recursive-include kitchen3 *.py *.po *.mo *.pot
include runtests.sh

View file

@ -2,9 +2,72 @@
NEWS
====
:Authors: Toshio Kuratomi
:Date: 14 Feb 2012
:Version: 1.1.1
:Author: Toshio Kuratomi
:Maintainer: Ralph Bean
:Date: 13 Nov 2015
:Version: 1.2.x
-----
1.2.4
-----
* Further compat fixes for python-3.5
-----
1.2.3
-----
* Compatibility with python-3.5
-----
1.2.2
-----
* Compatibility with python-3.4
* Compatibility with pep470
-----
1.2.1
-----
* Fix release-related problems with the 1.2.0 tarball.
- Include locale data for the test suite.
- Include NEWS.rst and README.rst.
- Include runtests.sh.
- Adjust trove classifiers to indicate python3 support.
-----
1.2.0
-----
* kitchen gained support for python3. The tarball release now includes a
``kitchen2/`` and a ``kitchen3/`` directory containing copies of the source
code modified to work against each of the two major python versions. When
installing with ``pip`` or ``setup.py``, the appropriate version should be
selected and installed.
* The canonical upstream repository location moved to git and github. See
https://github.com/fedora-infra/kitchen
* Added kitchen.text.misc.isbasestring(), kitchen.text.misc.isbytestring(),
and kitchen.text.misc.isunicodestring(). These are mainly useful for code
being ported to python3 as python3 lacks a basestring type and has two types
for byte strings. Code that has to run on both python2 and python3 or
wants to provide similar byte vs unicode semantics may find these functions
to be a good abstraction.
* Add a python2_api parameter to various i18n functions: NullTranslations
constructor, NewGNUTranslations constructor, and get_translation_object.
When set to True (the default), the python2 api for gettext objects is used.
When set to False, the python3 api is used. This option is intended to aid
in porting from python2 to python3.
* Exception messages are no longer translated. The idea is that exceptions
should be easily searched for via a web search.
* Fix a bug in unicode_to_xml() where xmlcharrefs created when a unicode
string is turned into a byte string with an encoding that doesn't have
all of the needed characters had their ampersands ("&") escaped.
* Fix a bug in NewGNUTranslations.lngettext() if a fallback gettext object is
used and the message is not in any catalog.
* Speedups to process_control_chars() that are directly reflected in
unicode_to_xml() and byte_string_to_xml()
* Remove C1 Control Codes in to_xml() as well as C0 Control Codes
-----
1.1.1

View file

@ -1,39 +0,0 @@
Metadata-Version: 1.0
Name: kitchen
Version: 1.1.1
Summary: Kitchen contains a cornucopia of useful code
Home-page: https://fedorahosted.org/kitchen
Author: Toshio Kuratomi
Author-email: toshio@fedoraproject.org
License: LGPLv2+
Download-URL: https://fedorahosted.org/releases/k/i/kitchen
Description:
We've all done it. In the process of writing a brand new application we've
discovered that we need a little bit of code that we've invented before.
Perhaps it's something to handle unicode text. Perhaps it's something to make
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
a tiny bit of code that seems too small to worry about pushing into its own
module so it sits there, a part of your current project, waiting to be cut and
pasted into your next project. And the next. And the next. And since that
little bittybit of code proved so useful to you, it's highly likely that it
proved useful to someone else as well. Useful enough that they've written it
and copy and pasted it over and over into each of their new projects.
Well, no longer! Kitchen aims to pull these small snippets of code into a few
python modules which you can import and use within your project. No more copy
and paste! Now you can let someone else maintain and release these small
snippets so that you can get on with your life.
Keywords: Useful Small Code Snippets
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.3
Classifier: Programming Language :: Python :: 2.4
Classifier: Programming Language :: Python :: 2.5
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Software Development :: Internationalization
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General

View file

@ -3,8 +3,9 @@ Kitchen.core Module
===================
:Author: Toshio Kuratomi
:Date: 2 Jan 2012
:Version: 1.1.x
:Maintainer: Ralph Bean
:Date: 13 Nov 2015
:Version: 1.2.x
The Kitchen module provides a python API for all sorts of little useful
snippets of code that everybody ends up writing for their projects but never
@ -38,12 +39,15 @@ Requirements
kitchen.core requires
:python: 2.3.1 or later
:python: 2.4 or later
Since version 1.2.0, this package has distributed both python2 and python3
compatible versions of the source.
Soft Requirements
=================
If found, these libraries will be used to make the implementation of soemthing
If found, these libraries will be used to make the implementation of something
better in some way. If they are not present, the API that they enable will
still exist but may function in a different manner.
@ -78,4 +82,5 @@ Testing
=======
You can run the unittests with this command::
nosetests --with-coverage --cover-package kitchen
./runtests.sh

View file

@ -10,10 +10,10 @@ Style
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
------------------------
Python 2.3 compatibility
Python 2.4 compatibility
------------------------
At the moment, we're supporting python-2.3 and above. Understand that there's
At the moment, we're supporting python-2.4 and above. Understand that there's
a lot of python features that we cannot use because of this.
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
@ -23,7 +23,7 @@ available. When we do that we need to be careful of several things:
:file:`maintainers/sync-copied-files.py` for this.
2. Sync the unittests as well as the module.
3. Be aware that not all modules are written to remain compatible with
Python-2.3 and might use python language features that were not present
Python-2.4 and might use python language features that were not present
then (generator expressions, relative imports, decorators, with, try: with
both except: and finally:, etc) These are not good candidates for
importing into kitchen as they require more work to keep synced.
@ -56,7 +56,7 @@ Unittests
* We're using nose for unittesting. Rather than depend on unittest2
functionality, use the functions that nose provides.
* Remember to maintain python-2.3 compatibility even in unittests.
* Remember to maintain python-2.4 compatibility even in unittests.
----------------------------
Docstrings and documentation

View file

@ -9,7 +9,7 @@ Kitchen, everything but the sink
We've all done it. In the process of writing a brand new application we've
discovered that we need a little bit of code that we've invented before.
Perhaps it's something to handle unicode text. Perhaps it's something to make
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
a tiny bit of code that seems too small to worry about pushing into its own
module so it sits there, a part of your current project, waiting to be cut and
pasted into your next project. And the next. And the next. And since that
@ -37,11 +37,9 @@ Requirements
We've tried to keep the core kitchen module's requirements lightweight. At the
moment kitchen only requires
:python: 2.3.1 or later
:python: 2.4 or later
.. warning:: Kitchen-1.1.0 is likely to be the last release that supports
python-2.3.x. Future releases will target python-2.4 as the minimum
required version.
.. warning:: Kitchen-1.1.0 was the last release that supported python-2.3.x
Soft Requirements
=================
@ -73,9 +71,9 @@ now, I just mention them here:
lists and dicts, transforming the dicts to Bunch's.
`hashlib <http://code.krypto.org/python/hashlib/>`_
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
hash functions to python. If you're developing for python2.3 or
python2.4, though, you can install the standalone hashlib library and have
access to the same functions.
hash functions to python. If you're developing for python2.4 though, you
can install the standalone hashlib library and have access to the same
functions.
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
The python documentation for :mod:`itertools` has some examples
of other nice iterable functions that can be built from the

View file

@ -35,7 +35,7 @@ from kitchen import versioning
(b_, bN_) = i18n.easy_gettext_setup('kitchen.core', use_unicode=False)
#pylint: enable-msg=C0103
__version_info__ = ((1, 1, 1),)
__version_info__ = ((1, 2, 4),)
__version__ = versioning.version_tuple_to_string(__version_info__)
__all__ = ('exceptions', 'release',)

View file

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) 2010-2011 Red Hat, Inc
# Copyright (c) 2010-2012 Red Hat, Inc
# Copyright (c) 2009 Milos Komarcevic
# Copyright (c) 2008 Tim Lauridsen
#
@ -89,7 +89,7 @@ See the documentation of :func:`easy_gettext_setup` and
from kitchen.versioning import version_tuple_to_string
__version_info__ = ((2, 1, 1),)
__version_info__ = ((2, 2, 0),)
__version__ = version_tuple_to_string(__version_info__)
import copy
@ -99,6 +99,7 @@ import itertools
import locale
import os
import sys
import warnings
# We use the _default_localedir definition in get_translation_object
try:
@ -107,7 +108,7 @@ except ImportError:
_DEFAULT_LOCALEDIR = os.path.join(sys.prefix, 'share', 'locale')
from kitchen.text.converters import to_bytes, to_unicode
from kitchen.text.misc import byte_string_valid_encoding
from kitchen.text.misc import byte_string_valid_encoding, isbasestring
# We cache parts of the translation objects just like stdlib's gettext so that
# we don't reparse the message files and keep them in memory separately if the
@ -199,9 +200,12 @@ class DummyTranslations(object, gettext.NullTranslations):
:func:`locale.getpreferredencoding`.
* Make setting :attr:`input_charset` and :attr:`output_charset` also
set those attributes on any fallback translation objects.
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
Add python2_api parameter to __init__()
'''
#pylint: disable-msg=C0103,C0111
def __init__(self, fp=None):
def __init__(self, fp=None, python2_api=True):
gettext.NullTranslations.__init__(self, fp)
# Python 2.3 compat
@ -212,6 +216,46 @@ class DummyTranslations(object, gettext.NullTranslations):
# 'utf-8' is only a default here. Users can override.
self._input_charset = 'utf-8'
# Decide whether to mimic the python2 or python3 api
self.python2_api = python2_api
def _set_api(self):
if self._python2_api:
warnings.warn('Kitchen.i18n provides gettext objects that'
' implement either the python2 or python3 gettext api.'
' You are currently using the python2 api. Consider'
' switching to the python3 api by setting'
' python2_api=False when creating the gettext object',
PendingDeprecationWarning, stacklevel=2)
self.gettext = self._gettext
self.lgettext = self._lgettext
self.ugettext = self._ugettext
self.ngettext = self._ngettext
self.lngettext = self._lngettext
self.ungettext = self._ungettext
else:
self.gettext = self._ugettext
self.lgettext = self._lgettext
self.ngettext = self._ungettext
self.lngettext = self._lngettext
self.ugettext = self._removed_method_factory('ugettext')
self.ungettext = self._removed_method_factory('ungettext')
def _removed_method_factory(self, name):
def _removed_method(*args, **kwargs):
raise AttributeError("'%s' object has no attribute '%s'" %
(self.__class__.__name__, name))
return _removed_method
def _set_python2_api(self, value):
self._python2_api = value
self._set_api()
def _get_python2_api(self):
return self._python2_api
python2_api = property(_get_python2_api, _set_python2_api)
def _set_input_charset(self, charset):
if self._fallback:
try:
@ -276,7 +320,7 @@ class DummyTranslations(object, gettext.NullTranslations):
# Make sure that we're returning a str of the desired encoding
return to_bytes(msg, encoding=output_encoding)
def gettext(self, message):
def _gettext(self, message):
# First use any fallback gettext objects. Since DummyTranslations
# doesn't do any translation on its own, this is a good first step.
if self._fallback:
@ -292,7 +336,7 @@ class DummyTranslations(object, gettext.NullTranslations):
return self._reencode_if_necessary(message, output_encoding)
def ngettext(self, msgid1, msgid2, n):
def _ngettext(self, msgid1, msgid2, n):
# Default
if n == 1:
message = msgid1
@ -313,7 +357,7 @@ class DummyTranslations(object, gettext.NullTranslations):
return self._reencode_if_necessary(message, output_encoding)
def lgettext(self, message):
def _lgettext(self, message):
if self._fallback:
try:
message = self._fallback.lgettext(message)
@ -329,7 +373,7 @@ class DummyTranslations(object, gettext.NullTranslations):
return self._reencode_if_necessary(message, output_encoding)
def lngettext(self, msgid1, msgid2, n):
def _lngettext(self, msgid1, msgid2, n):
# Default
if n == 1:
message = msgid1
@ -351,8 +395,8 @@ class DummyTranslations(object, gettext.NullTranslations):
return self._reencode_if_necessary(message, output_encoding)
def ugettext(self, message):
if not isinstance(message, basestring):
def _ugettext(self, message):
if not isbasestring(message):
return u''
if self._fallback:
msg = to_unicode(message, encoding=self.input_charset)
@ -365,7 +409,7 @@ class DummyTranslations(object, gettext.NullTranslations):
# Make sure we're returning unicode
return to_unicode(message, encoding=self.input_charset)
def ungettext(self, msgid1, msgid2, n):
def _ungettext(self, msgid1, msgid2, n):
# Default
if n == 1:
message = msgid1
@ -474,8 +518,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
def _parse(self, fp):
gettext.GNUTranslations._parse(self, fp)
def gettext(self, message):
if not isinstance(message, basestring):
def _gettext(self, message):
if not isbasestring(message):
return ''
tmsg = message
u_message = to_unicode(message, encoding=self.input_charset)
@ -495,13 +539,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
return self._reencode_if_necessary(tmsg, output_encoding)
def ngettext(self, msgid1, msgid2, n):
def _ngettext(self, msgid1, msgid2, n):
if n == 1:
tmsg = msgid1
else:
tmsg = msgid2
if not isinstance(msgid1, basestring):
if not isbasestring(msgid1):
return ''
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
try:
@ -521,8 +565,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
return self._reencode_if_necessary(tmsg, output_encoding)
def lgettext(self, message):
if not isinstance(message, basestring):
def _lgettext(self, message):
if not isbasestring(message):
return ''
tmsg = message
u_message = to_unicode(message, encoding=self.input_charset)
@ -542,13 +586,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
return self._reencode_if_necessary(tmsg, output_encoding)
def lngettext(self, msgid1, msgid2, n):
def _lngettext(self, msgid1, msgid2, n):
if n == 1:
tmsg = msgid1
else:
tmsg = msgid2
if not isinstance(msgid1, basestring):
if not isbasestring(msgid1):
return ''
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
try:
@ -557,7 +601,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
except KeyError:
if self._fallback:
try:
tmsg = self._fallback.ngettext(msgid1, msgid2, n)
tmsg = self._fallback.lngettext(msgid1, msgid2, n)
except (AttributeError, UnicodeError):
# Ignore UnicodeErrors: We'll do our own encoding next
pass
@ -569,8 +613,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
return self._reencode_if_necessary(tmsg, output_encoding)
def ugettext(self, message):
if not isinstance(message, basestring):
def _ugettext(self, message):
if not isbasestring(message):
return u''
message = to_unicode(message, encoding=self.input_charset)
try:
@ -586,13 +630,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
# Make sure that we're returning unicode
return to_unicode(message, encoding=self.input_charset)
def ungettext(self, msgid1, msgid2, n):
def _ungettext(self, msgid1, msgid2, n):
if n == 1:
tmsg = msgid1
else:
tmsg = msgid2
if not isinstance(msgid1, basestring):
if not isbasestring(msgid1):
return u''
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
try:
@ -612,7 +656,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
def get_translation_object(domain, localedirs=tuple(), languages=None,
class_=None, fallback=True, codeset=None):
class_=None, fallback=True, codeset=None, python2_api=True):
'''Get a translation object bound to the :term:`message catalogs`
:arg domain: Name of the message domain. This should be a unique name
@ -650,6 +694,15 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
:class:`str` objects. This is equivalent to calling
:meth:`~gettext.GNUTranslations.output_charset` on the Translations
object that is returned from this function.
:kwarg python2_api: When data:`True` (default), return Translation objects
that use the python2 gettext api
(:meth:`~gettext.GNUTranslations.gettext` and
:meth:`~gettext.GNUTranslations.lgettext` return byte
:class:`str`. :meth:`~gettext.GNUTranslations.ugettext` exists and
returns :class:`unicode` strings). When :data:`False`, return
Translation objects that use the python3 gettext api (gettext returns
:class:`unicode` strings and lgettext returns byte :class:`str`.
ugettext does not exist.)
:return: Translation object to get :mod:`gettext` methods from
If you need more flexibility than :func:`easy_gettext_setup`, use this
@ -730,7 +783,16 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
than simply cycling through until we find a directory that exists.
The new code is based heavily on the |stdlib|_
:func:`gettext.translation` function.
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
Add python2_api parameter
'''
if python2_api:
warnings.warn('get_translation_object returns gettext objects'
' that implement either the python2 or python3 gettext api.'
' You are currently using the python2 api. Consider'
' switching to the python3 api by setting python2_api=False'
' when you call the function.',
PendingDeprecationWarning, stacklevel=2)
if not class_:
class_ = NewGNUTranslations
@ -739,7 +801,7 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
mofiles.extend(gettext.find(domain, localedir, languages, all=1))
if not mofiles:
if fallback:
return DummyTranslations()
return DummyTranslations(python2_api=python2_api)
raise IOError(ENOENT, 'No translation file found for domain', domain)
# Accumulate a translation with fallbacks to all the other mofiles
@ -750,14 +812,22 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
if not translation:
mofile_fh = open(full_path, 'rb')
try:
translation = _translations.setdefault(full_path,
class_(mofile_fh))
try:
translation = _translations.setdefault(full_path,
class_(mofile_fh, python2_api=python2_api))
except TypeError:
# Only our translation classes have the python2_api
# parameter
translation = _translations.setdefault(full_path,
class_(mofile_fh))
finally:
mofile_fh.close()
# Shallow copy the object so that the fallbacks and output charset can
# differ but the data we read from the mofile is shared.
translation = copy.copy(translation)
translation.python2_api = python2_api
if codeset:
translation.set_output_charset(codeset)
if not stacked_translations:
@ -818,9 +888,9 @@ def easy_gettext_setup(domain, localedirs=tuple(), use_unicode=True):
Changed :func:`~kitchen.i18n.easy_gettext_setup` to return the lgettext
functions instead of gettext functions when use_unicode=False.
'''
translations = get_translation_object(domain, localedirs=localedirs)
translations = get_translation_object(domain, localedirs=localedirs, python2_api=False)
if use_unicode:
return(translations.ugettext, translations.ungettext)
return(translations.gettext, translations.ngettext)
return(translations.lgettext, translations.lngettext)
__all__ = ('DummyTranslations', 'NewGNUTranslations', 'easy_gettext_setup',

View file

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) 2010 Red Hat, Inc
# Copyright (c) 2012 Red Hat, Inc
#
# kitchen is free software; you can redistribute it and/or modify it under the
# terms of the GNU Lesser General Public License as published by the Free
@ -34,6 +34,8 @@ from kitchen.versioning import version_tuple_to_string
__version_info__ = ((0, 0, 1),)
__version__ = version_tuple_to_string(__version_info__)
from kitchen.text.misc import isbasestring
def isiterable(obj, include_string=False):
'''Check whether an object is an iterable
@ -46,7 +48,7 @@ def isiterable(obj, include_string=False):
:returns: :data:`True` if :attr:`obj` is iterable, otherwise
:data:`False`.
'''
if include_string or not isinstance(obj, basestring):
if include_string or not isbasestring(obj):
try:
iter(obj)
except TypeError:

View file

@ -78,8 +78,6 @@ the defaultdict class provided by python-2.5 and above.
import types
from kitchen import b_
# :C0103, W0613: We're implementing the python-2.5 defaultdict API so
# we have to use the same names as python.
# :C0111: We point people at the stdlib API docs for defaultdict rather than
@ -89,8 +87,8 @@ from kitchen import b_
class defaultdict(dict):
def __init__(self, default_factory=None, *args, **kwargs):
if (default_factory is not None and
not hasattr(default_factory, '__call__')):
raise TypeError(b_('First argument must be callable'))
not hasattr(default_factory, '__call__')):
raise TypeError('First argument must be callable')
dict.__init__(self, *args, **kwargs)
self.default_factory = default_factory

View file

@ -26,9 +26,9 @@ snippets so that you can get on with your life.
''')
AUTHOR = 'Toshio Kuratomi, Seth Vidal, others'
EMAIL = 'toshio@fedoraproject.org'
COPYRIGHT = '2011 Red Hat, Inc. and others'
COPYRIGHT = '2012 Red Hat, Inc. and others'
URL = 'https://fedorahosted.org/kitchen'
DOWNLOAD_URL = 'https://fedorahosted.org/releases/k/i/kitchen'
DOWNLOAD_URL = 'https://pypi.python.org/pypi/kitchen'
LICENSE = 'LGPLv2+'
__all__ = ('NAME', 'VERSION', 'DESCRIPTION', 'LONG_DESCRIPTION', 'AUTHOR',

View file

@ -11,7 +11,7 @@ and displaying text on the screen.
from kitchen.versioning import version_tuple_to_string
__version_info__ = ((2, 1, 1),)
__version_info__ = ((2, 2, 0),)
__version__ = version_tuple_to_string(__version_info__)
__all__ = ('converters', 'exceptions', 'misc',)

View file

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) 2011 Red Hat, Inc.
# Copyright (c) 2012 Red Hat, Inc.
#
# kitchen is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
@ -50,15 +50,12 @@ import codecs
import warnings
import xml.sax.saxutils
# We need to access b_() for localizing our strings but we'll end up with
# a circular import if we import it directly.
import kitchen as k
from kitchen.pycompat24 import sets
sets.add_builtin_set()
from kitchen.text.exceptions import ControlCharError, XmlEncodeError
from kitchen.text.misc import guess_encoding, html_entities_unescape, \
process_control_chars
isbytestring, isunicodestring, process_control_chars
#: Aliases for the utf-8 codec
_UTF8_ALIASES = frozenset(('utf-8', 'UTF-8', 'utf8', 'UTF8', 'utf_8', 'UTF_8',
@ -127,6 +124,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter and changed
default value to ``simplerepr``
'''
# Could use isbasestring/isunicode here but we want this code to be as
# fast as possible
if isinstance(obj, basestring):
if isinstance(obj, unicode):
return obj
@ -137,8 +136,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
return obj.decode(encoding, errors)
if non_string:
warnings.warn(k.b_('non_string is a deprecated parameter of'
' to_unicode(). Use nonstring instead'), DeprecationWarning,
warnings.warn('non_string is a deprecated parameter of'
' to_unicode(). Use nonstring instead', DeprecationWarning,
stacklevel=2)
if not nonstring:
nonstring = non_string
@ -162,21 +161,21 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
simple = obj.__str__()
except (UnicodeError, AttributeError):
simple = u''
if not isinstance(simple, unicode):
if isbytestring(simple):
return unicode(simple, encoding, errors)
return simple
elif nonstring in ('repr', 'strict'):
obj_repr = repr(obj)
if not isinstance(obj_repr, unicode):
if isbytestring(obj_repr):
obj_repr = unicode(obj_repr, encoding, errors)
if nonstring == 'repr':
return obj_repr
raise TypeError(k.b_('to_unicode was given "%(obj)s" which is neither'
' a byte string (str) or a unicode string') %
raise TypeError('to_unicode was given "%(obj)s" which is neither'
' a byte string (str) or a unicode string' %
{'obj': obj_repr.encode(encoding, 'replace')})
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
' action') % {'param': nonstring})
raise TypeError('nonstring value, %(param)s, is not set to a valid'
' action' % {'param': nonstring})
def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
non_string=None):
@ -247,13 +246,15 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter
and changed default value to ``simplerepr``
'''
# Could use isbasestring, isbytestring here but we want this to be as fast
# as possible
if isinstance(obj, basestring):
if isinstance(obj, str):
return obj
return obj.encode(encoding, errors)
if non_string:
warnings.warn(k.b_('non_string is a deprecated parameter of'
' to_bytes(). Use nonstring instead'), DeprecationWarning,
warnings.warn('non_string is a deprecated parameter of'
' to_bytes(). Use nonstring instead', DeprecationWarning,
stacklevel=2)
if not nonstring:
nonstring = non_string
@ -277,7 +278,7 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
simple = obj.__unicode__()
except (AttributeError, UnicodeError):
simple = ''
if isinstance(simple, unicode):
if isunicodestring(simple):
simple = simple.encode(encoding, 'replace')
return simple
elif nonstring in ('repr', 'strict'):
@ -285,17 +286,17 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
obj_repr = obj.__repr__()
except (AttributeError, UnicodeError):
obj_repr = ''
if isinstance(obj_repr, unicode):
if isunicodestring(obj_repr):
obj_repr = obj_repr.encode(encoding, errors)
else:
obj_repr = str(obj_repr)
if nonstring == 'repr':
return obj_repr
raise TypeError(k.b_('to_bytes was given "%(obj)s" which is neither'
' a unicode string or a byte string (str)') % {'obj': obj_repr})
raise TypeError('to_bytes was given "%(obj)s" which is neither'
' a unicode string or a byte string (str)' % {'obj': obj_repr})
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
' action') % {'param': nonstring})
raise TypeError('nonstring value, %(param)s, is not set to a valid'
' action' % {'param': nonstring})
def getwriter(encoding):
'''Return a :class:`codecs.StreamWriter` that resists tracing back.
@ -375,9 +376,9 @@ def to_utf8(obj, errors='replace', non_string='passthru'):
to_bytes(obj, encoding='utf-8', non_string='passthru')
'''
warnings.warn(k.b_('kitchen.text.converters.to_utf8 is deprecated. Use'
warnings.warn('kitchen.text.converters.to_utf8 is deprecated. Use'
' kitchen.text.converters.to_bytes(obj, encoding="utf-8",'
' nonstring="passthru" instead.'), DeprecationWarning, stacklevel=2)
' nonstring="passthru" instead.', DeprecationWarning, stacklevel=2)
return to_bytes(obj, encoding='utf-8', errors=errors,
nonstring=non_string)
@ -400,9 +401,8 @@ def to_str(obj):
to_bytes(obj, nonstring='simplerepr')
'''
warnings.warn(k.b_('to_str is deprecated. Use to_unicode or to_bytes'
' instead. See the to_str docstring for'
' porting information.'),
warnings.warn('to_str is deprecated. Use to_unicode or to_bytes'
' instead. See the to_str docstring for porting information.',
DeprecationWarning, stacklevel=2)
return to_bytes(obj, nonstring='simplerepr')
@ -682,22 +682,23 @@ def unicode_to_xml(string, encoding='utf-8', attrib=False,
try:
process_control_chars(string, strategy=control_chars)
except TypeError:
raise XmlEncodeError(k.b_('unicode_to_xml must have a unicode type as'
raise XmlEncodeError('unicode_to_xml must have a unicode type as'
' the first argument. Use bytes_string_to_xml for byte'
' strings.'))
' strings.')
except ValueError:
raise ValueError(k.b_('The control_chars argument to unicode_to_xml'
' must be one of ignore, replace, or strict'))
raise ValueError('The control_chars argument to unicode_to_xml'
' must be one of ignore, replace, or strict')
except ControlCharError, exc:
raise XmlEncodeError(exc.args[0])
string = string.encode(encoding, 'xmlcharrefreplace')
# Escape characters that have special meaning in xml
if attrib:
string = xml.sax.saxutils.escape(string, entities={'"':"&quot;"})
else:
string = xml.sax.saxutils.escape(string)
string = string.encode(encoding, 'xmlcharrefreplace')
return string
def xml_to_unicode(byte_string, encoding='utf-8', errors='replace'):
@ -782,10 +783,10 @@ def byte_string_to_xml(byte_string, input_encoding='utf-8', errors='replace',
:func:`unicode_to_xml`
for other ideas on using this function
'''
if not isinstance(byte_string, str):
raise XmlEncodeError(k.b_('byte_string_to_xml can only take a byte'
if not isbytestring(byte_string):
raise XmlEncodeError('byte_string_to_xml can only take a byte'
' string as its first argument. Use unicode_to_xml for'
' unicode strings'))
' unicode strings')
# Decode the string into unicode
u_string = unicode(byte_string, input_encoding, errors)
@ -892,7 +893,7 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
'''
# Unicode strings can just be run through unicode_to_xml()
if isinstance(string, unicode):
if isunicodestring(string):
return unicode_to_xml(string, encoding=output_encoding,
attrib=attrib, control_chars=control_chars)
@ -907,8 +908,8 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
def to_xml(string, encoding='utf-8', attrib=False, control_chars='ignore'):
'''*Deprecated*: Use :func:`guess_encoding_to_xml` instead
'''
warnings.warn(k.b_('kitchen.text.converters.to_xml is deprecated. Use'
' kitchen.text.converters.guess_encoding_to_xml instead.'),
warnings.warn('kitchen.text.converters.to_xml is deprecated. Use'
' kitchen.text.converters.guess_encoding_to_xml instead.',
DeprecationWarning, stacklevel=2)
return guess_encoding_to_xml(string, output_encoding=encoding,
attrib=attrib, control_chars=control_chars)

View file

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) 2010 Red Hat, Inc.
# Copyright (c) 2013 Red Hat, Inc.
# Copyright (c) 2010 Ville Skyttä
# Copyright (c) 2009 Tim Lauridsen
# Copyright (c) 2007 Marcus Kuhn
@ -39,7 +39,6 @@ have the same width so we need helper functions for displaying them.
import itertools
import unicodedata
from kitchen import b_
from kitchen.text.converters import to_unicode, to_bytes
from kitchen.text.exceptions import ControlCharError
@ -101,7 +100,7 @@ def _interval_bisearch(value, table):
return False
while maximum >= minimum:
mid = (minimum + maximum) / 2
mid = divmod(minimum + maximum, 2)[0]
if value > table[mid][1]:
minimum = mid + 1
elif value < table[mid][0]:
@ -115,62 +114,64 @@ _COMBINING = (
(0x300, 0x36f), (0x483, 0x489), (0x591, 0x5bd),
(0x5bf, 0x5bf), (0x5c1, 0x5c2), (0x5c4, 0x5c5),
(0x5c7, 0x5c7), (0x600, 0x603), (0x610, 0x61a),
(0x64b, 0x65e), (0x670, 0x670), (0x6d6, 0x6e4),
(0x64b, 0x65f), (0x670, 0x670), (0x6d6, 0x6e4),
(0x6e7, 0x6e8), (0x6ea, 0x6ed), (0x70f, 0x70f),
(0x711, 0x711), (0x730, 0x74a), (0x7a6, 0x7b0),
(0x7eb, 0x7f3), (0x816, 0x819), (0x81b, 0x823),
(0x825, 0x827), (0x829, 0x82d), (0x901, 0x902),
(0x93c, 0x93c), (0x941, 0x948), (0x94d, 0x94d),
(0x951, 0x954), (0x962, 0x963), (0x981, 0x981),
(0x9bc, 0x9bc), (0x9c1, 0x9c4), (0x9cd, 0x9cd),
(0x9e2, 0x9e3), (0xa01, 0xa02), (0xa3c, 0xa3c),
(0xa41, 0xa42), (0xa47, 0xa48), (0xa4b, 0xa4d),
(0xa70, 0xa71), (0xa81, 0xa82), (0xabc, 0xabc),
(0xac1, 0xac5), (0xac7, 0xac8), (0xacd, 0xacd),
(0xae2, 0xae3), (0xb01, 0xb01), (0xb3c, 0xb3c),
(0xb3f, 0xb3f), (0xb41, 0xb43), (0xb4d, 0xb4d),
(0xb56, 0xb56), (0xb82, 0xb82), (0xbc0, 0xbc0),
(0xbcd, 0xbcd), (0xc3e, 0xc40), (0xc46, 0xc48),
(0xc4a, 0xc4d), (0xc55, 0xc56), (0xcbc, 0xcbc),
(0xcbf, 0xcbf), (0xcc6, 0xcc6), (0xccc, 0xccd),
(0xce2, 0xce3), (0xd41, 0xd43), (0xd4d, 0xd4d),
(0xdca, 0xdca), (0xdd2, 0xdd4), (0xdd6, 0xdd6),
(0xe31, 0xe31), (0xe34, 0xe3a), (0xe47, 0xe4e),
(0xeb1, 0xeb1), (0xeb4, 0xeb9), (0xebb, 0xebc),
(0xec8, 0xecd), (0xf18, 0xf19), (0xf35, 0xf35),
(0xf37, 0xf37), (0xf39, 0xf39), (0xf71, 0xf7e),
(0xf80, 0xf84), (0xf86, 0xf87), (0xf90, 0xf97),
(0xf99, 0xfbc), (0xfc6, 0xfc6), (0x102d, 0x1030),
(0x1032, 0x1032), (0x1036, 0x1037), (0x1039, 0x103a),
(0x1058, 0x1059), (0x108d, 0x108d), (0x1160, 0x11ff),
(0x135f, 0x135f), (0x1712, 0x1714), (0x1732, 0x1734),
(0x1752, 0x1753), (0x1772, 0x1773), (0x17b4, 0x17b5),
(0x17b7, 0x17bd), (0x17c6, 0x17c6), (0x17c9, 0x17d3),
(0x17dd, 0x17dd), (0x180b, 0x180d), (0x18a9, 0x18a9),
(0x1920, 0x1922), (0x1927, 0x1928), (0x1932, 0x1932),
(0x1939, 0x193b), (0x1a17, 0x1a18), (0x1a60, 0x1a60),
(0x1a75, 0x1a7c), (0x1a7f, 0x1a7f), (0x1b00, 0x1b03),
(0x1b34, 0x1b34), (0x1b36, 0x1b3a), (0x1b3c, 0x1b3c),
(0x1b42, 0x1b42), (0x1b44, 0x1b44), (0x1b6b, 0x1b73),
(0x1baa, 0x1baa), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
(0x825, 0x827), (0x829, 0x82d), (0x859, 0x85b),
(0x901, 0x902), (0x93c, 0x93c), (0x941, 0x948),
(0x94d, 0x94d), (0x951, 0x954), (0x962, 0x963),
(0x981, 0x981), (0x9bc, 0x9bc), (0x9c1, 0x9c4),
(0x9cd, 0x9cd), (0x9e2, 0x9e3), (0xa01, 0xa02),
(0xa3c, 0xa3c), (0xa41, 0xa42), (0xa47, 0xa48),
(0xa4b, 0xa4d), (0xa70, 0xa71), (0xa81, 0xa82),
(0xabc, 0xabc), (0xac1, 0xac5), (0xac7, 0xac8),
(0xacd, 0xacd), (0xae2, 0xae3), (0xb01, 0xb01),
(0xb3c, 0xb3c), (0xb3f, 0xb3f), (0xb41, 0xb43),
(0xb4d, 0xb4d), (0xb56, 0xb56), (0xb82, 0xb82),
(0xbc0, 0xbc0), (0xbcd, 0xbcd), (0xc3e, 0xc40),
(0xc46, 0xc48), (0xc4a, 0xc4d), (0xc55, 0xc56),
(0xcbc, 0xcbc), (0xcbf, 0xcbf), (0xcc6, 0xcc6),
(0xccc, 0xccd), (0xce2, 0xce3), (0xd41, 0xd43),
(0xd4d, 0xd4d), (0xdca, 0xdca), (0xdd2, 0xdd4),
(0xdd6, 0xdd6), (0xe31, 0xe31), (0xe34, 0xe3a),
(0xe47, 0xe4e), (0xeb1, 0xeb1), (0xeb4, 0xeb9),
(0xebb, 0xebc), (0xec8, 0xecd), (0xf18, 0xf19),
(0xf35, 0xf35), (0xf37, 0xf37), (0xf39, 0xf39),
(0xf71, 0xf7e), (0xf80, 0xf84), (0xf86, 0xf87),
(0xf90, 0xf97), (0xf99, 0xfbc), (0xfc6, 0xfc6),
(0x102d, 0x1030), (0x1032, 0x1032), (0x1036, 0x1037),
(0x1039, 0x103a), (0x1058, 0x1059), (0x108d, 0x108d),
(0x1160, 0x11ff), (0x135d, 0x135f), (0x1712, 0x1714),
(0x1732, 0x1734), (0x1752, 0x1753), (0x1772, 0x1773),
(0x17b4, 0x17b5), (0x17b7, 0x17bd), (0x17c6, 0x17c6),
(0x17c9, 0x17d3), (0x17dd, 0x17dd), (0x180b, 0x180d),
(0x18a9, 0x18a9), (0x1920, 0x1922), (0x1927, 0x1928),
(0x1932, 0x1932), (0x1939, 0x193b), (0x1a17, 0x1a18),
(0x1a60, 0x1a60), (0x1a75, 0x1a7c), (0x1a7f, 0x1a7f),
(0x1b00, 0x1b03), (0x1b34, 0x1b34), (0x1b36, 0x1b3a),
(0x1b3c, 0x1b3c), (0x1b42, 0x1b42), (0x1b44, 0x1b44),
(0x1b6b, 0x1b73), (0x1baa, 0x1baa), (0x1be6, 0x1be6),
(0x1bf2, 0x1bf3), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
(0x1cd4, 0x1ce0), (0x1ce2, 0x1ce8), (0x1ced, 0x1ced),
(0x1dc0, 0x1de6), (0x1dfd, 0x1dff), (0x200b, 0x200f),
(0x1dc0, 0x1de6), (0x1dfc, 0x1dff), (0x200b, 0x200f),
(0x202a, 0x202e), (0x2060, 0x2063), (0x206a, 0x206f),
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2de0, 0x2dff),
(0x302a, 0x302f), (0x3099, 0x309a), (0xa66f, 0xa66f),
(0xa67c, 0xa67d), (0xa6f0, 0xa6f1), (0xa806, 0xa806),
(0xa80b, 0xa80b), (0xa825, 0xa826), (0xa8c4, 0xa8c4),
(0xa8e0, 0xa8f1), (0xa92b, 0xa92d), (0xa953, 0xa953),
(0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0), (0xaab0, 0xaab0),
(0xaab2, 0xaab4), (0xaab7, 0xaab8), (0xaabe, 0xaabf),
(0xaac1, 0xaac1), (0xabed, 0xabed), (0xfb1e, 0xfb1e),
(0xfe00, 0xfe0f), (0xfe20, 0xfe26), (0xfeff, 0xfeff),
(0xfff9, 0xfffb), (0x101fd, 0x101fd), (0x10a01, 0x10a03),
(0x10a05, 0x10a06), (0x10a0c, 0x10a0f), (0x10a38, 0x10a3a),
(0x10a3f, 0x10a3f), (0x110b9, 0x110ba), (0x1d165, 0x1d169),
(0x1d16d, 0x1d182), (0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad),
(0x1d242, 0x1d244), (0xe0001, 0xe0001), (0xe0020, 0xe007f),
(0xe0100, 0xe01ef), )
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2d7f, 0x2d7f),
(0x2de0, 0x2dff), (0x302a, 0x302f), (0x3099, 0x309a),
(0xa66f, 0xa66f), (0xa67c, 0xa67d), (0xa6f0, 0xa6f1),
(0xa806, 0xa806), (0xa80b, 0xa80b), (0xa825, 0xa826),
(0xa8c4, 0xa8c4), (0xa8e0, 0xa8f1), (0xa92b, 0xa92d),
(0xa953, 0xa953), (0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0),
(0xaab0, 0xaab0), (0xaab2, 0xaab4), (0xaab7, 0xaab8),
(0xaabe, 0xaabf), (0xaac1, 0xaac1), (0xabed, 0xabed),
(0xfb1e, 0xfb1e), (0xfe00, 0xfe0f), (0xfe20, 0xfe26),
(0xfeff, 0xfeff), (0xfff9, 0xfffb), (0x101fd, 0x101fd),
(0x10a01, 0x10a03), (0x10a05, 0x10a06), (0x10a0c, 0x10a0f),
(0x10a38, 0x10a3a), (0x10a3f, 0x10a3f), (0x11046, 0x11046),
(0x110b9, 0x110ba), (0x1d165, 0x1d169), (0x1d16d, 0x1d182),
(0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad), (0x1d242, 0x1d244),
(0xe0001, 0xe0001), (0xe0020, 0xe007f), (0xe0100, 0xe01ef), )
'''
Internal table, provided by this module to list :term:`code points` which
combine with other characters and therefore should have no :term:`textual
@ -184,8 +185,8 @@ a combining character.
:func:`~kitchen.text.display._generate_combining_table`
for how this table is generated
This table was last regenerated on python-2.7.0 with
:data:`unicodedata.unidata_version` 5.1.0
This table was last regenerated on python-3.2.3 with
:data:`unicodedata.unidata_version` 6.0.0
'''
# New function from Toshio Kuratomi (LGPLv2+)
@ -341,8 +342,8 @@ def _ucp_width(ucs, control_chars='guess'):
if ucs < 32 or (ucs < 0xa0 and ucs >= 0x7f):
# Control character detected
if control_chars == 'strict':
raise ControlCharError(b_('_ucp_width does not understand how to'
' assign a width value to control characters.'))
raise ControlCharError('_ucp_width does not understand how to'
' assign a width value to control characters.')
if ucs in (0x08, 0x07F, 0x94):
# Backspace, delete, and clear delete remove a single character
return -1
@ -519,7 +520,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
# if current width is high,
if width > chop:
# calculate new midpoint
mid = minimum + (eos - minimum) / 2
mid = minimum + (eos - minimum) // 2
if mid == eos:
break
if (eos - chop) < (eos - mid):
@ -537,7 +538,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
# short-circuit above means that we never use this branch.
# calculate new midpoint
mid = eos + (maximum - eos) / 2
mid = eos + (maximum - eos) // 2
if mid == eos:
break
if (chop - eos) < (mid - eos):

View file

@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
# Copyright (c) 2011 Red Hat, Inc
# Copyright (c) 2012 Red Hat, Inc
# Copyright (c) 2010 Seth Vidal
#
# kitchen is free software; you can redistribute it and/or
@ -27,6 +27,12 @@ Miscellaneous functions for manipulating text
---------------------------------------------
Collection of text functions that don't fit in another category.
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
Added :func:`~kitchen.text.misc.isbasestring`,
:func:`~kitchen.text.misc.isbytestring`, and
:func:`~kitchen.text.misc.isunicodestring` to help tell which string type
is which on python2 and python3
'''
import htmlentitydefs
import itertools
@ -37,9 +43,6 @@ try:
except ImportError:
chardet = None
# We need to access b_() for localizing our strings but we'll end up with
# a circular import if we import it directly.
import kitchen as k
from kitchen.pycompat24 import sets
from kitchen.text.exceptions import ControlCharError
@ -49,13 +52,64 @@ sets.add_builtin_set()
# byte strings we're guessing about as latin1
_CHARDET_THRESHHOLD = 0.6
# ASCII control codes that are illegal in xml 1.0
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32))
# ASCII control codes (the c0 codes) that are illegal in xml 1.0
# Also unicode control codes (the C1 codes): also illegal in xml
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32) + range(128, 160))
_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
_IGNORE_TABLE = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
_REPLACE_TABLE = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
# _ENTITY_RE
_ENTITY_RE = re.compile(r'(?s)<[^>]*>|&#?\w+;')
def isbasestring(obj):
'''Determine if obj is a byte :class:`str` or :class:`unicode` string
In python2 this is eqiuvalent to isinstance(obj, basestring). In python3
it checks whether the object is an instance of str, bytes, or bytearray.
This is an aid to porting code that needed to test whether an object was
derived from basestring in python2 (commonly used in unicode-bytes
conversion functions)
:arg obj: Object to test
:returns: True if the object is a :class:`basestring`. Otherwise False.
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
'''
if isinstance(obj, basestring):
return True
return False
def isbytestring(obj):
'''Determine if obj is a byte :class:`str`
In python2 this is equivalent to isinstance(obj, str). In python3 it
checks whether the object is an instance of bytes or bytearray.
:arg obj: Object to test
:returns: True if the object is a byte :class:`str`. Otherwise, False.
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
'''
if isinstance(obj, str):
return True
return False
def isunicodestring(obj):
'''Determine if obj is a :class:`unicode` string
In python2 this is equivalent to isinstance(obj, unicode). In python3 it
checks whether the object is an instance of :class:`str`.
:arg obj: Object to test
:returns: True if the object is a :class:`unicode` string. Otherwise, False.
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
'''
if isinstance(obj, unicode):
return True
return False
def guess_encoding(byte_string, disable_chardet=False):
'''Try to guess the encoding of a byte :class:`str`
@ -79,8 +133,8 @@ def guess_encoding(byte_string, disable_chardet=False):
to every byte, decoding from ``latin-1`` to :class:`unicode` will not
cause :exc:`UnicodeErrors` although the output might be mangled.
'''
if not isinstance(byte_string, str):
raise TypeError(k.b_('byte_string must be a byte string (str)'))
if not isbytestring(byte_string):
raise TypeError('first argument must be a byte string (str)')
input_encoding = 'utf-8'
try:
unicode(byte_string, input_encoding, 'strict')
@ -98,7 +152,7 @@ def guess_encoding(byte_string, disable_chardet=False):
return input_encoding
def str_eq(str1, str2, encoding='utf-8', errors='replace'):
'''Compare two stringsi, converting to byte :class:`str` if one is
'''Compare two strings, converting to byte :class:`str` if one is
:class:`unicode`
:arg str1: First string to compare
@ -135,7 +189,7 @@ def str_eq(str1, str2, encoding='utf-8', errors='replace'):
except UnicodeError:
pass
if isinstance(str1, unicode):
if isunicodestring(str1):
str1 = str1.encode(encoding, errors)
else:
str2 = str2.encode(encoding, errors)
@ -166,26 +220,30 @@ def process_control_chars(string, strategy='replace'):
:attr:`string`
:returns: :class:`unicode` string with no :term:`control characters` in
it.
'''
if not isinstance(string, unicode):
raise TypeError(k.b_('process_control_char must have a unicode type as'
' the first argument.'))
if strategy == 'ignore':
control_table = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
elif strategy == 'replace':
control_table = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
elif strategy == 'strict':
control_table = None
# Test that there are no control codes present
data = frozenset(string)
if [c for c in _CONTROL_CHARS if c in data]:
raise ControlCharError(k.b_('ASCII control code present in string'
' input'))
else:
raise ValueError(k.b_('The strategy argument to process_control_chars'
' must be one of ignore, replace, or strict'))
if control_table:
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
Strip out the C1 control characters in addition to the C0 control
characters.
'''
if not isunicodestring(string):
raise TypeError('process_control_char must have a unicode type as'
' the first argument.')
if strategy not in ('replace', 'ignore', 'strict'):
raise ValueError('The strategy argument to process_control_chars'
' must be one of ignore, replace, or strict')
# Most strings don't have control chars and translating carries
# a higher cost than testing whether the chars are in the string
# So only translate if necessary
if not _CONTROL_CHARS.isdisjoint(string):
if strategy == 'replace':
control_table = _REPLACE_TABLE
elif strategy == 'ignore':
control_table = _IGNORE_TABLE
else:
# strategy can only equal 'strict'
raise ControlCharError('ASCII control code present in string'
' input')
string = string.translate(control_table)
return string
@ -237,9 +295,9 @@ def html_entities_unescape(string):
return unicode(entity, "iso-8859-1")
return string # leave as is
if not isinstance(string, unicode):
raise TypeError(k.b_('html_entities_unescape must have a unicode type'
' for its first argument'))
if not isunicodestring(string):
raise TypeError('html_entities_unescape must have a unicode type'
' for its first argument')
return re.sub(_ENTITY_RE, fixup, string)
def byte_string_valid_xml(byte_string, encoding='utf-8'):
@ -264,7 +322,7 @@ def byte_string_valid_xml(byte_string, encoding='utf-8'):
processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
output_xml(processed_array)
'''
if not isinstance(byte_string, str):
if not isbytestring(byte_string):
# Not a byte string
return False
@ -309,5 +367,5 @@ def byte_string_valid_encoding(byte_string, encoding='utf-8'):
return True
__all__ = ('byte_string_valid_encoding', 'byte_string_valid_xml',
'guess_encoding', 'html_entities_unescape', 'process_control_chars',
'str_eq')
'guess_encoding', 'html_entities_unescape', 'isbasestring',
'isbytestring', 'isunicodestring', 'process_control_chars', 'str_eq')

View file

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) 2011 Red Hat, Inc.
# Copyright (c) 2012 Red Hat, Inc.
# Copyright (c) 2010 Ville Skyttä
# Copyright (c) 2009 Tim Lauridsen
# Copyright (c) 2007 Marcus Kuhn
@ -50,9 +50,8 @@ Functions for operating on byte :class:`str` encoded as :term:`UTF-8`
'''
import warnings
from kitchen import b_
from kitchen.text.converters import to_unicode, to_bytes
from kitchen.text.misc import byte_string_valid_encoding
from kitchen.text.misc import byte_string_valid_encoding, isunicodestring
from kitchen.text.display import _textual_width_le, \
byte_string_textual_width_fill, fill, textual_width, \
textual_width_chop, wrap
@ -66,8 +65,8 @@ def utf8_valid(msg):
Use :func:`kitchen.text.misc.byte_string_valid_encoding` instead.
'''
warnings.warn(b_('kitchen.text.utf8.utf8_valid is deprecated. Use'
' kitchen.text.misc.byte_string_valid_encoding(msg) instead'),
warnings.warn('kitchen.text.utf8.utf8_valid is deprecated. Use'
' kitchen.text.misc.byte_string_valid_encoding(msg) instead',
DeprecationWarning, stacklevel=2)
return byte_string_valid_encoding(msg)
@ -76,8 +75,8 @@ def utf8_width(msg):
Use :func:`kitchen.text.display.textual_width` instead.
'''
warnings.warn(b_('kitchen.text.utf8.utf8_width is deprecated. Use'
' kitchen.text.display.textual_width(msg) instead'),
warnings.warn('kitchen.text.utf8.utf8_width is deprecated. Use'
' kitchen.text.display.textual_width(msg) instead',
DeprecationWarning, stacklevel=2)
return textual_width(msg)
@ -98,14 +97,14 @@ def utf8_width_chop(msg, chop=None):
>>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
(5, 'く ku')
'''
warnings.warn(b_('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
' kitchen.text.display.textual_width_chop instead'), DeprecationWarning,
warnings.warn('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
' kitchen.text.display.textual_width_chop instead', DeprecationWarning,
stacklevel=2)
if chop == None:
return textual_width(msg), msg
as_bytes = not isinstance(msg, unicode)
as_bytes = not isunicodestring(msg)
chopped_msg = textual_width_chop(msg, chop)
if as_bytes:
@ -117,8 +116,8 @@ def utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix=''):
Use :func:`~kitchen.text.display.byte_string_textual_width_fill` instead
'''
warnings.warn(b_('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
' kitchen.text.display.byte_string_textual_width_fill instead'),
warnings.warn('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
' kitchen.text.display.byte_string_textual_width_fill instead',
DeprecationWarning, stacklevel=2)
return byte_string_textual_width_fill(msg, fill, chop=chop, left=left,
@ -130,11 +129,11 @@ def utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent=''):
Use :func:`kitchen.text.display.wrap` instead
'''
warnings.warn(b_('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
' kitchen.text.display.wrap instead'),
warnings.warn('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
' kitchen.text.display.wrap instead',
DeprecationWarning, stacklevel=2)
as_bytes = not isinstance(text, unicode)
as_bytes = not isunicodestring(text)
text = to_unicode(text)
lines = wrap(text, width=width, initial_indent=initial_indent,
@ -150,8 +149,8 @@ def utf8_text_fill(text, *args, **kwargs):
Use :func:`kitchen.text.display.fill` instead.
'''
warnings.warn(b_('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
' kitchen.text.display.fill instead'),
warnings.warn('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
' kitchen.text.display.fill instead',
DeprecationWarning, stacklevel=2)
# This assumes that all args. are utf8.
return fill(text, *args, **kwargs)
@ -160,8 +159,8 @@ def _utf8_width_le(width, *args):
'''**Deprecated** Convert the arguments to unicode and use
:func:`kitchen.text.display._textual_width_le` instead.
'''
warnings.warn(b_('kitchen.text.utf8._utf8_width_le is deprecated. Use'
' kitchen.text.display._textual_width_le instead'),
warnings.warn('kitchen.text.utf8._utf8_width_le is deprecated. Use'
' kitchen.text.display._textual_width_le instead',
DeprecationWarning, stacklevel=2)
# This assumes that all args. are utf8.
return _textual_width_le(width, to_unicode(''.join(args)))

View file

@ -89,10 +89,10 @@ def version_tuple_to_string(version_info):
if isinstance(values[0], int):
ver_components.append('.'.join(itertools.imap(str, values)))
else:
modifier = values[0]
if isinstance(values[0], unicode):
modifier = values[0].encode('ascii')
else:
modifier = values[0]
if modifier in ('a', 'b', 'c', 'rc'):
ver_components.append('%s%s' % (modifier,
'.'.join(itertools.imap(str, values[1:])) or '0'))

View file

@ -9,6 +9,8 @@ from kitchen.text.converters import to_bytes
from kitchen.text import misc
class UnicodeTestData(object):
u_empty_string = u''
b_empty_string = ''
# This should encode fine -- sanity check
u_ascii = u'the quick brown fox jumped over the lazy dog'
b_ascii = 'the quick brown fox jumped over the lazy dog'
@ -16,7 +18,7 @@ class UnicodeTestData(object):
# First challenge -- what happens with latin-1 characters
u_spanish = u'El veloz murciélago saltó sobre el perro perezoso.'
# utf8 and latin1 both support these chars so no mangling
utf8_spanish = u_spanish.encode('utf8')
utf8_spanish = u_spanish.encode('utf-8')
latin1_spanish = u_spanish.encode('latin1')
# ASCII does not have the accented characters so it mangles
@ -62,7 +64,8 @@ class UnicodeTestData(object):
u_entity_escape = u'Test: &lt;&quot;&amp;&quot;&gt; &ndash; ' + unicode(u_japanese.encode('ascii', 'xmlcharrefreplace'), 'ascii') + u'&#xe9;'
utf8_entity_escape = 'Test: &lt;"&amp;"&gt; 速い茶色のキツネが怠惰な犬に\'増é'
utf8_attrib_escape = 'Test: &lt;&quot;&amp;&quot;&gt; 速い茶色のキツネが怠惰な犬に\'増é'
ascii_entity_escape = (u'Test: <"&"> ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace').replace('&', '&amp;',1).replace('<', '&lt;').replace('>', '&gt;')
ascii_entity_escape = ('Test: <"&"> '.replace('&', '&amp;',1).replace('<', '&lt;').replace('>', '&gt;')) + (u' ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
ascii_attrib_escape = ('Test: <"&"> '.replace('&', '&amp;',1).replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;')) + (u' ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
b_byte_chars = ' '.join(map(chr, range(0, 256)))
b_byte_encoded = 'ACABIAIgAyAEIAUgBiAHIAggCSAKIAsgDCANIA4gDyAQIBEgEiATIBQgFSAWIBcgGCAZIBogGyAcIB0gHiAfICAgISAiICMgJCAlICYgJyAoICkgKiArICwgLSAuIC8gMCAxIDIgMyA0IDUgNiA3IDggOSA6IDsgPCA9ID4gPyBAIEEgQiBDIEQgRSBGIEcgSCBJIEogSyBMIE0gTiBPIFAgUSBSIFMgVCBVIFYgVyBYIFkgWiBbIFwgXSBeIF8gYCBhIGIgYyBkIGUgZiBnIGggaSBqIGsgbCBtIG4gbyBwIHEgciBzIHQgdSB2IHcgeCB5IHogeyB8IH0gfiB/IIAggSCCIIMghCCFIIYghyCIIIkgiiCLIIwgjSCOII8gkCCRIJIgkyCUIJUgliCXIJggmSCaIJsgnCCdIJ4gnyCgIKEgoiCjIKQgpSCmIKcgqCCpIKogqyCsIK0griCvILAgsSCyILMgtCC1ILYgtyC4ILkguiC7ILwgvSC+IL8gwCDBIMIgwyDEIMUgxiDHIMggySDKIMsgzCDNIM4gzyDQINEg0iDTINQg1SDWINcg2CDZINog2yDcIN0g3iDfIOAg4SDiIOMg5CDlIOYg5yDoIOkg6iDrIOwg7SDuIO8g8CDxIPIg8yD0IPUg9iD3IPgg+SD6IPsg/CD9IP4g/w=='
@ -127,3 +130,48 @@ u' * A powerful unrepr mode for storing basic datatypes']
u_ascii_no_ctrl = u''.join([c for c in u_ascii_chars if ord(c) not in misc._CONTROL_CODES])
u_ascii_ctrl_replace = u_ascii_chars.translate(dict([(c, u'?') for c in misc._CONTROL_CODES]))
utf8_ascii_chars = u_ascii_chars.encode('utf8')
# These are present in the test catalog as msgids or values
u_lemon = u'1 lemon'
utf8_lemon = u_lemon.encode('utf-8')
latin1_lemon = u_lemon.encode('latin-1')
u_lemons = u'4 lemons'
utf8_lemons = u_lemons.encode('utf-8')
latin1_lemons = u_lemons.encode('latin-1')
u_limao = u'一 limão'
utf8_limao = u_limao.encode('utf-8')
latin1_limao = u_limao.encode('latin-1', 'replace')
u_limoes = u'四 limões'
utf8_limoes = u_limoes.encode('utf-8')
latin1_limoes = u_limoes.encode('latin-1', 'replace')
u_not_in_catalog = u'café not matched in catalogs'
utf8_not_in_catalog = u_not_in_catalog.encode('utf-8')
latin1_not_in_catalog = u_not_in_catalog.encode('latin-1')
u_kitchen = u'kitchen sink'
utf8_kitchen = u_kitchen.encode('utf-8')
latin1_kitchen = u_kitchen.encode('latin-1')
u_pt_kitchen = u'pia da cozinha'
utf8_pt_kitchen = u_pt_kitchen.encode('utf-8')
latin1_pt_kitchen = u_pt_kitchen.encode('latin-1')
u_kuratomi = u'Kuratomi'
utf8_kuratomi = u_kuratomi.encode('utf-8')
latin1_kuratomi = u_kuratomi.encode('latin-1')
u_ja_kuratomi = u'くらとみ'
utf8_ja_kuratomi = u_ja_kuratomi.encode('utf-8')
latin1_ja_kuratomi = u_ja_kuratomi.encode('latin-1', 'replace')
u_in_fallback = u'Only café in fallback'
utf8_in_fallback = u_in_fallback.encode('utf-8')
latin1_in_fallback = u_in_fallback.encode('latin-1')
u_yes_in_fallback = u'Yes, only café in fallback'
utf8_yes_in_fallback = u_yes_in_fallback.encode('utf-8')
latin1_yes_in_fallback = u_yes_in_fallback.encode('latin-1')

View file

@ -29,6 +29,9 @@ class Test__all__(object):
('kitchen', 'i18n', 'to_unicode'),
('kitchen', 'i18n', 'ENOENT'),
('kitchen', 'i18n', 'byte_string_valid_encoding'),
('kitchen', 'i18n', 'isbasestring'),
('kitchen', 'i18n', 'partial'),
('kitchen', 'iterutils', 'isbasestring'),
('kitchen', 'iterutils', 'version_tuple_to_string'),
('kitchen', 'pycompat24', 'version_tuple_to_string'),
('kitchen', 'pycompat25', 'version_tuple_to_string'),
@ -44,6 +47,8 @@ class Test__all__(object):
('kitchen.text', 'converters', 'ControlCharError'),
('kitchen.text', 'converters', 'guess_encoding'),
('kitchen.text', 'converters', 'html_entities_unescape'),
('kitchen.text', 'converters', 'isbytestring'),
('kitchen.text', 'converters', 'isunicodestring'),
('kitchen.text', 'converters', 'process_control_chars'),
('kitchen.text', 'converters', 'XmlEncodeError'),
('kitchen.text', 'misc', 'b_'),
@ -57,6 +62,7 @@ class Test__all__(object):
('kitchen.text', 'utf8', 'byte_string_textual_width_fill'),
('kitchen.text', 'utf8', 'byte_string_valid_encoding'),
('kitchen.text', 'utf8', 'fill'),
('kitchen.text', 'utf8', 'isunicodestring'),
('kitchen.text', 'utf8', 'textual_width'),
('kitchen.text', 'utf8', 'textual_width_chop'),
('kitchen.text', 'utf8', 'to_bytes'),

View file

@ -1,5 +1,4 @@
import unittest
from test import test_support
from kitchen.pycompat24.base64 import _base64 as base64
@ -183,6 +182,7 @@ class BaseXYTestCase(unittest.TestCase):
#from test import test_support
#def test_main():
# test_support.run_unittest(__name__)
#

View file

@ -13,16 +13,16 @@ def test_strict_dict_get_set():
d = collections.StrictDict()
d[u'a'] = 1
d['a'] = 2
tools.ok_(d[u'a'] != d['a'])
tools.ok_(len(d) == 2)
tools.assert_not_equal(d[u'a'], d['a'])
tools.eq_(len(d), 2)
d[u'\xf1'] = 1
d['\xf1'] = 2
d[u'\xf1'.encode('utf8')] = 3
tools.ok_(d[u'\xf1'] == 1)
tools.ok_(d['\xf1'] == 2)
tools.ok_(d[u'\xf1'.encode('utf8')] == 3)
tools.ok_(len(d) == 5)
d[u'\xf1'.encode('utf-8')] = 3
tools.eq_(d[u'\xf1'], 1)
tools.eq_(d['\xf1'], 2)
tools.eq_(d[u'\xf1'.encode('utf-8')], 3)
tools.eq_(len(d), 5)
class TestStrictDict(unittest.TestCase):
def setUp(self):
@ -32,15 +32,14 @@ class TestStrictDict(unittest.TestCase):
self.d[u'\xf1'] = 1
self.d['\xf1'] = 2
self.d[u'\xf1'.encode('utf8')] = 3
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf8')]
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf-8')]
def tearDown(self):
del(self.d)
def _compare_lists(self, list1, list2, debug=False):
'''We have a mixture of bytes and unicode and need python2.3 compat
So we have to compare these lists manually and inefficiently
'''We have a mixture of bytes and unicode so we have to compare these
lists manually and inefficiently
'''
def _compare_lists_helper(compare_to, dupes, idx, length):
if i not in compare_to:
@ -57,11 +56,11 @@ class TestStrictDict(unittest.TestCase):
list1_u = [l for l in list1 if isinstance(l, unicode)]
list1_b = [l for l in list1 if isinstance(l, str)]
list1_o = [l for l in list1 if not (isinstance(l, unicode) or isinstance(l, str))]
list1_o = [l for l in list1 if not (isinstance(l, (unicode, bytes)))]
list2_u = [l for l in list2 if isinstance(l, unicode)]
list2_b = [l for l in list2 if isinstance(l, str)]
list2_o = [l for l in list2 if not (isinstance(l, unicode) or isinstance(l, str))]
list2_o = [l for l in list2 if not (isinstance(l, (unicode, bytes)))]
for i in list1:
if isinstance(i, unicode):
@ -109,34 +108,38 @@ class TestStrictDict(unittest.TestCase):
def test_strict_dict_len(self):
'''StrictDict len'''
tools.ok_(len(self.d) == 5)
tools.eq_(len(self.d), 5)
def test_strict_dict_del(self):
'''StrictDict del'''
tools.ok_(len(self.d) == 5)
tools.eq_(len(self.d), 5)
del(self.d[u'\xf1'])
tools.assert_raises(KeyError, self.d.__getitem__, u'\xf1')
tools.ok_(len(self.d) == 4)
tools.eq_(len(self.d), 4)
def test_strict_dict_iter(self):
'''StrictDict iteration'''
keys = []
for k in self.d:
keys.append(k)
tools.ok_(self._compare_lists(keys, self.keys))
tools.ok_(self._compare_lists(keys, self.keys),
msg='keys != self.key: %s != %s' % (keys, self.keys))
keys = []
for k in self.d.iterkeys():
keys.append(k)
tools.ok_(self._compare_lists(keys, self.keys))
tools.ok_(self._compare_lists(keys, self.keys),
msg='keys != self.key: %s != %s' % (keys, self.keys))
keys = [k for k in self.d]
tools.ok_(self._compare_lists(keys, self.keys))
tools.ok_(self._compare_lists(keys, self.keys),
msg='keys != self.key: %s != %s' % (keys, self.keys))
keys = []
for k in self.d.keys():
keys.append(k)
tools.ok_(self._compare_lists(keys, self.keys))
tools.ok_(self._compare_lists(keys, self.keys),
msg='keys != self.key: %s != %s' % (keys, self.keys))
def test_strict_dict_contains(self):
'''StrictDict contains function'''

View file

@ -0,0 +1,415 @@
# -*- coding: utf-8 -*-
#
import unittest
from nose import tools
from nose.plugins.skip import SkipTest
import sys
import StringIO
import warnings
try:
import chardet
except:
chardet = None
from kitchen.text import converters
from kitchen.text.exceptions import XmlEncodeError
import base_classes
class UnicodeNoStr(object):
def __unicode__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'
class StrNoUnicode(object):
def __str__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
class StrReturnsUnicode(object):
def __str__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'
class UnicodeReturnsStr(object):
def __unicode__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
class UnicodeStrCrossed(object):
def __unicode__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
def __str__(self):
return u'El veloz murciélago saltó sobre el perro perezoso.'
class ReprUnicode(object):
def __repr__(self):
return u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'
class TestConverters(unittest.TestCase, base_classes.UnicodeTestData):
def test_to_unicode(self):
'''Test to_unicode when the user gives good values'''
tools.eq_(converters.to_unicode(self.u_japanese, encoding='latin1'), self.u_japanese)
tools.eq_(converters.to_unicode(self.utf8_spanish), self.u_spanish)
tools.eq_(converters.to_unicode(self.utf8_japanese), self.u_japanese)
tools.eq_(converters.to_unicode(self.latin1_spanish, encoding='latin1'), self.u_spanish)
tools.eq_(converters.to_unicode(self.euc_jp_japanese, encoding='euc_jp'), self.u_japanese)
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'foo'})
def test_to_unicode_errors(self):
tools.eq_(converters.to_unicode(self.latin1_spanish), self.u_mangled_spanish_latin1_as_utf8)
tools.eq_(converters.to_unicode(self.latin1_spanish, errors='ignore'), self.u_spanish_ignore)
tools.assert_raises(UnicodeDecodeError, converters.to_unicode,
*[self.latin1_spanish], **{'errors': 'strict'})
def test_to_unicode_nonstring(self):
tools.eq_(converters.to_unicode(5), u'5')
tools.eq_(converters.to_unicode(5, nonstring='empty'), u'')
tools.eq_(converters.to_unicode(5, nonstring='passthru'), 5)
tools.eq_(converters.to_unicode(5, nonstring='simplerepr'), u'5')
tools.eq_(converters.to_unicode(5, nonstring='repr'), u'5')
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'strict'})
obj_repr = converters.to_unicode(object, nonstring='simplerepr')
tools.eq_(obj_repr, u"<type 'object'>")
tools.assert_true(isinstance(obj_repr, unicode))
def test_to_unicode_nonstring_with_objects_that_have__unicode__and__str__(self):
'''Test that to_unicode handles objects that have __unicode__ and __str__ methods'''
if sys.version_info < (3, 0):
# None of these apply on python3 because python3 does not use __unicode__
# and it enforces __str__ returning str
tools.eq_(converters.to_unicode(UnicodeNoStr(), nonstring='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(StrNoUnicode(), nonstring='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), nonstring='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(StrReturnsUnicode(), nonstring='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), nonstring='simplerepr'), self.u_spanish)
def test_to_bytes(self):
'''Test to_bytes when the user gives good values'''
tools.eq_(converters.to_bytes(self.utf8_japanese, encoding='latin1'), self.utf8_japanese)
tools.eq_(converters.to_bytes(self.u_spanish), self.utf8_spanish)
tools.eq_(converters.to_bytes(self.u_japanese), self.utf8_japanese)
tools.eq_(converters.to_bytes(self.u_spanish, encoding='latin1'), self.latin1_spanish)
tools.eq_(converters.to_bytes(self.u_japanese, encoding='euc_jp'), self.euc_jp_japanese)
def test_to_bytes_errors(self):
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin1'),
self.latin1_mixed_replace)
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin',
errors='ignore'), self.latin1_mixed_ignore)
tools.assert_raises(UnicodeEncodeError, converters.to_bytes,
*[self.u_mixed], **{'errors': 'strict', 'encoding': 'latin1'})
def _check_repr_bytes(self, repr_string, obj_name):
tools.assert_true(isinstance(repr_string, str))
match = self.repr_re.match(repr_string)
tools.assert_not_equal(match, None)
tools.eq_(match.groups()[0], obj_name)
def test_to_bytes_nonstring(self):
tools.eq_(converters.to_bytes(5), '5')
tools.eq_(converters.to_bytes(5, nonstring='empty'), '')
tools.eq_(converters.to_bytes(5, nonstring='passthru'), 5)
tools.eq_(converters.to_bytes(5, nonstring='simplerepr'), '5')
tools.eq_(converters.to_bytes(5, nonstring='repr'), '5')
# Raise a TypeError if the msg is nonstring and we're set to strict
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'strict'})
# Raise a TypeError if given an invalid nonstring arg
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'INVALID'})
obj_repr = converters.to_bytes(object, nonstring='simplerepr')
tools.eq_(obj_repr, "<type 'object'>")
tools.assert_true(isinstance(obj_repr, str))
def test_to_bytes_nonstring_with_objects_that_have__unicode__and__str__(self):
if sys.version_info < (3, 0):
# This object's _str__ returns a utf8 encoded object
tools.eq_(converters.to_bytes(StrNoUnicode(), nonstring='simplerepr'), self.utf8_spanish)
# No __str__ method so this returns repr
string = converters.to_bytes(UnicodeNoStr(), nonstring='simplerepr')
self._check_repr_bytes(string, 'UnicodeNoStr')
# This object's __str__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(StrReturnsUnicode(), nonstring='simplerepr'), self.utf8_spanish)
# Unless we explicitly ask for something different
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
nonstring='simplerepr', encoding='latin1'), self.latin1_spanish)
# This object has no __str__ so it returns repr
string = converters.to_bytes(UnicodeReturnsStr(), nonstring='simplerepr')
self._check_repr_bytes(string, 'UnicodeReturnsStr')
# This object's __str__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), nonstring='simplerepr'), self.utf8_spanish)
# This object's __repr__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='simplerepr'),
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='repr'),
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
def test_unicode_to_xml(self):
tools.eq_(converters.unicode_to_xml(None), '')
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml, *['byte string'])
tools.assert_raises(ValueError, converters.unicode_to_xml, *[u'string'], **{'control_chars': 'foo'})
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml,
*[u'string\u0002'], **{'control_chars': 'strict'})
tools.eq_(converters.unicode_to_xml(self.u_entity), self.utf8_entity_escape)
tools.eq_(converters.unicode_to_xml(self.u_entity, attrib=True), self.utf8_attrib_escape)
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii'), self.ascii_entity_escape)
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii', attrib=True), self.ascii_attrib_escape)
def test_xml_to_unicode(self):
tools.eq_(converters.xml_to_unicode(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity)
tools.eq_(converters.xml_to_unicode(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity)
tools.eq_(converters.xml_to_unicode(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity)
tools.eq_(converters.xml_to_unicode(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity)
def test_xml_to_byte_string(self):
tools.eq_(converters.xml_to_byte_string(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
tools.eq_(converters.xml_to_byte_string(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
output_encoding='euc_jp', errors='replace'),
self.u_entity.encode('euc_jp', 'replace'))
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
output_encoding='latin1', errors='replace'),
self.u_entity.encode('latin1', 'replace'))
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
output_encoding='euc_jp', errors='replace'),
self.u_entity.encode('euc_jp', 'replace'))
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
output_encoding='latin1', errors='replace'),
self.u_entity.encode('latin1', 'replace'))
def test_byte_string_to_xml(self):
tools.assert_raises(XmlEncodeError, converters.byte_string_to_xml, *[u'test'])
tools.eq_(converters.byte_string_to_xml(self.utf8_entity), self.utf8_entity_escape)
tools.eq_(converters.byte_string_to_xml(self.utf8_entity, attrib=True), self.utf8_attrib_escape)
def test_bytes_to_xml(self):
tools.eq_(converters.bytes_to_xml(self.b_byte_chars), self.b_byte_encoded)
def test_xml_to_bytes(self):
tools.eq_(converters.xml_to_bytes(self.b_byte_encoded), self.b_byte_chars)
def test_guess_encoding_to_xml(self):
tools.eq_(converters.guess_encoding_to_xml(self.u_entity), self.utf8_entity_escape)
tools.eq_(converters.guess_encoding_to_xml(self.utf8_spanish), self.utf8_spanish)
tools.eq_(converters.guess_encoding_to_xml(self.latin1_spanish), self.utf8_spanish)
tools.eq_(converters.guess_encoding_to_xml(self.utf8_japanese), self.utf8_japanese)
def test_guess_encoding_to_xml_euc_japanese(self):
if chardet:
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
self.utf8_japanese)
else:
raise SkipTest('chardet not installed, euc_japanese won\'t be detected')
def test_guess_encoding_to_xml_euc_japanese_mangled(self):
if chardet:
raise SkipTest('chardet installed, euc_japanese won\'t be mangled')
else:
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
self.utf8_mangled_euc_jp_as_latin1)
class TestGetWriter(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.io = StringIO.StringIO()
def test_utf8_writer(self):
writer = converters.getwriter('utf-8')
io = writer(self.io)
io.write(self.u_japanese + u'\n')
io.seek(0)
result = io.read().strip()
tools.eq_(result, self.utf8_japanese)
io.seek(0)
io.truncate(0)
io.write(self.euc_jp_japanese + '\n')
io.seek(0)
result = io.read().strip()
tools.eq_(result, self.euc_jp_japanese)
io.seek(0)
io.truncate(0)
io.write(self.utf8_japanese + '\n')
io.seek(0)
result = io.read().strip()
tools.eq_(result, self.utf8_japanese)
def test_error_handlers(self):
'''Test setting alternate error handlers'''
writer = converters.getwriter('latin1')
io = writer(self.io, errors='strict')
tools.assert_raises(UnicodeEncodeError, io.write, self.u_japanese)
class TestExceptionConverters(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.exceptions = {}
tests = {'u_jpn': self.u_japanese,
'u_spanish': self.u_spanish,
'utf8_jpn': self.utf8_japanese,
'utf8_spanish': self.utf8_spanish,
'euc_jpn': self.euc_jp_japanese,
'latin1_spanish': self.latin1_spanish}
for test in tests.iteritems():
try:
raise Exception(test[1])
except Exception, self.exceptions[test[0]]:
pass
def test_exception_to_unicode_with_unicode(self):
tools.eq_(converters.exception_to_unicode(self.exceptions['u_jpn']), self.u_japanese)
tools.eq_(converters.exception_to_unicode(self.exceptions['u_spanish']), self.u_spanish)
def test_exception_to_unicode_with_bytes(self):
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_jpn']), self.u_japanese)
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_spanish']), self.u_spanish)
# Mangled latin1/utf8 conversion but no tracebacks
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish']), self.u_mangled_spanish_latin1_as_utf8)
# Mangled euc_jp/utf8 conversion but no tracebacks
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn']), self.u_mangled_euc_jp_as_utf8)
def test_exception_to_unicode_custom(self):
# If given custom functions, then we should not mangle
c = [lambda e: converters.to_unicode(e.args[0], encoding='euc_jp'),
lambda e: converters.to_unicode(e, encoding='euc_jp')]
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
converters=c), self.u_japanese)
c.extend(converters.EXCEPTION_CONVERTERS)
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
converters=c), self.u_japanese)
c = [lambda e: converters.to_unicode(e.args[0], encoding='latin1'),
lambda e: converters.to_unicode(e, encoding='latin1')]
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
converters=c), self.u_spanish)
c.extend(converters.EXCEPTION_CONVERTERS)
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
converters=c), self.u_spanish)
def test_exception_to_bytes_with_unicode(self):
tools.eq_(converters.exception_to_bytes(self.exceptions['u_jpn']), self.utf8_japanese)
tools.eq_(converters.exception_to_bytes(self.exceptions['u_spanish']), self.utf8_spanish)
def test_exception_to_bytes_with_bytes(self):
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_jpn']), self.utf8_japanese)
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_spanish']), self.utf8_spanish)
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish']), self.latin1_spanish)
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn']), self.euc_jp_japanese)
def test_exception_to_bytes_custom(self):
# If given custom functions, then we should not mangle
c = [lambda e: converters.to_bytes(e.args[0], encoding='euc_jp'),
lambda e: converters.to_bytes(e, encoding='euc_jp')]
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
converters=c), self.euc_jp_japanese)
c.extend(converters.EXCEPTION_CONVERTERS)
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
converters=c), self.euc_jp_japanese)
c = [lambda e: converters.to_bytes(e.args[0], encoding='latin1'),
lambda e: converters.to_bytes(e, encoding='latin1')]
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
converters=c), self.latin1_spanish)
c.extend(converters.EXCEPTION_CONVERTERS)
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
converters=c), self.latin1_spanish)
class TestDeprecatedConverters(TestConverters):
def setUp(self):
warnings.simplefilter('ignore', DeprecationWarning)
def tearDown(self):
warnings.simplefilter('default', DeprecationWarning)
def test_to_xml(self):
tools.eq_(converters.to_xml(self.u_entity), self.utf8_entity_escape)
tools.eq_(converters.to_xml(self.utf8_spanish), self.utf8_spanish)
tools.eq_(converters.to_xml(self.latin1_spanish), self.utf8_spanish)
tools.eq_(converters.to_xml(self.utf8_japanese), self.utf8_japanese)
def test_to_utf8(self):
tools.eq_(converters.to_utf8(self.u_japanese), self.utf8_japanese)
tools.eq_(converters.to_utf8(self.utf8_spanish), self.utf8_spanish)
def test_to_str(self):
tools.eq_(converters.to_str(self.u_japanese), self.utf8_japanese)
tools.eq_(converters.to_str(self.utf8_spanish), self.utf8_spanish)
tools.eq_(converters.to_str(object), "<type 'object'>")
def test_non_string(self):
'''Test deprecated non_string parameter'''
# unicode
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'foo'})
tools.eq_(converters.to_unicode(5, non_string='empty'), u'')
tools.eq_(converters.to_unicode(5, non_string='passthru'), 5)
tools.eq_(converters.to_unicode(5, non_string='simplerepr'), u'5')
tools.eq_(converters.to_unicode(5, non_string='repr'), u'5')
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'strict'})
tools.eq_(converters.to_unicode(UnicodeNoStr(), non_string='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(StrNoUnicode(), non_string='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(StrReturnsUnicode(), non_string='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), non_string='simplerepr'), self.u_spanish)
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), non_string='simplerepr'), self.u_spanish)
obj_repr = converters.to_unicode(object, non_string='simplerepr')
tools.eq_(obj_repr, u"<type 'object'>")
tools.assert_true(isinstance(obj_repr, unicode))
# Bytes
tools.eq_(converters.to_bytes(5), '5')
tools.eq_(converters.to_bytes(5, non_string='empty'), '')
tools.eq_(converters.to_bytes(5, non_string='passthru'), 5)
tools.eq_(converters.to_bytes(5, non_string='simplerepr'), '5')
tools.eq_(converters.to_bytes(5, non_string='repr'), '5')
# Raise a TypeError if the msg is non_string and we're set to strict
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'strict'})
# Raise a TypeError if given an invalid non_string arg
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'INVALID'})
# No __str__ method so this returns repr
string = converters.to_bytes(UnicodeNoStr(), non_string='simplerepr')
self._check_repr_bytes(string, 'UnicodeNoStr')
# This object's _str__ returns a utf8 encoded object
tools.eq_(converters.to_bytes(StrNoUnicode(), non_string='simplerepr'), self.utf8_spanish)
# This object's __str__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(StrReturnsUnicode(), non_string='simplerepr'), self.utf8_spanish)
# Unless we explicitly ask for something different
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
non_string='simplerepr', encoding='latin1'), self.latin1_spanish)
# This object has no __str__ so it returns repr
string = converters.to_bytes(UnicodeReturnsStr(), non_string='simplerepr')
self._check_repr_bytes(string, 'UnicodeReturnsStr')
# This object's __str__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), non_string='simplerepr'), self.utf8_spanish)
# This object's __repr__ returns unicode which to_bytes converts to utf8
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='simplerepr'),
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='repr'),
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
obj_repr = converters.to_bytes(object, non_string='simplerepr')
tools.eq_(obj_repr, "<type 'object'>")
tools.assert_true(isinstance(obj_repr, str))

View file

@ -4,7 +4,6 @@ import os
import copy
import tempfile
import unittest
from test import test_support
from kitchen.pycompat25.collections._defaultdict import defaultdict
@ -173,6 +172,7 @@ class TestDefaultDict(unittest.TestCase):
os.remove(tfn)
#from test import test_support
#def test_main():
# test_support.run_unittest(TestDefaultDict)
#

View file

@ -5,21 +5,19 @@ from nose import tools
import sys
import warnings
from kitchen import i18n
from kitchen.text import converters
from kitchen.text import utf8
class TestDeprecated(unittest.TestCase):
def setUp(self):
registry = sys._getframe(2).f_globals.get('__warningregistry__')
if registry:
registry.clear()
registry = sys._getframe(1).f_globals.get('__warningregistry__')
if registry:
registry.clear()
for module in sys.modules.values():
if hasattr(module, '__warningregistry__'):
del module.__warningregistry__
warnings.simplefilter('error', DeprecationWarning)
def tearDown(self):
warnings.simplefilter('default', DeprecationWarning)
warnings.simplefilter('ignore', DeprecationWarning)
def test_deprecated_functions(self):
'''Test that all deprecated functions raise DeprecationWarning'''
@ -45,3 +43,23 @@ class TestDeprecated(unittest.TestCase):
**{'non_string': 'simplerepr'})
tools.assert_raises(DeprecationWarning, converters.to_bytes, *[5],
**{'nonstring': 'simplerepr', 'non_string': 'simplerepr'})
class TestPendingDeprecationParameters(unittest.TestCase):
def setUp(self):
for module in sys.modules.values():
if hasattr(module, '__warningregistry__'):
del module.__warningregistry__
warnings.simplefilter('error', PendingDeprecationWarning)
def tearDown(self):
warnings.simplefilter('ignore', PendingDeprecationWarning)
def test_parameters(self):
# test that we warn when using the python2_api parameters
tools.assert_raises(PendingDeprecationWarning,
i18n.get_translation_object, 'test', **{'python2_api': True})
tools.assert_raises(PendingDeprecationWarning,
i18n.DummyTranslations, **{'python2_api': True})

820
kitchen2/tests/test_i18n.py Normal file
View file

@ -0,0 +1,820 @@
# -*- coding: utf-8 -*-
#
import unittest
from nose import tools
import os
import types
from kitchen import i18n
import base_classes
class TestI18N_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_easy_gettext_setup(self):
'''Test that the easy_gettext_setup function works
'''
_, N_ = i18n.easy_gettext_setup('foo', localedirs=
['%s/data/locale/' % os.path.dirname(__file__)])
tools.assert_true(isinstance(_, types.MethodType))
tools.assert_true(isinstance(N_, types.MethodType))
tools.eq_(_.__name__, '_ugettext')
tools.eq_(N_.__name__, '_ungettext')
tools.eq_(_(self.utf8_spanish), self.u_spanish)
tools.eq_(_(self.u_spanish), self.u_spanish)
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 1), self.u_limao)
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 2), self.u_limoes)
tools.eq_(N_(self.u_limao, self.u_limoes, 1), self.u_limao)
tools.eq_(N_(self.u_limao, self.u_limoes, 2), self.u_limoes)
def test_easy_gettext_setup_non_unicode(self):
'''Test that the easy_gettext_setup function works
'''
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
['%s/data/locale/' % os.path.dirname(__file__)],
use_unicode=False)
tools.assert_true(isinstance(b_, types.MethodType))
tools.assert_true(isinstance(bN_, types.MethodType))
tools.eq_(b_.__name__, '_lgettext')
tools.eq_(bN_.__name__, '_lngettext')
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
tools.eq_(b_(self.u_spanish), self.utf8_spanish)
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.utf8_limao)
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.utf8_limoes)
def test_get_translation_object(self):
'''Test that the get_translation_object function works
'''
translations = i18n.get_translation_object('foo', ['%s/data/locale/' % os.path.dirname(__file__)])
tools.eq_(translations.__class__, i18n.DummyTranslations)
tools.assert_raises(IOError, i18n.get_translation_object, 'foo', ['%s/data/locale/' % os.path.dirname(__file__)], fallback=False)
translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
def test_get_translation_object_create_fallback(self):
'''Test get_translation_object creates fallbacks for additional catalogs'''
translations = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
tools.eq_(translations._fallback.__class__, i18n.NewGNUTranslations)
def test_get_translation_object_copy(self):
'''Test get_translation_object shallow copies the message catalog'''
translations = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
translations.input_charset = 'utf-8'
translations2 = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='latin-1')
translations2.input_charset = 'latin-1'
# Test that portions of the translation objects are the same and other
# portions are different (which is a space optimization so that the
# translation data isn't in memory multiple times)
tools.assert_not_equal(id(translations._fallback), id(translations2._fallback))
tools.assert_not_equal(id(translations.output_charset()), id(translations2.output_charset()))
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
tools.eq_(id(translations._catalog), id(translations2._catalog))
def test_get_translation_object_optional_params(self):
'''Smoketest leaving out optional parameters'''
translations = i18n.get_translation_object('test')
tools.assert_true(translations.__class__ in (i18n.NewGNUTranslations, i18n.DummyTranslations))
def test_get_translation_object_python2_api_default(self):
'''Smoketest that python2_api default value yields the python2 functions'''
# Default
translations = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
translations.input_charset = 'utf-8'
tools.eq_(translations.gettext.__name__, '_gettext')
tools.eq_(translations.lgettext.__name__, '_lgettext')
tools.eq_(translations.ugettext.__name__, '_ugettext')
tools.eq_(translations.ngettext.__name__, '_ngettext')
tools.eq_(translations.lngettext.__name__, '_lngettext')
tools.eq_(translations.ungettext.__name__, '_ungettext')
def test_get_translation_object_python2_api_true(self):
'''Smoketest that setting python2_api true yields the python2 functions'''
# Default
translations = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
python2_api=True)
translations.input_charset = 'utf-8'
tools.eq_(translations.gettext.__name__, '_gettext')
tools.eq_(translations.lgettext.__name__, '_lgettext')
tools.eq_(translations.ugettext.__name__, '_ugettext')
tools.eq_(translations.ngettext.__name__, '_ngettext')
tools.eq_(translations.lngettext.__name__, '_lngettext')
tools.eq_(translations.ungettext.__name__, '_ungettext')
def test_get_translation_object_python2_api_false(self):
'''Smoketest that setting python2_api false yields the python3 functions'''
# Default
translations = i18n.get_translation_object('test',
['%s/data/locale' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
python2_api=False)
translations.input_charset = 'utf-8'
tools.eq_(translations.gettext.__name__, '_ugettext')
tools.eq_(translations.lgettext.__name__, '_lgettext')
tools.eq_(translations.ngettext.__name__, '_ungettext')
tools.eq_(translations.lngettext.__name__, '_lngettext')
tools.assert_raises(AttributeError, translations.ugettext, 'message')
tools.assert_raises(AttributeError, translations.ungettext, 'message1', 'message2')
def test_dummy_translation(self):
'''Test that we can create a DummyTranslation object
'''
tools.assert_true(isinstance(i18n.DummyTranslations(), i18n.DummyTranslations))
# Note: Using nose's generator tests for this so we can't subclass
# unittest.TestCase
class TestDummyTranslations(base_classes.UnicodeTestData):
def __init__(self):
self.test_data = {'bytes': (( # First set is with default charset (utf8)
(self.u_ascii, self.b_ascii),
(self.u_spanish, self.utf8_spanish),
(self.u_japanese, self.utf8_japanese),
(self.b_ascii, self.b_ascii),
(self.utf8_spanish, self.utf8_spanish),
(self.latin1_spanish, self.utf8_mangled_spanish_latin1_as_utf8),
(self.utf8_japanese, self.utf8_japanese),
),
( # Second set is with output_charset of latin1 (ISO-8859-1)
(self.u_ascii, self.b_ascii),
(self.u_spanish, self.latin1_spanish),
(self.u_japanese, self.latin1_mangled_japanese_replace_as_latin1),
(self.b_ascii, self.b_ascii),
(self.utf8_spanish, self.utf8_spanish),
(self.latin1_spanish, self.latin1_spanish),
(self.utf8_japanese, self.utf8_japanese),
),
( # Third set is with output_charset of C
(self.u_ascii, self.b_ascii),
(self.u_spanish, self.ascii_mangled_spanish_as_ascii),
(self.u_japanese, self.ascii_mangled_japanese_replace_as_latin1),
(self.b_ascii, self.b_ascii),
(self.utf8_spanish, self.ascii_mangled_spanish_as_ascii),
(self.latin1_spanish, self.ascii_twice_mangled_spanish_latin1_as_utf8_as_ascii),
(self.utf8_japanese, self.ascii_mangled_japanese_replace_as_latin1),
),
),
'unicode': (( # First set is with the default charset (utf8)
(self.u_ascii, self.u_ascii),
(self.u_spanish, self.u_spanish),
(self.u_japanese, self.u_japanese),
(self.b_ascii, self.u_ascii),
(self.utf8_spanish, self.u_spanish),
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_utf8), # String is mangled but no exception
(self.utf8_japanese, self.u_japanese),
),
( # Second set is with _charset of latin1 (ISO-8859-1)
(self.u_ascii, self.u_ascii),
(self.u_spanish, self.u_spanish),
(self.u_japanese, self.u_japanese),
(self.b_ascii, self.u_ascii),
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_latin1), # String mangled but no exception
(self.latin1_spanish, self.u_spanish),
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_latin1), # String mangled but no exception
),
( # Third set is with _charset of C
(self.u_ascii, self.u_ascii),
(self.u_spanish, self.u_spanish),
(self.u_japanese, self.u_japanese),
(self.b_ascii, self.u_ascii),
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_ascii), # String mangled but no exception
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_ascii), # String mangled but no exception
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_ascii), # String mangled but no exception
),
)
}
def setUp(self):
self.translations = i18n.DummyTranslations()
def check_gettext(self, message, value, charset=None):
self.translations.set_output_charset(charset)
tools.eq_(self.translations.gettext(message), value,
msg='gettext(%s): trans: %s != val: %s (charset=%s)'
% (repr(message), repr(self.translations.gettext(message)),
repr(value), charset))
def check_lgettext(self, message, value, charset=None,
locale='en_US.UTF-8'):
os.environ['LC_ALL'] = locale
self.translations.set_output_charset(charset)
tools.eq_(self.translations.lgettext(message), value,
msg='lgettext(%s): trans: %s != val: %s (charset=%s, locale=%s)'
% (repr(message), repr(self.translations.lgettext(message)),
repr(value), charset, locale))
# Note: charset has a default value because nose isn't invoking setUp and
# tearDown each time check_* is run.
def check_ugettext(self, message, value, charset='utf-8'):
'''ugettext method with default values'''
self.translations.input_charset = charset
tools.eq_(self.translations.ugettext(message), value,
msg='ugettext(%s): trans: %s != val: %s (charset=%s)'
% (repr(message), repr(self.translations.ugettext(message)),
repr(value), charset))
def check_ngettext(self, message, value, charset=None):
self.translations.set_output_charset(charset)
tools.eq_(self.translations.ngettext(message, 'blank', 1), value)
tools.eq_(self.translations.ngettext('blank', message, 2), value)
tools.assert_not_equal(self.translations.ngettext(message, 'blank', 2), value)
tools.assert_not_equal(self.translations.ngettext('blank', message, 1), value)
def check_lngettext(self, message, value, charset=None, locale='en_US.UTF-8'):
os.environ['LC_ALL'] = locale
self.translations.set_output_charset(charset)
tools.eq_(self.translations.lngettext(message, 'blank', 1), value,
msg='lngettext(%s, "blank", 1): trans: %s != val: %s (charset=%s, locale=%s)'
% (repr(message), repr(self.translations.lngettext(message,
'blank', 1)), repr(value), charset, locale))
tools.eq_(self.translations.lngettext('blank', message, 2), value,
msg='lngettext("blank", %s, 2): trans: %s != val: %s (charset=%s, locale=%s)'
% (repr(message), repr(self.translations.lngettext('blank',
message, 2)), repr(value), charset, locale))
tools.assert_not_equal(self.translations.lngettext(message, 'blank', 2), value,
msg='lngettext(%s, "blank", 2): trans: %s, val: %s (charset=%s, locale=%s)'
% (repr(message), repr(self.translations.lngettext(message,
'blank', 2)), repr(value), charset, locale))
tools.assert_not_equal(self.translations.lngettext('blank', message, 1), value,
msg='lngettext("blank", %s, 1): trans: %s != val: %s (charset=%s, locale=%s)'
% (repr(message), repr(self.translations.lngettext('blank',
message, 1)), repr(value), charset, locale))
# Note: charset has a default value because nose isn't invoking setUp and
# tearDown each time check_* is run.
def check_ungettext(self, message, value, charset='utf-8'):
self.translations.input_charset = charset
tools.eq_(self.translations.ungettext(message, 'blank', 1), value)
tools.eq_(self.translations.ungettext('blank', message, 2), value)
tools.assert_not_equal(self.translations.ungettext(message, 'blank', 2), value)
tools.assert_not_equal(self.translations.ungettext('blank', message, 1), value)
def test_gettext(self):
'''gettext method with default values'''
for message, value in self.test_data['bytes'][0]:
yield self.check_gettext, message, value
def test_gettext_output_charset(self):
'''gettext method after output_charset is set'''
for message, value in self.test_data['bytes'][1]:
yield self.check_gettext, message, value, 'latin1'
def test_ngettext(self):
for message, value in self.test_data['bytes'][0]:
yield self.check_ngettext, message, value
def test_ngettext_output_charset(self):
for message, value in self.test_data['bytes'][1]:
yield self.check_ngettext, message, value, 'latin1'
def test_lgettext(self):
'''lgettext method with default values on a utf8 locale'''
for message, value in self.test_data['bytes'][0]:
yield self.check_lgettext, message, value
def test_lgettext_output_charset(self):
'''lgettext method after output_charset is set'''
for message, value in self.test_data['bytes'][1]:
yield self.check_lgettext, message, value, 'latin1'
def test_lgettext_output_charset_and_locale(self):
'''lgettext method after output_charset is set in C locale
output_charset should take precedence
'''
for message, value in self.test_data['bytes'][1]:
yield self.check_lgettext, message, value, 'latin1', 'C'
def test_lgettext_locale_C(self):
'''lgettext method in a C locale'''
for message, value in self.test_data['bytes'][2]:
yield self.check_lgettext, message, value, None, 'C'
def test_lngettext(self):
'''lngettext method with default values on a utf8 locale'''
for message, value in self.test_data['bytes'][0]:
yield self.check_lngettext, message, value
def test_lngettext_output_charset(self):
'''lngettext method after output_charset is set'''
for message, value in self.test_data['bytes'][1]:
yield self.check_lngettext, message, value, 'latin1'
def test_lngettext_output_charset_and_locale(self):
'''lngettext method after output_charset is set in C locale
output_charset should take precedence
'''
for message, value in self.test_data['bytes'][1]:
yield self.check_lngettext, message, value, 'latin1', 'C'
def test_lngettext_locale_C(self):
'''lngettext method in a C locale'''
for message, value in self.test_data['bytes'][2]:
yield self.check_lngettext, message, value, None, 'C'
def test_ugettext(self):
for message, value in self.test_data['unicode'][0]:
yield self.check_ugettext, message, value
def test_ugettext_charset_latin1(self):
for message, value in self.test_data['unicode'][1]:
yield self.check_ugettext, message, value, 'latin1'
def test_ugettext_charset_ascii(self):
for message, value in self.test_data['unicode'][2]:
yield self.check_ugettext, message, value, 'ascii'
def test_ungettext(self):
for message, value in self.test_data['unicode'][0]:
yield self.check_ungettext, message, value
def test_ungettext_charset_latin1(self):
for message, value in self.test_data['unicode'][1]:
yield self.check_ungettext, message, value, 'latin1'
def test_ungettext_charset_ascii(self):
for message, value in self.test_data['unicode'][2]:
yield self.check_ungettext, message, value, 'ascii'
def test_nonbasestring(self):
tools.eq_(self.translations.gettext(dict(hi='there')), self.b_empty_string)
tools.eq_(self.translations.ngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
tools.eq_(self.translations.lgettext(dict(hi='there')), self.b_empty_string)
tools.eq_(self.translations.lngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
tools.eq_(self.translations.ugettext(dict(hi='there')), self.u_empty_string)
tools.eq_(self.translations.ungettext(dict(hi='there'), dict(hi='two'), 1), self.u_empty_string)
class TestI18N_Latin1(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.iso88591'
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_easy_gettext_setup_non_unicode(self):
'''Test that the easy_gettext_setup function works
'''
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
['%s/data/locale/' % os.path.dirname(__file__)],
use_unicode=False)
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
tools.eq_(b_(self.u_spanish), self.latin1_spanish)
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.latin1_limao)
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.latin1_limoes)
class TestNewGNUTranslationsNoMatch(TestDummyTranslations):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
class TestNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_gettext(self):
_ = self.translations.gettext
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
def test_ngettext(self):
_ = self.translations.ngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
def test_lgettext(self):
_ = self.translations.lgettext
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
def test_lngettext(self):
_ = self.translations.lngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
def test_ugettext(self):
_ = self.translations.ugettext
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.utf8_in_fallback), self.u_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
# This is not translated to utf8_yes_in_fallback because this test is
# without the fallback message catalog
tools.eq_(_(self.u_in_fallback), self.u_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
def test_ungettext(self):
_ = self.translations.ungettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
class TestNewGNURealTranslations_Latin1(TestNewGNURealTranslations_UTF8):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.iso88591'
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_lgettext(self):
_ = self.translations.lgettext
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
# Neither of the following two tests encode to proper latin-1 because:
# any byte is valid in latin-1 so there's no way to know that what
# we're given in the string is really utf-8
#
# This is not translated to latin1_yes_in_fallback because this test
# is without the fallback message catalog
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
# This is not translated to latin1_yes_in_fallback because this test
# is without the fallback message catalog
tools.eq_(_(self.u_in_fallback), self.latin1_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
def test_lngettext(self):
_ = self.translations.lngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
# This unfortunately does not encode to proper latin-1 because:
# any byte is valid in latin-1 so there's no way to know that what
# we're given in the string is really utf-8
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
class TestFallbackNewGNUTranslationsNoMatch(TestDummyTranslations):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
self.translations = i18n.get_translation_object('test',
['%s/data/locale/' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
class TestFallbackNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
self.translations = i18n.get_translation_object('test',
['%s/data/locale/' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_gettext(self):
_ = self.translations.gettext
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
def test_ngettext(self):
_ = self.translations.ngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
def test_lgettext(self):
_ = self.translations.lgettext
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
def test_lngettext(self):
_ = self.translations.lngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
def test_ugettext(self):
_ = self.translations.ugettext
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
tools.eq_(_(self.utf8_in_fallback), self.u_yes_in_fallback)
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
tools.eq_(_(self.u_in_fallback), self.u_yes_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
def test_ungettext(self):
_ = self.translations.ungettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
class TestFallbackNewGNURealTranslations_Latin1(TestFallbackNewGNURealTranslations_UTF8):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.iso88591'
self.translations = i18n.get_translation_object('test',
['%s/data/locale/' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_lgettext(self):
_ = self.translations.lgettext
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
tools.eq_(_(self.utf8_in_fallback), self.latin1_yes_in_fallback)
# This unfortunately does not encode to proper latin-1 because:
# any byte is valid in latin-1 so there's no way to know that what
# we're given in the string is really utf-8
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
tools.eq_(_(self.u_in_fallback), self.latin1_yes_in_fallback)
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
def test_lngettext(self):
_ = self.translations.lngettext
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
# This unfortunately does not encode to proper latin-1 because:
# any byte is valid in latin-1 so there's no way to know that what
# we're given in the string is really utf-8
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
class TestFallback(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.iso88591'
self.gtranslations = i18n.get_translation_object('test',
['%s/data/locale/' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
self.gtranslations.add_fallback(object())
self.dtranslations = i18n.get_translation_object('nonexistent',
['%s/data/locale/' % os.path.dirname(__file__),
'%s/data/locale-old' % os.path.dirname(__file__)])
self.dtranslations.add_fallback(object())
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
def test_invalid_fallback_no_raise(self):
'''Test when we have an invalid fallback that it does not raise.'''
tools.eq_(self.gtranslations.gettext(self.u_spanish), self.utf8_spanish)
tools.eq_(self.gtranslations.ugettext(self.u_spanish), self.u_spanish)
tools.eq_(self.gtranslations.lgettext(self.u_spanish), self.latin1_spanish)
tools.eq_(self.gtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
tools.eq_(self.gtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
tools.eq_(self.gtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
tools.eq_(self.dtranslations.gettext(self.u_spanish), self.utf8_spanish)
tools.eq_(self.dtranslations.ugettext(self.u_spanish), self.u_spanish)
tools.eq_(self.dtranslations.lgettext(self.u_spanish), self.latin1_spanish)
tools.eq_(self.dtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
tools.eq_(self.dtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
tools.eq_(self.dtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
class TestDefaultLocaleDir(unittest.TestCase, base_classes.UnicodeTestData):
def setUp(self):
self.old_LC_ALL = os.environ.get('LC_ALL', None)
os.environ['LC_ALL'] = 'pt_BR.utf8'
self.old_DEFAULT_LOCALEDIRS = i18n._DEFAULT_LOCALEDIR
i18n._DEFAULT_LOCALEDIR = '%s/data/locale/' % os.path.dirname(__file__)
self.translations = i18n.get_translation_object('test')
def tearDown(self):
if self.old_LC_ALL:
os.environ['LC_ALL'] = self.old_LC_ALL
else:
del(os.environ['LC_ALL'])
if self.old_DEFAULT_LOCALEDIRS:
i18n._DEFAULT_LOCALEDIR = self.old_DEFAULT_LOCALEDIRS
def test_gettext(self):
_ = self.translations.gettext
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
# Returns msgid because the string is in a fallback catalog which we
# haven't setup
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
# Returns msgid because the string is in a fallback catalog which we
# haven't setup
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)

View file

@ -5,7 +5,7 @@ from nose import tools
from kitchen import iterutils
class TestStrictDict(unittest.TestCase):
class TestIterutils(unittest.TestCase):
iterable_data = (
[0, 1, 2],
[],
@ -40,6 +40,9 @@ class TestStrictDict(unittest.TestCase):
tools.ok_(iterutils.isiterable('a', include_string=True) == True)
tools.ok_(iterutils.isiterable('a', include_string=False) == False)
tools.ok_(iterutils.isiterable('a') == False)
tools.ok_(iterutils.isiterable(u'a', include_string=True) == True)
tools.ok_(iterutils.isiterable(u'a', include_string=False) == False)
tools.ok_(iterutils.isiterable(u'a') == False)
def test_iterate(self):
iterutils.iterate(None)
@ -55,3 +58,5 @@ class TestStrictDict(unittest.TestCase):
# strings
tools.ok_(list(iterutils.iterate('abc')) == ['abc'])
tools.ok_(list(iterutils.iterate('abc', include_string=True)) == ['a', 'b', 'c'])
tools.ok_(list(iterutils.iterate(u'abc')) == [u'abc'])
tools.ok_(list(iterutils.iterate(u'abc', include_string=True)) == [u'a', u'b', u'c'])

View file

@ -1,6 +1,5 @@
import unittest
from nose.plugins.skip import SkipTest
from test import test_support
from kitchen.pycompat27.subprocess import _subprocess as subprocess
import sys
import StringIO
@ -45,9 +44,14 @@ def reap_children():
except:
break
if not hasattr(test_support, 'reap_children'):
# No reap_children in python-2.3
test_support.reap_children = reap_children
test_support = None
try:
from test import test_support
if not hasattr(test_support, 'reap_children'):
# No reap_children in python-2.3
test_support.reap_children = reap_children
except ImportError:
pass
# In a debug build, stuff like "[6580 refs]" is printed to stderr at
# shutdown time. That frustrates tests trying to check stderr produced
@ -79,7 +83,8 @@ class BaseTestCase(unittest.TestCase):
def setUp(self):
# Try to minimize the number of children we have so this test
# doesn't crash on some buildbots (Alphas in particular).
test_support.reap_children()
if test_support:
test_support.reap_children()
def tearDown(self):
for inst in subprocess._active:
@ -596,6 +601,9 @@ class ProcessTestCase(BaseTestCase):
"line1\nline2\rline3\r\nline4\r\nline5\nline6")
def test_no_leaking(self):
if not test_support:
raise SkipTest("No test_support module available.")
# Make sure we leak no resources
if not mswindows:
max_handles = 1026 # too much for most UNIX systems
@ -1123,6 +1131,8 @@ class POSIXProcessTestCase(BaseTestCase):
def test_wait_when_sigchild_ignored(self):
# NOTE: sigchild_ignore.py may not be an effective test on all OSes.
if not test_support:
raise SkipTest("No test_support module available.")
sigchild_ignore = test_support.findfile(os.path.join("subprocessdata",
"sigchild_ignore.py"))
p = subprocess.Popen([sys.executable, sigchild_ignore],

View file

@ -0,0 +1,161 @@
# -*- coding: utf-8 -*-
#
import unittest
from nose import tools
from kitchen.text.exceptions import ControlCharError
from kitchen.text import display
import base_classes
class TestDisplay(base_classes.UnicodeTestData, unittest.TestCase):
def test_internal_interval_bisearch(self):
'''Test that we can find things in an interval table'''
table = ((0, 3), (5,7), (9, 10))
tools.assert_true(display._interval_bisearch(0, table))
tools.assert_true(display._interval_bisearch(1, table))
tools.assert_true(display._interval_bisearch(2, table))
tools.assert_true(display._interval_bisearch(3, table))
tools.assert_true(display._interval_bisearch(5, table))
tools.assert_true(display._interval_bisearch(6, table))
tools.assert_true(display._interval_bisearch(7, table))
tools.assert_true(display._interval_bisearch(9, table))
tools.assert_true(display._interval_bisearch(10, table))
tools.assert_false(display._interval_bisearch(-1, table))
tools.assert_false(display._interval_bisearch(4, table))
tools.assert_false(display._interval_bisearch(8, table))
tools.assert_false(display._interval_bisearch(11, table))
def test_internal_generate_combining_table(self):
'''Test that the combining table we generate is equal to or a subseet of what's in the current table
If we assert it can mean one of two things:
1. The code is broken
2. The table we have is out of date.
'''
old_table = display._COMBINING
new_table = display._generate_combining_table()
for interval in new_table:
if interval[0] == interval[1]:
tools.assert_true(display._interval_bisearch(interval[0], old_table))
else:
for codepoint in xrange(interval[0], interval[1] + 1):
tools.assert_true(display._interval_bisearch(interval[0], old_table))
def test_internal_ucp_width(self):
'''Test that ucp_width returns proper width for characters'''
for codepoint in xrange(0, 0xFFFFF + 1):
if codepoint < 32 or (codepoint < 0xa0 and codepoint >= 0x7f):
# With strict on, we should raise an error
tools.assert_raises(ControlCharError, display._ucp_width, codepoint, 'strict')
if codepoint in (0x08, 0x1b, 0x7f, 0x94):
# Backspace, delete, clear delete remove one char
tools.eq_(display._ucp_width(codepoint), -1)
else:
# Everything else returns 0
tools.eq_(display._ucp_width(codepoint), 0)
elif display._interval_bisearch(codepoint, display._COMBINING):
# Combining character
tools.eq_(display._ucp_width(codepoint), 0)
elif (codepoint >= 0x1100 and
(codepoint <= 0x115f or # Hangul Jamo init. consonants
codepoint == 0x2329 or codepoint == 0x232a or
(codepoint >= 0x2e80 and codepoint <= 0xa4cf and
codepoint != 0x303f) or # CJK ... Yi
(codepoint >= 0xac00 and codepoint <= 0xd7a3) or # Hangul Syllables
(codepoint >= 0xf900 and codepoint <= 0xfaff) or # CJK Compatibility Ideographs
(codepoint >= 0xfe10 and codepoint <= 0xfe19) or # Vertical forms
(codepoint >= 0xfe30 and codepoint <= 0xfe6f) or # CJK Compatibility Forms
(codepoint >= 0xff00 and codepoint <= 0xff60) or # Fullwidth Forms
(codepoint >= 0xffe0 and codepoint <= 0xffe6) or
(codepoint >= 0x20000 and codepoint <= 0x2fffd) or
(codepoint >= 0x30000 and codepoint <= 0x3fffd))):
tools.eq_(display._ucp_width(codepoint), 2)
else:
tools.eq_(display._ucp_width(codepoint), 1)
def test_textual_width(self):
'''Test that we find the proper number of spaces that a utf8 string will consume'''
tools.eq_(display.textual_width(self.u_japanese), 31)
tools.eq_(display.textual_width(self.u_spanish), 50)
tools.eq_(display.textual_width(self.u_mixed), 23)
def test_textual_width_chop(self):
'''utf8_width_chop with byte strings'''
tools.eq_(display.textual_width_chop(self.u_mixed, 1000), self.u_mixed)
tools.eq_(display.textual_width_chop(self.u_mixed, 23), self.u_mixed)
tools.eq_(display.textual_width_chop(self.u_mixed, 22), self.u_mixed[:-1])
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:-4])
tools.eq_(display.textual_width_chop(self.u_mixed, 1), u'')
tools.eq_(display.textual_width_chop(self.u_mixed, 2), self.u_mixed[0])
tools.eq_(display.textual_width_chop(self.u_mixed, 3), self.u_mixed[:2])
tools.eq_(display.textual_width_chop(self.u_mixed, 4), self.u_mixed[:3])
tools.eq_(display.textual_width_chop(self.u_mixed, 5), self.u_mixed[:4])
tools.eq_(display.textual_width_chop(self.u_mixed, 6), self.u_mixed[:5])
tools.eq_(display.textual_width_chop(self.u_mixed, 7), self.u_mixed[:5])
tools.eq_(display.textual_width_chop(self.u_mixed, 8), self.u_mixed[:6])
tools.eq_(display.textual_width_chop(self.u_mixed, 9), self.u_mixed[:7])
tools.eq_(display.textual_width_chop(self.u_mixed, 10), self.u_mixed[:8])
tools.eq_(display.textual_width_chop(self.u_mixed, 11), self.u_mixed[:9])
tools.eq_(display.textual_width_chop(self.u_mixed, 12), self.u_mixed[:10])
tools.eq_(display.textual_width_chop(self.u_mixed, 13), self.u_mixed[:10])
tools.eq_(display.textual_width_chop(self.u_mixed, 14), self.u_mixed[:11])
tools.eq_(display.textual_width_chop(self.u_mixed, 15), self.u_mixed[:12])
tools.eq_(display.textual_width_chop(self.u_mixed, 16), self.u_mixed[:13])
tools.eq_(display.textual_width_chop(self.u_mixed, 17), self.u_mixed[:14])
tools.eq_(display.textual_width_chop(self.u_mixed, 18), self.u_mixed[:15])
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:15])
tools.eq_(display.textual_width_chop(self.u_mixed, 20), self.u_mixed[:16])
tools.eq_(display.textual_width_chop(self.u_mixed, 21), self.u_mixed[:17])
def test_textual_width_fill(self):
'''Pad a utf8 string'''
tools.eq_(display.textual_width_fill(self.u_mixed, 1), self.u_mixed)
tools.eq_(display.textual_width_fill(self.u_mixed, 25), self.u_mixed + u' ')
tools.eq_(display.textual_width_fill(self.u_mixed, 25, left=False), u' ' + self.u_mixed)
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
def test_internal_textual_width_le(self):
test_data = ''.join([self.u_mixed, self.u_spanish])
tw = display.textual_width(test_data)
tools.eq_(display._textual_width_le(68, self.u_mixed, self.u_spanish), (tw <= 68))
tools.eq_(display._textual_width_le(69, self.u_mixed, self.u_spanish), (tw <= 69))
tools.eq_(display._textual_width_le(137, self.u_mixed, self.u_spanish), (tw <= 137))
tools.eq_(display._textual_width_le(138, self.u_mixed, self.u_spanish), (tw <= 138))
tools.eq_(display._textual_width_le(78, self.u_mixed, self.u_spanish), (tw <= 78))
tools.eq_(display._textual_width_le(79, self.u_mixed, self.u_spanish), (tw <= 79))
def test_wrap(self):
'''Test that text wrapping works'''
tools.eq_(display.wrap(self.u_mixed), [self.u_mixed])
tools.eq_(display.wrap(self.u_paragraph), self.u_paragraph_out)
tools.eq_(display.wrap(self.utf8_paragraph), self.u_paragraph_out)
tools.eq_(display.wrap(self.u_mixed_para), self.u_mixed_para_out)
tools.eq_(display.wrap(self.u_mixed_para, width=57,
initial_indent=' ', subsequent_indent='----'),
self.u_mixed_para_57_initial_subsequent_out)
def test_fill(self):
tools.eq_(display.fill(self.u_paragraph), u'\n'.join(self.u_paragraph_out))
tools.eq_(display.fill(self.utf8_paragraph), u'\n'.join(self.u_paragraph_out))
tools.eq_(display.fill(self.u_mixed_para), u'\n'.join(self.u_mixed_para_out))
tools.eq_(display.fill(self.u_mixed_para, width=57,
initial_indent=' ', subsequent_indent='----'),
u'\n'.join(self.u_mixed_para_57_initial_subsequent_out))
def test_byte_string_textual_width_fill(self):
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 1), self.utf8_mixed)
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25), self.utf8_mixed + ' ')
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, left=False), ' ' + self.utf8_mixed)
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')

View file

@ -135,3 +135,19 @@ class TestTextMisc(unittest.TestCase, base_classes.UnicodeTestData):
'''Test that we return False with non-encoded chars'''
tools.ok_(misc.byte_string_valid_encoding('\xff') == False)
tools.ok_(misc.byte_string_valid_encoding(self.euc_jp_japanese) == False)
class TestIsStringTypes(unittest.TestCase):
def test_isbasestring(self):
tools.assert_true(misc.isbasestring('abc'))
tools.assert_true(misc.isbasestring(u'abc'))
tools.assert_false(misc.isbasestring(5))
def test_isbytestring(self):
tools.assert_true(misc.isbytestring('abc'))
tools.assert_false(misc.isbytestring(u'abc'))
tools.assert_false(misc.isbytestring(5))
def test_isunicodestring(self):
tools.assert_false(misc.isunicodestring('abc'))
tools.assert_true(misc.isunicodestring(u'abc'))
tools.assert_false(misc.isunicodestring(5))

View file

@ -56,7 +56,7 @@ class TestUTF8(base_classes.UnicodeTestData, unittest.TestCase):
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 22) == (22, self.u_mixed[:-1]))
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 19) == (18, self.u_mixed[:-4]))
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 2) == (2, self.u_mixed[0]))
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, ''))
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, u''))
def test_utf8_width_fill(self):
'''Pad a utf8 string'''

View file

@ -1,6 +1,5 @@
# -*- coding: utf-8 -*-
#
import unittest
from nose import tools
from kitchen.versioning import version_tuple_to_string
@ -26,7 +25,7 @@ class TestVersionTuple(object):
}
def check_ver_tuple_to_str(self, v_tuple, v_str):
tools.ok_(version_tuple_to_string(v_tuple) == v_str)
tools.eq_(version_tuple_to_string(v_tuple), v_str)
def test_version_tuple_to_string(self):
'''Test that version_tuple_to_string outputs PEP-386 compliant strings

View file

@ -0,0 +1,6 @@
===================
Kitchen.collections
===================
.. automodule:: kitchen.collections.strictdict
:members:

View file

@ -0,0 +1,12 @@
==========
Exceptions
==========
Kitchen has a hierarchy of exceptions that should make it easy to catch many
errors emitted by kitchen itself.
.. automodule:: kitchen.exceptions
:members:
.. automodule:: kitchen.text.exceptions
:members:

View file

@ -0,0 +1,38 @@
===================
Kitchen.i18n Module
===================
.. automodule:: kitchen.i18n
Functions
=========
:func:`easy_gettext_setup` should satisfy the needs of most users.
:func:`get_translation_object` is designed to ease the way for anyone that
needs more control.
.. autofunction:: easy_gettext_setup
.. autofunction:: get_translation_object
Translation Objects
===================
The standard translation objects from the :mod:`gettext` module suffer from
several problems:
* They can throw :exc:`UnicodeError`
* They can't find translations for non-:term:`ASCII` byte :class:`str`
messages
* They may return either :class:`unicode` string or byte :class:`str` from the
same function even though the functions say they will only return
:class:`unicode` or only return byte :class:`str`.
:class:`DummyTranslations` and :class:`NewGNUTranslations` were written to fix
these issues.
.. autoclass:: kitchen.i18n.DummyTranslations
:members:
.. autoclass:: kitchen.i18n.NewGNUTranslations
:members:

View file

@ -0,0 +1,9 @@
========================
Kitchen.iterutils Module
========================
.. automodule:: kitchen.iterutils
.. autofunction:: kitchen.iterutils.isiterable
.. autofunction:: kitchen.iterutils.iterate

View file

@ -0,0 +1,24 @@
.. _KitchenAPI:
===========
Kitchen API
===========
Kitchen is structured as a collection of modules. In its current
configuration, Kitchen ships with the following modules. Other addon modules
that may drag in more dependencies can be found on the `project webpage`_
.. toctree::
:maxdepth: 2
api-i18n
api-text
api-collections
api-iterutils
api-versioning
api-pycompat24
api-pycompat25
api-pycompat27
api-exceptions
.. _`project webpage`: https://fedorahosted.org/kitchen

View file

@ -0,0 +1,34 @@
=======================
Python 2.4 Compatibiity
=======================
-------------------
Sets for python-2.3
-------------------
.. automodule:: kitchen.pycompat24.sets
.. autofunction:: kitchen.pycompat24.sets.add_builtin_set
----------------------------------
Partial new style base64 interface
----------------------------------
.. automodule:: kitchen.pycompat24.base64
:members:
----------
Subprocess
----------
.. seealso::
:mod:`kitchen.pycompat27.subprocess`
Kitchen includes the python-2.7 version of subprocess which has a new
function, :func:`~kitchen.pycompat27.subprocess.check_output`. When
you import :mod:`pycompat24.subprocess` you will be getting the
python-2.7 version of subprocess rather than the 2.4 version (where
subprocess first appeared). This choice was made so that we can
concentrate our efforts on keeping the single version of subprocess up
to date rather than working on a 2.4 version that very few people
would need specifically.

View file

@ -0,0 +1,8 @@
========================
Python 2.5 Compatibility
========================
.. automodule:: kitchen.pycompat25
.. automodule:: kitchen.pycompat25.collections.defaultdict

View file

@ -0,0 +1,35 @@
========================
Python 2.7 Compatibility
========================
.. module:: kitchen.pycompat27.subprocess
--------------------------
Subprocess from Python 2.7
--------------------------
The :mod:`subprocess` module included here is a direct import from
python-2.7's |stdlib|_. You can access it via::
>>> from kitchen.pycompat27 import subprocess
The motivation for including this module is that various API changing
improvements have been made to subprocess over time. The following is a list
of the known changes to :mod:`subprocess` with the python version they were
introduced in:
==================================== ===
New API Feature Ver
==================================== ===
:exc:`subprocess.CalledProcessError` 2.5
:func:`subprocess.check_call` 2.5
:func:`subprocess.check_output` 2.7
:meth:`subprocess.Popen.send_signal` 2.6
:meth:`subprocess.Popen.terminate` 2.6
:meth:`subprocess.Popen.kill` 2.6
==================================== ===
.. seealso::
The stdlib :mod:`subprocess` documenation
For complete documentation on how to use subprocess

View file

@ -0,0 +1,405 @@
-----------------------
Kitchen.text.converters
-----------------------
.. automodule:: kitchen.text.converters
Byte Strings and Unicode in Python2
===================================
Python2 has two string types, :class:`str` and :class:`unicode`.
:class:`unicode` represents an abstract sequence of text characters. It can
hold any character that is present in the unicode standard. :class:`str` can
hold any byte of data. The operating system and python work together to
display these bytes as characters in many cases but you should always keep in
mind that the information is really a sequence of bytes, not a sequence of
characters. In python2 these types are interchangeable a large amount of the
time. They are one of the few pairs of types that automatically convert when
used in equality::
>>> # string is converted to unicode and then compared
>>> "I am a string" == u"I am a string"
True
>>> # Other types, like int, don't have this special treatment
>>> 5 == "5"
False
However, this automatic conversion tends to lull people into a false sense of
security. As long as you're dealing with :term:`ASCII` characters the
automatic conversion will save you from seeing any differences. Once you
start using characters that are not in :term:`ASCII`, you will start getting
:exc:`UnicodeError` and :exc:`UnicodeWarning` as the automatic conversions
between the types fail::
>>> "I am an ñ" == u"I am an ñ"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
Why do these conversions fail? The reason is that the python2
:class:`unicode` type represents an abstract sequence of unicode text known as
:term:`code points`. :class:`str`, on the other hand, really represents
a sequence of bytes. Those bytes are converted by your operating system to
appear as characters on your screen using a particular encoding (usually
with a default defined by the operating system and customizable by the
individual user.) Although :term:`ASCII` characters are fairly standard in
what bytes represent each character, the bytes outside of the :term:`ASCII`
range are not. In general, each encoding will map a different character to
a particular byte. Newer encodings map individual characters to multiple
bytes (which the older encodings will instead treat as multiple characters).
In the face of these differences, python refuses to guess at an encoding and
instead issues a warning or exception and refuses to convert.
.. seealso::
:ref:`overcoming-frustration`
For a longer introduction on this subject.
Strategy for Explicit Conversion
================================
So what is the best method of dealing with this weltering babble of incoherent
encodings? The basic strategy is to explicitly turn everything into
:class:`unicode` when it first enters your program. Then, when you send it to
output, you can transform the unicode back into bytes. Doing this allows you
to control the encodings that are used and avoid getting tracebacks due to
:exc:`UnicodeError`. Using the functions defined in this module, that looks
something like this:
.. code-block:: pycon
:linenos:
>>> from kitchen.text.converters import to_unicode, to_bytes
>>> name = raw_input('Enter your name: ')
Enter your name: Toshio くらとみ
>>> name
'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
>>> type(name)
<type 'str'>
>>> unicode_name = to_unicode(name)
>>> type(unicode_name)
<type 'unicode'>
>>> unicode_name
u'Toshio \u304f\u3089\u3068\u307f'
>>> # Do a lot of other things before needing to save/output again:
>>> output = open('datafile', 'w')
>>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))
A few notes:
Looking at line 6, you'll notice that the input we took from the user was
a byte :class:`str`. In general, anytime we're getting a value from outside
of python (The filesystem, reading data from the network, interacting with an
external command, reading values from the environment) we are interacting with
something that will want to give us a byte :class:`str`. Some |stdlib|_
modules and third party libraries will automatically attempt to convert a byte
:class:`str` to :class:`unicode` strings for you. This is both a boon and
a curse. If the library can guess correctly about the encoding that the data
is in, it will return :class:`unicode` objects to you without you having to
convert. However, if it can't guess correctly, you may end up with one of
several problems:
:exc:`UnicodeError`
The library attempted to decode a byte :class:`str` into
a :class:`unicode`, string failed, and raises an exception.
Garbled data
If the library returns the data after decoding it with the wrong encoding,
the characters you see in the :exc:`unicode` string won't be the ones that
you expect.
A byte :class:`str` instead of :class:`unicode` string
Some libraries will return a :class:`unicode` string when they're able to
decode the data and a byte :class:`str` when they can't. This is
generally the hardest problem to debug when it occurs. Avoid it in your
own code and try to avoid or open bugs against upstreams that do this. See
:ref:`DesigningUnicodeAwareAPIs` for strategies to do this properly.
On line 8, we convert from a byte :class:`str` to a :class:`unicode` string.
:func:`~kitchen.text.converters.to_unicode` does this for us. It has some
error handling and sane defaults that make this a nicer function to use than
calling :meth:`str.decode` directly:
* Instead of defaulting to the :term:`ASCII` encoding which fails with all
but the simple American English characters, it defaults to :term:`UTF-8`.
* Instead of raising an error if it cannot decode a value, it will replace
the value with the unicode "Replacement character" symbol (``<EFBFBD>``).
* If you happen to call this method with something that is not a :class:`str`
or :class:`unicode`, it will return an empty :class:`unicode` string.
All three of these can be overridden using different keyword arguments to the
function. See the :func:`to_unicode` documentation for more information.
On line 15 we push the data back out to a file. Two things you should note here:
1. We deal with the strings as :class:`unicode` until the last instant. The
string format that we're using is :class:`unicode` and the variable also
holds :class:`unicode`. People sometimes get into trouble when they mix
a byte :class:`str` format with a variable that holds a :class:`unicode`
string (or vice versa) at this stage.
2. :func:`~kitchen.text.converters.to_bytes`, does the reverse of
:func:`to_unicode`. In this case, we're using the default values which
turn :class:`unicode` into a byte :class:`str` using :term:`UTF-8`. Any
errors are replaced with a ``<EFBFBD>`` and sending nonstring objects yield empty
:class:`unicode` strings. Just like :func:`to_unicode`, you can look at
the documentation for :func:`to_bytes` to find out how to override any of
these defaults.
When to use an alternate strategy
---------------------------------
The default strategy of decoding to :class:`unicode` strings when you take
data in and encoding to a byte :class:`str` when you send the data back out
works great for most problems but there are a few times when you shouldn't:
* The values aren't meant to be read as text
* The values need to be byte-for-byte when you send them back out -- for
instance if they are database keys or filenames.
* You are transferring the data between several libraries that all expect
byte :class:`str`.
In each of these instances, there is a reason to keep around the byte
:class:`str` version of a value. Here's a few hints to keep your sanity in
these situations:
1. Keep your :class:`unicode` and :class:`str` values separate. Just like the
pain caused when you have to use someone else's library that returns both
:class:`unicode` and :class:`str` you can cause yourself pain if you have
functions that can return both types or variables that could hold either
type of value.
2. Name your variables so that you can tell whether you're storing byte
:class:`str` or :class:`unicode` string. One of the first things you end
up having to do when debugging is determine what type of string you have in
a variable and what type of string you are expecting. Naming your
variables consistently so that you can tell which type they are supposed to
hold will save you from at least one of those steps.
3. When you get values initially, make sure that you're dealing with the type
of value that you expect as you save it. You can use :func:`isinstance`
or :func:`to_bytes` since :func:`to_bytes` doesn't do any modifications of
the string if it's already a :class:`str`. When using :func:`to_bytes`
for this purpose you might want to use::
try:
b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
except:
handle_errors_somehow()
The reason is that the default of :func:`to_bytes` will take characters
that are illegal in the chosen encoding and transform them to replacement
characters. Since the point of keeping this data as a byte :class:`str` is
to keep the exact same bytes when you send it outside of your code,
changing things to replacement characters should be rasing red flags that
something is wrong. Setting :attr:`errors` to ``strict`` will raise an
exception which gives you an opportunity to fail gracefully.
4. Sometimes you will want to print out the values that you have in your byte
:class:`str`. When you do this you will need to make sure that you
transform :class:`unicode` to :class:`str` before combining them. Also be
sure that any other function calls (including :mod:`gettext`) are going to
give you strings that are the same type. For instance::
print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}
Gotchas and how to avoid them
=============================
Even when you have a good conceptual understanding of how python2 treats
:class:`unicode` and :class:`str` there are still some things that can
surprise you. In most cases this is because, as noted earlier, python or one
of the python libraries you depend on is trying to convert a value
automatically and failing. Explicit conversion at the appropriate place
usually solves that.
str(obj)
--------
One common idiom for getting a simple, string representation of an object is to use::
str(obj)
Unfortunately, this is not safe. Sometimes str(obj) will return
:class:`unicode`. Sometimes it will return a byte :class:`str`. Sometimes,
it will attempt to convert from a :class:`unicode` string to a byte
:class:`str`, fail, and throw a :exc:`UnicodeError`. To be safe from all of
these, first decide whether you need :class:`unicode` or :class:`str` to be
returned. Then use :func:`to_unicode` or :func:`to_bytes` to get the simple
representation like this::
u_representation = to_unicode(obj, nonstring='simplerepr')
b_representation = to_bytes(obj, nonstring='simplerepr')
print
-----
python has a builtin :func:`print` statement that outputs strings to the
terminal. This originated in a time when python only dealt with byte
:class:`str`. When :class:`unicode` strings came about, some enhancements
were made to the :func:`print` statement so that it could print those as well.
The enhancements make :func:`print` work most of the time. However, the times
when it doesn't work tend to make for cryptic debugging.
The basic issue is that :func:`print` has to figure out what encoding to use
when it prints a :class:`unicode` string to the terminal. When python is
attached to your terminal (ie, you're running the interpreter or running
a script that prints to the screen) python is able to take the encoding value
from your locale settings :envvar:`LC_ALL` or :envvar:`LC_CTYPE` and print the
characters allowed by that encoding. On most modern Unix systems, the
encoding is :term:`utf-8` which means that you can print any :class:`unicode`
character without problem.
There are two common cases of things going wrong:
1. Someone has a locale set that does not accept all valid unicode characters.
For instance::
$ LC_ALL=C python
>>> print u'\ufffd'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
This often happens when a script that you've written and debugged from the
terminal is run from an automated environment like :program:`cron`. It
also occurs when you have written a script using a :term:`utf-8` aware
locale and released it for consumption by people all over the internet.
Inevitably, someone is running with a locale that can't handle all unicode
characters and you get a traceback reported.
2. You redirect output to a file. Python isn't using the values in
:envvar:`LC_ALL` unconditionally to decide what encoding to use. Instead
it is using the encoding set for the terminal you are printing to which is
set to accept different encodings by :envvar:`LC_ALL`. If you redirect
to a file, you are no longer printing to the terminal so :envvar:`LC_ALL`
won't have any effect. At this point, python will decide it can't find an
encoding and fallback to :term:`ASCII` which will likely lead to
:exc:`UnicodeError` being raised. You can see this in a short script::
#! /usr/bin/python -tt
print u'\ufffd'
And then look at the difference between running it normally and redirecting to a file:
.. code-block:: console
$ ./test.py
<20>
$ ./test.py > t
Traceback (most recent call last):
File "test.py", line 3, in <module>
print u'\ufffd'
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
The short answer to dealing with this is to always use bytes when writing
output. You can do this by explicitly converting to bytes like this::
from kitchen.text.converters import to_bytes
u_string = u'\ufffd'
print to_bytes(u_string)
or you can wrap stdout and stderr with a :class:`~codecs.StreamWriter`.
A :class:`~codecs.StreamWriter` is convenient in that you can assign it to
encode for :data:`sys.stdout` or :data:`sys.stderr` and then have output
automatically converted but it has the drawback of still being able to throw
:exc:`UnicodeError` if the writer can't encode all possible unicode
codepoints. Kitchen provides an alternate version which can be retrieved with
:func:`kitchen.text.converters.getwriter` which will not traceback in its
standard configuration.
.. _unicode-and-dict-keys:
Unicode, str, and dict keys
---------------------------
The :func:`hash` of the :term:`ASCII` characters is the same for
:class:`unicode` and byte :class:`str`. When you use them in :class:`dict`
keys, they evaluate to the same dictionary slot::
>>> u_string = u'a'
>>> b_string = 'a'
>>> hash(u_string), hash(b_string)
(12416037344, 12416037344)
>>> d = {}
>>> d[u_string] = 'unicode'
>>> d[b_string] = 'bytes'
>>> d
{u'a': 'bytes'}
When you deal with key values outside of :term:`ASCII`, :class:`unicode` and
byte :class:`str` evaluate unequally no matter what their character content or
hash value::
>>> u_string = u'ñ'
>>> b_string = u_string.encode('utf-8')
>>> print u_string
ñ
>>> print b_string
ñ
>>> d = {}
>>> d[u_string] = 'unicode'
>>> d[b_string] = 'bytes'
>>> d
{u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
>>> b_string2 = '\\xf1'
>>> hash(u_string), hash(b_string2)
(30848092528, 30848092528)
>>> d = {}
>>> d[u_string] = 'unicode'
>>> d[b_string2] = 'bytes'
{u'\\xf1': 'unicode', '\\xf1': 'bytes'}
How do you work with this one? Remember rule #1: Keep your :class:`unicode`
and byte :class:`str` values separate. That goes for keys in a dictionary
just like anything else.
* For any given dictionary, make sure that all your keys are either
:class:`unicode` or :class:`str`. **Do not mix the two.** If you're being
given both :class:`unicode` and :class:`str` but you don't need to preserve
separate keys for each, I recommend using :func:`to_unicode` or
:func:`to_bytes` to convert all keys to one type or the other like this::
>>> from kitchen.text.converters import to_unicode
>>> u_string = u'one'
>>> b_string = 'two'
>>> d = {}
>>> d[to_unicode(u_string)] = 1
>>> d[to_unicode(b_string)] = 2
>>> d
{u'two': 2, u'one': 1}
* These issues also apply to using dicts with tuple keys that contain
a mixture of :class:`unicode` and :class:`str`. Once again the best fix
is to standardise on either :class:`str` or :class:`unicode`.
* If you absolutely need to store values in a dictionary where the keys could
be either :class:`unicode` or :class:`str` you can use
:class:`~kitchen.collections.strictdict.StrictDict` which has separate
entries for all :class:`unicode` and byte :class:`str` and deals correctly
with any :class:`tuple` containing mixed :class:`unicode` and byte
:class:`str`.
---------
Functions
---------
Unicode and byte str conversion
===============================
.. autofunction:: kitchen.text.converters.to_unicode
.. autofunction:: kitchen.text.converters.to_bytes
.. autofunction:: kitchen.text.converters.getwriter
.. autofunction:: kitchen.text.converters.to_str
.. autofunction:: kitchen.text.converters.to_utf8
Transformation to XML
=====================
.. autofunction:: kitchen.text.converters.unicode_to_xml
.. autofunction:: kitchen.text.converters.xml_to_unicode
.. autofunction:: kitchen.text.converters.byte_string_to_xml
.. autofunction:: kitchen.text.converters.xml_to_byte_string
.. autofunction:: kitchen.text.converters.bytes_to_xml
.. autofunction:: kitchen.text.converters.xml_to_bytes
.. autofunction:: kitchen.text.converters.guess_encoding_to_xml
.. autofunction:: kitchen.text.converters.to_xml
Working with exception messages
===============================
.. autodata:: kitchen.text.converters.EXCEPTION_CONVERTERS
.. autodata:: kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS
.. autofunction:: kitchen.text.converters.exception_to_unicode
.. autofunction:: kitchen.text.converters.exception_to_bytes

View file

@ -0,0 +1,33 @@
.. automodule:: kitchen.text.display
.. autofunction:: kitchen.text.display.textual_width
.. autofunction:: kitchen.text.display.textual_width_chop
.. autofunction:: kitchen.text.display.textual_width_fill
.. autofunction:: kitchen.text.display.wrap
.. autofunction:: kitchen.text.display.fill
.. autofunction:: kitchen.text.display.byte_string_textual_width_fill
Internal Data
=============
There are a few internal functions and variables in this module. Code outside
of kitchen shouldn't use them but people coding on kitchen itself may find
them useful.
.. autodata:: kitchen.text.display._COMBINING
.. autofunction:: kitchen.text.display._generate_combining_table
.. autofunction:: kitchen.text.display._print_combining_table
.. autofunction:: kitchen.text.display._interval_bisearch
.. autofunction:: kitchen.text.display._ucp_width
.. autofunction:: kitchen.text.display._textual_width_le

View file

@ -0,0 +1,2 @@
.. automodule:: kitchen.text.misc
:members:

View file

@ -0,0 +1,3 @@
.. automodule:: kitchen.text.utf8
:members:
:deprecated:

View file

@ -0,0 +1,22 @@
=============================================
Kitchen.text: unicode and utf8 and xml oh my!
=============================================
The kitchen.text module contains functions that deal with text manipulation.
.. toctree::
api-text-converters
api-text-display
api-text-misc
api-text-utf8
:mod:`~kitchen.text.converters`
deals with converting text for different encodings and to and from XML
:mod:`~kitchen.text.display`
deals with issues with printing text to a screen
:mod:`~kitchen.text.misc`
is a catchall for text manipulation functions that don't seem to fit
elsewhere
:mod:`~kitchen.text.utf8`
contains deprecated functions to manipulate utf8 byte strings

View file

@ -0,0 +1,6 @@
===============================
Helpers for versioning software
===============================
.. automodule:: kitchen.versioning
:members:

220
kitchen3/docs/conf.py Normal file
View file

@ -0,0 +1,220 @@
# -*- coding: utf-8 -*-
#
# Kitchen documentation build configuration file, created by
# sphinx-quickstart on Sat May 22 00:51:26 2010.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
import kitchen.release
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.append(os.path.abspath('.'))
# -- General configuration -----------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.pngmath', 'sphinx.ext.ifconfig']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = kitchen.release.NAME
copyright = kitchen.release.COPYRIGHT
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.2'
# The full version, including alpha/beta/rc tags.
release = kitchen.__version__
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
language = 'en'
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of documents that shouldn't be included in the build.
#unused_docs = []
# List of directories, relative to source directory, that shouldn't be searched
# for source files.
exclude_trees = []
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
show_authors = True
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
highlight_language = 'python'
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. Major themes that come with
# Sphinx are currently 'default' and 'sphinxdoc'.
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Content template for the index page.
html_index = 'index.html'
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_use_modindex = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
html_use_opensearch = kitchen.release.DOWNLOAD_URL + 'docs/'
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = ''
# Output file base name for HTML help builder.
htmlhelp_basename = 'kitchendoc'
# -- Options for LaTeX output --------------------------------------------------
# The paper size ('letter' or 'a4').
#latex_paper_size = 'letter'
# The font size ('10pt', '11pt' or '12pt').
#latex_font_size = '10pt'
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'kitchen.tex', 'kitchen Documentation',
'Toshio Kuratomi', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# Additional stuff for the LaTeX preamble.
#latex_preamble = ''
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_use_modindex = True
automodule_skip_lines = 4
autoclass_content = "class"
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'http://docs.python.org/': None,
'https://fedorahosted.org/releases/p/y/python-fedora/doc/': None,
'https://fedorahosted.org/releases/p/a/packagedb/doc/': None}
rst_epilog = '''
.. |projpage| replace:: project webpage
.. _projpage: %(url)s
.. |docpage| replace:: documentation page
.. _docpage: %(download)s/docs
.. |downldpage| replace:: download page
.. _downldpage: %(download)s
.. |stdlib| replace:: python standard library
.. _stdlib: http://docs.python.org/library
''' % {'url': kitchen.release.URL, 'download': kitchen.release.DOWNLOAD_URL}

View file

@ -0,0 +1,690 @@
.. _DesigningUnicodeAwareAPIs:
============================
Designing Unicode Aware APIs
============================
APIs that deal with byte :class:`str` and :class:`unicode` strings are
difficult to get right. Here are a few strategies with pros and cons of each.
.. contents::
-------------------------------------------------
Take either bytes or unicode, output only unicode
-------------------------------------------------
In this strategy, you allow the user to enter either :class:`unicode` strings
or byte :class:`str` but what you give back is always :class:`unicode`. This
strategy is easy for novice endusers to start using immediately as they will
be able to feed either type of string into the function and get back a string
that they can use in other places.
However, it does lead to the novice writing code that functions correctly when
testing it with :term:`ASCII`-only data but fails when given data that contains
non-:term:`ASCII` characters. Worse, if your API is not designed to be
flexible, the consumer of your code won't be able to easily correct those
problems once they find them.
Here's a good API that uses this strategy::
from kitchen.text.converters import to_unicode
def truncate(msg, max_length, encoding='utf8', errors='replace'):
msg = to_unicode(msg, encoding, errors)
return msg[:max_length]
The call to :func:`truncate` starts with the essential parameters for
performing the task. It ends with two optional keyword arguments that define
the encoding to use to transform from a byte :class:`str` to :class:`unicode`
and the strategy to use if undecodable bytes are encountered. The defaults
may vary depending on the use cases you have in mind. When the output is
generally going to be printed for the user to see, ``errors='replace'`` is
a good default. If you are constructing keys to a database, raisng an
exception (with ``errors='strict'``) may be a better default. In either case,
having both parameters allows the person using your API to choose how they
want to handle any problems. Having the values is also a clue to them that
a conversion from byte :class:`str` to :class:`unicode` string is going to
occur.
.. note::
If you're targeting python-3.1 and above, ``errors='surrogateescape'`` may
be a better default than ``errors='strict'``. You need to be mindful of
a few things when using ``surrogateescape`` though:
* ``surrogateescape`` will cause issues if a non-:term:`ASCII` compatible
encoding is used (for instance, UTF-16 and UTF-32.) That makes it
unhelpful in situations where a true general purpose method of encoding
must be found. :pep:`383` mentions that ``surrogateescape`` was
specifically designed with the limitations of translating using system
locales (where :term:`ASCII` compatibility is generally seen as
inescapable) so you should keep that in mind.
* If you use ``surrogateescape`` to decode from :class:`bytes`
to :class:`unicode` you will need to use an error handler other than
``strict`` to encode as the lone surrogate that this error handler
creates makes for invalid unicode that must be handled when encoding.
In Python-3.1.2 or less, a bug in the encoder error handlers mean that
you can only use ``surrogateescape`` to encode; anything else will throw
an error.
Evaluate your usages of the variables in question to see what makes sense.
Here's a bad example of using this strategy::
from kitchen.text.converters import to_unicode
def truncate(msg, max_length):
msg = to_unicode(msg)
return msg[:max_length]
In this example, we don't have the optional keyword arguments for
:attr:`encoding` and :attr:`errors`. A user who uses this function is more
likely to miss the fact that a conversion from byte :class:`str` to
:class:`unicode` is going to occur. And once an error is reported, they will
have to look through their backtrace and think harder about where they want to
transform their data into :class:`unicode` strings instead of having the
opportunity to control how the conversion takes place in the function itself.
Note that the user does have the ability to make this work by making the
transformation to unicode themselves::
from kitchen.text.converters import to_unicode
msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
new_msg = truncate(msg, 5)
--------------------------------------------------
Take either bytes or unicode, output the same type
--------------------------------------------------
This strategy is sometimes called polymorphic because the type of data that is
returned is dependent on the type of data that is received. The concept is
that when you are given a byte :class:`str` to process, you return a byte
:class:`str` in your output. When you are given :class:`unicode` strings to
process, you return :class:`unicode` strings in your output.
This can work well for end users as the ones that know about the difference
between the two string types will already have transformed the strings to
their desired type before giving it to this function. The ones that don't can
remain blissfully ignorant (at least, as far as your function is concerned) as
the function does not change the type.
In cases where the encoding of the byte :class:`str` is known or can be
discovered based on the input data this works well. If you can't figure out
the input encoding, however, this strategy can fail in any of the following
cases:
1. It needs to do an internal conversion between byte :class:`str` and
:class:`unicode` string.
2. It cannot return the same data as either a :class:`unicode` string or byte
:class:`str`.
3. You may need to deal with byte strings that are not byte-compatible with
:term:`ASCII`
First, a couple examples of using this strategy in a good way::
def translate(msg, table):
replacements = table.keys()
new_msg = []
for index, char in enumerate(msg):
if char in replacements:
new_msg.append(table[char])
else:
new_msg.append(char)
return ''.join(new_msg)
In this example, all of the strings that we use (except the empty string which
is okay because it doesn't have any characters to encode) come from outside of
the function. Due to that, the user is responsible for making sure that the
:attr:`msg`, and the keys and values in :attr:`table` all match in terms of
type (:class:`unicode` vs :class:`str`) and encoding (You can do some error
checking to make sure the user gave all the same type but you can't do the
same for the user giving different encodings). You do not need to make
changes to the string that require you to know the encoding or type of the
string; everything is a simple replacement of one element in the array of
characters in message with the character in table.
::
import json
from kitchen.text.converters import to_unicode, to_bytes
def first_field_from_json_data(json_string):
'''Return the first field in a json data structure.
The format of the json data is a simple list of strings.
'["one", "two", "three"]'
'''
if isinstance(json_string, unicode):
# On all python versions, json.loads() returns unicode if given
# a unicode string
return json.loads(json_string)[0]
# Byte str: figure out which encoding we're dealing with
if '\x00' not in json_data[:2]
encoding = 'utf8'
elif '\x00\x00\x00' == json_data[:3]:
encoding = 'utf-32-be'
elif '\x00\x00\x00' == json_data[1:4]:
encoding = 'utf-32-le'
elif '\x00' == json_data[0] and '\x00' == json_data[2]:
encoding = 'utf-16-be'
else:
encoding = 'utf-16-le'
data = json.loads(unicode(json_string, encoding))
return data[0].encode(encoding)
In this example the function takes either a byte :class:`str` type or
a :class:`unicode` string that has a list in json format and returns the first
field from it as the type of the input string. The first section of code is
very straightforward; we receive a :class:`unicode` string, parse it with
a function, and then return the first field from our parsed data (which our
function returned to us as json data).
The second portion that deals with byte :class:`str` is not so
straightforward. Before we can parse the string we have to determine what
characters the bytes in the string map to. If we didn't do that, we wouldn't
be able to properly find which characters are present in the string. In order
to do that we have to figure out the encoding of the byte :class:`str`.
Luckily, the json specification states that all strings are unicode and
encoded with one of UTF32be, UTF32le, UTF16be, UTF16le, or :term:`UTF-8`. It further
defines the format such that the first two characters are always
:term:`ASCII`. Each of these has a different sequence of NULLs when they
encode an :term:`ASCII` character. We can use that to detect which encoding
was used to create the byte :class:`str`.
Finally, we return the byte :class:`str` by encoding the :class:`unicode` back
to a byte :class:`str`.
As you can see, in this example we have to convert from byte :class:`str` to
:class:`unicode` and back. But we know from the json specification that byte
:class:`str` has to be one of a limited number of encodings that we are able
to detect. That ability makes this strategy work.
Now for some examples of using this strategy in ways that fail::
import unicodedata
def first_char(msg):
'''Return the first character in a string'''
if not isinstance(msg, unicode):
try:
msg = unicode(msg, 'utf8')
except UnicodeError:
msg = unicode(msg, 'latin1')
msg = unicodedata.normalize('NFC', msg)
return msg[0]
If you look at that code and think that there's something fragile and prone to
breaking in the ``try: except:`` block you are correct in being suspicious.
This code will fail on multi-byte character sets that aren't :term:`UTF-8`. It
can also fail on data where the sequence of bytes is valid :term:`UTF-8` but
the bytes are actually of a different encoding. The reasons this code fails
is that we don't know what encoding the bytes are in and the code must convert
from a byte :class:`str` to a :class:`unicode` string in order to function.
In order to make this code robust we must know the encoding of :attr:`msg`.
The only way to know that is to ask the user so the API must do that::
import unicodedata
def number_of_chars(msg, encoding='utf8', errors='strict'):
if not isinstance(msg, unicode):
msg = unicode(msg, encoding, errors)
msg = unicodedata.normalize('NFC', msg)
return len(msg)
Another example of failure::
import os
def listdir(directory):
files = os.listdir(directory)
if isinstance(directory, str):
return files
# files could contain both bytes and unicode
new_files = []
for filename in files:
if not isinstance(filename, unicode):
# What to do here?
continue
new_files.appen(filename)
return new_files
This function illustrates the second failure mode. Here, not all of the
possible values can be represented as :class:`unicode` without knowing more
about the encoding of each of the filenames involved. Since each filename
could have a different encoding there's a few different options to pursue. We
could make this function always return byte :class:`str` since that can
accurately represent anything that could be returned. If we want to return
:class:`unicode` we need to at least allow the user to specify what to do in
case of an error decoding the bytes to :class:`unicode`. We can also let the
user specify the encoding to use for doing the decoding but that won't help in
all cases since not all files will be in the same encoding (or even
necessarily in any encoding)::
import locale
import os
def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
# Note: In python-3.1+, surrogateescape may be a better default
files = os.listdir(directory)
if isinstance(directory, str):
return files
new_files = []
for filename in files:
if not isinstance(filename, unicode):
filename = unicode(filename, encoding=encoding, errors=errors)
new_files.append(filename)
return new_files
Note that although we use :attr:`errors` in this example as what to pass to
the codec that decodes to :class:`unicode` we could also have an
:attr:`errors` argument that decides other things to do like skip a filename
entirely, return a placeholder (``Nondisplayable filename``), or raise an
exception.
This leaves us with one last failure to describe::
def first_field(csv_string):
'''Return the first field in a comma separated values string.'''
try:
return csv_string[:csv_string.index(',')]
except ValueError:
return csv_string
This code looks simple enough. The hidden error here is that we are searching
for a comma character in a byte :class:`str` but not all encodings will use
the same sequence of bytes to represent the comma. If you use an encoding
that's not :term:`ASCII` compatible on the byte level, then the literal comma
``','`` in the above code will match inappropriate bytes. Some examples of
how it can fail:
* Will find the byte representing an :term:`ASCII` comma in another character
* Will find the comma but leave trailing garbage bytes on the end of the
string
* Will not match the character that represents the comma in this encoding
There are two ways to solve this. You can either take the encoding value from
the user or you can take the separator value from the user. Of the two,
taking the encoding is the better option for two reasons:
1. Taking a separator argument doesn't clearly document for the API user that
the reason they must give it is to properly match the encoding of the
:attr:`csv_string`. They're just as likely to think that it's simply a way
to specify an alternate character (like ":" or "|") for the separator.
2. It's possible for a variable width encoding to reuse the same byte sequence
for different characters in multiple sequences.
.. note::
:term:`UTF-8` is resistant to this as any character's sequence of
bytes will never be a subset of another character's sequence of bytes.
With that in mind, here's how to improve the API::
def first_field(csv_string, encoding='utf-8', errors='replace'):
if not isinstance(csv_string, unicode):
u_string = unicode(csv_string, encoding, errors)
is_unicode = False
else:
u_string = csv_string
try:
field = u_string[:U_string.index(u',')]
except ValueError:
return csv_string
if not is_unicode:
field = field.encode(encoding, errors)
return field
.. note::
If you decide you'll never encounter a variable width encoding that reuses
byte sequences you can use this code instead::
def first_field(csv_string, encoding='utf-8'):
try:
return csv_string[:csv_string.index(','.encode(encoding))]
except ValueError:
return csv_string
------------------
Separate functions
------------------
Sometimes you want to be able to take either byte :class:`str` or
:class:`unicode` strings, perform similar operations on either one and then
return data in the same format as was given. Probably the easiest way to do
that is to have separate functions for each and adopt a naming convention to
show that one is for working with byte :class:`str` and the other is for
working with :class:`unicode` strings::
def translate_b(msg, table):
'''Replace values in str with other byte values like unicode.translate'''
if not isinstance(msg, str):
raise TypeError('msg must be of type str')
str_table = [chr(s) for s in xrange(0,256)]
delete_chars = []
for chr_val in (k for k in table.keys() if isinstance(k, int)):
if chr_val > 255:
raise ValueError('Keys in table must not exceed 255)')
if table[chr_val] == None:
delete_chars.append(chr(chr_val))
elif isinstance(table[chr_val], int):
if table[chr_val] > 255:
raise TypeError('table values cannot be more than 255 or less than 0')
str_table[chr_val] = chr(table[chr_val])
else:
if not isinstance(table[chr_val], str):
raise TypeError('character mapping must return integer, None or str')
str_table[chr_val] = table[chr_val]
str_table = ''.join(str_table)
delete_chars = ''.join(delete_chars)
return msg.translate(str_table, delete_chars)
def translate(msg, table):
'''Replace values in a unicode string with other values'''
if not isinstance(msg, unicode):
raise TypeError('msg must be of type unicode')
return msg.translate(table)
There's several things that we have to do in this API:
* Because the function names might not be enough of a clue to the user of the
functions of the value types that are expected, we have to check that the
types are correct.
* We keep the behaviour of the two functions as close to the same as possible,
just with byte :class:`str` and :class:`unicode` strings substituted for
each other.
-----------------------------------------------------------------
Deciding whether to take str or unicode when no value is returned
-----------------------------------------------------------------
Not all functions have a return value. Sometimes a function is there to
interact with something external to python, for instance, writing a file out
to disk or a method exists to update the internal state of a data structure.
One of the main questions with these APIs is whether to take byte
:class:`str`, :class:`unicode` string, or both. The answer depends on your
use case but I'll give some examples here.
Writing to external data
========================
When your information is going to an external data source like writing to
a file you need to decide whether to take in :class:`unicode` strings or byte
:class:`str`. Remember that most external data sources are not going to be
dealing with unicode directly. Instead, they're going to be dealing with
a sequence of bytes that may be interpreted as unicode. With that in mind,
you either need to have the user give you a byte :class:`str` or convert to
a byte :class:`str` inside the function.
Next you need to think about the type of data that you're receiving. If it's
textual data, (for instance, this is a chat client and the user is typing
messages that they expect to be read by another person) it probably makes sense to
take in :class:`unicode` strings and do the conversion inside your function.
On the other hand, if this is a lower level function that's passing data into
a network socket, it probably should be taking byte :class:`str` instead.
Just as noted in the API notes above, you should specify an :attr:`encoding`
and :attr:`errors` argument if you need to transform from :class:`unicode`
string to byte :class:`str` and you are unable to guess the encoding from the
data itself.
Updating data structures
========================
Sometimes your API is just going to update a data structure and not
immediately output that data anywhere. Just as when writing external data,
you should think about both what your function is going to do with the data
eventually and what the caller of your function is thinking that they're
giving you. Most of the time, you'll want to take :class:`unicode` strings
and enter them into the data structure as :class:`unicode` when the data is
textual in nature. You'll want to take byte :class:`str` and enter them into
the data structure as byte :class:`str` when the data is not text. Use
a naming convention so the user knows what's expected.
-------------
APIs to Avoid
-------------
There are a few APIs that are just wrong. If you catch yourself making an API
that does one of these things, change it before anyone sees your code.
Returning unicode unless a conversion fails
===========================================
This type of API usually deals with byte :class:`str` at some point and
converts it to :class:`unicode` because it's usually thought to be text.
However, there are times when the bytes fail to convert to a :class:`unicode`
string. When that happens, this API returns the raw byte :class:`str` instead
of a :class:`unicode` string. One example of this is present in the |stdlib|_:
python2's :func:`os.listdir`::
>>> import os
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> os.mkdir('/tmp/mine')
>>> os.chdir('/tmp/mine')
>>> open('nonsense_char_\xff', 'w').close()
>>> open('all_ascii', 'w').close()
>>> os.listdir(u'.')
[u'all_ascii', 'nonsense_char_\xff']
The problem with APIs like this is that they cause failures that are hard to
debug because they don't happen where the variables are set. For instance,
let's say you take the filenames from :func:`os.listdir` and give it to this
function::
def normalize_filename(filename):
'''Change spaces and dashes into underscores'''
return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})
When you test this, you use filenames that all are decodable in your preferred
encoding and everything seems to work. But when this code is run on a machine
that has filenames in multiple encodings the filenames returned by
:func:`os.listdir` suddenly include byte :class:`str`. And byte :class:`str`
has a different :func:`string.translate` function that takes different values.
So the code raises an exception where it's not immediately obvious that
:func:`os.listdir` is at fault.
Ignoring values with no chance of recovery
==========================================
An early version of python3 attempted to fix the :func:`os.listdir` problem
pointed out in the last section by returning all values that were decodable to
:class:`unicode` and omitting the filenames that were not. This lead to the
following output::
>>> import os
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> os.mkdir('/tmp/mine')
>>> os.chdir('/tmp/mine')
>>> open(b'nonsense_char_\xff', 'w').close()
>>> open('all_ascii', 'w').close()
>>> os.listdir('.')
['all_ascii']
The issue with this type of code is that it is silently doing something
surprising. The caller expects to get a full list of files back from
:func:`os.listdir`. Instead, it silently ignores some of the files, returning
only a subset. This leads to code that doesn't do what is expected that may
go unnoticed until the code is in production and someone notices that
something important is being missed.
Raising a UnicodeException with no chance of recovery
=====================================================
Believe it or not, a few libraries exist that make it impossible to deal
with unicode text without raising a :exc:`UnicodeError`. What seems to occur
in these libraries is that the library has functions that expect to receive
a :class:`unicode` string. However, internally, those functions call other
functions that expect to receive a byte :class:`str`. The programmer of the
API was smart enough to convert from a :class:`unicode` string to a byte
:class:`str` but they did not give the user the chance to specify the
encodings to use or how to deal with errors. This results in exceptions when
the user passes in a byte :class:`str` because the initial function wants
a :class:`unicode` string and exceptions when the user passes in
a :class:`unicode` string because the function can't convert the string to
bytes in the encoding that it's selected.
Do not put the user in the position of not being able to use your API without
raising a :exc:`UnicodeError` with certain values. If you can only safely
take :class:`unicode` strings, document that byte :class:`str` is not allowed
and vice versa. If you have to convert internally, make sure to give the
caller of your function parameters to control the encoding and how to treat
errors that may occur during the encoding/decoding process. If your code will
raise a :exc:`UnicodeError` with non-:term:`ASCII` values no matter what, you
should probably rethink your API.
-----------------
Knowing your data
-----------------
If you've read all the way down to this section without skipping you've seen
several admonitions about the type of data you are processing affecting the
viability of the various API choices.
Here's a few things to consider in your data:
Do you need to operate on both bytes and unicode?
=================================================
Much of the data in libraries, programs, and the general environment outside
of python is written where strings are sequences of bytes. So when we
interact with data that comes from outside of python or data that is about to
leave python it may make sense to only operate on the data as a byte
:class:`str`. There's two times when this may make sense:
1. The user is intended to hand the data to the function and then the function
takes care of sending the data outside of python (to the filesystem, over
the network, etc).
2. The data is not representable as text. For instance, writing a binary
file format.
Even when your code is operating in this area you still need to think a little
more about your data. For instance, it might make sense for the person using
your API to pass in :class:`unicode` strings and let the function convert that
into the byte :class:`str` that it then sends over the wire.
There are also times when it might make sense to operate only on
:class:`unicode` strings. :class:`unicode` represents text so anytime that
you are working on textual data that isn't going to leave python it has the
potential to be a :class:`unicode`-only API. However, there's two things that
you should consider when designing a :class:`unicode`-only API:
1. As your API gains popularity, people are going to use your API in places
that you may not have thought of. Corner cases in these other places may
mean that processing bytes is desirable.
2. In python2, byte :class:`str` and :class:`unicode` are often used
interchangably with each other. That means that people programming against
your API may have received :class:`str` from some other API and it would be
most convenient for their code if your API accepted it.
.. note::
In python3, the separation between the text type and the byte type
are more clear. So in python3, there's less need to have all APIs take
both unicode and bytes.
Can you restrict the encodings?
===============================
If you determine that you have to deal with byte :class:`str` you should
realize that not all encodings are created equal. Each has different
properties that may make it possible to provide a simpler API provided that
you can reasonably tell the users of your API that they cannot use certain
classes of encodings.
As one example, if you are required to find a comma (``,``) in a byte
:class:`str` you have different choices based on what encodings are allowed.
If you can reasonably restrict your API users to only giving :term:`ASCII
compatible` encodings you can do this simply by searching for the literal
comma character because that character will be represented by the same byte
sequence in all :term:`ASCII compatible` encodings.
The following are some classes of encodings to be aware of as you decide how
generic your code needs to be.
Single byte encodings
---------------------
Single byte encodings can only represent 256 total characters. They encode
the :term:`code points` for a character to the equivalent number in a single
byte.
Most single byte encodings are :term:`ASCII compatible`. :term:`ASCII
compatible` encodings are the most likely to be usable without changes to code
so this is good news. A notable exception to this is the `EBDIC
<http://en.wikipedia.org/wiki/Extended_Binary_Coded_Decimal_Interchange_Code>`_
family of encodings.
Multibyte encodings
-------------------
Multibyte encodings use more than one byte to encode some characters.
Fixed width
~~~~~~~~~~~
Fixed width encodings have a set number of bytes to represent all of the
characters in the character set. ``UTF-32`` is an example of a fixed width
encoding that uses four bytes per character and can express every unicode
characters. There are a number of problems with writing APIs that need to
operate on fixed width, multibyte characters. To go back to our earlier
example of finding a comma in a string, we have to realize that even in
``UTF-32`` where the :term:`code point` for :term:`ASCII` characters is the
same as in :term:`ASCII`, the byte sequence for them is different. So you
cannot search for the literal byte character as it may pick up false
positives and may break a byte sequence in an odd place.
Variable Width
~~~~~~~~~~~~~~
ASCII compatible
""""""""""""""""
:term:`UTF-8` and the `EUC <http://en.wikipedia.org/wiki/Extended_Unix_Code>`_
family of encodings are examples of :term:`ASCII compatible` multi-byte
encodings. They achieve this by adhering to two principles:
* All of the :term:`ASCII` characters are represented by the byte that they
are in the :term:`ASCII` encoding.
* None of the :term:`ASCII` byte sequences are reused in any other byte
sequence for a different character.
Escaped
"""""""
Some multibyte encodings work by using only bytes from the :term:`ASCII`
encoding but when a particular sequence of those byes is found, they are
interpreted as meaning something other than their :term:`ASCII` values.
``UTF-7`` is one such encoding that can encode all of the unicode
:term:`code points`. For instance, here's a some Japanese characters encoded as
``UTF-7``::
>>> a = u'\u304f\u3089\u3068\u307f'
>>> print a
くらとみ
>>> print a.encode('utf-7')
+ME8wiTBoMH8-
These encodings can be used when you need to encode unicode data that may
contain non-:term:`ASCII` characters for inclusion in an :term:`ASCII` only
transport medium or file.
However, they are not :term:`ASCII compatible` in the sense that we used
earlier as the bytes that represent a :term:`ASCII` character are being reused
as part of other characters. If you were to search for a literal plus sign in
this encoded string, you would run across many false positives, for instance.
Other
"""""
There are many other popular variable width encodings, for instance ``UTF-16``
and ``shift-JIS``. Many of these are not :term:`ASCII compatible` so you
cannot search for a literal :term:`ASCII` character without danger of false
positives or false negatives.

107
kitchen3/docs/glossary.rst Normal file
View file

@ -0,0 +1,107 @@
========
Glossary
========
.. glossary::
"Everything but the kitchen sink"
An English idiom meaning to include nearly everything that you can
think of.
API version
Version that is meant for computer consumption. This version is
parsable and comparable by computers. It contains information about
a library's API so that computer software can decide whether it works
with the software.
ASCII
A character encoding that maps numbers to characters essential to
American English. It maps 128 characters using 7bits.
.. seealso:: http://en.wikipedia.org/wiki/ASCII
ASCII compatible
An encoding in which the particular byte that maps to a character in
the :term:`ASCII` character set is only used to map to that character.
This excludes EBDIC based encodings and many multi-byte fixed and
variable width encodings since they reuse the bytes that make up the
:term:`ASCII` encoding for other purposes. :term:`UTF-8` is notable
as a variable width encoding that is :term:`ASCII` compatible.
.. seealso::
http://en.wikipedia.org/wiki/Variable-width_encoding
For another explanation of various ways bytes are mapped to
characters in a possibly incompatible manner.
code points
:term:`code point`
code point
A number that maps to a particular abstract character. Code points
make it so that we have a number pointing to a character without
worrying about implementation details of how those numbers are stored
for the computer to read. Encodings define how the code points map to
particular sequences of bytes on disk and in memory.
control characters
:term:`control character`
control character
The set of characters in unicode that are used, not to display glyphs
on the screen, but to tell the display in program to do something.
.. seealso:: http://en.wikipedia.org/wiki/Control_character
grapheme
characters or pieces of characters that you might write on a page to
make words, sentences, or other pieces of text.
.. seealso:: http://en.wikipedia.org/wiki/Grapheme
I18N
I18N is an abbreviation for internationalization. It's often used to
signify the need to translate words, number and date formats, and
other pieces of data in a computer program so that it will work well
for people who speak another language than yourself.
message catalogs
:term:`message catalog`
message catalog
Message catalogs contain translations for user-visible strings that
are present in your code. Normally, you need to mark the strings to
be translated by wrapping them in one of several :mod:`gettext`
functions. The function serves two purposes:
1. It allows automated tools to find which strings are supposed to be
extracted for translation.
2. The functions perform the translation when the program is running.
.. seealso::
`babel's documentation
<http://babel.edgewall.org/wiki/Documentation/messages.html>`_
for one method of extracting message catalogs from source
code.
Murphy's Law
"Anything that can go wrong, will go wrong."
.. seealso:: http://en.wikipedia.org/wiki/Murphy%27s_Law
release version
Version that is meant for human consumption. This version is easy for
a human to look at to decide how a particular version relates to other
versions of the software.
textual width
The amount of horizontal space a character takes up on a monospaced
screen. The units are number of character cells or columns that it
takes the place of.
UTF-8
A character encoding that maps all unicode :term:`code points` to a sequence
of bytes. It is compatible with :term:`ASCII`. It uses a variable
number of bytes to encode all of unicode. ASCII characters take one
byte. Characters from other parts of unicode take two to four bytes.
It is widespread as an encoding on the internet and in Linux.

359
kitchen3/docs/hacking.rst Normal file
View file

@ -0,0 +1,359 @@
=======================================
Conventions for contributing to kitchen
=======================================
-----
Style
-----
* Strive to be :pep:`8` compliant
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
------------------------
Python 2.4 compatibility
------------------------
At the moment, we're supporting python-2.4 and above. Understand that there's
a lot of python features that we cannot use because of this.
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
available. When we do that we need to be careful of several things:
1. Keep the module in sync with the version in the python-2.x trunk. Use
:file:`maintainers/sync-copied-files.py` for this.
2. Sync the unittests as well as the module.
3. Be aware that not all modules are written to remain compatible with
Python-2.4 and might use python language features that were not present
then (generator expressions, relative imports, decorators, with, try: with
both except: and finally:, etc) These are not good candidates for
importing into kitchen as they require more work to keep synced.
---------
Unittests
---------
* At least smoketest your code (make sure a function will return expected
values for one set of inputs).
* Note that even 100% coverage is not a guarantee of working code! Good tests
will realize that you need to also give multiple inputs that test the code
paths of called functions that are outside of your code. Example::
def to_unicode(msg, encoding='utf8', errors='replace'):
return unicode(msg, encoding, errors)
# Smoketest only. This will give 100% coverage for your code (it
# tests all of the code inside of to_unicode) but it leaves a lot of
# room for errors as it doesn't test all combinations of arguments
# that are then passed to the unicode() function.
tools.ok_(to_unicode('abc') == u'abc')
# Better -- tests now cover non-ascii characters and that error conditions
# occur properly. There's a lot of other permutations that can be
# added along these same lines.
tools.ok_(to_unicode(u'café', 'utf8', 'replace'))
tools.assert_raises(UnicodeError, to_unicode, [u'cafè ñunru'.encode('latin1')])
* We're using nose for unittesting. Rather than depend on unittest2
functionality, use the functions that nose provides.
* Remember to maintain python-2.4 compatibility even in unittests.
----------------------------
Docstrings and documentation
----------------------------
We use sphinx to build our documentation. We use the sphinx autodoc extension
to pull docstrings out of the modules for API documentation. This means that
docstrings for subpackages and modules should follow a certain pattern. The
general structure is:
* Introductory material about a module in the module's top level docstring.
* Introductory material should begin with a level two title: an overbar and
underbar of '-'.
* docstrings for every function.
* The first line is a short summary of what the function does
* This is followed by a blank line
* The next lines are a `field list
<http://sphinx.pocoo.org/markup/desc.html#info-field-lists>_` giving
information about the function's signature. We use the keywords:
``arg``, ``kwarg``, ``raises``, ``returns``, and sometimes ``rtype``. Use
these to describe all arguments, key word arguments, exceptions raised,
and return values using these.
* Parameters that are ``kwarg`` should specify what their default
behaviour is.
.. _kitchen-versioning:
------------------
Kitchen versioning
------------------
Currently the kitchen library is in early stages of development. While we're
in this state, the main kitchen library uses the following pattern for version
information:
* Versions look like this::
__version_info__ = ((0, 1, 2),)
__version__ = '0.1.2'
* The Major version number remains at 0 until we decide to make the first 1.0
release of kitchen. At that point, we're declaring that we have some
confidence that we won't need to break backwards compatibility for a while.
* The Minor version increments for any backwards incompatible API changes.
When this is updated, we reset micro to zero.
* The Micro version increments for any other changes (backwards compatible API
changes, pure bugfixes, etc).
.. note::
Versioning is only updated for releases that generate sdists and new
uploads to the download directory. Usually we update the version
information for the library just before release. By contrast, we update
kitchen :ref:`subpackage-versioning` when an API change is made. When in
doubt, look at the version information in the last release.
----
I18N
----
All strings that are used as feedback for users need to be translated.
:mod:`kitchen` sets up several functions for this. :func:`_` is used for
marking things that are shown to users via print, GUIs, or other "standard"
methods. Strings for exceptions are marked with :func:`b_`. This function
returns a byte :class:`str` which is needed for use with exceptions::
from kitchen import _, b_
def print_message(msg, username):
print _('%(user)s, your message of the day is: %(message)s') % {
'message': msg, 'user': username}
raise Exception b_('Test message')
This serves several purposes:
* It marks the strings to be extracted by an xgettext-like program.
* :func:`_` is a function that will substitute available translations at
runtime.
.. note::
By using the ``%()s with dict`` style of string formatting, we make this
string friendly to translators that may need to reorder the variables when
they're translating the string.
`paver <http://www.blueskyonmars.com/projects/paver/>_` and `babel
<http://babel.edgewall.org/>_` are used to extract the strings.
-----------
API updates
-----------
Kitchen strives to have a long deprecation cycle so that people have time to
switch away from any APIs that we decide to discard. Discarded APIs should
raise a :exc:`DeprecationWarning` and clearly state in the warning message and
the docstring how to convert old code to use the new interface. An example of
deprecating a function::
import warnings
from kitchen import _
from kitchen.text.converters import to_bytes, to_unicode
from kitchen.text.new_module import new_function
def old_function(param):
'''**Deprecated**
This function is deprecated. Use
:func:`kitchen.text.new_module.new_function` instead. If you want
unicode strngs as output, switch to::
>>> from kitchen.text.new_module import new_function
>>> output = new_function(param)
If you want byte strings, use::
>>> from kitchen.text.new_module import new_function
>>> from kitchen.text.converters import to_bytes
>>> output = to_bytes(new_function(param))
'''
warnings.warn(_('kitchen.text.old_function is deprecated. Use'
' kitchen.text.new_module.new_function instead'),
DeprecationWarning, stacklevel=2)
as_unicode = isinstance(param, unicode)
message = new_function(to_unicode(param))
if not as_unicode:
message = to_bytes(message)
return message
If a particular API change is very intrusive, it may be better to create a new
version of the subpackage and ship both the old version and the new version.
---------
NEWS file
---------
Update the :file:`NEWS` file when you make a change that will be visible to
the users. This is not a ChangeLog file so we don't need to list absolutely
everything but it should give the user an idea of how this version differs
from prior versions. API changes should be listed here explicitly. bugfixes
can be more general::
-----
0.2.0
-----
* Relicense to LGPLv2+
* Add kitchen.text.format module with the following functions:
textual_width, textual_width_chop.
* Rename the kitchen.text.utils module to kitchen.text.misc. use of the
old names is deprecated but still available.
* bugfixes applied to kitchen.pycompat24.defaultdict that fixes some
tracebacks
-------------------
Kitchen subpackages
-------------------
Kitchen itself is a namespace. The kitchen sdist (tarball) provides certain
useful subpackages.
.. seealso::
`Kitchen addon packages`_
For information about subpackages not distributed in the kitchen sdist
that install into the kitchen namespace.
.. _subpackage-versioning:
Versioning
==========
Each subpackage should have its own version information which is independent
of the other kitchen subpackages and the main kitchen library version. This is
used so that code that depends on kitchen APIs can check the version
information. The standard way to do this is to put something like this in the
subpackage's :file:`__init__.py`::
from kitchen.versioning import version_tuple_to_string
__version_info__ = ((1, 0, 0),)
__version__ = version_tuple_to_string(__version_info__)
:attr:`__version_info__` is documented in :mod:`kitchen.versioning`. The
values of the first tuple should describe API changes to the module. There
are at least three numbers present in the tuple: (Major, minor, micro). The
major version number is for backwards incompatible changes (For
instance, removing a function, or adding a new mandatory argument to
a function). Whenever one of these occurs, you should increment the major
number and reset minor and micro to zero. The second number is the minor
version. Anytime new but backwards compatible changes are introduced this
number should be incremented and the micro version number reset to zero. The
micro version should be incremented when a change is made that does not change
the API at all. This is a common case for bugfixes, for instance.
Version information beyond the first three parts of the first tuple may be
useful for versioning but semantically have similar meaning to the micro
version.
.. note::
We update the :attr:`__version_info__` tuple when the API is updated.
This way there's less chance of forgetting to update the API version when
a new release is made. However, we try to only increment the version
numbers a single step for any release. So if kitchen-0.1.0 has
kitchen.text.__version__ == '1.0.1', kitchen-0.1.1 should have
kitchen.text.__version__ == '1.0.2' or '1.1.0' or '2.0.0'.
Criteria for subpackages in kitchen
===================================
Supackages within kitchen should meet these criteria:
* Generally useful or needed for other pieces of kitchen.
* No mandatory requirements outside of the |stdlib|_.
* Optional requirements from outside the |stdlib|_ are allowed. Things with
mandatory requirements are better placed in `kitchen addon packages`_
* Somewhat API stable -- this is not a hard requirement. We can change the
kitchen api. However, it is better not to as people may come to depend on
it.
.. seealso::
`API Updates`_
----------------------
Kitchen addon packages
----------------------
Addon packages are very similar to subpackages integrated into the kitchen
sdist. This section just lists some of the differences to watch out for.
setup.py
========
Your :file:`setup.py` should contain entries like this::
# It's suggested to use a dotted name like this so the package is easily
# findable on pypi:
setup(name='kitchen.config',
# Include kitchen in the keywords, again, for searching on pypi
keywords=['kitchen', 'configuration'],
# This package lives in the directory kitchen/config
packages=['kitchen.config'],
# [...]
)
Package directory layout
========================
Create a :file:`kitchen` directory in the toplevel. Place the addon
subpackage in there. For example::
./ <== toplevel with README, setup.py, NEWS, etc
kitchen/
kitchen/__init__.py
kitchen/config/ <== subpackage directory
kitchen/config/__init__.py
Fake kitchen module
===================
The :file::`__init__.py` in the :file:`kitchen` directory is special. It
won't be installed. It just needs to pull in the kitchen from the system so
that you are able to test your module. You should be able to use this
boilerplate::
# Fake module. This is not installed, It's just made to import the real
# kitchen modules for testing this module
import pkgutil
# Extend the __path__ with everything in the real kitchen module
__path__ = pkgutil.extend_path(__path__, __name__)
.. note::
:mod:`kitchen` needs to be findable by python for this to work. Installed
in the :file:`site-packages` directory or adding it to the
:envvar:`PYTHONPATH` will work.
Your unittests should now be able to find both your submodule and the main
kitchen module.
Versioning
==========
It is recommended that addon packages version similarly to
:ref:`subpackage-versioning`. The :data:`__version_info__` and
:data:`__version__` strings can be changed independently of the version
exposed by setup.py so that you have both an API version
(:data:`__version_info__`) and release version that's easier for people to
parse. However, you aren't required to do this and you could follow
a different methodology if you want (for instance, :ref:`kitchen-versioning`)

140
kitchen3/docs/index.rst Normal file
View file

@ -0,0 +1,140 @@
================================
Kitchen, everything but the sink
================================
:Author: Toshio Kuratomi
:Date: 19 March 2011
:Version: 1.0.x
We've all done it. In the process of writing a brand new application we've
discovered that we need a little bit of code that we've invented before.
Perhaps it's something to handle unicode text. Perhaps it's something to make
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
a tiny bit of code that seems too small to worry about pushing into its own
module so it sits there, a part of your current project, waiting to be cut and
pasted into your next project. And the next. And the next. And since that
little bittybit of code proved so useful to you, it's highly likely that it
proved useful to someone else as well. Useful enough that they've written it
and copy and pasted it over and over into each of their new projects.
Well, no longer! Kitchen aims to pull these small snippets of code into a few
python modules which you can import and use within your project. No more copy
and paste! Now you can let someone else maintain and release these small
snippets so that you can get on with your life.
This package forms the core of Kitchen. It contains some useful modules for
using newer |stdlib|_ modules on older python versions, text manipulation,
:pep:`386` versioning, and initializing :mod:`gettext`. With this package we're
trying to provide a few useful features that don't have too many dependencies
outside of the |stdlib|_. We'll be releasing other modules that drop into the
kitchen namespace to add other features (possibly with larger deps) as time
goes on.
------------
Requirements
------------
We've tried to keep the core kitchen module's requirements lightweight. At the
moment kitchen only requires
:python: 2.4 or later
.. warning:: Kitchen-1.1.0 was the last release to support python-2.3.x.
Soft Requirements
=================
If found, these libraries will be used to make the implementation of some part
of kitchen better in some way. If they are not present, the API that they
enable will still exist but may function in a different manner.
`chardet <http://pypi.python.org/pypi/chardet>`_
Used in :func:`~kitchen.text.misc.guess_encoding` and
:func:`~kitchen.text.converters.guess_encoding_to_xml` to help guess
encoding of byte strings being converted. If not present, unknown
encodings will be converted as if they were ``latin1``
---------------------------
Other Recommended Libraries
---------------------------
These libraries implement commonly used functionality that everyone seems to
invent. Rather than reinvent their wheel, I simply list the things that they
do well for now. Perhaps if people can't find them normally, I'll add them as
requirements in :file:`setup.py` or link them into kitchen's namespace. For
now, I just mention them here:
`bunch <http://pypi.python.org/pypi/bunch/>`_
Bunch is a dictionary that you can use attribute lookup as well as bracket
notation to access. Setting it apart from most homebrewed implementations
is the :func:`bunchify` function which will descend nested structures of
lists and dicts, transforming the dicts to Bunch's.
`hashlib <http://code.krypto.org/python/hashlib/>`_
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
hash functions to python. If you're developing for python2.4 though, you
can install the standalone hashlib library and have access to the same
functions.
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
The python documentation for :mod:`itertools` has some examples
of other nice iterable functions that can be built from the
:mod:`itertools` functions. This third-party module creates those recipes
as a module.
`ordereddict <http://pypi.python.org/pypi/ordereddict/>`_
Python 2.7 and forward have a :mod:`~collections.OrderedDict` that
provides a :class:`dict` whose items are ordered (and indexable) as well
as named.
`unittest2 <http://pypi.python.org/pypi/unittest2>`_
Python 2.7 has an updated :mod:`unittest` library with new functions not
present in the |stdlib|_ for Python 2.6 or less. If you want to use those
new functions but need your testing framework to be compatible with older
Python the unittest2 library provides the update as an external module.
`nose <http://somethingaboutorange.com/mrl/projects/nose/>`_
If you want to use a test discovery tool instead of the unittest
framework, nosetests provides a simple to use way to do that.
-------
License
-------
This python module is distributed under the terms of the
`GNU Lesser General Public License Version 2 or later
<http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html>`_.
.. note:: Some parts of this module are licensed under terms less restrictive
than the LGPLv2+. If you separate these files from the work as a whole
you are allowed to use them under the less restrictive licenses. The
following is a list of the files that are known:
`Python 2 license <http://www.python.org/download/releases/2.4/license/>`_
:file:`_subprocess.py`, :file:`test_subprocess.py`,
:file:`defaultdict.py`, :file:`test_defaultdict.py`,
:file:`_base64.py`, and :file:`test_base64.py`
--------
Contents
--------
.. toctree::
:maxdepth: 2
tutorial
api-overview
porting-guide-0.3
hacking
glossary
------------------
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
-------------
Project Pages
-------------
More information about the project can be found on the |projpage|_
The latest published version of this documentation can be found on the |docpage|_

View file

@ -0,0 +1,209 @@
===================
1.0.0 Porting Guide
===================
The 0.1 through 1.0.0 releases focused on bringing in functions from yum and
python-fedora. This porting guide tells how to port from those APIs to their
kitchen replacements.
-------------
python-fedora
-------------
=================================== ===================
python-fedora kitchen replacement
----------------------------------- -------------------
:func:`fedora.iterutils.isiterable` :func:`kitchen.iterutils.isiterable` [#f1]_
:func:`fedora.textutils.to_unicode` :func:`kitchen.text.converters.to_unicode`
:func:`fedora.textutils.to_bytes` :func:`kitchen.text.converters.to_bytes`
=================================== ===================
.. [#f1] :func:`~kitchen.iterutils.isiterable` has changed slightly in
kitchen. The :attr:`include_string` attribute has switched its default value
from :data:`True` to :data:`False`. So you need to change code like::
>>> # Old code
>>> isiterable('abcdef')
True
>>> # New code
>>> isiterable('abcdef', include_string=True)
True
---
yum
---
================================= ===================
yum kitchen replacement
--------------------------------- -------------------
:func:`yum.i18n.dummy_wrapper` :meth:`kitchen.i18n.DummyTranslations.ugettext` [#y1]_
:func:`yum.i18n.dummyP_wrapper` :meth:`kitchen.i18n.DummyTanslations.ungettext` [#y1]_
:func:`yum.i18n.utf8_width` :func:`kitchen.text.display.textual_width`
:func:`yum.i18n.utf8_width_chop` :func:`kitchen.text.display.textual_width_chop`
and :func:`kitchen.text.display.textual_width` [#y2]_ [#y4]_
:func:`yum.i18n.utf8_valid` :func:`kitchen.text.misc.byte_string_valid_encoding`
:func:`yum.i18n.utf8_text_wrap` :func:`kitchen.text.display.wrap` [#y3]_
:func:`yum.i18n.utf8_text_fill` :func:`kitchen.text.display.fill` [#y3]_
:func:`yum.i18n.to_unicode` :func:`kitchen.text.converters.to_unicode` [#y5]_
:func:`yum.i18n.to_unicode_maybe` :func:`kitchen.text.converters.to_unicode` [#y5]_
:func:`yum.i18n.to_utf8` :func:`kitchen.text.converters.to_bytes` [#y5]_
:func:`yum.i18n.to_str` :func:`kitchen.text.converters.to_unicode`
or :func:`kitchen.text.converters.to_bytes` [#y6]_
:func:`yum.i18n.str_eq` :func:`kitchen.text.misc.str_eq`
:func:`yum.misc.to_xml` :func:`kitchen.text.converters.unicode_to_xml`
or :func:`kitchen.text.converters.byte_string_to_xml` [#y7]_
:func:`yum.i18n._` See: :ref:`yum-i18n-init`
:func:`yum.i18n.P_` See: :ref:`yum-i18n-init`
:func:`yum.i18n.exception2msg` :func:`kitchen.text.converters.exception_to_unicode`
or :func:`kitchen.text.converter.exception_to_bytes` [#y8]_
================================= ===================
.. [#y1] These yum methods provided fallback support for :mod:`gettext`
functions in case either ``gaftonmode`` was set or :mod:`gettext` failed
to return an object. In kitchen, we can use the
:class:`kitchen.i18n.DummyTranslations` object to fulfill that role.
Please see :ref:`yum-i18n-init` for more suggestions on how to do this.
.. [#y2] The yum version of these functions returned a byte :class:`str`. The
kitchen version listed here returns a :class:`unicode` string. If you
need a byte :class:`str` simply call
:func:`kitchen.text.converters.to_bytes` on the result.
.. [#y3] The yum version of these functions would return either a byte
:class:`str` or a :class:`unicode` string depending on what the input
value was. The kitchen version always returns :class:`unicode` strings.
.. [#y4] :func:`yum.i18n.utf8_width_chop` performed two functions. It
returned the piece of the message that fit in a specified width and the
width of that message. In kitchen, you need to call two functions, one
for each action::
>>> # Old way
>>> utf8_width_chop(msg, 5)
(5, 'く ku')
>>> # New way
>>> from kitchen.text.display import textual_width, textual_width_chop
>>> (textual_width(msg), textual_width_chop(msg, 5))
(5, u'く ku')
.. [#y5] If the yum version of :func:`~yum.i18n.to_unicode` or
:func:`~yum.i18n.to_utf8` is given an object that is not a string, it
returns the object itself. :func:`kitchen.text.converters.to_unicode` and
:func:`kitchen.text.converters.to_bytes` default to returning the
``simplerepr`` of the object instead. If you want the yum behaviour, set
the :attr:`nonstring` parameter to ``passthru``::
>>> from kitchen.text.converters import to_unicode
>>> to_unicode(5)
u'5'
>>> to_unicode(5, nonstring='passthru')
5
.. [#y6] :func:`yum.i18n.to_str` could return either a byte :class:`str`. or
a :class:`unicode` string In kitchen you can get the same effect but you
get to choose whether you want a byte :class:`str` or a :class:`unicode`
string. Use :func:`~kitchen.text.converters.to_bytes` for :class:`str`
and :func:`~kitchen.text.converters.to_unicode` for :class:`unicode`.
.. [#y7] :func:`yum.misc.to_xml` was buggy as written. I think the intention
was for you to be able to pass a byte :class:`str` or :class:`unicode`
string in and get out a byte :class:`str` that was valid to use in an xml
file. The two kitchen functions
:func:`~kitchen.text.converters.byte_string_to_xml` and
:func:`~kitchen.text.converters.unicode_to_xml` do that for each string
type.
.. [#y8] When porting :func:`yum.i18n.exception2msg` to use kitchen, you
should setup two wrapper functions to aid in your port. They'll look like
this:
.. code-block:: python
from kitchen.text.converters import EXCEPTION_CONVERTERS, \
BYTE_EXCEPTION_CONVERTERS, exception_to_unicode, \
exception_to_bytes
def exception2umsg(e):
'''Return a unicode representation of an exception'''
c = [lambda e: e.value]
c.extend(EXCEPTION_CONVERTERS)
return exception_to_unicode(e, converters=c)
def exception2bmsg(e):
'''Return a utf8 encoded str representation of an exception'''
c = [lambda e: e.value]
c.extend(BYTE_EXCEPTION_CONVERTERS)
return exception_to_bytes(e, converters=c)
The reason to define this wrapper is that many of the exceptions in yum
put the message in the :attr:`value` attribute of the :exc:`Exception`
instead of adding it to the :attr:`args` attribute. So the default
:data:`~kitchen.text.converters.EXCEPTION_CONVERTERS` don't know where to
find the message. The wrapper tells kitchen to check the :attr:`value`
attribute for the message. The reason to define two wrappers may be less
obvious. :func:`yum.i18n.exception2msg` can return a :class:`unicode`
string or a byte :class:`str` depending on a combination of what
attributes are present on the :exc:`Exception` and what locale the
function is being run in. By contrast,
:func:`kitchen.text.converters.exception_to_unicode` only returns
:class:`unicode` strings and
:func:`kitchen.text.converters.exception_to_bytes` only returns byte
:class:`str`. This is much safer as it keeps code that can only handle
:class:`unicode` or only handle byte :class:`str` correctly from getting
the wrong type when an input changes but it means you need to examine the
calling code when porting from :func:`yum.i18n.exception2msg` and use the
appropriate wrapper.
.. _yum-i18n-init:
Initializing Yum i18n
=====================
Previously, yum had several pieces of code to initialize i18n. From the
toplevel of :file:`yum/i18n.py`::
try:.
'''
Setup the yum translation domain and make _() and P_() translation wrappers
available.
using ugettext to make sure translated strings are in Unicode.
'''
import gettext
t = gettext.translation('yum', fallback=True)
_ = t.ugettext
P_ = t.ungettext
except:
'''
Something went wrong so we make a dummy _() wrapper there is just
returning the same text
'''
_ = dummy_wrapper
P_ = dummyP_wrapper
With kitchen, this can be changed to this::
from kitchen.i18n import easy_gettext_setup, DummyTranslations
try:
_, P_ = easy_gettext_setup('yum')
except:
translations = DummyTranslations()
_ = translations.ugettext
P_ = translations.ungettext
.. note:: In :ref:`overcoming-frustration`, it is mentioned that for some
things (like exception messages), using the byte :class:`str` oriented
functions is more appropriate. If this is desired, the setup portion is
only a second call to :func:`kitchen.i18n.easy_gettext_setup`::
b_, bP_ = easy_gettext_setup('yum', use_unicode=False)
The second place where i18n is setup is in :meth:`yum.YumBase._getConfig` in
:file:`yum/__init_.py` if ``gaftonmode`` is in effect::
if startupconf.gaftonmode:
global _
_ = yum.i18n.dummy_wrapper
This can be changed to::
if startupconf.gaftonmode:
global _
_ = DummyTranslations().ugettext()

View file

@ -0,0 +1,19 @@
================================
Using kitchen to write good code
================================
Kitchen's functions won't automatically make you a better programmer. You
have to learn when and how to use them as well. This section of the
documentation is intended to show you some of the ways that you can apply
kitchen's functions to problems that may have arisen in your life. The goal
of this section is to give you enough information to understand what the
kitchen API can do for you and where in the :ref:`KitchenAPI` docs to look
for something that can help you with your next issue. Along the way,
you might pick up the knack for identifying issues with your code before you
publish it. And that *will* make you a better coder.
.. toctree::
:maxdepth: 2
unicode-frustrations
designing-unicode-apis

View file

@ -0,0 +1,504 @@
.. _overcoming-frustration:
==========================================================
Overcoming frustration: Correctly using unicode in python2
==========================================================
In python-2.x, there's two types that deal with text.
1. :class:`str` is for strings of bytes. These are very similar in nature to
how strings are handled in C.
2. :class:`unicode` is for strings of unicode :term:`code points`.
.. note::
**Just what the dickens is "Unicode"?**
One mistake that people encountering this issue for the first time make is
confusing the :class:`unicode` type and the encodings of unicode stored in
the :class:`str` type. In python, the :class:`unicode` type stores an
abstract sequence of :term:`code points`. Each :term:`code point`
represents a :term:`grapheme`. By contrast, byte :class:`str` stores
a sequence of bytes which can then be mapped to a sequence of :term:`code
points`. Each unicode encoding (:term:`UTF-8`, UTF-7, UTF-16, UTF-32,
etc) maps different sequences of bytes to the unicode :term:`code points`.
What does that mean to you as a programmer? When you're dealing with text
manipulations (finding the number of characters in a string or cutting
a string on word boundaries) you should be dealing with :class:`unicode`
strings as they abstract characters in a manner that's appropriate for
thinking of them as a sequence of letters that you will see on a page.
When dealing with I/O, reading to and from the disk, printing to
a terminal, sending something over a network link, etc, you should be dealing
with byte :class:`str` as those devices are going to need to deal with
concrete implementations of what bytes represent your abstract characters.
In the python2 world many APIs use these two classes interchangably but there
are several important APIs where only one or the other will do the right
thing. When you give the wrong type of string to an API that wants the other
type, you may end up with an exception being raised (:exc:`UnicodeDecodeError`
or :exc:`UnicodeEncodeError`). However, these exceptions aren't always raised
because python implicitly converts between types... *sometimes*.
-----------------------------------
Frustration #1: Inconsistent Errors
-----------------------------------
Although converting when possible seems like the right thing to do, it's
actually the first source of frustration. A programmer can test out their
program with a string like: ``The quick brown fox jumped over the lazy dog``
and not encounter any issues. But when they release their software into the
wild, someone enters the string: ``I sat down for coffee at the café`` and
suddenly an exception is thrown. The reason? The mechanism that converts
between the two types is only able to deal with :term:`ASCII` characters.
Once you throw non-:term:`ASCII` characters into your strings, you have to
start dealing with the conversion manually.
So, if I manually convert everything to either byte :class:`str` or
:class:`unicode` strings, will I be okay? The answer is.... *sometimes*.
---------------------------------
Frustration #2: Inconsistent APIs
---------------------------------
The problem you run into when converting everything to byte :class:`str` or
:class:`unicode` strings is that you'll be using someone else's API quite
often (this includes the APIs in the |stdlib|_) and find that the API will only
accept byte :class:`str` or only accept :class:`unicode` strings. Or worse,
that the code will accept either when you're dealing with strings that consist
solely of :term:`ASCII` but throw an error when you give it a string that's
got non-:term:`ASCII` characters. When you encounter these APIs you first
need to identify which type will work better and then you have to convert your
values to the correct type for that code. Thus the programmer that wants to
proactively fix all unicode errors in their code needs to do two things:
1. You must keep track of what type your sequences of text are. Does
``my_sentence`` contain :class:`unicode` or :class:`str`? If you don't
know that then you're going to be in for a world of hurt.
2. Anytime you call a function you need to evaluate whether that function will
do the right thing with :class:`str` or :class:`unicode` values. Sending
the wrong value here will lead to a :exc:`UnicodeError` being thrown when
the string contains non-:term:`ASCII` characters.
.. note::
There is one mitigating factor here. The python community has been
standardizing on using :class:`unicode` in all its APIs. Although there
are some APIs that you need to send byte :class:`str` to in order to be
safe, (including things as ubiquitous as :func:`print` as we'll see in the
next section), it's getting easier and easier to use :class:`unicode`
strings with most APIs.
------------------------------------------------
Frustration #3: Inconsistent treatment of output
------------------------------------------------
Alright, since the python community is moving to using :class:`unicode`
strings everywhere, we might as well convert everything to :class:`unicode`
strings and use that by default, right? Sounds good most of the time but
there's at least one huge caveat to be aware of. Anytime you output text to
the terminal or to a file, the text has to be converted into a byte
:class:`str`. Python will try to implicitly convert from :class:`unicode` to
byte :class:`str`... but it will throw an exception if the bytes are
non-:term:`ASCII`::
>>> string = unicode(raw_input(), 'utf8')
café
>>> log = open('/var/tmp/debug.log', 'w')
>>> log.write(string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
Okay, this is simple enough to solve: Just convert to a byte :class:`str` and
we're all set::
>>> string = unicode(raw_input(), 'utf8')
café
>>> string_for_output = string.encode('utf8', 'replace')
>>> log = open('/var/tmp/debug.log', 'w')
>>> log.write(string_for_output)
>>>
So that was simple, right? Well... there's one gotcha that makes things a bit
harder to debug sometimes. When you attempt to write non-:term:`ASCII`
:class:`unicode` strings to a file-like object you get a traceback everytime.
But what happens when you use :func:`print`? The terminal is a file-like object
so it should raise an exception right? The answer to that is....
*sometimes*:
.. code-block:: pycon
$ python
>>> print u'café'
café
No exception. Okay, we're fine then?
We are until someone does one of the following:
* Runs the script in a different locale:
.. code-block:: pycon
$ LC_ALL=C python
>>> # Note: if you're using a good terminal program when running in the C locale
>>> # The terminal program will prevent you from entering non-ASCII characters
>>> # python will still recognize them if you use the codepoint instead:
>>> print u'caf\xe9'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
* Redirects output to a file:
.. code-block:: pycon
$ cat test.py
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
print u'café'
$ ./test.py >t
Traceback (most recent call last):
File "./test.py", line 4, in <module>
print u'café'
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
Okay, the locale thing is a pain but understandable: the C locale doesn't
understand any characters outside of :term:`ASCII` so naturally attempting to
display those won't work. Now why does redirecting to a file cause problems?
It's because :func:`print` in python2 is treated specially. Whereas the other
file-like objects in python always convert to :term:`ASCII` unless you set
them up differently, using :func:`print` to output to the terminal will use
the user's locale to convert before sending the output to the terminal. When
:func:`print` is not outputting to the terminal (being redirected to a file,
for instance), :func:`print` decides that it doesn't know what locale to use
for that file and so it tries to convert to :term:`ASCII` instead.
So what does this mean for you, as a programmer? Unless you have the luxury
of controlling how your users use your code, you should always, always, always
convert to a byte :class:`str` before outputting strings to the terminal or to
a file. Python even provides you with a facility to do just this. If you
know that every :class:`unicode` string you send to a particular file-like
object (for instance, :data:`~sys.stdout`) should be converted to a particular
encoding you can use a :class:`codecs.StreamWriter` object to convert from
a :class:`unicode` string into a byte :class:`str`. In particular,
:func:`codecs.getwriter` will return a :class:`~codecs.StreamWriter` class
that will help you to wrap a file-like object for output. Using our
:func:`print` example:
.. code-block:: python
$ cat test.py
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print u'café'
$ ./test.py >t
$ cat t
café
-----------------------------------------
Frustrations #4 and #5 -- The other shoes
-----------------------------------------
In English, there's a saying "waiting for the other shoe to drop". It means
that when one event (usually bad) happens, you come to expect another event
(usually worse) to come after. In this case we have two other shoes.
Frustration #4: Now it doesn't take byte strings?!
==================================================
If you wrap :data:`sys.stdout` using :func:`codecs.getwriter` and think you
are now safe to print any variable without checking its type I am afraid
I must inform you that you're not paying enough attention to :term:`Murphy's
Law`. The :class:`~codecs.StreamWriter` that :func:`codecs.getwriter`
provides will take :class:`unicode` strings and transform them into byte
:class:`str` before they get to :data:`sys.stdout`. The problem is if you
give it something that's already a byte :class:`str` it tries to transform
that as well. To do that it tries to turn the byte :class:`str` you give it
into :class:`unicode` and then transform that back into a byte :class:`str`...
and since it uses the :term:`ASCII` codec to perform those conversions,
chances are that it'll blow up when making them::
>>> import codecs
>>> import sys
>>> UTF8Writer = codecs.getwriter('utf8')
>>> sys.stdout = UTF8Writer(sys.stdout)
>>> print 'café'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
To work around this, kitchen provides an alternate version of
:func:`codecs.getwriter` that can deal with both byte :class:`str` and
:class:`unicode` strings. Use :func:`kitchen.text.converters.getwriter` in
place of the :mod:`codecs` version like this::
>>> import sys
>>> from kitchen.text.converters import getwriter
>>> UTF8Writer = getwriter('utf8')
>>> sys.stdout = UTF8Writer(sys.stdout)
>>> print u'café'
café
>>> print 'café'
café
-------------------------------------------
Frustration #5: Inconsistent APIs Part deux
-------------------------------------------
Sometimes you do everything right in your code but other people's code fails
you. With unicode issues this happens more often than we want. A glaring
example of this is when you get values back from a function that aren't
consistently :class:`unicode` string or byte :class:`str`.
An example from the |stdlib|_ is :mod:`gettext`. The :mod:`gettext` functions
are used to help translate messages that you display to users in the users'
native languages. Since most languages contain letters outside of the
:term:`ASCII` range, the values that are returned contain unicode characters.
:mod:`gettext` provides you with :meth:`~gettext.GNUTranslations.ugettext` and
:meth:`~gettext.GNUTranslations.ungettext` to return these translations as
:class:`unicode` strings and :meth:`~gettext.GNUTranslations.gettext`,
:meth:`~gettext.GNUTranslations.ngettext`,
:meth:`~gettext.GNUTranslations.lgettext`, and
:meth:`~gettext.GNUTranslations.lngettext` to return them as encoded byte
:class:`str`. Unfortunately, even though they're documented to return only
one type of string or the other, the implementation has corner cases where the
wrong type can be returned.
This means that even if you separate your :class:`unicode` string and byte
:class:`str` correctly before you pass your strings to a :mod:`gettext`
function, afterwards, you might have to check that you have the right sort of
string type again.
.. note::
:mod:`kitchen.i18n` provides alternate gettext translation objects that
return only byte :class:`str` or only :class:`unicode` string.
---------------
A few solutions
---------------
Now that we've identified the issues, can we define a comprehensive strategy
for dealing with them?
Convert text at the border
==========================
If you get some piece of text from a library, read from a file, etc, turn it
into a :class:`unicode` string immediately. Since python is moving in the
direction of :class:`unicode` strings everywhere it's going to be easier to
work with :class:`unicode` strings within your code.
If your code is heavily involved with using things that are bytes, you can do
the opposite and convert all text into byte :class:`str` at the border and
only convert to :class:`unicode` when you need it for passing to another
library or performing string operations on it.
In either case, the important thing is to pick a default type for strings and
stick with it throughout your code. When you mix the types it becomes much
easier to operate on a string with a function that can only use the other type
by mistake.
.. note:: In python3, the abstract unicode type becomes much more prominent.
The type named ``str`` is the equivalent of python2's :class:`unicode` and
python3's ``bytes`` type replaces python2's :class:`str`. Most APIs deal
in the unicode type of string with just some pieces that are low level
dealing with bytes. The implicit conversions between bytes and unicode
is removed and whenever you want to make the conversion you need to do so
explicitly.
When the data needs to be treated as bytes (or unicode) use a naming convention
===============================================================================
Sometimes you're converting nearly all of your data to :class:`unicode`
strings but you have one or two values where you have to keep byte
:class:`str` around. This is often the case when you need to use the value
verbatim with some external resource. For instance, filenames or key values
in a database. When you do this, use a naming convention for the data you're
working with so you (and others reading your code later) don't get confused
about what's being stored in the value.
If you need both a textual string to present to the user and a byte value for
an exact match, consider keeping both versions around. You can either use two
variables for this or a :class:`dict` whose key is the byte value.
.. note:: You can use the naming convention used in kitchen as a guide for
implementing your own naming convention. It prefixes byte :class:`str`
variables of unknown encoding with ``b_`` and byte :class:`str` of known
encoding with the encoding name like: ``utf8_``. If the default was to
handle :class:`str` and only keep a few :class:`unicode` values, those
variables would be prefixed with ``u_``.
When outputting data, convert back into bytes
=============================================
When you go to send your data back outside of your program (to the filesystem,
over the network, displaying to the user, etc) turn the data back into a byte
:class:`str`. How you do this will depend on the expected output format of
the data. For displaying to the user, you can use the user's default encoding
using :func:`locale.getpreferredencoding`. For entering into a file, you're best
bet is to pick a single encoding and stick with it.
.. warning::
When using the encoding that the user has set (for instance, using
:func:`locale.getpreferredencoding`, remember that they may have their
encoding set to something that can't display every single unicode
character. That means when you convert from :class:`unicode` to a byte
:class:`str` you need to decide what should happen if the byte value is
not valid in the user's encoding. For purposes of displaying messages to
the user, it's usually okay to use the ``replace`` encoding error handler
to replace the invalid characters with a question mark or other symbol
meaning the character couldn't be displayed.
You can use :func:`kitchen.text.converters.getwriter` to do this automatically
for :data:`sys.stdout`. When creating exception messages be sure to convert
to bytes manually.
When writing unittests, include non-ASCII values and both unicode and str type
==============================================================================
Unless you know that a specific portion of your code will only deal with
:term:`ASCII`, be sure to include non-:term:`ASCII` values in your unittests.
Including a few characters from several different scripts is highly advised as
well because some code may have special cased accented roman characters but
not know how to handle characters used in Asian alphabets.
Similarly, unless you know that that portion of your code will only be given
:class:`unicode` strings or only byte :class:`str` be sure to try variables
of both types in your unittests. When doing this, make sure that the
variables are also non-:term:`ASCII` as python's implicit conversion will mask
problems with pure :term:`ASCII` data. In many cases, it makes sense to check
what happens if byte :class:`str` and :class:`unicode` strings that won't
decode in the present locale are given.
Be vigilant about spotting poor APIs
====================================
Make sure that the libraries you use return only :class:`unicode` strings or
byte :class:`str`. Unittests can help you spot issues here by running many
variations of data through your functions and checking that you're still
getting the types of string that you expect.
Example: Putting this all together with kitchen
===============================================
The kitchen library provides a wide array of functions to help you deal with
byte :class:`str` and :class:`unicode` strings in your program. Here's
a short example that uses many kitchen functions to do its work::
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import locale
import os
import sys
import unicodedata
from kitchen.text.converters import getwriter, to_bytes, to_unicode
from kitchen.i18n import get_translation_object
if __name__ == '__main__':
# Setup gettext driven translations but use the kitchen functions so
# we don't have the mismatched bytes-unicode issues.
translations = get_translation_object('example')
# We use _() for marking strings that we operate on as unicode
# This is pretty much everything
_ = translations.ugettext
# And b_() for marking strings that we operate on as bytes.
# This is limited to exceptions
b_ = translations.lgettext
# Setup stdout
encoding = locale.getpreferredencoding()
Writer = getwriter(encoding)
sys.stdout = Writer(sys.stdout)
# Load data. Format is filename\0description
# description should be utf-8 but filename can be any legal filename
# on the filesystem
# Sample datafile.txt:
# /etc/shells\x00Shells available on caf\xc3\xa9.lan
# /var/tmp/file\xff\x00File with non-utf8 data in the filename
#
# And to create /var/tmp/file\xff (under bash or zsh) do:
# echo 'Some data' > /var/tmp/file$'\377'
datafile = open('datafile.txt', 'r')
data = {}
for line in datafile:
# We're going to keep filename as bytes because we will need the
# exact bytes to access files on a POSIX operating system.
# description, we'll immediately transform into unicode type.
b_filename, description = line.split('\0', 1)
# to_unicode defaults to decoding output from utf-8 and replacing
# any problematic bytes with the unicode replacement character
# We accept mangling of the description here knowing that our file
# format is supposed to use utf-8 in that field and that the
# description will only be displayed to the user, not used as
# a key value.
description = to_unicode(description, 'utf-8').strip()
data[b_filename] = description
datafile.close()
# We're going to add a pair of extra fields onto our data to show the
# length of the description and the filesize. We put those between
# the filename and description because we haven't checked that the
# description is free of NULLs.
datafile = open('newdatafile.txt', 'w')
# Name filename with a b_ prefix to denote byte string of unknown encoding
for b_filename in data:
# Since we have the byte representation of filename, we can read any
# filename
if os.access(b_filename, os.F_OK):
size = os.path.getsize(b_filename)
else:
size = 0
# Because the description is unicode type, we know the number of
# characters corresponds to the length of the normalized unicode
# string.
length = len(unicodedata.normalize('NFC', description))
# Print a summary to the screen
# Note that we do not let implici type conversion from str to
# unicode transform b_filename into a unicode string. That might
# fail as python would use the ASCII filename. Instead we use
# to_unicode() to explictly transform in a way that we know will
# not traceback.
print _(u'filename: %s') % to_unicode(b_filename)
print _(u'file size: %s') % size
print _(u'desc length: %s') % length
print _(u'description: %s') % data[b_filename]
# First combine the unicode portion
line = u'%s\0%s\0%s' % (size, length, data[b_filename])
# Since the filenames are bytes, turn everything else to bytes before combining
# Turning into unicode first would be wrong as the bytes in b_filename
# might not convert
b_line = '%s\0%s\n' % (b_filename, to_bytes(line))
# Just to demonstrate that getwriter will pass bytes through fine
print b_('Wrote: %s') % b_line
datafile.write(b_line)
datafile.close()
# And just to show how to properly deal with an exception.
# Note two things about this:
# 1) We use the b_() function to translate the string. This returns a
# byte string instead of a unicode string
# 2) We're using the b_() function returned by kitchen. If we had
# used the one from gettext we would need to convert the message to
# a byte str first
message = u'Demonstrate the proper way to raise exceptions. Sincerely, \u3068\u3057\u304a'
raise Exception(b_(message))
.. seealso:: :mod:`kitchen.text.converters`

Some files were not shown because too many files have changed in this diff Show more