Import kitchen_1.2.4.orig.tar.gz
This commit is contained in:
parent
dfb12f36e6
commit
2faeac3a1a
154 changed files with 11665 additions and 2152 deletions
6
.gitignore
vendored
Normal file
6
.gitignore
vendored
Normal file
|
@ -0,0 +1,6 @@
|
|||
*.pyc
|
||||
MANIFEST
|
||||
dist
|
||||
*.egg*
|
||||
*.pdf
|
||||
build
|
12
.travis.yml
Normal file
12
.travis.yml
Normal file
|
@ -0,0 +1,12 @@
|
|||
language: python
|
||||
python:
|
||||
- "2.6"
|
||||
- "2.7"
|
||||
- "3.4"
|
||||
install: python setup.py develop
|
||||
script: ./runtests.sh
|
||||
notifications:
|
||||
irc:
|
||||
- "irc.freenode.net#threebean"
|
||||
on_success: never
|
||||
on_failure: always
|
7
.tx/config
Normal file
7
.tx/config
Normal file
|
@ -0,0 +1,7 @@
|
|||
[main]
|
||||
host = https://www.transifex.com
|
||||
|
||||
[kitchen.kitchenpot]
|
||||
file_filter = po/<lang>.po
|
||||
source_file = po/kitchen.pot
|
||||
source_lang = en
|
|
@ -3,8 +3,9 @@ Some notes on hacking on kitchen
|
|||
================================
|
||||
|
||||
:Author: Toshio Kuratomi
|
||||
:Date: 2 Jan 2012
|
||||
:Version: 1.1.x
|
||||
:Maintainer: Ralph Bean
|
||||
:Date: 2 Dec 2014
|
||||
:Version: 1.2.x
|
||||
|
||||
For coding and kitchen, see the style guide in the documentation.
|
||||
|
||||
|
@ -40,20 +41,20 @@ be found in the `transifex user's guide`_.
|
|||
.. `transifex user's guide`:: http://help.transifex.net/user-guide/translating.html
|
||||
|
||||
To generate the POT file (located in the po/ subdirectory), use pybabel to
|
||||
extract the messages. Tun the following from the top level directory::
|
||||
extract the messages. Run the following from the top level directory::
|
||||
|
||||
pybabel extract -o po/kitchen.pot kitchen -kb_ -kbN_
|
||||
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
|
||||
|
||||
Then commit this pot file and upload to transifex::
|
||||
|
||||
tx push -s
|
||||
bzr commit -m 'Extract new strings from the source files' po/kitchen.pot
|
||||
bzr push
|
||||
git commit -m 'Extract new strings from the source files' po/kitchen.pot
|
||||
git push
|
||||
|
||||
To pull messages from transifex prior to making a release, do::
|
||||
|
||||
tx pull -a
|
||||
bzr commit -m 'Merge new translations from transifex' po/*.po
|
||||
git commit -m 'Merge new translations from transifex' po/*.po
|
||||
|
||||
If you see a status message from transifex like this::
|
||||
Pulling new translations for resource kitchen.kitchenpot (source: po/kitchen.pot)
|
||||
|
@ -62,8 +63,8 @@ If you see a status message from transifex like this::
|
|||
it means that transifex has created a brand new po file for you. You need to
|
||||
add the new file to source control and commit it like this::
|
||||
|
||||
bzr add po/fr.po
|
||||
bzr commit -m 'New French translation' po/fr.po
|
||||
git add po/fr.po
|
||||
git commit -m 'New French translation' po/fr.po
|
||||
|
||||
|
||||
TODO: Add information about announcing string freeze. Using transifex's add
|
||||
|
@ -130,7 +131,8 @@ Unittest
|
|||
|
||||
Kitchen has a large set of unittests. All of them should pass before release.
|
||||
You can run the unittests with the following command::
|
||||
nosetests --with-coverage --cover-package kitchen
|
||||
|
||||
./runtests.sh
|
||||
|
||||
This will run all the unittests under the tests directory and also generate
|
||||
some statistics about which lines of code were not accessed when kitchen ran.
|
||||
|
@ -144,48 +146,70 @@ some statistics about which lines of code were not accessed when kitchen ran.
|
|||
a look at :file:`test_i18n.py` and :file:`test_converters.py` to see tests
|
||||
that attempt to cover enough input values to detect problems.
|
||||
|
||||
Since kitchen is currently supported on python-2.3.1+, it is desirable to test
|
||||
kitchen on at least one python major version from python-2.3 through
|
||||
python-2.7. We currently have access to a buildbot that has access to
|
||||
python-2.4, python-2.6, and python-2.7. You can view it at
|
||||
http://ci.csh.rit.edu:8080/view/Kitchen/ . The buildbot checks the devel
|
||||
repository hourly and if new checkins have occurred, it attempts to rebuild.
|
||||
If you need access to invoke builds on the buildbot more regularly than that,
|
||||
contact Toshio to get access.
|
||||
Since kitchen is currently supported on python2 and python3, it is desirable to
|
||||
run tests against as many python versions as possible. We currently have a
|
||||
jenkins instance in the Fedora Infrastructure private cloud with a job set up
|
||||
for kitchen at http://jenkins.cloud.fedoraproject.org/job/kitchen/
|
||||
|
||||
We were unable to get python-2.3 working in the buildbot so I manually run the
|
||||
unittests on a CentOS-4 virtual machine (with python-2.3). I currently don't
|
||||
test on python-2.5 but I'd be happy to take bug reports or get a new committer
|
||||
that was interested in that platform.
|
||||
It is not currently running tests against python-2.{3,4,5,6}. If you are
|
||||
interested in getting those builds running automatically, please speak up in
|
||||
the #fedora-apps channel on freenode.
|
||||
|
||||
Creating the release
|
||||
====================
|
||||
|
||||
|
||||
Then commit this pot file and upload to transifex:
|
||||
|
||||
1. Make sure that any feature branches you want have been merged.
|
||||
2. Pull in new translations and verify they are valid::
|
||||
|
||||
2. Make a fresh branch for your release::
|
||||
|
||||
git flow release start $VERSION
|
||||
|
||||
3. Extract strings for translation and push them to transifex::
|
||||
|
||||
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
|
||||
tx push -s
|
||||
git commit -m 'Extract new strings from the source files' po/kitchen.pot
|
||||
git push
|
||||
|
||||
4. Wait for translations. In the meantime...
|
||||
5. Update the version in ``kitchen/__init__.py`` and ``NEWS.rst``.
|
||||
6. When they're all ready, pull in new translations and verify they are valid::
|
||||
|
||||
tx pull -a
|
||||
# If msgfmt is installed, this will check that the catalogs are valid
|
||||
./releaseutils.py
|
||||
bzr commit -m 'Merge new translations from transifex.net'
|
||||
3. Update the version in kitchen/__init__.py and NEWS.
|
||||
4. Make a fresh clone of the repository::
|
||||
cd $PATH_TO_MY_SHARED_REPO
|
||||
bzr branch bzr://bzr.fedorahosted.org/bzr/kitchen/devel release
|
||||
5. Make the source tarball in that directory::
|
||||
cd release
|
||||
git commit -m 'Merge new translations from transifex.net'
|
||||
git push
|
||||
|
||||
7. Create a pull-request so someone else from #fedora-apps can review::
|
||||
|
||||
hub pull-request -b master
|
||||
|
||||
8. Once someone has given it a +1, then make a source tarball::
|
||||
|
||||
python setup.py sdist
|
||||
6. Make sure that the source tarball contains all of the files we want in the release::
|
||||
cd ..
|
||||
tar -xzvf release/dist/kitchen*tar.gz
|
||||
diff -uNr devel kitchen-$RELEASE_VERSION
|
||||
7. Upload the docs to pypi::
|
||||
cd release
|
||||
|
||||
9. Upload the docs to pypi::
|
||||
|
||||
mkdir -p build/sphinx/html
|
||||
sphinx-build kitchen2/docs/ build/sphinx/html
|
||||
python setup.py upload_docs
|
||||
8. Upload the tarball to pypi::
|
||||
|
||||
10. Upload the tarball to pypi::
|
||||
|
||||
python setup.py sdist upload --sign
|
||||
9. Upload the tarball to fedorahosted::
|
||||
scp dist/kitchen*tar.gz fedorahosted.org:/srv/web/releases/k/i/kitchen/
|
||||
10. Tag the release::
|
||||
cd ../devel
|
||||
bzr tag $RELEASE_VERSION
|
||||
bzr push
|
||||
|
||||
11. Upload the tarball to fedorahosted::
|
||||
|
||||
scp dist/kitchen*tar.gz* fedorahosted.org:/srv/web/releases/k/i/kitchen/
|
||||
|
||||
12. Tag and bag it::
|
||||
|
||||
git flow release finish -m $VERSION -u $YOUR_GPG_KEY_ID $VERSION
|
||||
git push origin develop:develop
|
||||
git push origin master:master
|
||||
git push origin --tags
|
||||
# Your pull-request should automatically close. Double-check this, though.
|
11
MANIFEST.in
Normal file
11
MANIFEST.in
Normal file
|
@ -0,0 +1,11 @@
|
|||
include COPYING COPYING.LESSER
|
||||
include *.rst
|
||||
include releaseutils.py
|
||||
recursive-include tests *.py *.po *.pot *.mo
|
||||
recursive-include docs *
|
||||
include po/*.pot
|
||||
include po/*.po
|
||||
include locale/*/*/*.mo
|
||||
recursive-include kitchen2 *.py *.po *.mo *.pot
|
||||
recursive-include kitchen3 *.py *.po *.mo *.pot
|
||||
include runtests.sh
|
|
@ -2,9 +2,72 @@
|
|||
NEWS
|
||||
====
|
||||
|
||||
:Authors: Toshio Kuratomi
|
||||
:Date: 14 Feb 2012
|
||||
:Version: 1.1.1
|
||||
:Author: Toshio Kuratomi
|
||||
:Maintainer: Ralph Bean
|
||||
:Date: 13 Nov 2015
|
||||
:Version: 1.2.x
|
||||
|
||||
-----
|
||||
1.2.4
|
||||
-----
|
||||
|
||||
* Further compat fixes for python-3.5
|
||||
|
||||
-----
|
||||
1.2.3
|
||||
-----
|
||||
|
||||
* Compatibility with python-3.5
|
||||
|
||||
-----
|
||||
1.2.2
|
||||
-----
|
||||
|
||||
* Compatibility with python-3.4
|
||||
* Compatibility with pep470
|
||||
|
||||
-----
|
||||
1.2.1
|
||||
-----
|
||||
|
||||
* Fix release-related problems with the 1.2.0 tarball.
|
||||
- Include locale data for the test suite.
|
||||
- Include NEWS.rst and README.rst.
|
||||
- Include runtests.sh.
|
||||
- Adjust trove classifiers to indicate python3 support.
|
||||
|
||||
-----
|
||||
1.2.0
|
||||
-----
|
||||
|
||||
* kitchen gained support for python3. The tarball release now includes a
|
||||
``kitchen2/`` and a ``kitchen3/`` directory containing copies of the source
|
||||
code modified to work against each of the two major python versions. When
|
||||
installing with ``pip`` or ``setup.py``, the appropriate version should be
|
||||
selected and installed.
|
||||
* The canonical upstream repository location moved to git and github. See
|
||||
https://github.com/fedora-infra/kitchen
|
||||
* Added kitchen.text.misc.isbasestring(), kitchen.text.misc.isbytestring(),
|
||||
and kitchen.text.misc.isunicodestring(). These are mainly useful for code
|
||||
being ported to python3 as python3 lacks a basestring type and has two types
|
||||
for byte strings. Code that has to run on both python2 and python3 or
|
||||
wants to provide similar byte vs unicode semantics may find these functions
|
||||
to be a good abstraction.
|
||||
* Add a python2_api parameter to various i18n functions: NullTranslations
|
||||
constructor, NewGNUTranslations constructor, and get_translation_object.
|
||||
When set to True (the default), the python2 api for gettext objects is used.
|
||||
When set to False, the python3 api is used. This option is intended to aid
|
||||
in porting from python2 to python3.
|
||||
* Exception messages are no longer translated. The idea is that exceptions
|
||||
should be easily searched for via a web search.
|
||||
* Fix a bug in unicode_to_xml() where xmlcharrefs created when a unicode
|
||||
string is turned into a byte string with an encoding that doesn't have
|
||||
all of the needed characters had their ampersands ("&") escaped.
|
||||
* Fix a bug in NewGNUTranslations.lngettext() if a fallback gettext object is
|
||||
used and the message is not in any catalog.
|
||||
* Speedups to process_control_chars() that are directly reflected in
|
||||
unicode_to_xml() and byte_string_to_xml()
|
||||
* Remove C1 Control Codes in to_xml() as well as C0 Control Codes
|
||||
|
||||
-----
|
||||
1.1.1
|
39
PKG-INFO
39
PKG-INFO
|
@ -1,39 +0,0 @@
|
|||
Metadata-Version: 1.0
|
||||
Name: kitchen
|
||||
Version: 1.1.1
|
||||
Summary: Kitchen contains a cornucopia of useful code
|
||||
Home-page: https://fedorahosted.org/kitchen
|
||||
Author: Toshio Kuratomi
|
||||
Author-email: toshio@fedoraproject.org
|
||||
License: LGPLv2+
|
||||
Download-URL: https://fedorahosted.org/releases/k/i/kitchen
|
||||
Description:
|
||||
We've all done it. In the process of writing a brand new application we've
|
||||
discovered that we need a little bit of code that we've invented before.
|
||||
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
||||
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
|
||||
a tiny bit of code that seems too small to worry about pushing into its own
|
||||
module so it sits there, a part of your current project, waiting to be cut and
|
||||
pasted into your next project. And the next. And the next. And since that
|
||||
little bittybit of code proved so useful to you, it's highly likely that it
|
||||
proved useful to someone else as well. Useful enough that they've written it
|
||||
and copy and pasted it over and over into each of their new projects.
|
||||
|
||||
Well, no longer! Kitchen aims to pull these small snippets of code into a few
|
||||
python modules which you can import and use within your project. No more copy
|
||||
and paste! Now you can let someone else maintain and release these small
|
||||
snippets so that you can get on with your life.
|
||||
|
||||
Keywords: Useful Small Code Snippets
|
||||
Platform: UNKNOWN
|
||||
Classifier: Development Status :: 4 - Beta
|
||||
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python :: 2.3
|
||||
Classifier: Programming Language :: Python :: 2.4
|
||||
Classifier: Programming Language :: Python :: 2.5
|
||||
Classifier: Programming Language :: Python :: 2.6
|
||||
Classifier: Programming Language :: Python :: 2.7
|
||||
Classifier: Topic :: Software Development :: Internationalization
|
||||
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||
Classifier: Topic :: Text Processing :: General
|
|
@ -3,8 +3,9 @@ Kitchen.core Module
|
|||
===================
|
||||
|
||||
:Author: Toshio Kuratomi
|
||||
:Date: 2 Jan 2012
|
||||
:Version: 1.1.x
|
||||
:Maintainer: Ralph Bean
|
||||
:Date: 13 Nov 2015
|
||||
:Version: 1.2.x
|
||||
|
||||
The Kitchen module provides a python API for all sorts of little useful
|
||||
snippets of code that everybody ends up writing for their projects but never
|
||||
|
@ -38,12 +39,15 @@ Requirements
|
|||
|
||||
kitchen.core requires
|
||||
|
||||
:python: 2.3.1 or later
|
||||
:python: 2.4 or later
|
||||
|
||||
Since version 1.2.0, this package has distributed both python2 and python3
|
||||
compatible versions of the source.
|
||||
|
||||
Soft Requirements
|
||||
=================
|
||||
|
||||
If found, these libraries will be used to make the implementation of soemthing
|
||||
If found, these libraries will be used to make the implementation of something
|
||||
better in some way. If they are not present, the API that they enable will
|
||||
still exist but may function in a different manner.
|
||||
|
||||
|
@ -78,4 +82,5 @@ Testing
|
|||
=======
|
||||
|
||||
You can run the unittests with this command::
|
||||
nosetests --with-coverage --cover-package kitchen
|
||||
|
||||
./runtests.sh
|
|
@ -10,10 +10,10 @@ Style
|
|||
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
|
||||
|
||||
------------------------
|
||||
Python 2.3 compatibility
|
||||
Python 2.4 compatibility
|
||||
------------------------
|
||||
|
||||
At the moment, we're supporting python-2.3 and above. Understand that there's
|
||||
At the moment, we're supporting python-2.4 and above. Understand that there's
|
||||
a lot of python features that we cannot use because of this.
|
||||
|
||||
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
|
||||
|
@ -23,7 +23,7 @@ available. When we do that we need to be careful of several things:
|
|||
:file:`maintainers/sync-copied-files.py` for this.
|
||||
2. Sync the unittests as well as the module.
|
||||
3. Be aware that not all modules are written to remain compatible with
|
||||
Python-2.3 and might use python language features that were not present
|
||||
Python-2.4 and might use python language features that were not present
|
||||
then (generator expressions, relative imports, decorators, with, try: with
|
||||
both except: and finally:, etc) These are not good candidates for
|
||||
importing into kitchen as they require more work to keep synced.
|
||||
|
@ -56,7 +56,7 @@ Unittests
|
|||
|
||||
* We're using nose for unittesting. Rather than depend on unittest2
|
||||
functionality, use the functions that nose provides.
|
||||
* Remember to maintain python-2.3 compatibility even in unittests.
|
||||
* Remember to maintain python-2.4 compatibility even in unittests.
|
||||
|
||||
----------------------------
|
||||
Docstrings and documentation
|
|
@ -9,7 +9,7 @@ Kitchen, everything but the sink
|
|||
We've all done it. In the process of writing a brand new application we've
|
||||
discovered that we need a little bit of code that we've invented before.
|
||||
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
||||
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
|
||||
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
|
||||
a tiny bit of code that seems too small to worry about pushing into its own
|
||||
module so it sits there, a part of your current project, waiting to be cut and
|
||||
pasted into your next project. And the next. And the next. And since that
|
||||
|
@ -37,11 +37,9 @@ Requirements
|
|||
We've tried to keep the core kitchen module's requirements lightweight. At the
|
||||
moment kitchen only requires
|
||||
|
||||
:python: 2.3.1 or later
|
||||
:python: 2.4 or later
|
||||
|
||||
.. warning:: Kitchen-1.1.0 is likely to be the last release that supports
|
||||
python-2.3.x. Future releases will target python-2.4 as the minimum
|
||||
required version.
|
||||
.. warning:: Kitchen-1.1.0 was the last release that supported python-2.3.x
|
||||
|
||||
Soft Requirements
|
||||
=================
|
||||
|
@ -73,9 +71,9 @@ now, I just mention them here:
|
|||
lists and dicts, transforming the dicts to Bunch's.
|
||||
`hashlib <http://code.krypto.org/python/hashlib/>`_
|
||||
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
|
||||
hash functions to python. If you're developing for python2.3 or
|
||||
python2.4, though, you can install the standalone hashlib library and have
|
||||
access to the same functions.
|
||||
hash functions to python. If you're developing for python2.4 though, you
|
||||
can install the standalone hashlib library and have access to the same
|
||||
functions.
|
||||
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
|
||||
The python documentation for :mod:`itertools` has some examples
|
||||
of other nice iterable functions that can be built from the
|
|
@ -35,7 +35,7 @@ from kitchen import versioning
|
|||
(b_, bN_) = i18n.easy_gettext_setup('kitchen.core', use_unicode=False)
|
||||
#pylint: enable-msg=C0103
|
||||
|
||||
__version_info__ = ((1, 1, 1),)
|
||||
__version_info__ = ((1, 2, 4),)
|
||||
__version__ = versioning.version_tuple_to_string(__version_info__)
|
||||
|
||||
__all__ = ('exceptions', 'release',)
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright (c) 2010-2011 Red Hat, Inc
|
||||
# Copyright (c) 2010-2012 Red Hat, Inc
|
||||
# Copyright (c) 2009 Milos Komarcevic
|
||||
# Copyright (c) 2008 Tim Lauridsen
|
||||
#
|
||||
|
@ -89,7 +89,7 @@ See the documentation of :func:`easy_gettext_setup` and
|
|||
|
||||
from kitchen.versioning import version_tuple_to_string
|
||||
|
||||
__version_info__ = ((2, 1, 1),)
|
||||
__version_info__ = ((2, 2, 0),)
|
||||
__version__ = version_tuple_to_string(__version_info__)
|
||||
|
||||
import copy
|
||||
|
@ -99,6 +99,7 @@ import itertools
|
|||
import locale
|
||||
import os
|
||||
import sys
|
||||
import warnings
|
||||
|
||||
# We use the _default_localedir definition in get_translation_object
|
||||
try:
|
||||
|
@ -107,7 +108,7 @@ except ImportError:
|
|||
_DEFAULT_LOCALEDIR = os.path.join(sys.prefix, 'share', 'locale')
|
||||
|
||||
from kitchen.text.converters import to_bytes, to_unicode
|
||||
from kitchen.text.misc import byte_string_valid_encoding
|
||||
from kitchen.text.misc import byte_string_valid_encoding, isbasestring
|
||||
|
||||
# We cache parts of the translation objects just like stdlib's gettext so that
|
||||
# we don't reparse the message files and keep them in memory separately if the
|
||||
|
@ -199,9 +200,12 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
:func:`locale.getpreferredencoding`.
|
||||
* Make setting :attr:`input_charset` and :attr:`output_charset` also
|
||||
set those attributes on any fallback translation objects.
|
||||
|
||||
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
|
||||
Add python2_api parameter to __init__()
|
||||
'''
|
||||
#pylint: disable-msg=C0103,C0111
|
||||
def __init__(self, fp=None):
|
||||
def __init__(self, fp=None, python2_api=True):
|
||||
gettext.NullTranslations.__init__(self, fp)
|
||||
|
||||
# Python 2.3 compat
|
||||
|
@ -212,6 +216,46 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
# 'utf-8' is only a default here. Users can override.
|
||||
self._input_charset = 'utf-8'
|
||||
|
||||
# Decide whether to mimic the python2 or python3 api
|
||||
self.python2_api = python2_api
|
||||
|
||||
def _set_api(self):
|
||||
if self._python2_api:
|
||||
warnings.warn('Kitchen.i18n provides gettext objects that'
|
||||
' implement either the python2 or python3 gettext api.'
|
||||
' You are currently using the python2 api. Consider'
|
||||
' switching to the python3 api by setting'
|
||||
' python2_api=False when creating the gettext object',
|
||||
PendingDeprecationWarning, stacklevel=2)
|
||||
self.gettext = self._gettext
|
||||
self.lgettext = self._lgettext
|
||||
self.ugettext = self._ugettext
|
||||
self.ngettext = self._ngettext
|
||||
self.lngettext = self._lngettext
|
||||
self.ungettext = self._ungettext
|
||||
else:
|
||||
self.gettext = self._ugettext
|
||||
self.lgettext = self._lgettext
|
||||
self.ngettext = self._ungettext
|
||||
self.lngettext = self._lngettext
|
||||
self.ugettext = self._removed_method_factory('ugettext')
|
||||
self.ungettext = self._removed_method_factory('ungettext')
|
||||
|
||||
def _removed_method_factory(self, name):
|
||||
def _removed_method(*args, **kwargs):
|
||||
raise AttributeError("'%s' object has no attribute '%s'" %
|
||||
(self.__class__.__name__, name))
|
||||
return _removed_method
|
||||
|
||||
def _set_python2_api(self, value):
|
||||
self._python2_api = value
|
||||
self._set_api()
|
||||
|
||||
def _get_python2_api(self):
|
||||
return self._python2_api
|
||||
|
||||
python2_api = property(_get_python2_api, _set_python2_api)
|
||||
|
||||
def _set_input_charset(self, charset):
|
||||
if self._fallback:
|
||||
try:
|
||||
|
@ -276,7 +320,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
# Make sure that we're returning a str of the desired encoding
|
||||
return to_bytes(msg, encoding=output_encoding)
|
||||
|
||||
def gettext(self, message):
|
||||
def _gettext(self, message):
|
||||
# First use any fallback gettext objects. Since DummyTranslations
|
||||
# doesn't do any translation on its own, this is a good first step.
|
||||
if self._fallback:
|
||||
|
@ -292,7 +336,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
|
||||
return self._reencode_if_necessary(message, output_encoding)
|
||||
|
||||
def ngettext(self, msgid1, msgid2, n):
|
||||
def _ngettext(self, msgid1, msgid2, n):
|
||||
# Default
|
||||
if n == 1:
|
||||
message = msgid1
|
||||
|
@ -313,7 +357,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
|
||||
return self._reencode_if_necessary(message, output_encoding)
|
||||
|
||||
def lgettext(self, message):
|
||||
def _lgettext(self, message):
|
||||
if self._fallback:
|
||||
try:
|
||||
message = self._fallback.lgettext(message)
|
||||
|
@ -329,7 +373,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
|
||||
return self._reencode_if_necessary(message, output_encoding)
|
||||
|
||||
def lngettext(self, msgid1, msgid2, n):
|
||||
def _lngettext(self, msgid1, msgid2, n):
|
||||
# Default
|
||||
if n == 1:
|
||||
message = msgid1
|
||||
|
@ -351,8 +395,8 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
|
||||
return self._reencode_if_necessary(message, output_encoding)
|
||||
|
||||
def ugettext(self, message):
|
||||
if not isinstance(message, basestring):
|
||||
def _ugettext(self, message):
|
||||
if not isbasestring(message):
|
||||
return u''
|
||||
if self._fallback:
|
||||
msg = to_unicode(message, encoding=self.input_charset)
|
||||
|
@ -365,7 +409,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
|||
# Make sure we're returning unicode
|
||||
return to_unicode(message, encoding=self.input_charset)
|
||||
|
||||
def ungettext(self, msgid1, msgid2, n):
|
||||
def _ungettext(self, msgid1, msgid2, n):
|
||||
# Default
|
||||
if n == 1:
|
||||
message = msgid1
|
||||
|
@ -474,8 +518,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
def _parse(self, fp):
|
||||
gettext.GNUTranslations._parse(self, fp)
|
||||
|
||||
def gettext(self, message):
|
||||
if not isinstance(message, basestring):
|
||||
def _gettext(self, message):
|
||||
if not isbasestring(message):
|
||||
return ''
|
||||
tmsg = message
|
||||
u_message = to_unicode(message, encoding=self.input_charset)
|
||||
|
@ -495,13 +539,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
|
||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||
|
||||
def ngettext(self, msgid1, msgid2, n):
|
||||
def _ngettext(self, msgid1, msgid2, n):
|
||||
if n == 1:
|
||||
tmsg = msgid1
|
||||
else:
|
||||
tmsg = msgid2
|
||||
|
||||
if not isinstance(msgid1, basestring):
|
||||
if not isbasestring(msgid1):
|
||||
return ''
|
||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||
try:
|
||||
|
@ -521,8 +565,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
|
||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||
|
||||
def lgettext(self, message):
|
||||
if not isinstance(message, basestring):
|
||||
def _lgettext(self, message):
|
||||
if not isbasestring(message):
|
||||
return ''
|
||||
tmsg = message
|
||||
u_message = to_unicode(message, encoding=self.input_charset)
|
||||
|
@ -542,13 +586,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
|
||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||
|
||||
def lngettext(self, msgid1, msgid2, n):
|
||||
def _lngettext(self, msgid1, msgid2, n):
|
||||
if n == 1:
|
||||
tmsg = msgid1
|
||||
else:
|
||||
tmsg = msgid2
|
||||
|
||||
if not isinstance(msgid1, basestring):
|
||||
if not isbasestring(msgid1):
|
||||
return ''
|
||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||
try:
|
||||
|
@ -557,7 +601,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
except KeyError:
|
||||
if self._fallback:
|
||||
try:
|
||||
tmsg = self._fallback.ngettext(msgid1, msgid2, n)
|
||||
tmsg = self._fallback.lngettext(msgid1, msgid2, n)
|
||||
except (AttributeError, UnicodeError):
|
||||
# Ignore UnicodeErrors: We'll do our own encoding next
|
||||
pass
|
||||
|
@ -569,8 +613,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||
|
||||
|
||||
def ugettext(self, message):
|
||||
if not isinstance(message, basestring):
|
||||
def _ugettext(self, message):
|
||||
if not isbasestring(message):
|
||||
return u''
|
||||
message = to_unicode(message, encoding=self.input_charset)
|
||||
try:
|
||||
|
@ -586,13 +630,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
# Make sure that we're returning unicode
|
||||
return to_unicode(message, encoding=self.input_charset)
|
||||
|
||||
def ungettext(self, msgid1, msgid2, n):
|
||||
def _ungettext(self, msgid1, msgid2, n):
|
||||
if n == 1:
|
||||
tmsg = msgid1
|
||||
else:
|
||||
tmsg = msgid2
|
||||
|
||||
if not isinstance(msgid1, basestring):
|
||||
if not isbasestring(msgid1):
|
||||
return u''
|
||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||
try:
|
||||
|
@ -612,7 +656,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
|||
|
||||
|
||||
def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||
class_=None, fallback=True, codeset=None):
|
||||
class_=None, fallback=True, codeset=None, python2_api=True):
|
||||
'''Get a translation object bound to the :term:`message catalogs`
|
||||
|
||||
:arg domain: Name of the message domain. This should be a unique name
|
||||
|
@ -650,6 +694,15 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
|||
:class:`str` objects. This is equivalent to calling
|
||||
:meth:`~gettext.GNUTranslations.output_charset` on the Translations
|
||||
object that is returned from this function.
|
||||
:kwarg python2_api: When data:`True` (default), return Translation objects
|
||||
that use the python2 gettext api
|
||||
(:meth:`~gettext.GNUTranslations.gettext` and
|
||||
:meth:`~gettext.GNUTranslations.lgettext` return byte
|
||||
:class:`str`. :meth:`~gettext.GNUTranslations.ugettext` exists and
|
||||
returns :class:`unicode` strings). When :data:`False`, return
|
||||
Translation objects that use the python3 gettext api (gettext returns
|
||||
:class:`unicode` strings and lgettext returns byte :class:`str`.
|
||||
ugettext does not exist.)
|
||||
:return: Translation object to get :mod:`gettext` methods from
|
||||
|
||||
If you need more flexibility than :func:`easy_gettext_setup`, use this
|
||||
|
@ -730,7 +783,16 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
|||
than simply cycling through until we find a directory that exists.
|
||||
The new code is based heavily on the |stdlib|_
|
||||
:func:`gettext.translation` function.
|
||||
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
|
||||
Add python2_api parameter
|
||||
'''
|
||||
if python2_api:
|
||||
warnings.warn('get_translation_object returns gettext objects'
|
||||
' that implement either the python2 or python3 gettext api.'
|
||||
' You are currently using the python2 api. Consider'
|
||||
' switching to the python3 api by setting python2_api=False'
|
||||
' when you call the function.',
|
||||
PendingDeprecationWarning, stacklevel=2)
|
||||
if not class_:
|
||||
class_ = NewGNUTranslations
|
||||
|
||||
|
@ -739,7 +801,7 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
|||
mofiles.extend(gettext.find(domain, localedir, languages, all=1))
|
||||
if not mofiles:
|
||||
if fallback:
|
||||
return DummyTranslations()
|
||||
return DummyTranslations(python2_api=python2_api)
|
||||
raise IOError(ENOENT, 'No translation file found for domain', domain)
|
||||
|
||||
# Accumulate a translation with fallbacks to all the other mofiles
|
||||
|
@ -750,14 +812,22 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
|||
if not translation:
|
||||
mofile_fh = open(full_path, 'rb')
|
||||
try:
|
||||
try:
|
||||
translation = _translations.setdefault(full_path,
|
||||
class_(mofile_fh, python2_api=python2_api))
|
||||
except TypeError:
|
||||
# Only our translation classes have the python2_api
|
||||
# parameter
|
||||
translation = _translations.setdefault(full_path,
|
||||
class_(mofile_fh))
|
||||
|
||||
finally:
|
||||
mofile_fh.close()
|
||||
|
||||
# Shallow copy the object so that the fallbacks and output charset can
|
||||
# differ but the data we read from the mofile is shared.
|
||||
translation = copy.copy(translation)
|
||||
translation.python2_api = python2_api
|
||||
if codeset:
|
||||
translation.set_output_charset(codeset)
|
||||
if not stacked_translations:
|
||||
|
@ -818,9 +888,9 @@ def easy_gettext_setup(domain, localedirs=tuple(), use_unicode=True):
|
|||
Changed :func:`~kitchen.i18n.easy_gettext_setup` to return the lgettext
|
||||
functions instead of gettext functions when use_unicode=False.
|
||||
'''
|
||||
translations = get_translation_object(domain, localedirs=localedirs)
|
||||
translations = get_translation_object(domain, localedirs=localedirs, python2_api=False)
|
||||
if use_unicode:
|
||||
return(translations.ugettext, translations.ungettext)
|
||||
return(translations.gettext, translations.ngettext)
|
||||
return(translations.lgettext, translations.lngettext)
|
||||
|
||||
__all__ = ('DummyTranslations', 'NewGNUTranslations', 'easy_gettext_setup',
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright (c) 2010 Red Hat, Inc
|
||||
# Copyright (c) 2012 Red Hat, Inc
|
||||
#
|
||||
# kitchen is free software; you can redistribute it and/or modify it under the
|
||||
# terms of the GNU Lesser General Public License as published by the Free
|
||||
|
@ -34,6 +34,8 @@ from kitchen.versioning import version_tuple_to_string
|
|||
__version_info__ = ((0, 0, 1),)
|
||||
__version__ = version_tuple_to_string(__version_info__)
|
||||
|
||||
from kitchen.text.misc import isbasestring
|
||||
|
||||
def isiterable(obj, include_string=False):
|
||||
'''Check whether an object is an iterable
|
||||
|
||||
|
@ -46,7 +48,7 @@ def isiterable(obj, include_string=False):
|
|||
:returns: :data:`True` if :attr:`obj` is iterable, otherwise
|
||||
:data:`False`.
|
||||
'''
|
||||
if include_string or not isinstance(obj, basestring):
|
||||
if include_string or not isbasestring(obj):
|
||||
try:
|
||||
iter(obj)
|
||||
except TypeError:
|
|
@ -78,8 +78,6 @@ the defaultdict class provided by python-2.5 and above.
|
|||
|
||||
import types
|
||||
|
||||
from kitchen import b_
|
||||
|
||||
# :C0103, W0613: We're implementing the python-2.5 defaultdict API so
|
||||
# we have to use the same names as python.
|
||||
# :C0111: We point people at the stdlib API docs for defaultdict rather than
|
||||
|
@ -90,7 +88,7 @@ class defaultdict(dict):
|
|||
def __init__(self, default_factory=None, *args, **kwargs):
|
||||
if (default_factory is not None and
|
||||
not hasattr(default_factory, '__call__')):
|
||||
raise TypeError(b_('First argument must be callable'))
|
||||
raise TypeError('First argument must be callable')
|
||||
dict.__init__(self, *args, **kwargs)
|
||||
self.default_factory = default_factory
|
||||
|
|
@ -26,9 +26,9 @@ snippets so that you can get on with your life.
|
|||
''')
|
||||
AUTHOR = 'Toshio Kuratomi, Seth Vidal, others'
|
||||
EMAIL = 'toshio@fedoraproject.org'
|
||||
COPYRIGHT = '2011 Red Hat, Inc. and others'
|
||||
COPYRIGHT = '2012 Red Hat, Inc. and others'
|
||||
URL = 'https://fedorahosted.org/kitchen'
|
||||
DOWNLOAD_URL = 'https://fedorahosted.org/releases/k/i/kitchen'
|
||||
DOWNLOAD_URL = 'https://pypi.python.org/pypi/kitchen'
|
||||
LICENSE = 'LGPLv2+'
|
||||
|
||||
__all__ = ('NAME', 'VERSION', 'DESCRIPTION', 'LONG_DESCRIPTION', 'AUTHOR',
|
|
@ -11,7 +11,7 @@ and displaying text on the screen.
|
|||
|
||||
from kitchen.versioning import version_tuple_to_string
|
||||
|
||||
__version_info__ = ((2, 1, 1),)
|
||||
__version_info__ = ((2, 2, 0),)
|
||||
__version__ = version_tuple_to_string(__version_info__)
|
||||
|
||||
__all__ = ('converters', 'exceptions', 'misc',)
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright (c) 2011 Red Hat, Inc.
|
||||
# Copyright (c) 2012 Red Hat, Inc.
|
||||
#
|
||||
# kitchen is free software; you can redistribute it and/or
|
||||
# modify it under the terms of the GNU Lesser General Public
|
||||
|
@ -50,15 +50,12 @@ import codecs
|
|||
import warnings
|
||||
import xml.sax.saxutils
|
||||
|
||||
# We need to access b_() for localizing our strings but we'll end up with
|
||||
# a circular import if we import it directly.
|
||||
import kitchen as k
|
||||
from kitchen.pycompat24 import sets
|
||||
sets.add_builtin_set()
|
||||
|
||||
from kitchen.text.exceptions import ControlCharError, XmlEncodeError
|
||||
from kitchen.text.misc import guess_encoding, html_entities_unescape, \
|
||||
process_control_chars
|
||||
isbytestring, isunicodestring, process_control_chars
|
||||
|
||||
#: Aliases for the utf-8 codec
|
||||
_UTF8_ALIASES = frozenset(('utf-8', 'UTF-8', 'utf8', 'UTF8', 'utf_8', 'UTF_8',
|
||||
|
@ -127,6 +124,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter and changed
|
||||
default value to ``simplerepr``
|
||||
'''
|
||||
# Could use isbasestring/isunicode here but we want this code to be as
|
||||
# fast as possible
|
||||
if isinstance(obj, basestring):
|
||||
if isinstance(obj, unicode):
|
||||
return obj
|
||||
|
@ -137,8 +136,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
return obj.decode(encoding, errors)
|
||||
|
||||
if non_string:
|
||||
warnings.warn(k.b_('non_string is a deprecated parameter of'
|
||||
' to_unicode(). Use nonstring instead'), DeprecationWarning,
|
||||
warnings.warn('non_string is a deprecated parameter of'
|
||||
' to_unicode(). Use nonstring instead', DeprecationWarning,
|
||||
stacklevel=2)
|
||||
if not nonstring:
|
||||
nonstring = non_string
|
||||
|
@ -162,21 +161,21 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
simple = obj.__str__()
|
||||
except (UnicodeError, AttributeError):
|
||||
simple = u''
|
||||
if not isinstance(simple, unicode):
|
||||
if isbytestring(simple):
|
||||
return unicode(simple, encoding, errors)
|
||||
return simple
|
||||
elif nonstring in ('repr', 'strict'):
|
||||
obj_repr = repr(obj)
|
||||
if not isinstance(obj_repr, unicode):
|
||||
if isbytestring(obj_repr):
|
||||
obj_repr = unicode(obj_repr, encoding, errors)
|
||||
if nonstring == 'repr':
|
||||
return obj_repr
|
||||
raise TypeError(k.b_('to_unicode was given "%(obj)s" which is neither'
|
||||
' a byte string (str) or a unicode string') %
|
||||
raise TypeError('to_unicode was given "%(obj)s" which is neither'
|
||||
' a byte string (str) or a unicode string' %
|
||||
{'obj': obj_repr.encode(encoding, 'replace')})
|
||||
|
||||
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
|
||||
' action') % {'param': nonstring})
|
||||
raise TypeError('nonstring value, %(param)s, is not set to a valid'
|
||||
' action' % {'param': nonstring})
|
||||
|
||||
def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||
non_string=None):
|
||||
|
@ -247,13 +246,15 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter
|
||||
and changed default value to ``simplerepr``
|
||||
'''
|
||||
# Could use isbasestring, isbytestring here but we want this to be as fast
|
||||
# as possible
|
||||
if isinstance(obj, basestring):
|
||||
if isinstance(obj, str):
|
||||
return obj
|
||||
return obj.encode(encoding, errors)
|
||||
if non_string:
|
||||
warnings.warn(k.b_('non_string is a deprecated parameter of'
|
||||
' to_bytes(). Use nonstring instead'), DeprecationWarning,
|
||||
warnings.warn('non_string is a deprecated parameter of'
|
||||
' to_bytes(). Use nonstring instead', DeprecationWarning,
|
||||
stacklevel=2)
|
||||
if not nonstring:
|
||||
nonstring = non_string
|
||||
|
@ -277,7 +278,7 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
simple = obj.__unicode__()
|
||||
except (AttributeError, UnicodeError):
|
||||
simple = ''
|
||||
if isinstance(simple, unicode):
|
||||
if isunicodestring(simple):
|
||||
simple = simple.encode(encoding, 'replace')
|
||||
return simple
|
||||
elif nonstring in ('repr', 'strict'):
|
||||
|
@ -285,17 +286,17 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
|||
obj_repr = obj.__repr__()
|
||||
except (AttributeError, UnicodeError):
|
||||
obj_repr = ''
|
||||
if isinstance(obj_repr, unicode):
|
||||
if isunicodestring(obj_repr):
|
||||
obj_repr = obj_repr.encode(encoding, errors)
|
||||
else:
|
||||
obj_repr = str(obj_repr)
|
||||
if nonstring == 'repr':
|
||||
return obj_repr
|
||||
raise TypeError(k.b_('to_bytes was given "%(obj)s" which is neither'
|
||||
' a unicode string or a byte string (str)') % {'obj': obj_repr})
|
||||
raise TypeError('to_bytes was given "%(obj)s" which is neither'
|
||||
' a unicode string or a byte string (str)' % {'obj': obj_repr})
|
||||
|
||||
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
|
||||
' action') % {'param': nonstring})
|
||||
raise TypeError('nonstring value, %(param)s, is not set to a valid'
|
||||
' action' % {'param': nonstring})
|
||||
|
||||
def getwriter(encoding):
|
||||
'''Return a :class:`codecs.StreamWriter` that resists tracing back.
|
||||
|
@ -375,9 +376,9 @@ def to_utf8(obj, errors='replace', non_string='passthru'):
|
|||
|
||||
to_bytes(obj, encoding='utf-8', non_string='passthru')
|
||||
'''
|
||||
warnings.warn(k.b_('kitchen.text.converters.to_utf8 is deprecated. Use'
|
||||
warnings.warn('kitchen.text.converters.to_utf8 is deprecated. Use'
|
||||
' kitchen.text.converters.to_bytes(obj, encoding="utf-8",'
|
||||
' nonstring="passthru" instead.'), DeprecationWarning, stacklevel=2)
|
||||
' nonstring="passthru" instead.', DeprecationWarning, stacklevel=2)
|
||||
return to_bytes(obj, encoding='utf-8', errors=errors,
|
||||
nonstring=non_string)
|
||||
|
||||
|
@ -400,9 +401,8 @@ def to_str(obj):
|
|||
|
||||
to_bytes(obj, nonstring='simplerepr')
|
||||
'''
|
||||
warnings.warn(k.b_('to_str is deprecated. Use to_unicode or to_bytes'
|
||||
' instead. See the to_str docstring for'
|
||||
' porting information.'),
|
||||
warnings.warn('to_str is deprecated. Use to_unicode or to_bytes'
|
||||
' instead. See the to_str docstring for porting information.',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
return to_bytes(obj, nonstring='simplerepr')
|
||||
|
||||
|
@ -682,22 +682,23 @@ def unicode_to_xml(string, encoding='utf-8', attrib=False,
|
|||
try:
|
||||
process_control_chars(string, strategy=control_chars)
|
||||
except TypeError:
|
||||
raise XmlEncodeError(k.b_('unicode_to_xml must have a unicode type as'
|
||||
raise XmlEncodeError('unicode_to_xml must have a unicode type as'
|
||||
' the first argument. Use bytes_string_to_xml for byte'
|
||||
' strings.'))
|
||||
' strings.')
|
||||
except ValueError:
|
||||
raise ValueError(k.b_('The control_chars argument to unicode_to_xml'
|
||||
' must be one of ignore, replace, or strict'))
|
||||
raise ValueError('The control_chars argument to unicode_to_xml'
|
||||
' must be one of ignore, replace, or strict')
|
||||
except ControlCharError, exc:
|
||||
raise XmlEncodeError(exc.args[0])
|
||||
|
||||
string = string.encode(encoding, 'xmlcharrefreplace')
|
||||
|
||||
# Escape characters that have special meaning in xml
|
||||
if attrib:
|
||||
string = xml.sax.saxutils.escape(string, entities={'"':"""})
|
||||
else:
|
||||
string = xml.sax.saxutils.escape(string)
|
||||
|
||||
string = string.encode(encoding, 'xmlcharrefreplace')
|
||||
|
||||
return string
|
||||
|
||||
def xml_to_unicode(byte_string, encoding='utf-8', errors='replace'):
|
||||
|
@ -782,10 +783,10 @@ def byte_string_to_xml(byte_string, input_encoding='utf-8', errors='replace',
|
|||
:func:`unicode_to_xml`
|
||||
for other ideas on using this function
|
||||
'''
|
||||
if not isinstance(byte_string, str):
|
||||
raise XmlEncodeError(k.b_('byte_string_to_xml can only take a byte'
|
||||
if not isbytestring(byte_string):
|
||||
raise XmlEncodeError('byte_string_to_xml can only take a byte'
|
||||
' string as its first argument. Use unicode_to_xml for'
|
||||
' unicode strings'))
|
||||
' unicode strings')
|
||||
|
||||
# Decode the string into unicode
|
||||
u_string = unicode(byte_string, input_encoding, errors)
|
||||
|
@ -892,7 +893,7 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
|
|||
|
||||
'''
|
||||
# Unicode strings can just be run through unicode_to_xml()
|
||||
if isinstance(string, unicode):
|
||||
if isunicodestring(string):
|
||||
return unicode_to_xml(string, encoding=output_encoding,
|
||||
attrib=attrib, control_chars=control_chars)
|
||||
|
||||
|
@ -907,8 +908,8 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
|
|||
def to_xml(string, encoding='utf-8', attrib=False, control_chars='ignore'):
|
||||
'''*Deprecated*: Use :func:`guess_encoding_to_xml` instead
|
||||
'''
|
||||
warnings.warn(k.b_('kitchen.text.converters.to_xml is deprecated. Use'
|
||||
' kitchen.text.converters.guess_encoding_to_xml instead.'),
|
||||
warnings.warn('kitchen.text.converters.to_xml is deprecated. Use'
|
||||
' kitchen.text.converters.guess_encoding_to_xml instead.',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
return guess_encoding_to_xml(string, output_encoding=encoding,
|
||||
attrib=attrib, control_chars=control_chars)
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright (c) 2010 Red Hat, Inc.
|
||||
# Copyright (c) 2013 Red Hat, Inc.
|
||||
# Copyright (c) 2010 Ville Skyttä
|
||||
# Copyright (c) 2009 Tim Lauridsen
|
||||
# Copyright (c) 2007 Marcus Kuhn
|
||||
|
@ -39,7 +39,6 @@ have the same width so we need helper functions for displaying them.
|
|||
import itertools
|
||||
import unicodedata
|
||||
|
||||
from kitchen import b_
|
||||
from kitchen.text.converters import to_unicode, to_bytes
|
||||
from kitchen.text.exceptions import ControlCharError
|
||||
|
||||
|
@ -101,7 +100,7 @@ def _interval_bisearch(value, table):
|
|||
return False
|
||||
|
||||
while maximum >= minimum:
|
||||
mid = (minimum + maximum) / 2
|
||||
mid = divmod(minimum + maximum, 2)[0]
|
||||
if value > table[mid][1]:
|
||||
minimum = mid + 1
|
||||
elif value < table[mid][0]:
|
||||
|
@ -115,62 +114,64 @@ _COMBINING = (
|
|||
(0x300, 0x36f), (0x483, 0x489), (0x591, 0x5bd),
|
||||
(0x5bf, 0x5bf), (0x5c1, 0x5c2), (0x5c4, 0x5c5),
|
||||
(0x5c7, 0x5c7), (0x600, 0x603), (0x610, 0x61a),
|
||||
(0x64b, 0x65e), (0x670, 0x670), (0x6d6, 0x6e4),
|
||||
(0x64b, 0x65f), (0x670, 0x670), (0x6d6, 0x6e4),
|
||||
(0x6e7, 0x6e8), (0x6ea, 0x6ed), (0x70f, 0x70f),
|
||||
(0x711, 0x711), (0x730, 0x74a), (0x7a6, 0x7b0),
|
||||
(0x7eb, 0x7f3), (0x816, 0x819), (0x81b, 0x823),
|
||||
(0x825, 0x827), (0x829, 0x82d), (0x901, 0x902),
|
||||
(0x93c, 0x93c), (0x941, 0x948), (0x94d, 0x94d),
|
||||
(0x951, 0x954), (0x962, 0x963), (0x981, 0x981),
|
||||
(0x9bc, 0x9bc), (0x9c1, 0x9c4), (0x9cd, 0x9cd),
|
||||
(0x9e2, 0x9e3), (0xa01, 0xa02), (0xa3c, 0xa3c),
|
||||
(0xa41, 0xa42), (0xa47, 0xa48), (0xa4b, 0xa4d),
|
||||
(0xa70, 0xa71), (0xa81, 0xa82), (0xabc, 0xabc),
|
||||
(0xac1, 0xac5), (0xac7, 0xac8), (0xacd, 0xacd),
|
||||
(0xae2, 0xae3), (0xb01, 0xb01), (0xb3c, 0xb3c),
|
||||
(0xb3f, 0xb3f), (0xb41, 0xb43), (0xb4d, 0xb4d),
|
||||
(0xb56, 0xb56), (0xb82, 0xb82), (0xbc0, 0xbc0),
|
||||
(0xbcd, 0xbcd), (0xc3e, 0xc40), (0xc46, 0xc48),
|
||||
(0xc4a, 0xc4d), (0xc55, 0xc56), (0xcbc, 0xcbc),
|
||||
(0xcbf, 0xcbf), (0xcc6, 0xcc6), (0xccc, 0xccd),
|
||||
(0xce2, 0xce3), (0xd41, 0xd43), (0xd4d, 0xd4d),
|
||||
(0xdca, 0xdca), (0xdd2, 0xdd4), (0xdd6, 0xdd6),
|
||||
(0xe31, 0xe31), (0xe34, 0xe3a), (0xe47, 0xe4e),
|
||||
(0xeb1, 0xeb1), (0xeb4, 0xeb9), (0xebb, 0xebc),
|
||||
(0xec8, 0xecd), (0xf18, 0xf19), (0xf35, 0xf35),
|
||||
(0xf37, 0xf37), (0xf39, 0xf39), (0xf71, 0xf7e),
|
||||
(0xf80, 0xf84), (0xf86, 0xf87), (0xf90, 0xf97),
|
||||
(0xf99, 0xfbc), (0xfc6, 0xfc6), (0x102d, 0x1030),
|
||||
(0x1032, 0x1032), (0x1036, 0x1037), (0x1039, 0x103a),
|
||||
(0x1058, 0x1059), (0x108d, 0x108d), (0x1160, 0x11ff),
|
||||
(0x135f, 0x135f), (0x1712, 0x1714), (0x1732, 0x1734),
|
||||
(0x1752, 0x1753), (0x1772, 0x1773), (0x17b4, 0x17b5),
|
||||
(0x17b7, 0x17bd), (0x17c6, 0x17c6), (0x17c9, 0x17d3),
|
||||
(0x17dd, 0x17dd), (0x180b, 0x180d), (0x18a9, 0x18a9),
|
||||
(0x1920, 0x1922), (0x1927, 0x1928), (0x1932, 0x1932),
|
||||
(0x1939, 0x193b), (0x1a17, 0x1a18), (0x1a60, 0x1a60),
|
||||
(0x1a75, 0x1a7c), (0x1a7f, 0x1a7f), (0x1b00, 0x1b03),
|
||||
(0x1b34, 0x1b34), (0x1b36, 0x1b3a), (0x1b3c, 0x1b3c),
|
||||
(0x1b42, 0x1b42), (0x1b44, 0x1b44), (0x1b6b, 0x1b73),
|
||||
(0x1baa, 0x1baa), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
|
||||
(0x825, 0x827), (0x829, 0x82d), (0x859, 0x85b),
|
||||
(0x901, 0x902), (0x93c, 0x93c), (0x941, 0x948),
|
||||
(0x94d, 0x94d), (0x951, 0x954), (0x962, 0x963),
|
||||
(0x981, 0x981), (0x9bc, 0x9bc), (0x9c1, 0x9c4),
|
||||
(0x9cd, 0x9cd), (0x9e2, 0x9e3), (0xa01, 0xa02),
|
||||
(0xa3c, 0xa3c), (0xa41, 0xa42), (0xa47, 0xa48),
|
||||
(0xa4b, 0xa4d), (0xa70, 0xa71), (0xa81, 0xa82),
|
||||
(0xabc, 0xabc), (0xac1, 0xac5), (0xac7, 0xac8),
|
||||
(0xacd, 0xacd), (0xae2, 0xae3), (0xb01, 0xb01),
|
||||
(0xb3c, 0xb3c), (0xb3f, 0xb3f), (0xb41, 0xb43),
|
||||
(0xb4d, 0xb4d), (0xb56, 0xb56), (0xb82, 0xb82),
|
||||
(0xbc0, 0xbc0), (0xbcd, 0xbcd), (0xc3e, 0xc40),
|
||||
(0xc46, 0xc48), (0xc4a, 0xc4d), (0xc55, 0xc56),
|
||||
(0xcbc, 0xcbc), (0xcbf, 0xcbf), (0xcc6, 0xcc6),
|
||||
(0xccc, 0xccd), (0xce2, 0xce3), (0xd41, 0xd43),
|
||||
(0xd4d, 0xd4d), (0xdca, 0xdca), (0xdd2, 0xdd4),
|
||||
(0xdd6, 0xdd6), (0xe31, 0xe31), (0xe34, 0xe3a),
|
||||
(0xe47, 0xe4e), (0xeb1, 0xeb1), (0xeb4, 0xeb9),
|
||||
(0xebb, 0xebc), (0xec8, 0xecd), (0xf18, 0xf19),
|
||||
(0xf35, 0xf35), (0xf37, 0xf37), (0xf39, 0xf39),
|
||||
(0xf71, 0xf7e), (0xf80, 0xf84), (0xf86, 0xf87),
|
||||
(0xf90, 0xf97), (0xf99, 0xfbc), (0xfc6, 0xfc6),
|
||||
(0x102d, 0x1030), (0x1032, 0x1032), (0x1036, 0x1037),
|
||||
(0x1039, 0x103a), (0x1058, 0x1059), (0x108d, 0x108d),
|
||||
(0x1160, 0x11ff), (0x135d, 0x135f), (0x1712, 0x1714),
|
||||
(0x1732, 0x1734), (0x1752, 0x1753), (0x1772, 0x1773),
|
||||
(0x17b4, 0x17b5), (0x17b7, 0x17bd), (0x17c6, 0x17c6),
|
||||
(0x17c9, 0x17d3), (0x17dd, 0x17dd), (0x180b, 0x180d),
|
||||
(0x18a9, 0x18a9), (0x1920, 0x1922), (0x1927, 0x1928),
|
||||
(0x1932, 0x1932), (0x1939, 0x193b), (0x1a17, 0x1a18),
|
||||
(0x1a60, 0x1a60), (0x1a75, 0x1a7c), (0x1a7f, 0x1a7f),
|
||||
(0x1b00, 0x1b03), (0x1b34, 0x1b34), (0x1b36, 0x1b3a),
|
||||
(0x1b3c, 0x1b3c), (0x1b42, 0x1b42), (0x1b44, 0x1b44),
|
||||
(0x1b6b, 0x1b73), (0x1baa, 0x1baa), (0x1be6, 0x1be6),
|
||||
(0x1bf2, 0x1bf3), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
|
||||
(0x1cd4, 0x1ce0), (0x1ce2, 0x1ce8), (0x1ced, 0x1ced),
|
||||
(0x1dc0, 0x1de6), (0x1dfd, 0x1dff), (0x200b, 0x200f),
|
||||
(0x1dc0, 0x1de6), (0x1dfc, 0x1dff), (0x200b, 0x200f),
|
||||
(0x202a, 0x202e), (0x2060, 0x2063), (0x206a, 0x206f),
|
||||
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2de0, 0x2dff),
|
||||
(0x302a, 0x302f), (0x3099, 0x309a), (0xa66f, 0xa66f),
|
||||
(0xa67c, 0xa67d), (0xa6f0, 0xa6f1), (0xa806, 0xa806),
|
||||
(0xa80b, 0xa80b), (0xa825, 0xa826), (0xa8c4, 0xa8c4),
|
||||
(0xa8e0, 0xa8f1), (0xa92b, 0xa92d), (0xa953, 0xa953),
|
||||
(0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0), (0xaab0, 0xaab0),
|
||||
(0xaab2, 0xaab4), (0xaab7, 0xaab8), (0xaabe, 0xaabf),
|
||||
(0xaac1, 0xaac1), (0xabed, 0xabed), (0xfb1e, 0xfb1e),
|
||||
(0xfe00, 0xfe0f), (0xfe20, 0xfe26), (0xfeff, 0xfeff),
|
||||
(0xfff9, 0xfffb), (0x101fd, 0x101fd), (0x10a01, 0x10a03),
|
||||
(0x10a05, 0x10a06), (0x10a0c, 0x10a0f), (0x10a38, 0x10a3a),
|
||||
(0x10a3f, 0x10a3f), (0x110b9, 0x110ba), (0x1d165, 0x1d169),
|
||||
(0x1d16d, 0x1d182), (0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad),
|
||||
(0x1d242, 0x1d244), (0xe0001, 0xe0001), (0xe0020, 0xe007f),
|
||||
(0xe0100, 0xe01ef), )
|
||||
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2d7f, 0x2d7f),
|
||||
(0x2de0, 0x2dff), (0x302a, 0x302f), (0x3099, 0x309a),
|
||||
(0xa66f, 0xa66f), (0xa67c, 0xa67d), (0xa6f0, 0xa6f1),
|
||||
(0xa806, 0xa806), (0xa80b, 0xa80b), (0xa825, 0xa826),
|
||||
(0xa8c4, 0xa8c4), (0xa8e0, 0xa8f1), (0xa92b, 0xa92d),
|
||||
(0xa953, 0xa953), (0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0),
|
||||
(0xaab0, 0xaab0), (0xaab2, 0xaab4), (0xaab7, 0xaab8),
|
||||
(0xaabe, 0xaabf), (0xaac1, 0xaac1), (0xabed, 0xabed),
|
||||
(0xfb1e, 0xfb1e), (0xfe00, 0xfe0f), (0xfe20, 0xfe26),
|
||||
(0xfeff, 0xfeff), (0xfff9, 0xfffb), (0x101fd, 0x101fd),
|
||||
(0x10a01, 0x10a03), (0x10a05, 0x10a06), (0x10a0c, 0x10a0f),
|
||||
(0x10a38, 0x10a3a), (0x10a3f, 0x10a3f), (0x11046, 0x11046),
|
||||
(0x110b9, 0x110ba), (0x1d165, 0x1d169), (0x1d16d, 0x1d182),
|
||||
(0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad), (0x1d242, 0x1d244),
|
||||
(0xe0001, 0xe0001), (0xe0020, 0xe007f), (0xe0100, 0xe01ef), )
|
||||
|
||||
'''
|
||||
Internal table, provided by this module to list :term:`code points` which
|
||||
combine with other characters and therefore should have no :term:`textual
|
||||
|
@ -184,8 +185,8 @@ a combining character.
|
|||
:func:`~kitchen.text.display._generate_combining_table`
|
||||
for how this table is generated
|
||||
|
||||
This table was last regenerated on python-2.7.0 with
|
||||
:data:`unicodedata.unidata_version` 5.1.0
|
||||
This table was last regenerated on python-3.2.3 with
|
||||
:data:`unicodedata.unidata_version` 6.0.0
|
||||
'''
|
||||
|
||||
# New function from Toshio Kuratomi (LGPLv2+)
|
||||
|
@ -341,8 +342,8 @@ def _ucp_width(ucs, control_chars='guess'):
|
|||
if ucs < 32 or (ucs < 0xa0 and ucs >= 0x7f):
|
||||
# Control character detected
|
||||
if control_chars == 'strict':
|
||||
raise ControlCharError(b_('_ucp_width does not understand how to'
|
||||
' assign a width value to control characters.'))
|
||||
raise ControlCharError('_ucp_width does not understand how to'
|
||||
' assign a width value to control characters.')
|
||||
if ucs in (0x08, 0x07F, 0x94):
|
||||
# Backspace, delete, and clear delete remove a single character
|
||||
return -1
|
||||
|
@ -519,7 +520,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
|
|||
# if current width is high,
|
||||
if width > chop:
|
||||
# calculate new midpoint
|
||||
mid = minimum + (eos - minimum) / 2
|
||||
mid = minimum + (eos - minimum) // 2
|
||||
if mid == eos:
|
||||
break
|
||||
if (eos - chop) < (eos - mid):
|
||||
|
@ -537,7 +538,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
|
|||
# short-circuit above means that we never use this branch.
|
||||
|
||||
# calculate new midpoint
|
||||
mid = eos + (maximum - eos) / 2
|
||||
mid = eos + (maximum - eos) // 2
|
||||
if mid == eos:
|
||||
break
|
||||
if (chop - eos) < (mid - eos):
|
|
@ -1,5 +1,5 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
# Copyright (c) 2011 Red Hat, Inc
|
||||
# Copyright (c) 2012 Red Hat, Inc
|
||||
# Copyright (c) 2010 Seth Vidal
|
||||
#
|
||||
# kitchen is free software; you can redistribute it and/or
|
||||
|
@ -27,6 +27,12 @@ Miscellaneous functions for manipulating text
|
|||
---------------------------------------------
|
||||
|
||||
Collection of text functions that don't fit in another category.
|
||||
|
||||
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
|
||||
Added :func:`~kitchen.text.misc.isbasestring`,
|
||||
:func:`~kitchen.text.misc.isbytestring`, and
|
||||
:func:`~kitchen.text.misc.isunicodestring` to help tell which string type
|
||||
is which on python2 and python3
|
||||
'''
|
||||
import htmlentitydefs
|
||||
import itertools
|
||||
|
@ -37,9 +43,6 @@ try:
|
|||
except ImportError:
|
||||
chardet = None
|
||||
|
||||
# We need to access b_() for localizing our strings but we'll end up with
|
||||
# a circular import if we import it directly.
|
||||
import kitchen as k
|
||||
from kitchen.pycompat24 import sets
|
||||
from kitchen.text.exceptions import ControlCharError
|
||||
|
||||
|
@ -49,13 +52,64 @@ sets.add_builtin_set()
|
|||
# byte strings we're guessing about as latin1
|
||||
_CHARDET_THRESHHOLD = 0.6
|
||||
|
||||
# ASCII control codes that are illegal in xml 1.0
|
||||
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32))
|
||||
# ASCII control codes (the c0 codes) that are illegal in xml 1.0
|
||||
# Also unicode control codes (the C1 codes): also illegal in xml
|
||||
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32) + range(128, 160))
|
||||
_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
|
||||
_IGNORE_TABLE = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
|
||||
_REPLACE_TABLE = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
|
||||
|
||||
# _ENTITY_RE
|
||||
_ENTITY_RE = re.compile(r'(?s)<[^>]*>|&#?\w+;')
|
||||
|
||||
def isbasestring(obj):
|
||||
'''Determine if obj is a byte :class:`str` or :class:`unicode` string
|
||||
|
||||
In python2 this is eqiuvalent to isinstance(obj, basestring). In python3
|
||||
it checks whether the object is an instance of str, bytes, or bytearray.
|
||||
This is an aid to porting code that needed to test whether an object was
|
||||
derived from basestring in python2 (commonly used in unicode-bytes
|
||||
conversion functions)
|
||||
|
||||
:arg obj: Object to test
|
||||
:returns: True if the object is a :class:`basestring`. Otherwise False.
|
||||
|
||||
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||
'''
|
||||
if isinstance(obj, basestring):
|
||||
return True
|
||||
return False
|
||||
|
||||
def isbytestring(obj):
|
||||
'''Determine if obj is a byte :class:`str`
|
||||
|
||||
In python2 this is equivalent to isinstance(obj, str). In python3 it
|
||||
checks whether the object is an instance of bytes or bytearray.
|
||||
|
||||
:arg obj: Object to test
|
||||
:returns: True if the object is a byte :class:`str`. Otherwise, False.
|
||||
|
||||
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||
'''
|
||||
if isinstance(obj, str):
|
||||
return True
|
||||
return False
|
||||
|
||||
def isunicodestring(obj):
|
||||
'''Determine if obj is a :class:`unicode` string
|
||||
|
||||
In python2 this is equivalent to isinstance(obj, unicode). In python3 it
|
||||
checks whether the object is an instance of :class:`str`.
|
||||
|
||||
:arg obj: Object to test
|
||||
:returns: True if the object is a :class:`unicode` string. Otherwise, False.
|
||||
|
||||
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||
'''
|
||||
if isinstance(obj, unicode):
|
||||
return True
|
||||
return False
|
||||
|
||||
def guess_encoding(byte_string, disable_chardet=False):
|
||||
'''Try to guess the encoding of a byte :class:`str`
|
||||
|
||||
|
@ -79,8 +133,8 @@ def guess_encoding(byte_string, disable_chardet=False):
|
|||
to every byte, decoding from ``latin-1`` to :class:`unicode` will not
|
||||
cause :exc:`UnicodeErrors` although the output might be mangled.
|
||||
'''
|
||||
if not isinstance(byte_string, str):
|
||||
raise TypeError(k.b_('byte_string must be a byte string (str)'))
|
||||
if not isbytestring(byte_string):
|
||||
raise TypeError('first argument must be a byte string (str)')
|
||||
input_encoding = 'utf-8'
|
||||
try:
|
||||
unicode(byte_string, input_encoding, 'strict')
|
||||
|
@ -98,7 +152,7 @@ def guess_encoding(byte_string, disable_chardet=False):
|
|||
return input_encoding
|
||||
|
||||
def str_eq(str1, str2, encoding='utf-8', errors='replace'):
|
||||
'''Compare two stringsi, converting to byte :class:`str` if one is
|
||||
'''Compare two strings, converting to byte :class:`str` if one is
|
||||
:class:`unicode`
|
||||
|
||||
:arg str1: First string to compare
|
||||
|
@ -135,7 +189,7 @@ def str_eq(str1, str2, encoding='utf-8', errors='replace'):
|
|||
except UnicodeError:
|
||||
pass
|
||||
|
||||
if isinstance(str1, unicode):
|
||||
if isunicodestring(str1):
|
||||
str1 = str1.encode(encoding, errors)
|
||||
else:
|
||||
str2 = str2.encode(encoding, errors)
|
||||
|
@ -166,26 +220,30 @@ def process_control_chars(string, strategy='replace'):
|
|||
:attr:`string`
|
||||
:returns: :class:`unicode` string with no :term:`control characters` in
|
||||
it.
|
||||
'''
|
||||
if not isinstance(string, unicode):
|
||||
raise TypeError(k.b_('process_control_char must have a unicode type as'
|
||||
' the first argument.'))
|
||||
if strategy == 'ignore':
|
||||
control_table = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
|
||||
elif strategy == 'replace':
|
||||
control_table = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
|
||||
elif strategy == 'strict':
|
||||
control_table = None
|
||||
# Test that there are no control codes present
|
||||
data = frozenset(string)
|
||||
if [c for c in _CONTROL_CHARS if c in data]:
|
||||
raise ControlCharError(k.b_('ASCII control code present in string'
|
||||
' input'))
|
||||
else:
|
||||
raise ValueError(k.b_('The strategy argument to process_control_chars'
|
||||
' must be one of ignore, replace, or strict'))
|
||||
|
||||
if control_table:
|
||||
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
|
||||
Strip out the C1 control characters in addition to the C0 control
|
||||
characters.
|
||||
'''
|
||||
if not isunicodestring(string):
|
||||
raise TypeError('process_control_char must have a unicode type as'
|
||||
' the first argument.')
|
||||
if strategy not in ('replace', 'ignore', 'strict'):
|
||||
raise ValueError('The strategy argument to process_control_chars'
|
||||
' must be one of ignore, replace, or strict')
|
||||
|
||||
# Most strings don't have control chars and translating carries
|
||||
# a higher cost than testing whether the chars are in the string
|
||||
# So only translate if necessary
|
||||
if not _CONTROL_CHARS.isdisjoint(string):
|
||||
if strategy == 'replace':
|
||||
control_table = _REPLACE_TABLE
|
||||
elif strategy == 'ignore':
|
||||
control_table = _IGNORE_TABLE
|
||||
else:
|
||||
# strategy can only equal 'strict'
|
||||
raise ControlCharError('ASCII control code present in string'
|
||||
' input')
|
||||
string = string.translate(control_table)
|
||||
|
||||
return string
|
||||
|
@ -237,9 +295,9 @@ def html_entities_unescape(string):
|
|||
return unicode(entity, "iso-8859-1")
|
||||
return string # leave as is
|
||||
|
||||
if not isinstance(string, unicode):
|
||||
raise TypeError(k.b_('html_entities_unescape must have a unicode type'
|
||||
' for its first argument'))
|
||||
if not isunicodestring(string):
|
||||
raise TypeError('html_entities_unescape must have a unicode type'
|
||||
' for its first argument')
|
||||
return re.sub(_ENTITY_RE, fixup, string)
|
||||
|
||||
def byte_string_valid_xml(byte_string, encoding='utf-8'):
|
||||
|
@ -264,7 +322,7 @@ def byte_string_valid_xml(byte_string, encoding='utf-8'):
|
|||
processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
|
||||
output_xml(processed_array)
|
||||
'''
|
||||
if not isinstance(byte_string, str):
|
||||
if not isbytestring(byte_string):
|
||||
# Not a byte string
|
||||
return False
|
||||
|
||||
|
@ -309,5 +367,5 @@ def byte_string_valid_encoding(byte_string, encoding='utf-8'):
|
|||
return True
|
||||
|
||||
__all__ = ('byte_string_valid_encoding', 'byte_string_valid_xml',
|
||||
'guess_encoding', 'html_entities_unescape', 'process_control_chars',
|
||||
'str_eq')
|
||||
'guess_encoding', 'html_entities_unescape', 'isbasestring',
|
||||
'isbytestring', 'isunicodestring', 'process_control_chars', 'str_eq')
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright (c) 2011 Red Hat, Inc.
|
||||
# Copyright (c) 2012 Red Hat, Inc.
|
||||
# Copyright (c) 2010 Ville Skyttä
|
||||
# Copyright (c) 2009 Tim Lauridsen
|
||||
# Copyright (c) 2007 Marcus Kuhn
|
||||
|
@ -50,9 +50,8 @@ Functions for operating on byte :class:`str` encoded as :term:`UTF-8`
|
|||
'''
|
||||
import warnings
|
||||
|
||||
from kitchen import b_
|
||||
from kitchen.text.converters import to_unicode, to_bytes
|
||||
from kitchen.text.misc import byte_string_valid_encoding
|
||||
from kitchen.text.misc import byte_string_valid_encoding, isunicodestring
|
||||
from kitchen.text.display import _textual_width_le, \
|
||||
byte_string_textual_width_fill, fill, textual_width, \
|
||||
textual_width_chop, wrap
|
||||
|
@ -66,8 +65,8 @@ def utf8_valid(msg):
|
|||
|
||||
Use :func:`kitchen.text.misc.byte_string_valid_encoding` instead.
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_valid is deprecated. Use'
|
||||
' kitchen.text.misc.byte_string_valid_encoding(msg) instead'),
|
||||
warnings.warn('kitchen.text.utf8.utf8_valid is deprecated. Use'
|
||||
' kitchen.text.misc.byte_string_valid_encoding(msg) instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
return byte_string_valid_encoding(msg)
|
||||
|
||||
|
@ -76,8 +75,8 @@ def utf8_width(msg):
|
|||
|
||||
Use :func:`kitchen.text.display.textual_width` instead.
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_width is deprecated. Use'
|
||||
' kitchen.text.display.textual_width(msg) instead'),
|
||||
warnings.warn('kitchen.text.utf8.utf8_width is deprecated. Use'
|
||||
' kitchen.text.display.textual_width(msg) instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
return textual_width(msg)
|
||||
|
||||
|
@ -98,14 +97,14 @@ def utf8_width_chop(msg, chop=None):
|
|||
>>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
|
||||
(5, 'く ku')
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
|
||||
' kitchen.text.display.textual_width_chop instead'), DeprecationWarning,
|
||||
warnings.warn('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
|
||||
' kitchen.text.display.textual_width_chop instead', DeprecationWarning,
|
||||
stacklevel=2)
|
||||
|
||||
if chop == None:
|
||||
return textual_width(msg), msg
|
||||
|
||||
as_bytes = not isinstance(msg, unicode)
|
||||
as_bytes = not isunicodestring(msg)
|
||||
|
||||
chopped_msg = textual_width_chop(msg, chop)
|
||||
if as_bytes:
|
||||
|
@ -117,8 +116,8 @@ def utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix=''):
|
|||
|
||||
Use :func:`~kitchen.text.display.byte_string_textual_width_fill` instead
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
|
||||
' kitchen.text.display.byte_string_textual_width_fill instead'),
|
||||
warnings.warn('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
|
||||
' kitchen.text.display.byte_string_textual_width_fill instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
|
||||
return byte_string_textual_width_fill(msg, fill, chop=chop, left=left,
|
||||
|
@ -130,11 +129,11 @@ def utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent=''):
|
|||
|
||||
Use :func:`kitchen.text.display.wrap` instead
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
|
||||
' kitchen.text.display.wrap instead'),
|
||||
warnings.warn('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
|
||||
' kitchen.text.display.wrap instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
|
||||
as_bytes = not isinstance(text, unicode)
|
||||
as_bytes = not isunicodestring(text)
|
||||
|
||||
text = to_unicode(text)
|
||||
lines = wrap(text, width=width, initial_indent=initial_indent,
|
||||
|
@ -150,8 +149,8 @@ def utf8_text_fill(text, *args, **kwargs):
|
|||
|
||||
Use :func:`kitchen.text.display.fill` instead.
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
|
||||
' kitchen.text.display.fill instead'),
|
||||
warnings.warn('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
|
||||
' kitchen.text.display.fill instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
# This assumes that all args. are utf8.
|
||||
return fill(text, *args, **kwargs)
|
||||
|
@ -160,8 +159,8 @@ def _utf8_width_le(width, *args):
|
|||
'''**Deprecated** Convert the arguments to unicode and use
|
||||
:func:`kitchen.text.display._textual_width_le` instead.
|
||||
'''
|
||||
warnings.warn(b_('kitchen.text.utf8._utf8_width_le is deprecated. Use'
|
||||
' kitchen.text.display._textual_width_le instead'),
|
||||
warnings.warn('kitchen.text.utf8._utf8_width_le is deprecated. Use'
|
||||
' kitchen.text.display._textual_width_le instead',
|
||||
DeprecationWarning, stacklevel=2)
|
||||
# This assumes that all args. are utf8.
|
||||
return _textual_width_le(width, to_unicode(''.join(args)))
|
|
@ -89,10 +89,10 @@ def version_tuple_to_string(version_info):
|
|||
if isinstance(values[0], int):
|
||||
ver_components.append('.'.join(itertools.imap(str, values)))
|
||||
else:
|
||||
modifier = values[0]
|
||||
if isinstance(values[0], unicode):
|
||||
modifier = values[0].encode('ascii')
|
||||
else:
|
||||
modifier = values[0]
|
||||
|
||||
if modifier in ('a', 'b', 'c', 'rc'):
|
||||
ver_components.append('%s%s' % (modifier,
|
||||
'.'.join(itertools.imap(str, values[1:])) or '0'))
|
|
@ -9,6 +9,8 @@ from kitchen.text.converters import to_bytes
|
|||
from kitchen.text import misc
|
||||
|
||||
class UnicodeTestData(object):
|
||||
u_empty_string = u''
|
||||
b_empty_string = ''
|
||||
# This should encode fine -- sanity check
|
||||
u_ascii = u'the quick brown fox jumped over the lazy dog'
|
||||
b_ascii = 'the quick brown fox jumped over the lazy dog'
|
||||
|
@ -16,7 +18,7 @@ class UnicodeTestData(object):
|
|||
# First challenge -- what happens with latin-1 characters
|
||||
u_spanish = u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||
# utf8 and latin1 both support these chars so no mangling
|
||||
utf8_spanish = u_spanish.encode('utf8')
|
||||
utf8_spanish = u_spanish.encode('utf-8')
|
||||
latin1_spanish = u_spanish.encode('latin1')
|
||||
|
||||
# ASCII does not have the accented characters so it mangles
|
||||
|
@ -62,7 +64,8 @@ class UnicodeTestData(object):
|
|||
u_entity_escape = u'Test: <"&"> – ' + unicode(u_japanese.encode('ascii', 'xmlcharrefreplace'), 'ascii') + u'é'
|
||||
utf8_entity_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
||||
utf8_attrib_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
||||
ascii_entity_escape = (u'Test: <"&"> – ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace').replace('&', '&',1).replace('<', '<').replace('>', '>')
|
||||
ascii_entity_escape = ('Test: <"&"> '.replace('&', '&',1).replace('<', '<').replace('>', '>')) + (u'– ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
|
||||
ascii_attrib_escape = ('Test: <"&"> '.replace('&', '&',1).replace('<', '<').replace('>', '>').replace('"', '"')) + (u'– ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
|
||||
|
||||
b_byte_chars = ' '.join(map(chr, range(0, 256)))
|
||||
b_byte_encoded = 'ACABIAIgAyAEIAUgBiAHIAggCSAKIAsgDCANIA4gDyAQIBEgEiATIBQgFSAWIBcgGCAZIBogGyAcIB0gHiAfICAgISAiICMgJCAlICYgJyAoICkgKiArICwgLSAuIC8gMCAxIDIgMyA0IDUgNiA3IDggOSA6IDsgPCA9ID4gPyBAIEEgQiBDIEQgRSBGIEcgSCBJIEogSyBMIE0gTiBPIFAgUSBSIFMgVCBVIFYgVyBYIFkgWiBbIFwgXSBeIF8gYCBhIGIgYyBkIGUgZiBnIGggaSBqIGsgbCBtIG4gbyBwIHEgciBzIHQgdSB2IHcgeCB5IHogeyB8IH0gfiB/IIAggSCCIIMghCCFIIYghyCIIIkgiiCLIIwgjSCOII8gkCCRIJIgkyCUIJUgliCXIJggmSCaIJsgnCCdIJ4gnyCgIKEgoiCjIKQgpSCmIKcgqCCpIKogqyCsIK0griCvILAgsSCyILMgtCC1ILYgtyC4ILkguiC7ILwgvSC+IL8gwCDBIMIgwyDEIMUgxiDHIMggySDKIMsgzCDNIM4gzyDQINEg0iDTINQg1SDWINcg2CDZINog2yDcIN0g3iDfIOAg4SDiIOMg5CDlIOYg5yDoIOkg6iDrIOwg7SDuIO8g8CDxIPIg8yD0IPUg9iD3IPgg+SD6IPsg/CD9IP4g/w=='
|
||||
|
@ -127,3 +130,48 @@ u' * A powerful unrepr mode for storing basic datatypes']
|
|||
u_ascii_no_ctrl = u''.join([c for c in u_ascii_chars if ord(c) not in misc._CONTROL_CODES])
|
||||
u_ascii_ctrl_replace = u_ascii_chars.translate(dict([(c, u'?') for c in misc._CONTROL_CODES]))
|
||||
utf8_ascii_chars = u_ascii_chars.encode('utf8')
|
||||
|
||||
# These are present in the test catalog as msgids or values
|
||||
u_lemon = u'1 lemon'
|
||||
utf8_lemon = u_lemon.encode('utf-8')
|
||||
latin1_lemon = u_lemon.encode('latin-1')
|
||||
|
||||
u_lemons = u'4 lemons'
|
||||
utf8_lemons = u_lemons.encode('utf-8')
|
||||
latin1_lemons = u_lemons.encode('latin-1')
|
||||
|
||||
u_limao = u'一 limão'
|
||||
utf8_limao = u_limao.encode('utf-8')
|
||||
latin1_limao = u_limao.encode('latin-1', 'replace')
|
||||
|
||||
u_limoes = u'四 limões'
|
||||
utf8_limoes = u_limoes.encode('utf-8')
|
||||
latin1_limoes = u_limoes.encode('latin-1', 'replace')
|
||||
|
||||
u_not_in_catalog = u'café not matched in catalogs'
|
||||
utf8_not_in_catalog = u_not_in_catalog.encode('utf-8')
|
||||
latin1_not_in_catalog = u_not_in_catalog.encode('latin-1')
|
||||
|
||||
u_kitchen = u'kitchen sink'
|
||||
utf8_kitchen = u_kitchen.encode('utf-8')
|
||||
latin1_kitchen = u_kitchen.encode('latin-1')
|
||||
|
||||
u_pt_kitchen = u'pia da cozinha'
|
||||
utf8_pt_kitchen = u_pt_kitchen.encode('utf-8')
|
||||
latin1_pt_kitchen = u_pt_kitchen.encode('latin-1')
|
||||
|
||||
u_kuratomi = u'Kuratomi'
|
||||
utf8_kuratomi = u_kuratomi.encode('utf-8')
|
||||
latin1_kuratomi = u_kuratomi.encode('latin-1')
|
||||
|
||||
u_ja_kuratomi = u'くらとみ'
|
||||
utf8_ja_kuratomi = u_ja_kuratomi.encode('utf-8')
|
||||
latin1_ja_kuratomi = u_ja_kuratomi.encode('latin-1', 'replace')
|
||||
|
||||
u_in_fallback = u'Only café in fallback'
|
||||
utf8_in_fallback = u_in_fallback.encode('utf-8')
|
||||
latin1_in_fallback = u_in_fallback.encode('latin-1')
|
||||
|
||||
u_yes_in_fallback = u'Yes, only café in fallback'
|
||||
utf8_yes_in_fallback = u_yes_in_fallback.encode('utf-8')
|
||||
latin1_yes_in_fallback = u_yes_in_fallback.encode('latin-1')
|
|
@ -29,6 +29,9 @@ class Test__all__(object):
|
|||
('kitchen', 'i18n', 'to_unicode'),
|
||||
('kitchen', 'i18n', 'ENOENT'),
|
||||
('kitchen', 'i18n', 'byte_string_valid_encoding'),
|
||||
('kitchen', 'i18n', 'isbasestring'),
|
||||
('kitchen', 'i18n', 'partial'),
|
||||
('kitchen', 'iterutils', 'isbasestring'),
|
||||
('kitchen', 'iterutils', 'version_tuple_to_string'),
|
||||
('kitchen', 'pycompat24', 'version_tuple_to_string'),
|
||||
('kitchen', 'pycompat25', 'version_tuple_to_string'),
|
||||
|
@ -44,6 +47,8 @@ class Test__all__(object):
|
|||
('kitchen.text', 'converters', 'ControlCharError'),
|
||||
('kitchen.text', 'converters', 'guess_encoding'),
|
||||
('kitchen.text', 'converters', 'html_entities_unescape'),
|
||||
('kitchen.text', 'converters', 'isbytestring'),
|
||||
('kitchen.text', 'converters', 'isunicodestring'),
|
||||
('kitchen.text', 'converters', 'process_control_chars'),
|
||||
('kitchen.text', 'converters', 'XmlEncodeError'),
|
||||
('kitchen.text', 'misc', 'b_'),
|
||||
|
@ -57,6 +62,7 @@ class Test__all__(object):
|
|||
('kitchen.text', 'utf8', 'byte_string_textual_width_fill'),
|
||||
('kitchen.text', 'utf8', 'byte_string_valid_encoding'),
|
||||
('kitchen.text', 'utf8', 'fill'),
|
||||
('kitchen.text', 'utf8', 'isunicodestring'),
|
||||
('kitchen.text', 'utf8', 'textual_width'),
|
||||
('kitchen.text', 'utf8', 'textual_width_chop'),
|
||||
('kitchen.text', 'utf8', 'to_bytes'),
|
|
@ -1,5 +1,4 @@
|
|||
import unittest
|
||||
from test import test_support
|
||||
from kitchen.pycompat24.base64 import _base64 as base64
|
||||
|
||||
|
||||
|
@ -183,6 +182,7 @@ class BaseXYTestCase(unittest.TestCase):
|
|||
|
||||
|
||||
|
||||
#from test import test_support
|
||||
#def test_main():
|
||||
# test_support.run_unittest(__name__)
|
||||
#
|
|
@ -13,16 +13,16 @@ def test_strict_dict_get_set():
|
|||
d = collections.StrictDict()
|
||||
d[u'a'] = 1
|
||||
d['a'] = 2
|
||||
tools.ok_(d[u'a'] != d['a'])
|
||||
tools.ok_(len(d) == 2)
|
||||
tools.assert_not_equal(d[u'a'], d['a'])
|
||||
tools.eq_(len(d), 2)
|
||||
|
||||
d[u'\xf1'] = 1
|
||||
d['\xf1'] = 2
|
||||
d[u'\xf1'.encode('utf8')] = 3
|
||||
tools.ok_(d[u'\xf1'] == 1)
|
||||
tools.ok_(d['\xf1'] == 2)
|
||||
tools.ok_(d[u'\xf1'.encode('utf8')] == 3)
|
||||
tools.ok_(len(d) == 5)
|
||||
d[u'\xf1'.encode('utf-8')] = 3
|
||||
tools.eq_(d[u'\xf1'], 1)
|
||||
tools.eq_(d['\xf1'], 2)
|
||||
tools.eq_(d[u'\xf1'.encode('utf-8')], 3)
|
||||
tools.eq_(len(d), 5)
|
||||
|
||||
class TestStrictDict(unittest.TestCase):
|
||||
def setUp(self):
|
||||
|
@ -32,15 +32,14 @@ class TestStrictDict(unittest.TestCase):
|
|||
self.d[u'\xf1'] = 1
|
||||
self.d['\xf1'] = 2
|
||||
self.d[u'\xf1'.encode('utf8')] = 3
|
||||
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf8')]
|
||||
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf-8')]
|
||||
|
||||
def tearDown(self):
|
||||
del(self.d)
|
||||
|
||||
def _compare_lists(self, list1, list2, debug=False):
|
||||
'''We have a mixture of bytes and unicode and need python2.3 compat
|
||||
|
||||
So we have to compare these lists manually and inefficiently
|
||||
'''We have a mixture of bytes and unicode so we have to compare these
|
||||
lists manually and inefficiently
|
||||
'''
|
||||
def _compare_lists_helper(compare_to, dupes, idx, length):
|
||||
if i not in compare_to:
|
||||
|
@ -57,11 +56,11 @@ class TestStrictDict(unittest.TestCase):
|
|||
|
||||
list1_u = [l for l in list1 if isinstance(l, unicode)]
|
||||
list1_b = [l for l in list1 if isinstance(l, str)]
|
||||
list1_o = [l for l in list1 if not (isinstance(l, unicode) or isinstance(l, str))]
|
||||
list1_o = [l for l in list1 if not (isinstance(l, (unicode, bytes)))]
|
||||
|
||||
list2_u = [l for l in list2 if isinstance(l, unicode)]
|
||||
list2_b = [l for l in list2 if isinstance(l, str)]
|
||||
list2_o = [l for l in list2 if not (isinstance(l, unicode) or isinstance(l, str))]
|
||||
list2_o = [l for l in list2 if not (isinstance(l, (unicode, bytes)))]
|
||||
|
||||
for i in list1:
|
||||
if isinstance(i, unicode):
|
||||
|
@ -109,34 +108,38 @@ class TestStrictDict(unittest.TestCase):
|
|||
|
||||
def test_strict_dict_len(self):
|
||||
'''StrictDict len'''
|
||||
tools.ok_(len(self.d) == 5)
|
||||
tools.eq_(len(self.d), 5)
|
||||
|
||||
def test_strict_dict_del(self):
|
||||
'''StrictDict del'''
|
||||
tools.ok_(len(self.d) == 5)
|
||||
tools.eq_(len(self.d), 5)
|
||||
del(self.d[u'\xf1'])
|
||||
tools.assert_raises(KeyError, self.d.__getitem__, u'\xf1')
|
||||
tools.ok_(len(self.d) == 4)
|
||||
tools.eq_(len(self.d), 4)
|
||||
|
||||
def test_strict_dict_iter(self):
|
||||
'''StrictDict iteration'''
|
||||
keys = []
|
||||
for k in self.d:
|
||||
keys.append(k)
|
||||
tools.ok_(self._compare_lists(keys, self.keys))
|
||||
tools.ok_(self._compare_lists(keys, self.keys),
|
||||
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||
|
||||
keys = []
|
||||
for k in self.d.iterkeys():
|
||||
keys.append(k)
|
||||
tools.ok_(self._compare_lists(keys, self.keys))
|
||||
tools.ok_(self._compare_lists(keys, self.keys),
|
||||
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||
|
||||
keys = [k for k in self.d]
|
||||
tools.ok_(self._compare_lists(keys, self.keys))
|
||||
tools.ok_(self._compare_lists(keys, self.keys),
|
||||
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||
|
||||
keys = []
|
||||
for k in self.d.keys():
|
||||
keys.append(k)
|
||||
tools.ok_(self._compare_lists(keys, self.keys))
|
||||
tools.ok_(self._compare_lists(keys, self.keys),
|
||||
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||
|
||||
def test_strict_dict_contains(self):
|
||||
'''StrictDict contains function'''
|
415
kitchen2/tests/test_converters.py
Normal file
415
kitchen2/tests/test_converters.py
Normal file
|
@ -0,0 +1,415 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
|
||||
import unittest
|
||||
from nose import tools
|
||||
from nose.plugins.skip import SkipTest
|
||||
|
||||
import sys
|
||||
import StringIO
|
||||
import warnings
|
||||
|
||||
try:
|
||||
import chardet
|
||||
except:
|
||||
chardet = None
|
||||
|
||||
from kitchen.text import converters
|
||||
from kitchen.text.exceptions import XmlEncodeError
|
||||
|
||||
import base_classes
|
||||
|
||||
class UnicodeNoStr(object):
|
||||
def __unicode__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||
|
||||
class StrNoUnicode(object):
|
||||
def __str__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||
|
||||
class StrReturnsUnicode(object):
|
||||
def __str__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||
|
||||
class UnicodeReturnsStr(object):
|
||||
def __unicode__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||
|
||||
class UnicodeStrCrossed(object):
|
||||
def __unicode__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||
|
||||
def __str__(self):
|
||||
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||
|
||||
class ReprUnicode(object):
|
||||
def __repr__(self):
|
||||
return u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'
|
||||
|
||||
class TestConverters(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def test_to_unicode(self):
|
||||
'''Test to_unicode when the user gives good values'''
|
||||
tools.eq_(converters.to_unicode(self.u_japanese, encoding='latin1'), self.u_japanese)
|
||||
|
||||
tools.eq_(converters.to_unicode(self.utf8_spanish), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(self.utf8_japanese), self.u_japanese)
|
||||
|
||||
tools.eq_(converters.to_unicode(self.latin1_spanish, encoding='latin1'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(self.euc_jp_japanese, encoding='euc_jp'), self.u_japanese)
|
||||
|
||||
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'foo'})
|
||||
|
||||
def test_to_unicode_errors(self):
|
||||
tools.eq_(converters.to_unicode(self.latin1_spanish), self.u_mangled_spanish_latin1_as_utf8)
|
||||
tools.eq_(converters.to_unicode(self.latin1_spanish, errors='ignore'), self.u_spanish_ignore)
|
||||
tools.assert_raises(UnicodeDecodeError, converters.to_unicode,
|
||||
*[self.latin1_spanish], **{'errors': 'strict'})
|
||||
|
||||
def test_to_unicode_nonstring(self):
|
||||
tools.eq_(converters.to_unicode(5), u'5')
|
||||
tools.eq_(converters.to_unicode(5, nonstring='empty'), u'')
|
||||
tools.eq_(converters.to_unicode(5, nonstring='passthru'), 5)
|
||||
tools.eq_(converters.to_unicode(5, nonstring='simplerepr'), u'5')
|
||||
tools.eq_(converters.to_unicode(5, nonstring='repr'), u'5')
|
||||
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'strict'})
|
||||
|
||||
obj_repr = converters.to_unicode(object, nonstring='simplerepr')
|
||||
tools.eq_(obj_repr, u"<type 'object'>")
|
||||
tools.assert_true(isinstance(obj_repr, unicode))
|
||||
|
||||
def test_to_unicode_nonstring_with_objects_that_have__unicode__and__str__(self):
|
||||
'''Test that to_unicode handles objects that have __unicode__ and __str__ methods'''
|
||||
if sys.version_info < (3, 0):
|
||||
# None of these apply on python3 because python3 does not use __unicode__
|
||||
# and it enforces __str__ returning str
|
||||
tools.eq_(converters.to_unicode(UnicodeNoStr(), nonstring='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(StrNoUnicode(), nonstring='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), nonstring='simplerepr'), self.u_spanish)
|
||||
|
||||
tools.eq_(converters.to_unicode(StrReturnsUnicode(), nonstring='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), nonstring='simplerepr'), self.u_spanish)
|
||||
|
||||
def test_to_bytes(self):
|
||||
'''Test to_bytes when the user gives good values'''
|
||||
tools.eq_(converters.to_bytes(self.utf8_japanese, encoding='latin1'), self.utf8_japanese)
|
||||
|
||||
tools.eq_(converters.to_bytes(self.u_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.to_bytes(self.u_japanese), self.utf8_japanese)
|
||||
|
||||
tools.eq_(converters.to_bytes(self.u_spanish, encoding='latin1'), self.latin1_spanish)
|
||||
tools.eq_(converters.to_bytes(self.u_japanese, encoding='euc_jp'), self.euc_jp_japanese)
|
||||
|
||||
def test_to_bytes_errors(self):
|
||||
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin1'),
|
||||
self.latin1_mixed_replace)
|
||||
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin',
|
||||
errors='ignore'), self.latin1_mixed_ignore)
|
||||
tools.assert_raises(UnicodeEncodeError, converters.to_bytes,
|
||||
*[self.u_mixed], **{'errors': 'strict', 'encoding': 'latin1'})
|
||||
|
||||
def _check_repr_bytes(self, repr_string, obj_name):
|
||||
tools.assert_true(isinstance(repr_string, str))
|
||||
match = self.repr_re.match(repr_string)
|
||||
tools.assert_not_equal(match, None)
|
||||
tools.eq_(match.groups()[0], obj_name)
|
||||
|
||||
def test_to_bytes_nonstring(self):
|
||||
tools.eq_(converters.to_bytes(5), '5')
|
||||
tools.eq_(converters.to_bytes(5, nonstring='empty'), '')
|
||||
tools.eq_(converters.to_bytes(5, nonstring='passthru'), 5)
|
||||
tools.eq_(converters.to_bytes(5, nonstring='simplerepr'), '5')
|
||||
tools.eq_(converters.to_bytes(5, nonstring='repr'), '5')
|
||||
|
||||
# Raise a TypeError if the msg is nonstring and we're set to strict
|
||||
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'strict'})
|
||||
# Raise a TypeError if given an invalid nonstring arg
|
||||
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'INVALID'})
|
||||
|
||||
obj_repr = converters.to_bytes(object, nonstring='simplerepr')
|
||||
tools.eq_(obj_repr, "<type 'object'>")
|
||||
tools.assert_true(isinstance(obj_repr, str))
|
||||
|
||||
def test_to_bytes_nonstring_with_objects_that_have__unicode__and__str__(self):
|
||||
if sys.version_info < (3, 0):
|
||||
# This object's _str__ returns a utf8 encoded object
|
||||
tools.eq_(converters.to_bytes(StrNoUnicode(), nonstring='simplerepr'), self.utf8_spanish)
|
||||
# No __str__ method so this returns repr
|
||||
string = converters.to_bytes(UnicodeNoStr(), nonstring='simplerepr')
|
||||
self._check_repr_bytes(string, 'UnicodeNoStr')
|
||||
|
||||
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(StrReturnsUnicode(), nonstring='simplerepr'), self.utf8_spanish)
|
||||
# Unless we explicitly ask for something different
|
||||
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
|
||||
nonstring='simplerepr', encoding='latin1'), self.latin1_spanish)
|
||||
|
||||
# This object has no __str__ so it returns repr
|
||||
string = converters.to_bytes(UnicodeReturnsStr(), nonstring='simplerepr')
|
||||
self._check_repr_bytes(string, 'UnicodeReturnsStr')
|
||||
|
||||
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), nonstring='simplerepr'), self.utf8_spanish)
|
||||
|
||||
# This object's __repr__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='simplerepr'),
|
||||
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='repr'),
|
||||
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||
|
||||
def test_unicode_to_xml(self):
|
||||
tools.eq_(converters.unicode_to_xml(None), '')
|
||||
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml, *['byte string'])
|
||||
tools.assert_raises(ValueError, converters.unicode_to_xml, *[u'string'], **{'control_chars': 'foo'})
|
||||
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml,
|
||||
*[u'string\u0002'], **{'control_chars': 'strict'})
|
||||
tools.eq_(converters.unicode_to_xml(self.u_entity), self.utf8_entity_escape)
|
||||
tools.eq_(converters.unicode_to_xml(self.u_entity, attrib=True), self.utf8_attrib_escape)
|
||||
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii'), self.ascii_entity_escape)
|
||||
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii', attrib=True), self.ascii_attrib_escape)
|
||||
|
||||
def test_xml_to_unicode(self):
|
||||
tools.eq_(converters.xml_to_unicode(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity)
|
||||
tools.eq_(converters.xml_to_unicode(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity)
|
||||
tools.eq_(converters.xml_to_unicode(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity)
|
||||
tools.eq_(converters.xml_to_unicode(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity)
|
||||
|
||||
def test_xml_to_byte_string(self):
|
||||
tools.eq_(converters.xml_to_byte_string(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
|
||||
|
||||
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
|
||||
output_encoding='euc_jp', errors='replace'),
|
||||
self.u_entity.encode('euc_jp', 'replace'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
|
||||
output_encoding='latin1', errors='replace'),
|
||||
self.u_entity.encode('latin1', 'replace'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
|
||||
output_encoding='euc_jp', errors='replace'),
|
||||
self.u_entity.encode('euc_jp', 'replace'))
|
||||
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
|
||||
output_encoding='latin1', errors='replace'),
|
||||
self.u_entity.encode('latin1', 'replace'))
|
||||
|
||||
def test_byte_string_to_xml(self):
|
||||
tools.assert_raises(XmlEncodeError, converters.byte_string_to_xml, *[u'test'])
|
||||
tools.eq_(converters.byte_string_to_xml(self.utf8_entity), self.utf8_entity_escape)
|
||||
tools.eq_(converters.byte_string_to_xml(self.utf8_entity, attrib=True), self.utf8_attrib_escape)
|
||||
|
||||
def test_bytes_to_xml(self):
|
||||
tools.eq_(converters.bytes_to_xml(self.b_byte_chars), self.b_byte_encoded)
|
||||
|
||||
def test_xml_to_bytes(self):
|
||||
tools.eq_(converters.xml_to_bytes(self.b_byte_encoded), self.b_byte_chars)
|
||||
|
||||
def test_guess_encoding_to_xml(self):
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.u_entity), self.utf8_entity_escape)
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.utf8_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.latin1_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.utf8_japanese), self.utf8_japanese)
|
||||
|
||||
def test_guess_encoding_to_xml_euc_japanese(self):
|
||||
if chardet:
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
|
||||
self.utf8_japanese)
|
||||
else:
|
||||
raise SkipTest('chardet not installed, euc_japanese won\'t be detected')
|
||||
|
||||
def test_guess_encoding_to_xml_euc_japanese_mangled(self):
|
||||
if chardet:
|
||||
raise SkipTest('chardet installed, euc_japanese won\'t be mangled')
|
||||
else:
|
||||
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
|
||||
self.utf8_mangled_euc_jp_as_latin1)
|
||||
|
||||
class TestGetWriter(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.io = StringIO.StringIO()
|
||||
|
||||
def test_utf8_writer(self):
|
||||
writer = converters.getwriter('utf-8')
|
||||
io = writer(self.io)
|
||||
io.write(self.u_japanese + u'\n')
|
||||
io.seek(0)
|
||||
result = io.read().strip()
|
||||
tools.eq_(result, self.utf8_japanese)
|
||||
|
||||
io.seek(0)
|
||||
io.truncate(0)
|
||||
io.write(self.euc_jp_japanese + '\n')
|
||||
io.seek(0)
|
||||
result = io.read().strip()
|
||||
tools.eq_(result, self.euc_jp_japanese)
|
||||
|
||||
io.seek(0)
|
||||
io.truncate(0)
|
||||
io.write(self.utf8_japanese + '\n')
|
||||
io.seek(0)
|
||||
result = io.read().strip()
|
||||
tools.eq_(result, self.utf8_japanese)
|
||||
|
||||
def test_error_handlers(self):
|
||||
'''Test setting alternate error handlers'''
|
||||
writer = converters.getwriter('latin1')
|
||||
io = writer(self.io, errors='strict')
|
||||
tools.assert_raises(UnicodeEncodeError, io.write, self.u_japanese)
|
||||
|
||||
|
||||
class TestExceptionConverters(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.exceptions = {}
|
||||
tests = {'u_jpn': self.u_japanese,
|
||||
'u_spanish': self.u_spanish,
|
||||
'utf8_jpn': self.utf8_japanese,
|
||||
'utf8_spanish': self.utf8_spanish,
|
||||
'euc_jpn': self.euc_jp_japanese,
|
||||
'latin1_spanish': self.latin1_spanish}
|
||||
for test in tests.iteritems():
|
||||
try:
|
||||
raise Exception(test[1])
|
||||
except Exception, self.exceptions[test[0]]:
|
||||
pass
|
||||
|
||||
def test_exception_to_unicode_with_unicode(self):
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['u_jpn']), self.u_japanese)
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['u_spanish']), self.u_spanish)
|
||||
|
||||
def test_exception_to_unicode_with_bytes(self):
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_jpn']), self.u_japanese)
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_spanish']), self.u_spanish)
|
||||
# Mangled latin1/utf8 conversion but no tracebacks
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish']), self.u_mangled_spanish_latin1_as_utf8)
|
||||
# Mangled euc_jp/utf8 conversion but no tracebacks
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn']), self.u_mangled_euc_jp_as_utf8)
|
||||
|
||||
def test_exception_to_unicode_custom(self):
|
||||
# If given custom functions, then we should not mangle
|
||||
c = [lambda e: converters.to_unicode(e.args[0], encoding='euc_jp'),
|
||||
lambda e: converters.to_unicode(e, encoding='euc_jp')]
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
|
||||
converters=c), self.u_japanese)
|
||||
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
|
||||
converters=c), self.u_japanese)
|
||||
|
||||
c = [lambda e: converters.to_unicode(e.args[0], encoding='latin1'),
|
||||
lambda e: converters.to_unicode(e, encoding='latin1')]
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
|
||||
converters=c), self.u_spanish)
|
||||
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
|
||||
converters=c), self.u_spanish)
|
||||
|
||||
def test_exception_to_bytes_with_unicode(self):
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['u_jpn']), self.utf8_japanese)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['u_spanish']), self.utf8_spanish)
|
||||
|
||||
def test_exception_to_bytes_with_bytes(self):
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_jpn']), self.utf8_japanese)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_spanish']), self.utf8_spanish)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish']), self.latin1_spanish)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn']), self.euc_jp_japanese)
|
||||
|
||||
def test_exception_to_bytes_custom(self):
|
||||
# If given custom functions, then we should not mangle
|
||||
c = [lambda e: converters.to_bytes(e.args[0], encoding='euc_jp'),
|
||||
lambda e: converters.to_bytes(e, encoding='euc_jp')]
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
|
||||
converters=c), self.euc_jp_japanese)
|
||||
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
|
||||
converters=c), self.euc_jp_japanese)
|
||||
|
||||
c = [lambda e: converters.to_bytes(e.args[0], encoding='latin1'),
|
||||
lambda e: converters.to_bytes(e, encoding='latin1')]
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
|
||||
converters=c), self.latin1_spanish)
|
||||
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
|
||||
converters=c), self.latin1_spanish)
|
||||
|
||||
|
||||
class TestDeprecatedConverters(TestConverters):
|
||||
def setUp(self):
|
||||
warnings.simplefilter('ignore', DeprecationWarning)
|
||||
|
||||
def tearDown(self):
|
||||
warnings.simplefilter('default', DeprecationWarning)
|
||||
|
||||
def test_to_xml(self):
|
||||
tools.eq_(converters.to_xml(self.u_entity), self.utf8_entity_escape)
|
||||
tools.eq_(converters.to_xml(self.utf8_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.to_xml(self.latin1_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.to_xml(self.utf8_japanese), self.utf8_japanese)
|
||||
|
||||
def test_to_utf8(self):
|
||||
tools.eq_(converters.to_utf8(self.u_japanese), self.utf8_japanese)
|
||||
tools.eq_(converters.to_utf8(self.utf8_spanish), self.utf8_spanish)
|
||||
|
||||
def test_to_str(self):
|
||||
tools.eq_(converters.to_str(self.u_japanese), self.utf8_japanese)
|
||||
tools.eq_(converters.to_str(self.utf8_spanish), self.utf8_spanish)
|
||||
tools.eq_(converters.to_str(object), "<type 'object'>")
|
||||
|
||||
def test_non_string(self):
|
||||
'''Test deprecated non_string parameter'''
|
||||
# unicode
|
||||
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'foo'})
|
||||
tools.eq_(converters.to_unicode(5, non_string='empty'), u'')
|
||||
tools.eq_(converters.to_unicode(5, non_string='passthru'), 5)
|
||||
tools.eq_(converters.to_unicode(5, non_string='simplerepr'), u'5')
|
||||
tools.eq_(converters.to_unicode(5, non_string='repr'), u'5')
|
||||
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'strict'})
|
||||
|
||||
tools.eq_(converters.to_unicode(UnicodeNoStr(), non_string='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(StrNoUnicode(), non_string='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(StrReturnsUnicode(), non_string='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), non_string='simplerepr'), self.u_spanish)
|
||||
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), non_string='simplerepr'), self.u_spanish)
|
||||
|
||||
obj_repr = converters.to_unicode(object, non_string='simplerepr')
|
||||
tools.eq_(obj_repr, u"<type 'object'>")
|
||||
tools.assert_true(isinstance(obj_repr, unicode))
|
||||
|
||||
# Bytes
|
||||
tools.eq_(converters.to_bytes(5), '5')
|
||||
tools.eq_(converters.to_bytes(5, non_string='empty'), '')
|
||||
tools.eq_(converters.to_bytes(5, non_string='passthru'), 5)
|
||||
tools.eq_(converters.to_bytes(5, non_string='simplerepr'), '5')
|
||||
tools.eq_(converters.to_bytes(5, non_string='repr'), '5')
|
||||
|
||||
# Raise a TypeError if the msg is non_string and we're set to strict
|
||||
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'strict'})
|
||||
# Raise a TypeError if given an invalid non_string arg
|
||||
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'INVALID'})
|
||||
|
||||
# No __str__ method so this returns repr
|
||||
string = converters.to_bytes(UnicodeNoStr(), non_string='simplerepr')
|
||||
self._check_repr_bytes(string, 'UnicodeNoStr')
|
||||
|
||||
# This object's _str__ returns a utf8 encoded object
|
||||
tools.eq_(converters.to_bytes(StrNoUnicode(), non_string='simplerepr'), self.utf8_spanish)
|
||||
|
||||
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(StrReturnsUnicode(), non_string='simplerepr'), self.utf8_spanish)
|
||||
# Unless we explicitly ask for something different
|
||||
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
|
||||
non_string='simplerepr', encoding='latin1'), self.latin1_spanish)
|
||||
|
||||
# This object has no __str__ so it returns repr
|
||||
string = converters.to_bytes(UnicodeReturnsStr(), non_string='simplerepr')
|
||||
self._check_repr_bytes(string, 'UnicodeReturnsStr')
|
||||
|
||||
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), non_string='simplerepr'), self.utf8_spanish)
|
||||
|
||||
# This object's __repr__ returns unicode which to_bytes converts to utf8
|
||||
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='simplerepr'),
|
||||
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='repr'),
|
||||
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||
|
||||
obj_repr = converters.to_bytes(object, non_string='simplerepr')
|
||||
tools.eq_(obj_repr, "<type 'object'>")
|
||||
tools.assert_true(isinstance(obj_repr, str))
|
|
@ -4,7 +4,6 @@ import os
|
|||
import copy
|
||||
import tempfile
|
||||
import unittest
|
||||
from test import test_support
|
||||
|
||||
from kitchen.pycompat25.collections._defaultdict import defaultdict
|
||||
|
||||
|
@ -173,6 +172,7 @@ class TestDefaultDict(unittest.TestCase):
|
|||
os.remove(tfn)
|
||||
|
||||
|
||||
#from test import test_support
|
||||
#def test_main():
|
||||
# test_support.run_unittest(TestDefaultDict)
|
||||
#
|
|
@ -5,21 +5,19 @@ from nose import tools
|
|||
|
||||
import sys
|
||||
import warnings
|
||||
from kitchen import i18n
|
||||
from kitchen.text import converters
|
||||
from kitchen.text import utf8
|
||||
|
||||
class TestDeprecated(unittest.TestCase):
|
||||
def setUp(self):
|
||||
registry = sys._getframe(2).f_globals.get('__warningregistry__')
|
||||
if registry:
|
||||
registry.clear()
|
||||
registry = sys._getframe(1).f_globals.get('__warningregistry__')
|
||||
if registry:
|
||||
registry.clear()
|
||||
for module in sys.modules.values():
|
||||
if hasattr(module, '__warningregistry__'):
|
||||
del module.__warningregistry__
|
||||
warnings.simplefilter('error', DeprecationWarning)
|
||||
|
||||
def tearDown(self):
|
||||
warnings.simplefilter('default', DeprecationWarning)
|
||||
warnings.simplefilter('ignore', DeprecationWarning)
|
||||
|
||||
def test_deprecated_functions(self):
|
||||
'''Test that all deprecated functions raise DeprecationWarning'''
|
||||
|
@ -45,3 +43,23 @@ class TestDeprecated(unittest.TestCase):
|
|||
**{'non_string': 'simplerepr'})
|
||||
tools.assert_raises(DeprecationWarning, converters.to_bytes, *[5],
|
||||
**{'nonstring': 'simplerepr', 'non_string': 'simplerepr'})
|
||||
|
||||
|
||||
class TestPendingDeprecationParameters(unittest.TestCase):
|
||||
def setUp(self):
|
||||
for module in sys.modules.values():
|
||||
if hasattr(module, '__warningregistry__'):
|
||||
del module.__warningregistry__
|
||||
warnings.simplefilter('error', PendingDeprecationWarning)
|
||||
|
||||
def tearDown(self):
|
||||
warnings.simplefilter('ignore', PendingDeprecationWarning)
|
||||
|
||||
def test_parameters(self):
|
||||
# test that we warn when using the python2_api parameters
|
||||
tools.assert_raises(PendingDeprecationWarning,
|
||||
i18n.get_translation_object, 'test', **{'python2_api': True})
|
||||
tools.assert_raises(PendingDeprecationWarning,
|
||||
i18n.DummyTranslations, **{'python2_api': True})
|
||||
|
||||
|
820
kitchen2/tests/test_i18n.py
Normal file
820
kitchen2/tests/test_i18n.py
Normal file
|
@ -0,0 +1,820 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
import unittest
|
||||
from nose import tools
|
||||
|
||||
import os
|
||||
import types
|
||||
|
||||
from kitchen import i18n
|
||||
|
||||
import base_classes
|
||||
|
||||
class TestI18N_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_easy_gettext_setup(self):
|
||||
'''Test that the easy_gettext_setup function works
|
||||
'''
|
||||
_, N_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||
['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
tools.assert_true(isinstance(_, types.MethodType))
|
||||
tools.assert_true(isinstance(N_, types.MethodType))
|
||||
tools.eq_(_.__name__, '_ugettext')
|
||||
tools.eq_(N_.__name__, '_ungettext')
|
||||
|
||||
tools.eq_(_(self.utf8_spanish), self.u_spanish)
|
||||
tools.eq_(_(self.u_spanish), self.u_spanish)
|
||||
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 1), self.u_limao)
|
||||
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 2), self.u_limoes)
|
||||
tools.eq_(N_(self.u_limao, self.u_limoes, 1), self.u_limao)
|
||||
tools.eq_(N_(self.u_limao, self.u_limoes, 2), self.u_limoes)
|
||||
|
||||
def test_easy_gettext_setup_non_unicode(self):
|
||||
'''Test that the easy_gettext_setup function works
|
||||
'''
|
||||
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||
['%s/data/locale/' % os.path.dirname(__file__)],
|
||||
use_unicode=False)
|
||||
tools.assert_true(isinstance(b_, types.MethodType))
|
||||
tools.assert_true(isinstance(bN_, types.MethodType))
|
||||
tools.eq_(b_.__name__, '_lgettext')
|
||||
tools.eq_(bN_.__name__, '_lngettext')
|
||||
|
||||
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
|
||||
tools.eq_(b_(self.u_spanish), self.utf8_spanish)
|
||||
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
|
||||
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
|
||||
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.utf8_limao)
|
||||
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.utf8_limoes)
|
||||
|
||||
def test_get_translation_object(self):
|
||||
'''Test that the get_translation_object function works
|
||||
'''
|
||||
translations = i18n.get_translation_object('foo', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
tools.eq_(translations.__class__, i18n.DummyTranslations)
|
||||
tools.assert_raises(IOError, i18n.get_translation_object, 'foo', ['%s/data/locale/' % os.path.dirname(__file__)], fallback=False)
|
||||
|
||||
translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
|
||||
|
||||
def test_get_translation_object_create_fallback(self):
|
||||
'''Test get_translation_object creates fallbacks for additional catalogs'''
|
||||
translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
|
||||
tools.eq_(translations._fallback.__class__, i18n.NewGNUTranslations)
|
||||
|
||||
def test_get_translation_object_copy(self):
|
||||
'''Test get_translation_object shallow copies the message catalog'''
|
||||
translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
|
||||
translations.input_charset = 'utf-8'
|
||||
translations2 = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='latin-1')
|
||||
translations2.input_charset = 'latin-1'
|
||||
|
||||
# Test that portions of the translation objects are the same and other
|
||||
# portions are different (which is a space optimization so that the
|
||||
# translation data isn't in memory multiple times)
|
||||
tools.assert_not_equal(id(translations._fallback), id(translations2._fallback))
|
||||
tools.assert_not_equal(id(translations.output_charset()), id(translations2.output_charset()))
|
||||
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
|
||||
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
|
||||
tools.eq_(id(translations._catalog), id(translations2._catalog))
|
||||
|
||||
def test_get_translation_object_optional_params(self):
|
||||
'''Smoketest leaving out optional parameters'''
|
||||
translations = i18n.get_translation_object('test')
|
||||
tools.assert_true(translations.__class__ in (i18n.NewGNUTranslations, i18n.DummyTranslations))
|
||||
|
||||
def test_get_translation_object_python2_api_default(self):
|
||||
'''Smoketest that python2_api default value yields the python2 functions'''
|
||||
# Default
|
||||
translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
|
||||
translations.input_charset = 'utf-8'
|
||||
tools.eq_(translations.gettext.__name__, '_gettext')
|
||||
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||
tools.eq_(translations.ugettext.__name__, '_ugettext')
|
||||
tools.eq_(translations.ngettext.__name__, '_ngettext')
|
||||
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||
tools.eq_(translations.ungettext.__name__, '_ungettext')
|
||||
|
||||
def test_get_translation_object_python2_api_true(self):
|
||||
'''Smoketest that setting python2_api true yields the python2 functions'''
|
||||
# Default
|
||||
translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
|
||||
python2_api=True)
|
||||
translations.input_charset = 'utf-8'
|
||||
tools.eq_(translations.gettext.__name__, '_gettext')
|
||||
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||
tools.eq_(translations.ugettext.__name__, '_ugettext')
|
||||
tools.eq_(translations.ngettext.__name__, '_ngettext')
|
||||
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||
tools.eq_(translations.ungettext.__name__, '_ungettext')
|
||||
|
||||
def test_get_translation_object_python2_api_false(self):
|
||||
'''Smoketest that setting python2_api false yields the python3 functions'''
|
||||
# Default
|
||||
translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
|
||||
python2_api=False)
|
||||
translations.input_charset = 'utf-8'
|
||||
tools.eq_(translations.gettext.__name__, '_ugettext')
|
||||
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||
tools.eq_(translations.ngettext.__name__, '_ungettext')
|
||||
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||
|
||||
tools.assert_raises(AttributeError, translations.ugettext, 'message')
|
||||
tools.assert_raises(AttributeError, translations.ungettext, 'message1', 'message2')
|
||||
|
||||
def test_dummy_translation(self):
|
||||
'''Test that we can create a DummyTranslation object
|
||||
'''
|
||||
tools.assert_true(isinstance(i18n.DummyTranslations(), i18n.DummyTranslations))
|
||||
|
||||
# Note: Using nose's generator tests for this so we can't subclass
|
||||
# unittest.TestCase
|
||||
class TestDummyTranslations(base_classes.UnicodeTestData):
|
||||
def __init__(self):
|
||||
self.test_data = {'bytes': (( # First set is with default charset (utf8)
|
||||
(self.u_ascii, self.b_ascii),
|
||||
(self.u_spanish, self.utf8_spanish),
|
||||
(self.u_japanese, self.utf8_japanese),
|
||||
(self.b_ascii, self.b_ascii),
|
||||
(self.utf8_spanish, self.utf8_spanish),
|
||||
(self.latin1_spanish, self.utf8_mangled_spanish_latin1_as_utf8),
|
||||
(self.utf8_japanese, self.utf8_japanese),
|
||||
),
|
||||
( # Second set is with output_charset of latin1 (ISO-8859-1)
|
||||
(self.u_ascii, self.b_ascii),
|
||||
(self.u_spanish, self.latin1_spanish),
|
||||
(self.u_japanese, self.latin1_mangled_japanese_replace_as_latin1),
|
||||
(self.b_ascii, self.b_ascii),
|
||||
(self.utf8_spanish, self.utf8_spanish),
|
||||
(self.latin1_spanish, self.latin1_spanish),
|
||||
(self.utf8_japanese, self.utf8_japanese),
|
||||
),
|
||||
( # Third set is with output_charset of C
|
||||
(self.u_ascii, self.b_ascii),
|
||||
(self.u_spanish, self.ascii_mangled_spanish_as_ascii),
|
||||
(self.u_japanese, self.ascii_mangled_japanese_replace_as_latin1),
|
||||
(self.b_ascii, self.b_ascii),
|
||||
(self.utf8_spanish, self.ascii_mangled_spanish_as_ascii),
|
||||
(self.latin1_spanish, self.ascii_twice_mangled_spanish_latin1_as_utf8_as_ascii),
|
||||
(self.utf8_japanese, self.ascii_mangled_japanese_replace_as_latin1),
|
||||
),
|
||||
),
|
||||
'unicode': (( # First set is with the default charset (utf8)
|
||||
(self.u_ascii, self.u_ascii),
|
||||
(self.u_spanish, self.u_spanish),
|
||||
(self.u_japanese, self.u_japanese),
|
||||
(self.b_ascii, self.u_ascii),
|
||||
(self.utf8_spanish, self.u_spanish),
|
||||
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_utf8), # String is mangled but no exception
|
||||
(self.utf8_japanese, self.u_japanese),
|
||||
),
|
||||
( # Second set is with _charset of latin1 (ISO-8859-1)
|
||||
(self.u_ascii, self.u_ascii),
|
||||
(self.u_spanish, self.u_spanish),
|
||||
(self.u_japanese, self.u_japanese),
|
||||
(self.b_ascii, self.u_ascii),
|
||||
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_latin1), # String mangled but no exception
|
||||
(self.latin1_spanish, self.u_spanish),
|
||||
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_latin1), # String mangled but no exception
|
||||
),
|
||||
( # Third set is with _charset of C
|
||||
(self.u_ascii, self.u_ascii),
|
||||
(self.u_spanish, self.u_spanish),
|
||||
(self.u_japanese, self.u_japanese),
|
||||
(self.b_ascii, self.u_ascii),
|
||||
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_ascii), # String mangled but no exception
|
||||
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_ascii), # String mangled but no exception
|
||||
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_ascii), # String mangled but no exception
|
||||
),
|
||||
)
|
||||
}
|
||||
|
||||
def setUp(self):
|
||||
self.translations = i18n.DummyTranslations()
|
||||
|
||||
def check_gettext(self, message, value, charset=None):
|
||||
self.translations.set_output_charset(charset)
|
||||
tools.eq_(self.translations.gettext(message), value,
|
||||
msg='gettext(%s): trans: %s != val: %s (charset=%s)'
|
||||
% (repr(message), repr(self.translations.gettext(message)),
|
||||
repr(value), charset))
|
||||
|
||||
def check_lgettext(self, message, value, charset=None,
|
||||
locale='en_US.UTF-8'):
|
||||
os.environ['LC_ALL'] = locale
|
||||
self.translations.set_output_charset(charset)
|
||||
tools.eq_(self.translations.lgettext(message), value,
|
||||
msg='lgettext(%s): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||
% (repr(message), repr(self.translations.lgettext(message)),
|
||||
repr(value), charset, locale))
|
||||
|
||||
# Note: charset has a default value because nose isn't invoking setUp and
|
||||
# tearDown each time check_* is run.
|
||||
def check_ugettext(self, message, value, charset='utf-8'):
|
||||
'''ugettext method with default values'''
|
||||
self.translations.input_charset = charset
|
||||
tools.eq_(self.translations.ugettext(message), value,
|
||||
msg='ugettext(%s): trans: %s != val: %s (charset=%s)'
|
||||
% (repr(message), repr(self.translations.ugettext(message)),
|
||||
repr(value), charset))
|
||||
|
||||
def check_ngettext(self, message, value, charset=None):
|
||||
self.translations.set_output_charset(charset)
|
||||
tools.eq_(self.translations.ngettext(message, 'blank', 1), value)
|
||||
tools.eq_(self.translations.ngettext('blank', message, 2), value)
|
||||
tools.assert_not_equal(self.translations.ngettext(message, 'blank', 2), value)
|
||||
tools.assert_not_equal(self.translations.ngettext('blank', message, 1), value)
|
||||
|
||||
def check_lngettext(self, message, value, charset=None, locale='en_US.UTF-8'):
|
||||
os.environ['LC_ALL'] = locale
|
||||
self.translations.set_output_charset(charset)
|
||||
tools.eq_(self.translations.lngettext(message, 'blank', 1), value,
|
||||
msg='lngettext(%s, "blank", 1): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||
% (repr(message), repr(self.translations.lngettext(message,
|
||||
'blank', 1)), repr(value), charset, locale))
|
||||
tools.eq_(self.translations.lngettext('blank', message, 2), value,
|
||||
msg='lngettext("blank", %s, 2): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||
% (repr(message), repr(self.translations.lngettext('blank',
|
||||
message, 2)), repr(value), charset, locale))
|
||||
tools.assert_not_equal(self.translations.lngettext(message, 'blank', 2), value,
|
||||
msg='lngettext(%s, "blank", 2): trans: %s, val: %s (charset=%s, locale=%s)'
|
||||
% (repr(message), repr(self.translations.lngettext(message,
|
||||
'blank', 2)), repr(value), charset, locale))
|
||||
tools.assert_not_equal(self.translations.lngettext('blank', message, 1), value,
|
||||
msg='lngettext("blank", %s, 1): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||
% (repr(message), repr(self.translations.lngettext('blank',
|
||||
message, 1)), repr(value), charset, locale))
|
||||
|
||||
# Note: charset has a default value because nose isn't invoking setUp and
|
||||
# tearDown each time check_* is run.
|
||||
def check_ungettext(self, message, value, charset='utf-8'):
|
||||
self.translations.input_charset = charset
|
||||
tools.eq_(self.translations.ungettext(message, 'blank', 1), value)
|
||||
tools.eq_(self.translations.ungettext('blank', message, 2), value)
|
||||
tools.assert_not_equal(self.translations.ungettext(message, 'blank', 2), value)
|
||||
tools.assert_not_equal(self.translations.ungettext('blank', message, 1), value)
|
||||
|
||||
def test_gettext(self):
|
||||
'''gettext method with default values'''
|
||||
for message, value in self.test_data['bytes'][0]:
|
||||
yield self.check_gettext, message, value
|
||||
|
||||
def test_gettext_output_charset(self):
|
||||
'''gettext method after output_charset is set'''
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_gettext, message, value, 'latin1'
|
||||
|
||||
def test_ngettext(self):
|
||||
for message, value in self.test_data['bytes'][0]:
|
||||
yield self.check_ngettext, message, value
|
||||
|
||||
def test_ngettext_output_charset(self):
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_ngettext, message, value, 'latin1'
|
||||
|
||||
def test_lgettext(self):
|
||||
'''lgettext method with default values on a utf8 locale'''
|
||||
for message, value in self.test_data['bytes'][0]:
|
||||
yield self.check_lgettext, message, value
|
||||
|
||||
def test_lgettext_output_charset(self):
|
||||
'''lgettext method after output_charset is set'''
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_lgettext, message, value, 'latin1'
|
||||
|
||||
def test_lgettext_output_charset_and_locale(self):
|
||||
'''lgettext method after output_charset is set in C locale
|
||||
|
||||
output_charset should take precedence
|
||||
'''
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_lgettext, message, value, 'latin1', 'C'
|
||||
|
||||
def test_lgettext_locale_C(self):
|
||||
'''lgettext method in a C locale'''
|
||||
for message, value in self.test_data['bytes'][2]:
|
||||
yield self.check_lgettext, message, value, None, 'C'
|
||||
|
||||
def test_lngettext(self):
|
||||
'''lngettext method with default values on a utf8 locale'''
|
||||
for message, value in self.test_data['bytes'][0]:
|
||||
yield self.check_lngettext, message, value
|
||||
|
||||
def test_lngettext_output_charset(self):
|
||||
'''lngettext method after output_charset is set'''
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_lngettext, message, value, 'latin1'
|
||||
|
||||
def test_lngettext_output_charset_and_locale(self):
|
||||
'''lngettext method after output_charset is set in C locale
|
||||
|
||||
output_charset should take precedence
|
||||
'''
|
||||
for message, value in self.test_data['bytes'][1]:
|
||||
yield self.check_lngettext, message, value, 'latin1', 'C'
|
||||
|
||||
def test_lngettext_locale_C(self):
|
||||
'''lngettext method in a C locale'''
|
||||
for message, value in self.test_data['bytes'][2]:
|
||||
yield self.check_lngettext, message, value, None, 'C'
|
||||
|
||||
def test_ugettext(self):
|
||||
for message, value in self.test_data['unicode'][0]:
|
||||
yield self.check_ugettext, message, value
|
||||
|
||||
def test_ugettext_charset_latin1(self):
|
||||
for message, value in self.test_data['unicode'][1]:
|
||||
yield self.check_ugettext, message, value, 'latin1'
|
||||
|
||||
def test_ugettext_charset_ascii(self):
|
||||
for message, value in self.test_data['unicode'][2]:
|
||||
yield self.check_ugettext, message, value, 'ascii'
|
||||
|
||||
def test_ungettext(self):
|
||||
for message, value in self.test_data['unicode'][0]:
|
||||
yield self.check_ungettext, message, value
|
||||
|
||||
def test_ungettext_charset_latin1(self):
|
||||
for message, value in self.test_data['unicode'][1]:
|
||||
yield self.check_ungettext, message, value, 'latin1'
|
||||
|
||||
def test_ungettext_charset_ascii(self):
|
||||
for message, value in self.test_data['unicode'][2]:
|
||||
yield self.check_ungettext, message, value, 'ascii'
|
||||
|
||||
def test_nonbasestring(self):
|
||||
tools.eq_(self.translations.gettext(dict(hi='there')), self.b_empty_string)
|
||||
tools.eq_(self.translations.ngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
|
||||
tools.eq_(self.translations.lgettext(dict(hi='there')), self.b_empty_string)
|
||||
tools.eq_(self.translations.lngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
|
||||
tools.eq_(self.translations.ugettext(dict(hi='there')), self.u_empty_string)
|
||||
tools.eq_(self.translations.ungettext(dict(hi='there'), dict(hi='two'), 1), self.u_empty_string)
|
||||
|
||||
|
||||
class TestI18N_Latin1(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_easy_gettext_setup_non_unicode(self):
|
||||
'''Test that the easy_gettext_setup function works
|
||||
'''
|
||||
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||
['%s/data/locale/' % os.path.dirname(__file__)],
|
||||
use_unicode=False)
|
||||
|
||||
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
|
||||
tools.eq_(b_(self.u_spanish), self.latin1_spanish)
|
||||
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
|
||||
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
|
||||
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.latin1_limao)
|
||||
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.latin1_limoes)
|
||||
|
||||
|
||||
class TestNewGNUTranslationsNoMatch(TestDummyTranslations):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
|
||||
class TestNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_gettext(self):
|
||||
_ = self.translations.gettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
def test_ngettext(self):
|
||||
_ = self.translations.ngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
|
||||
|
||||
def test_lgettext(self):
|
||||
_ = self.translations.lgettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
def test_lngettext(self):
|
||||
_ = self.translations.lngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
|
||||
|
||||
def test_ugettext(self):
|
||||
_ = self.translations.ugettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.utf8_in_fallback), self.u_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
|
||||
# This is not translated to utf8_yes_in_fallback because this test is
|
||||
# without the fallback message catalog
|
||||
tools.eq_(_(self.u_in_fallback), self.u_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
|
||||
|
||||
def test_ungettext(self):
|
||||
_ = self.translations.ungettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||
|
||||
|
||||
class TestNewGNURealTranslations_Latin1(TestNewGNURealTranslations_UTF8):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_lgettext(self):
|
||||
_ = self.translations.lgettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
|
||||
# Neither of the following two tests encode to proper latin-1 because:
|
||||
# any byte is valid in latin-1 so there's no way to know that what
|
||||
# we're given in the string is really utf-8
|
||||
#
|
||||
# This is not translated to latin1_yes_in_fallback because this test
|
||||
# is without the fallback message catalog
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
|
||||
# This is not translated to latin1_yes_in_fallback because this test
|
||||
# is without the fallback message catalog
|
||||
tools.eq_(_(self.u_in_fallback), self.latin1_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
|
||||
|
||||
def test_lngettext(self):
|
||||
_ = self.translations.lngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
|
||||
|
||||
# This unfortunately does not encode to proper latin-1 because:
|
||||
# any byte is valid in latin-1 so there's no way to know that what
|
||||
# we're given in the string is really utf-8
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
|
||||
|
||||
|
||||
class TestFallbackNewGNUTranslationsNoMatch(TestDummyTranslations):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
self.translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale/' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
|
||||
class TestFallbackNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
self.translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale/' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_gettext(self):
|
||||
_ = self.translations.gettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
def test_ngettext(self):
|
||||
_ = self.translations.ngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
|
||||
def test_lgettext(self):
|
||||
_ = self.translations.lgettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
def test_lngettext(self):
|
||||
_ = self.translations.lngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
|
||||
def test_ugettext(self):
|
||||
_ = self.translations.ugettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
|
||||
tools.eq_(_(self.utf8_in_fallback), self.u_yes_in_fallback)
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
|
||||
tools.eq_(_(self.u_in_fallback), self.u_yes_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
|
||||
|
||||
def test_ungettext(self):
|
||||
_ = self.translations.ungettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
|
||||
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||
|
||||
|
||||
class TestFallbackNewGNURealTranslations_Latin1(TestFallbackNewGNURealTranslations_UTF8):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||
self.translations = i18n.get_translation_object('test',
|
||||
['%s/data/locale/' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_lgettext(self):
|
||||
_ = self.translations.lgettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
|
||||
tools.eq_(_(self.utf8_in_fallback), self.latin1_yes_in_fallback)
|
||||
# This unfortunately does not encode to proper latin-1 because:
|
||||
# any byte is valid in latin-1 so there's no way to know that what
|
||||
# we're given in the string is really utf-8
|
||||
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
|
||||
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
|
||||
tools.eq_(_(self.u_in_fallback), self.latin1_yes_in_fallback)
|
||||
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
|
||||
|
||||
def test_lngettext(self):
|
||||
_ = self.translations.lngettext
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
|
||||
|
||||
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
|
||||
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
|
||||
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
|
||||
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
|
||||
|
||||
# This unfortunately does not encode to proper latin-1 because:
|
||||
# any byte is valid in latin-1 so there's no way to know that what
|
||||
# we're given in the string is really utf-8
|
||||
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
|
||||
|
||||
|
||||
class TestFallback(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||
self.gtranslations = i18n.get_translation_object('test',
|
||||
['%s/data/locale/' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
self.gtranslations.add_fallback(object())
|
||||
self.dtranslations = i18n.get_translation_object('nonexistent',
|
||||
['%s/data/locale/' % os.path.dirname(__file__),
|
||||
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||
self.dtranslations.add_fallback(object())
|
||||
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
|
||||
def test_invalid_fallback_no_raise(self):
|
||||
'''Test when we have an invalid fallback that it does not raise.'''
|
||||
tools.eq_(self.gtranslations.gettext(self.u_spanish), self.utf8_spanish)
|
||||
tools.eq_(self.gtranslations.ugettext(self.u_spanish), self.u_spanish)
|
||||
tools.eq_(self.gtranslations.lgettext(self.u_spanish), self.latin1_spanish)
|
||||
|
||||
tools.eq_(self.gtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
|
||||
tools.eq_(self.gtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
|
||||
tools.eq_(self.gtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
|
||||
|
||||
tools.eq_(self.dtranslations.gettext(self.u_spanish), self.utf8_spanish)
|
||||
tools.eq_(self.dtranslations.ugettext(self.u_spanish), self.u_spanish)
|
||||
tools.eq_(self.dtranslations.lgettext(self.u_spanish), self.latin1_spanish)
|
||||
|
||||
tools.eq_(self.dtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
|
||||
tools.eq_(self.dtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
|
||||
tools.eq_(self.dtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
|
||||
|
||||
|
||||
class TestDefaultLocaleDir(unittest.TestCase, base_classes.UnicodeTestData):
|
||||
def setUp(self):
|
||||
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||
self.old_DEFAULT_LOCALEDIRS = i18n._DEFAULT_LOCALEDIR
|
||||
i18n._DEFAULT_LOCALEDIR = '%s/data/locale/' % os.path.dirname(__file__)
|
||||
self.translations = i18n.get_translation_object('test')
|
||||
|
||||
def tearDown(self):
|
||||
if self.old_LC_ALL:
|
||||
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||
else:
|
||||
del(os.environ['LC_ALL'])
|
||||
if self.old_DEFAULT_LOCALEDIRS:
|
||||
i18n._DEFAULT_LOCALEDIR = self.old_DEFAULT_LOCALEDIRS
|
||||
|
||||
def test_gettext(self):
|
||||
_ = self.translations.gettext
|
||||
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||
# Returns msgid because the string is in a fallback catalog which we
|
||||
# haven't setup
|
||||
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||
|
||||
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||
# Returns msgid because the string is in a fallback catalog which we
|
||||
# haven't setup
|
||||
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||
|
||||
|
|
@ -5,7 +5,7 @@ from nose import tools
|
|||
|
||||
from kitchen import iterutils
|
||||
|
||||
class TestStrictDict(unittest.TestCase):
|
||||
class TestIterutils(unittest.TestCase):
|
||||
iterable_data = (
|
||||
[0, 1, 2],
|
||||
[],
|
||||
|
@ -40,6 +40,9 @@ class TestStrictDict(unittest.TestCase):
|
|||
tools.ok_(iterutils.isiterable('a', include_string=True) == True)
|
||||
tools.ok_(iterutils.isiterable('a', include_string=False) == False)
|
||||
tools.ok_(iterutils.isiterable('a') == False)
|
||||
tools.ok_(iterutils.isiterable(u'a', include_string=True) == True)
|
||||
tools.ok_(iterutils.isiterable(u'a', include_string=False) == False)
|
||||
tools.ok_(iterutils.isiterable(u'a') == False)
|
||||
|
||||
def test_iterate(self):
|
||||
iterutils.iterate(None)
|
||||
|
@ -55,3 +58,5 @@ class TestStrictDict(unittest.TestCase):
|
|||
# strings
|
||||
tools.ok_(list(iterutils.iterate('abc')) == ['abc'])
|
||||
tools.ok_(list(iterutils.iterate('abc', include_string=True)) == ['a', 'b', 'c'])
|
||||
tools.ok_(list(iterutils.iterate(u'abc')) == [u'abc'])
|
||||
tools.ok_(list(iterutils.iterate(u'abc', include_string=True)) == [u'a', u'b', u'c'])
|
|
@ -1,6 +1,5 @@
|
|||
import unittest
|
||||
from nose.plugins.skip import SkipTest
|
||||
from test import test_support
|
||||
from kitchen.pycompat27.subprocess import _subprocess as subprocess
|
||||
import sys
|
||||
import StringIO
|
||||
|
@ -45,9 +44,14 @@ def reap_children():
|
|||
except:
|
||||
break
|
||||
|
||||
if not hasattr(test_support, 'reap_children'):
|
||||
test_support = None
|
||||
try:
|
||||
from test import test_support
|
||||
if not hasattr(test_support, 'reap_children'):
|
||||
# No reap_children in python-2.3
|
||||
test_support.reap_children = reap_children
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
# In a debug build, stuff like "[6580 refs]" is printed to stderr at
|
||||
# shutdown time. That frustrates tests trying to check stderr produced
|
||||
|
@ -79,6 +83,7 @@ class BaseTestCase(unittest.TestCase):
|
|||
def setUp(self):
|
||||
# Try to minimize the number of children we have so this test
|
||||
# doesn't crash on some buildbots (Alphas in particular).
|
||||
if test_support:
|
||||
test_support.reap_children()
|
||||
|
||||
def tearDown(self):
|
||||
|
@ -596,6 +601,9 @@ class ProcessTestCase(BaseTestCase):
|
|||
"line1\nline2\rline3\r\nline4\r\nline5\nline6")
|
||||
|
||||
def test_no_leaking(self):
|
||||
if not test_support:
|
||||
raise SkipTest("No test_support module available.")
|
||||
|
||||
# Make sure we leak no resources
|
||||
if not mswindows:
|
||||
max_handles = 1026 # too much for most UNIX systems
|
||||
|
@ -1123,6 +1131,8 @@ class POSIXProcessTestCase(BaseTestCase):
|
|||
|
||||
def test_wait_when_sigchild_ignored(self):
|
||||
# NOTE: sigchild_ignore.py may not be an effective test on all OSes.
|
||||
if not test_support:
|
||||
raise SkipTest("No test_support module available.")
|
||||
sigchild_ignore = test_support.findfile(os.path.join("subprocessdata",
|
||||
"sigchild_ignore.py"))
|
||||
p = subprocess.Popen([sys.executable, sigchild_ignore],
|
161
kitchen2/tests/test_text_display.py
Normal file
161
kitchen2/tests/test_text_display.py
Normal file
|
@ -0,0 +1,161 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
import unittest
|
||||
from nose import tools
|
||||
|
||||
from kitchen.text.exceptions import ControlCharError
|
||||
|
||||
from kitchen.text import display
|
||||
|
||||
import base_classes
|
||||
|
||||
class TestDisplay(base_classes.UnicodeTestData, unittest.TestCase):
|
||||
|
||||
def test_internal_interval_bisearch(self):
|
||||
'''Test that we can find things in an interval table'''
|
||||
table = ((0, 3), (5,7), (9, 10))
|
||||
tools.assert_true(display._interval_bisearch(0, table))
|
||||
tools.assert_true(display._interval_bisearch(1, table))
|
||||
tools.assert_true(display._interval_bisearch(2, table))
|
||||
tools.assert_true(display._interval_bisearch(3, table))
|
||||
tools.assert_true(display._interval_bisearch(5, table))
|
||||
tools.assert_true(display._interval_bisearch(6, table))
|
||||
tools.assert_true(display._interval_bisearch(7, table))
|
||||
tools.assert_true(display._interval_bisearch(9, table))
|
||||
tools.assert_true(display._interval_bisearch(10, table))
|
||||
tools.assert_false(display._interval_bisearch(-1, table))
|
||||
tools.assert_false(display._interval_bisearch(4, table))
|
||||
tools.assert_false(display._interval_bisearch(8, table))
|
||||
tools.assert_false(display._interval_bisearch(11, table))
|
||||
|
||||
def test_internal_generate_combining_table(self):
|
||||
'''Test that the combining table we generate is equal to or a subseet of what's in the current table
|
||||
|
||||
If we assert it can mean one of two things:
|
||||
|
||||
1. The code is broken
|
||||
2. The table we have is out of date.
|
||||
'''
|
||||
old_table = display._COMBINING
|
||||
new_table = display._generate_combining_table()
|
||||
for interval in new_table:
|
||||
if interval[0] == interval[1]:
|
||||
tools.assert_true(display._interval_bisearch(interval[0], old_table))
|
||||
else:
|
||||
for codepoint in xrange(interval[0], interval[1] + 1):
|
||||
tools.assert_true(display._interval_bisearch(interval[0], old_table))
|
||||
|
||||
def test_internal_ucp_width(self):
|
||||
'''Test that ucp_width returns proper width for characters'''
|
||||
for codepoint in xrange(0, 0xFFFFF + 1):
|
||||
if codepoint < 32 or (codepoint < 0xa0 and codepoint >= 0x7f):
|
||||
# With strict on, we should raise an error
|
||||
tools.assert_raises(ControlCharError, display._ucp_width, codepoint, 'strict')
|
||||
|
||||
if codepoint in (0x08, 0x1b, 0x7f, 0x94):
|
||||
# Backspace, delete, clear delete remove one char
|
||||
tools.eq_(display._ucp_width(codepoint), -1)
|
||||
else:
|
||||
# Everything else returns 0
|
||||
tools.eq_(display._ucp_width(codepoint), 0)
|
||||
elif display._interval_bisearch(codepoint, display._COMBINING):
|
||||
# Combining character
|
||||
tools.eq_(display._ucp_width(codepoint), 0)
|
||||
elif (codepoint >= 0x1100 and
|
||||
(codepoint <= 0x115f or # Hangul Jamo init. consonants
|
||||
codepoint == 0x2329 or codepoint == 0x232a or
|
||||
(codepoint >= 0x2e80 and codepoint <= 0xa4cf and
|
||||
codepoint != 0x303f) or # CJK ... Yi
|
||||
(codepoint >= 0xac00 and codepoint <= 0xd7a3) or # Hangul Syllables
|
||||
(codepoint >= 0xf900 and codepoint <= 0xfaff) or # CJK Compatibility Ideographs
|
||||
(codepoint >= 0xfe10 and codepoint <= 0xfe19) or # Vertical forms
|
||||
(codepoint >= 0xfe30 and codepoint <= 0xfe6f) or # CJK Compatibility Forms
|
||||
(codepoint >= 0xff00 and codepoint <= 0xff60) or # Fullwidth Forms
|
||||
(codepoint >= 0xffe0 and codepoint <= 0xffe6) or
|
||||
(codepoint >= 0x20000 and codepoint <= 0x2fffd) or
|
||||
(codepoint >= 0x30000 and codepoint <= 0x3fffd))):
|
||||
tools.eq_(display._ucp_width(codepoint), 2)
|
||||
else:
|
||||
tools.eq_(display._ucp_width(codepoint), 1)
|
||||
|
||||
def test_textual_width(self):
|
||||
'''Test that we find the proper number of spaces that a utf8 string will consume'''
|
||||
tools.eq_(display.textual_width(self.u_japanese), 31)
|
||||
tools.eq_(display.textual_width(self.u_spanish), 50)
|
||||
tools.eq_(display.textual_width(self.u_mixed), 23)
|
||||
|
||||
def test_textual_width_chop(self):
|
||||
'''utf8_width_chop with byte strings'''
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 1000), self.u_mixed)
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 23), self.u_mixed)
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 22), self.u_mixed[:-1])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:-4])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 1), u'')
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 2), self.u_mixed[0])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 3), self.u_mixed[:2])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 4), self.u_mixed[:3])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 5), self.u_mixed[:4])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 6), self.u_mixed[:5])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 7), self.u_mixed[:5])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 8), self.u_mixed[:6])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 9), self.u_mixed[:7])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 10), self.u_mixed[:8])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 11), self.u_mixed[:9])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 12), self.u_mixed[:10])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 13), self.u_mixed[:10])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 14), self.u_mixed[:11])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 15), self.u_mixed[:12])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 16), self.u_mixed[:13])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 17), self.u_mixed[:14])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 18), self.u_mixed[:15])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:15])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 20), self.u_mixed[:16])
|
||||
tools.eq_(display.textual_width_chop(self.u_mixed, 21), self.u_mixed[:17])
|
||||
|
||||
def test_textual_width_fill(self):
|
||||
'''Pad a utf8 string'''
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 1), self.u_mixed)
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25), self.u_mixed + u' ')
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25, left=False), u' ' + self.u_mixed)
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
|
||||
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
|
||||
|
||||
def test_internal_textual_width_le(self):
|
||||
test_data = ''.join([self.u_mixed, self.u_spanish])
|
||||
tw = display.textual_width(test_data)
|
||||
tools.eq_(display._textual_width_le(68, self.u_mixed, self.u_spanish), (tw <= 68))
|
||||
tools.eq_(display._textual_width_le(69, self.u_mixed, self.u_spanish), (tw <= 69))
|
||||
tools.eq_(display._textual_width_le(137, self.u_mixed, self.u_spanish), (tw <= 137))
|
||||
tools.eq_(display._textual_width_le(138, self.u_mixed, self.u_spanish), (tw <= 138))
|
||||
tools.eq_(display._textual_width_le(78, self.u_mixed, self.u_spanish), (tw <= 78))
|
||||
tools.eq_(display._textual_width_le(79, self.u_mixed, self.u_spanish), (tw <= 79))
|
||||
|
||||
def test_wrap(self):
|
||||
'''Test that text wrapping works'''
|
||||
tools.eq_(display.wrap(self.u_mixed), [self.u_mixed])
|
||||
tools.eq_(display.wrap(self.u_paragraph), self.u_paragraph_out)
|
||||
tools.eq_(display.wrap(self.utf8_paragraph), self.u_paragraph_out)
|
||||
tools.eq_(display.wrap(self.u_mixed_para), self.u_mixed_para_out)
|
||||
tools.eq_(display.wrap(self.u_mixed_para, width=57,
|
||||
initial_indent=' ', subsequent_indent='----'),
|
||||
self.u_mixed_para_57_initial_subsequent_out)
|
||||
|
||||
def test_fill(self):
|
||||
tools.eq_(display.fill(self.u_paragraph), u'\n'.join(self.u_paragraph_out))
|
||||
tools.eq_(display.fill(self.utf8_paragraph), u'\n'.join(self.u_paragraph_out))
|
||||
tools.eq_(display.fill(self.u_mixed_para), u'\n'.join(self.u_mixed_para_out))
|
||||
tools.eq_(display.fill(self.u_mixed_para, width=57,
|
||||
initial_indent=' ', subsequent_indent='----'),
|
||||
u'\n'.join(self.u_mixed_para_57_initial_subsequent_out))
|
||||
|
||||
def test_byte_string_textual_width_fill(self):
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 1), self.utf8_mixed)
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25), self.utf8_mixed + ' ')
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, left=False), ' ' + self.utf8_mixed)
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
|
||||
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')
|
||||
|
|
@ -135,3 +135,19 @@ class TestTextMisc(unittest.TestCase, base_classes.UnicodeTestData):
|
|||
'''Test that we return False with non-encoded chars'''
|
||||
tools.ok_(misc.byte_string_valid_encoding('\xff') == False)
|
||||
tools.ok_(misc.byte_string_valid_encoding(self.euc_jp_japanese) == False)
|
||||
|
||||
class TestIsStringTypes(unittest.TestCase):
|
||||
def test_isbasestring(self):
|
||||
tools.assert_true(misc.isbasestring('abc'))
|
||||
tools.assert_true(misc.isbasestring(u'abc'))
|
||||
tools.assert_false(misc.isbasestring(5))
|
||||
|
||||
def test_isbytestring(self):
|
||||
tools.assert_true(misc.isbytestring('abc'))
|
||||
tools.assert_false(misc.isbytestring(u'abc'))
|
||||
tools.assert_false(misc.isbytestring(5))
|
||||
|
||||
def test_isunicodestring(self):
|
||||
tools.assert_false(misc.isunicodestring('abc'))
|
||||
tools.assert_true(misc.isunicodestring(u'abc'))
|
||||
tools.assert_false(misc.isunicodestring(5))
|
|
@ -56,7 +56,7 @@ class TestUTF8(base_classes.UnicodeTestData, unittest.TestCase):
|
|||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 22) == (22, self.u_mixed[:-1]))
|
||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 19) == (18, self.u_mixed[:-4]))
|
||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 2) == (2, self.u_mixed[0]))
|
||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, ''))
|
||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, u''))
|
||||
|
||||
def test_utf8_width_fill(self):
|
||||
'''Pad a utf8 string'''
|
|
@ -1,6 +1,5 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
import unittest
|
||||
from nose import tools
|
||||
|
||||
from kitchen.versioning import version_tuple_to_string
|
||||
|
@ -26,7 +25,7 @@ class TestVersionTuple(object):
|
|||
}
|
||||
|
||||
def check_ver_tuple_to_str(self, v_tuple, v_str):
|
||||
tools.ok_(version_tuple_to_string(v_tuple) == v_str)
|
||||
tools.eq_(version_tuple_to_string(v_tuple), v_str)
|
||||
|
||||
def test_version_tuple_to_string(self):
|
||||
'''Test that version_tuple_to_string outputs PEP-386 compliant strings
|
6
kitchen3/docs/api-collections.rst
Normal file
6
kitchen3/docs/api-collections.rst
Normal file
|
@ -0,0 +1,6 @@
|
|||
===================
|
||||
Kitchen.collections
|
||||
===================
|
||||
|
||||
.. automodule:: kitchen.collections.strictdict
|
||||
:members:
|
12
kitchen3/docs/api-exceptions.rst
Normal file
12
kitchen3/docs/api-exceptions.rst
Normal file
|
@ -0,0 +1,12 @@
|
|||
==========
|
||||
Exceptions
|
||||
==========
|
||||
|
||||
Kitchen has a hierarchy of exceptions that should make it easy to catch many
|
||||
errors emitted by kitchen itself.
|
||||
|
||||
.. automodule:: kitchen.exceptions
|
||||
:members:
|
||||
|
||||
.. automodule:: kitchen.text.exceptions
|
||||
:members:
|
38
kitchen3/docs/api-i18n.rst
Normal file
38
kitchen3/docs/api-i18n.rst
Normal file
|
@ -0,0 +1,38 @@
|
|||
===================
|
||||
Kitchen.i18n Module
|
||||
===================
|
||||
|
||||
.. automodule:: kitchen.i18n
|
||||
|
||||
Functions
|
||||
=========
|
||||
|
||||
:func:`easy_gettext_setup` should satisfy the needs of most users.
|
||||
:func:`get_translation_object` is designed to ease the way for anyone that
|
||||
needs more control.
|
||||
|
||||
.. autofunction:: easy_gettext_setup
|
||||
|
||||
.. autofunction:: get_translation_object
|
||||
|
||||
Translation Objects
|
||||
===================
|
||||
|
||||
The standard translation objects from the :mod:`gettext` module suffer from
|
||||
several problems:
|
||||
|
||||
* They can throw :exc:`UnicodeError`
|
||||
* They can't find translations for non-:term:`ASCII` byte :class:`str`
|
||||
messages
|
||||
* They may return either :class:`unicode` string or byte :class:`str` from the
|
||||
same function even though the functions say they will only return
|
||||
:class:`unicode` or only return byte :class:`str`.
|
||||
|
||||
:class:`DummyTranslations` and :class:`NewGNUTranslations` were written to fix
|
||||
these issues.
|
||||
|
||||
.. autoclass:: kitchen.i18n.DummyTranslations
|
||||
:members:
|
||||
|
||||
.. autoclass:: kitchen.i18n.NewGNUTranslations
|
||||
:members:
|
9
kitchen3/docs/api-iterutils.rst
Normal file
9
kitchen3/docs/api-iterutils.rst
Normal file
|
@ -0,0 +1,9 @@
|
|||
|
||||
========================
|
||||
Kitchen.iterutils Module
|
||||
========================
|
||||
|
||||
.. automodule:: kitchen.iterutils
|
||||
|
||||
.. autofunction:: kitchen.iterutils.isiterable
|
||||
.. autofunction:: kitchen.iterutils.iterate
|
24
kitchen3/docs/api-overview.rst
Normal file
24
kitchen3/docs/api-overview.rst
Normal file
|
@ -0,0 +1,24 @@
|
|||
.. _KitchenAPI:
|
||||
|
||||
===========
|
||||
Kitchen API
|
||||
===========
|
||||
|
||||
Kitchen is structured as a collection of modules. In its current
|
||||
configuration, Kitchen ships with the following modules. Other addon modules
|
||||
that may drag in more dependencies can be found on the `project webpage`_
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
api-i18n
|
||||
api-text
|
||||
api-collections
|
||||
api-iterutils
|
||||
api-versioning
|
||||
api-pycompat24
|
||||
api-pycompat25
|
||||
api-pycompat27
|
||||
api-exceptions
|
||||
|
||||
.. _`project webpage`: https://fedorahosted.org/kitchen
|
34
kitchen3/docs/api-pycompat24.rst
Normal file
34
kitchen3/docs/api-pycompat24.rst
Normal file
|
@ -0,0 +1,34 @@
|
|||
=======================
|
||||
Python 2.4 Compatibiity
|
||||
=======================
|
||||
|
||||
|
||||
-------------------
|
||||
Sets for python-2.3
|
||||
-------------------
|
||||
|
||||
.. automodule:: kitchen.pycompat24.sets
|
||||
.. autofunction:: kitchen.pycompat24.sets.add_builtin_set
|
||||
|
||||
----------------------------------
|
||||
Partial new style base64 interface
|
||||
----------------------------------
|
||||
|
||||
.. automodule:: kitchen.pycompat24.base64
|
||||
:members:
|
||||
|
||||
----------
|
||||
Subprocess
|
||||
----------
|
||||
|
||||
.. seealso::
|
||||
|
||||
:mod:`kitchen.pycompat27.subprocess`
|
||||
Kitchen includes the python-2.7 version of subprocess which has a new
|
||||
function, :func:`~kitchen.pycompat27.subprocess.check_output`. When
|
||||
you import :mod:`pycompat24.subprocess` you will be getting the
|
||||
python-2.7 version of subprocess rather than the 2.4 version (where
|
||||
subprocess first appeared). This choice was made so that we can
|
||||
concentrate our efforts on keeping the single version of subprocess up
|
||||
to date rather than working on a 2.4 version that very few people
|
||||
would need specifically.
|
8
kitchen3/docs/api-pycompat25.rst
Normal file
8
kitchen3/docs/api-pycompat25.rst
Normal file
|
@ -0,0 +1,8 @@
|
|||
========================
|
||||
Python 2.5 Compatibility
|
||||
========================
|
||||
|
||||
.. automodule:: kitchen.pycompat25
|
||||
|
||||
.. automodule:: kitchen.pycompat25.collections.defaultdict
|
||||
|
35
kitchen3/docs/api-pycompat27.rst
Normal file
35
kitchen3/docs/api-pycompat27.rst
Normal file
|
@ -0,0 +1,35 @@
|
|||
========================
|
||||
Python 2.7 Compatibility
|
||||
========================
|
||||
|
||||
.. module:: kitchen.pycompat27.subprocess
|
||||
|
||||
--------------------------
|
||||
Subprocess from Python 2.7
|
||||
--------------------------
|
||||
|
||||
The :mod:`subprocess` module included here is a direct import from
|
||||
python-2.7's |stdlib|_. You can access it via::
|
||||
|
||||
>>> from kitchen.pycompat27 import subprocess
|
||||
|
||||
The motivation for including this module is that various API changing
|
||||
improvements have been made to subprocess over time. The following is a list
|
||||
of the known changes to :mod:`subprocess` with the python version they were
|
||||
introduced in:
|
||||
|
||||
==================================== ===
|
||||
New API Feature Ver
|
||||
==================================== ===
|
||||
:exc:`subprocess.CalledProcessError` 2.5
|
||||
:func:`subprocess.check_call` 2.5
|
||||
:func:`subprocess.check_output` 2.7
|
||||
:meth:`subprocess.Popen.send_signal` 2.6
|
||||
:meth:`subprocess.Popen.terminate` 2.6
|
||||
:meth:`subprocess.Popen.kill` 2.6
|
||||
==================================== ===
|
||||
|
||||
.. seealso::
|
||||
|
||||
The stdlib :mod:`subprocess` documenation
|
||||
For complete documentation on how to use subprocess
|
405
kitchen3/docs/api-text-converters.rst
Normal file
405
kitchen3/docs/api-text-converters.rst
Normal file
|
@ -0,0 +1,405 @@
|
|||
-----------------------
|
||||
Kitchen.text.converters
|
||||
-----------------------
|
||||
|
||||
.. automodule:: kitchen.text.converters
|
||||
|
||||
Byte Strings and Unicode in Python2
|
||||
===================================
|
||||
|
||||
Python2 has two string types, :class:`str` and :class:`unicode`.
|
||||
:class:`unicode` represents an abstract sequence of text characters. It can
|
||||
hold any character that is present in the unicode standard. :class:`str` can
|
||||
hold any byte of data. The operating system and python work together to
|
||||
display these bytes as characters in many cases but you should always keep in
|
||||
mind that the information is really a sequence of bytes, not a sequence of
|
||||
characters. In python2 these types are interchangeable a large amount of the
|
||||
time. They are one of the few pairs of types that automatically convert when
|
||||
used in equality::
|
||||
|
||||
>>> # string is converted to unicode and then compared
|
||||
>>> "I am a string" == u"I am a string"
|
||||
True
|
||||
>>> # Other types, like int, don't have this special treatment
|
||||
>>> 5 == "5"
|
||||
False
|
||||
|
||||
However, this automatic conversion tends to lull people into a false sense of
|
||||
security. As long as you're dealing with :term:`ASCII` characters the
|
||||
automatic conversion will save you from seeing any differences. Once you
|
||||
start using characters that are not in :term:`ASCII`, you will start getting
|
||||
:exc:`UnicodeError` and :exc:`UnicodeWarning` as the automatic conversions
|
||||
between the types fail::
|
||||
|
||||
>>> "I am an ñ" == u"I am an ñ"
|
||||
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
|
||||
False
|
||||
|
||||
Why do these conversions fail? The reason is that the python2
|
||||
:class:`unicode` type represents an abstract sequence of unicode text known as
|
||||
:term:`code points`. :class:`str`, on the other hand, really represents
|
||||
a sequence of bytes. Those bytes are converted by your operating system to
|
||||
appear as characters on your screen using a particular encoding (usually
|
||||
with a default defined by the operating system and customizable by the
|
||||
individual user.) Although :term:`ASCII` characters are fairly standard in
|
||||
what bytes represent each character, the bytes outside of the :term:`ASCII`
|
||||
range are not. In general, each encoding will map a different character to
|
||||
a particular byte. Newer encodings map individual characters to multiple
|
||||
bytes (which the older encodings will instead treat as multiple characters).
|
||||
In the face of these differences, python refuses to guess at an encoding and
|
||||
instead issues a warning or exception and refuses to convert.
|
||||
|
||||
.. seealso::
|
||||
:ref:`overcoming-frustration`
|
||||
For a longer introduction on this subject.
|
||||
|
||||
Strategy for Explicit Conversion
|
||||
================================
|
||||
|
||||
So what is the best method of dealing with this weltering babble of incoherent
|
||||
encodings? The basic strategy is to explicitly turn everything into
|
||||
:class:`unicode` when it first enters your program. Then, when you send it to
|
||||
output, you can transform the unicode back into bytes. Doing this allows you
|
||||
to control the encodings that are used and avoid getting tracebacks due to
|
||||
:exc:`UnicodeError`. Using the functions defined in this module, that looks
|
||||
something like this:
|
||||
|
||||
.. code-block:: pycon
|
||||
:linenos:
|
||||
|
||||
>>> from kitchen.text.converters import to_unicode, to_bytes
|
||||
>>> name = raw_input('Enter your name: ')
|
||||
Enter your name: Toshio くらとみ
|
||||
>>> name
|
||||
'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
|
||||
>>> type(name)
|
||||
<type 'str'>
|
||||
>>> unicode_name = to_unicode(name)
|
||||
>>> type(unicode_name)
|
||||
<type 'unicode'>
|
||||
>>> unicode_name
|
||||
u'Toshio \u304f\u3089\u3068\u307f'
|
||||
>>> # Do a lot of other things before needing to save/output again:
|
||||
>>> output = open('datafile', 'w')
|
||||
>>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))
|
||||
|
||||
A few notes:
|
||||
|
||||
Looking at line 6, you'll notice that the input we took from the user was
|
||||
a byte :class:`str`. In general, anytime we're getting a value from outside
|
||||
of python (The filesystem, reading data from the network, interacting with an
|
||||
external command, reading values from the environment) we are interacting with
|
||||
something that will want to give us a byte :class:`str`. Some |stdlib|_
|
||||
modules and third party libraries will automatically attempt to convert a byte
|
||||
:class:`str` to :class:`unicode` strings for you. This is both a boon and
|
||||
a curse. If the library can guess correctly about the encoding that the data
|
||||
is in, it will return :class:`unicode` objects to you without you having to
|
||||
convert. However, if it can't guess correctly, you may end up with one of
|
||||
several problems:
|
||||
|
||||
:exc:`UnicodeError`
|
||||
The library attempted to decode a byte :class:`str` into
|
||||
a :class:`unicode`, string failed, and raises an exception.
|
||||
Garbled data
|
||||
If the library returns the data after decoding it with the wrong encoding,
|
||||
the characters you see in the :exc:`unicode` string won't be the ones that
|
||||
you expect.
|
||||
A byte :class:`str` instead of :class:`unicode` string
|
||||
Some libraries will return a :class:`unicode` string when they're able to
|
||||
decode the data and a byte :class:`str` when they can't. This is
|
||||
generally the hardest problem to debug when it occurs. Avoid it in your
|
||||
own code and try to avoid or open bugs against upstreams that do this. See
|
||||
:ref:`DesigningUnicodeAwareAPIs` for strategies to do this properly.
|
||||
|
||||
On line 8, we convert from a byte :class:`str` to a :class:`unicode` string.
|
||||
:func:`~kitchen.text.converters.to_unicode` does this for us. It has some
|
||||
error handling and sane defaults that make this a nicer function to use than
|
||||
calling :meth:`str.decode` directly:
|
||||
|
||||
* Instead of defaulting to the :term:`ASCII` encoding which fails with all
|
||||
but the simple American English characters, it defaults to :term:`UTF-8`.
|
||||
* Instead of raising an error if it cannot decode a value, it will replace
|
||||
the value with the unicode "Replacement character" symbol (``<EFBFBD>``).
|
||||
* If you happen to call this method with something that is not a :class:`str`
|
||||
or :class:`unicode`, it will return an empty :class:`unicode` string.
|
||||
|
||||
All three of these can be overridden using different keyword arguments to the
|
||||
function. See the :func:`to_unicode` documentation for more information.
|
||||
|
||||
On line 15 we push the data back out to a file. Two things you should note here:
|
||||
|
||||
1. We deal with the strings as :class:`unicode` until the last instant. The
|
||||
string format that we're using is :class:`unicode` and the variable also
|
||||
holds :class:`unicode`. People sometimes get into trouble when they mix
|
||||
a byte :class:`str` format with a variable that holds a :class:`unicode`
|
||||
string (or vice versa) at this stage.
|
||||
2. :func:`~kitchen.text.converters.to_bytes`, does the reverse of
|
||||
:func:`to_unicode`. In this case, we're using the default values which
|
||||
turn :class:`unicode` into a byte :class:`str` using :term:`UTF-8`. Any
|
||||
errors are replaced with a ``<EFBFBD>`` and sending nonstring objects yield empty
|
||||
:class:`unicode` strings. Just like :func:`to_unicode`, you can look at
|
||||
the documentation for :func:`to_bytes` to find out how to override any of
|
||||
these defaults.
|
||||
|
||||
When to use an alternate strategy
|
||||
---------------------------------
|
||||
|
||||
The default strategy of decoding to :class:`unicode` strings when you take
|
||||
data in and encoding to a byte :class:`str` when you send the data back out
|
||||
works great for most problems but there are a few times when you shouldn't:
|
||||
|
||||
* The values aren't meant to be read as text
|
||||
* The values need to be byte-for-byte when you send them back out -- for
|
||||
instance if they are database keys or filenames.
|
||||
* You are transferring the data between several libraries that all expect
|
||||
byte :class:`str`.
|
||||
|
||||
In each of these instances, there is a reason to keep around the byte
|
||||
:class:`str` version of a value. Here's a few hints to keep your sanity in
|
||||
these situations:
|
||||
|
||||
1. Keep your :class:`unicode` and :class:`str` values separate. Just like the
|
||||
pain caused when you have to use someone else's library that returns both
|
||||
:class:`unicode` and :class:`str` you can cause yourself pain if you have
|
||||
functions that can return both types or variables that could hold either
|
||||
type of value.
|
||||
2. Name your variables so that you can tell whether you're storing byte
|
||||
:class:`str` or :class:`unicode` string. One of the first things you end
|
||||
up having to do when debugging is determine what type of string you have in
|
||||
a variable and what type of string you are expecting. Naming your
|
||||
variables consistently so that you can tell which type they are supposed to
|
||||
hold will save you from at least one of those steps.
|
||||
3. When you get values initially, make sure that you're dealing with the type
|
||||
of value that you expect as you save it. You can use :func:`isinstance`
|
||||
or :func:`to_bytes` since :func:`to_bytes` doesn't do any modifications of
|
||||
the string if it's already a :class:`str`. When using :func:`to_bytes`
|
||||
for this purpose you might want to use::
|
||||
|
||||
try:
|
||||
b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
|
||||
except:
|
||||
handle_errors_somehow()
|
||||
|
||||
The reason is that the default of :func:`to_bytes` will take characters
|
||||
that are illegal in the chosen encoding and transform them to replacement
|
||||
characters. Since the point of keeping this data as a byte :class:`str` is
|
||||
to keep the exact same bytes when you send it outside of your code,
|
||||
changing things to replacement characters should be rasing red flags that
|
||||
something is wrong. Setting :attr:`errors` to ``strict`` will raise an
|
||||
exception which gives you an opportunity to fail gracefully.
|
||||
4. Sometimes you will want to print out the values that you have in your byte
|
||||
:class:`str`. When you do this you will need to make sure that you
|
||||
transform :class:`unicode` to :class:`str` before combining them. Also be
|
||||
sure that any other function calls (including :mod:`gettext`) are going to
|
||||
give you strings that are the same type. For instance::
|
||||
|
||||
print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}
|
||||
|
||||
Gotchas and how to avoid them
|
||||
=============================
|
||||
|
||||
Even when you have a good conceptual understanding of how python2 treats
|
||||
:class:`unicode` and :class:`str` there are still some things that can
|
||||
surprise you. In most cases this is because, as noted earlier, python or one
|
||||
of the python libraries you depend on is trying to convert a value
|
||||
automatically and failing. Explicit conversion at the appropriate place
|
||||
usually solves that.
|
||||
|
||||
str(obj)
|
||||
--------
|
||||
|
||||
One common idiom for getting a simple, string representation of an object is to use::
|
||||
|
||||
str(obj)
|
||||
|
||||
Unfortunately, this is not safe. Sometimes str(obj) will return
|
||||
:class:`unicode`. Sometimes it will return a byte :class:`str`. Sometimes,
|
||||
it will attempt to convert from a :class:`unicode` string to a byte
|
||||
:class:`str`, fail, and throw a :exc:`UnicodeError`. To be safe from all of
|
||||
these, first decide whether you need :class:`unicode` or :class:`str` to be
|
||||
returned. Then use :func:`to_unicode` or :func:`to_bytes` to get the simple
|
||||
representation like this::
|
||||
|
||||
u_representation = to_unicode(obj, nonstring='simplerepr')
|
||||
b_representation = to_bytes(obj, nonstring='simplerepr')
|
||||
|
||||
print
|
||||
-----
|
||||
|
||||
python has a builtin :func:`print` statement that outputs strings to the
|
||||
terminal. This originated in a time when python only dealt with byte
|
||||
:class:`str`. When :class:`unicode` strings came about, some enhancements
|
||||
were made to the :func:`print` statement so that it could print those as well.
|
||||
The enhancements make :func:`print` work most of the time. However, the times
|
||||
when it doesn't work tend to make for cryptic debugging.
|
||||
|
||||
The basic issue is that :func:`print` has to figure out what encoding to use
|
||||
when it prints a :class:`unicode` string to the terminal. When python is
|
||||
attached to your terminal (ie, you're running the interpreter or running
|
||||
a script that prints to the screen) python is able to take the encoding value
|
||||
from your locale settings :envvar:`LC_ALL` or :envvar:`LC_CTYPE` and print the
|
||||
characters allowed by that encoding. On most modern Unix systems, the
|
||||
encoding is :term:`utf-8` which means that you can print any :class:`unicode`
|
||||
character without problem.
|
||||
|
||||
There are two common cases of things going wrong:
|
||||
|
||||
1. Someone has a locale set that does not accept all valid unicode characters.
|
||||
For instance::
|
||||
|
||||
$ LC_ALL=C python
|
||||
>>> print u'\ufffd'
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
|
||||
|
||||
This often happens when a script that you've written and debugged from the
|
||||
terminal is run from an automated environment like :program:`cron`. It
|
||||
also occurs when you have written a script using a :term:`utf-8` aware
|
||||
locale and released it for consumption by people all over the internet.
|
||||
Inevitably, someone is running with a locale that can't handle all unicode
|
||||
characters and you get a traceback reported.
|
||||
2. You redirect output to a file. Python isn't using the values in
|
||||
:envvar:`LC_ALL` unconditionally to decide what encoding to use. Instead
|
||||
it is using the encoding set for the terminal you are printing to which is
|
||||
set to accept different encodings by :envvar:`LC_ALL`. If you redirect
|
||||
to a file, you are no longer printing to the terminal so :envvar:`LC_ALL`
|
||||
won't have any effect. At this point, python will decide it can't find an
|
||||
encoding and fallback to :term:`ASCII` which will likely lead to
|
||||
:exc:`UnicodeError` being raised. You can see this in a short script::
|
||||
|
||||
#! /usr/bin/python -tt
|
||||
print u'\ufffd'
|
||||
|
||||
And then look at the difference between running it normally and redirecting to a file:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ ./test.py
|
||||
<20>
|
||||
$ ./test.py > t
|
||||
Traceback (most recent call last):
|
||||
File "test.py", line 3, in <module>
|
||||
print u'\ufffd'
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
|
||||
|
||||
The short answer to dealing with this is to always use bytes when writing
|
||||
output. You can do this by explicitly converting to bytes like this::
|
||||
|
||||
from kitchen.text.converters import to_bytes
|
||||
u_string = u'\ufffd'
|
||||
print to_bytes(u_string)
|
||||
|
||||
or you can wrap stdout and stderr with a :class:`~codecs.StreamWriter`.
|
||||
A :class:`~codecs.StreamWriter` is convenient in that you can assign it to
|
||||
encode for :data:`sys.stdout` or :data:`sys.stderr` and then have output
|
||||
automatically converted but it has the drawback of still being able to throw
|
||||
:exc:`UnicodeError` if the writer can't encode all possible unicode
|
||||
codepoints. Kitchen provides an alternate version which can be retrieved with
|
||||
:func:`kitchen.text.converters.getwriter` which will not traceback in its
|
||||
standard configuration.
|
||||
|
||||
.. _unicode-and-dict-keys:
|
||||
|
||||
Unicode, str, and dict keys
|
||||
---------------------------
|
||||
|
||||
The :func:`hash` of the :term:`ASCII` characters is the same for
|
||||
:class:`unicode` and byte :class:`str`. When you use them in :class:`dict`
|
||||
keys, they evaluate to the same dictionary slot::
|
||||
|
||||
>>> u_string = u'a'
|
||||
>>> b_string = 'a'
|
||||
>>> hash(u_string), hash(b_string)
|
||||
(12416037344, 12416037344)
|
||||
>>> d = {}
|
||||
>>> d[u_string] = 'unicode'
|
||||
>>> d[b_string] = 'bytes'
|
||||
>>> d
|
||||
{u'a': 'bytes'}
|
||||
|
||||
When you deal with key values outside of :term:`ASCII`, :class:`unicode` and
|
||||
byte :class:`str` evaluate unequally no matter what their character content or
|
||||
hash value::
|
||||
|
||||
>>> u_string = u'ñ'
|
||||
>>> b_string = u_string.encode('utf-8')
|
||||
>>> print u_string
|
||||
ñ
|
||||
>>> print b_string
|
||||
ñ
|
||||
>>> d = {}
|
||||
>>> d[u_string] = 'unicode'
|
||||
>>> d[b_string] = 'bytes'
|
||||
>>> d
|
||||
{u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
|
||||
>>> b_string2 = '\\xf1'
|
||||
>>> hash(u_string), hash(b_string2)
|
||||
(30848092528, 30848092528)
|
||||
>>> d = {}
|
||||
>>> d[u_string] = 'unicode'
|
||||
>>> d[b_string2] = 'bytes'
|
||||
{u'\\xf1': 'unicode', '\\xf1': 'bytes'}
|
||||
|
||||
How do you work with this one? Remember rule #1: Keep your :class:`unicode`
|
||||
and byte :class:`str` values separate. That goes for keys in a dictionary
|
||||
just like anything else.
|
||||
|
||||
* For any given dictionary, make sure that all your keys are either
|
||||
:class:`unicode` or :class:`str`. **Do not mix the two.** If you're being
|
||||
given both :class:`unicode` and :class:`str` but you don't need to preserve
|
||||
separate keys for each, I recommend using :func:`to_unicode` or
|
||||
:func:`to_bytes` to convert all keys to one type or the other like this::
|
||||
|
||||
>>> from kitchen.text.converters import to_unicode
|
||||
>>> u_string = u'one'
|
||||
>>> b_string = 'two'
|
||||
>>> d = {}
|
||||
>>> d[to_unicode(u_string)] = 1
|
||||
>>> d[to_unicode(b_string)] = 2
|
||||
>>> d
|
||||
{u'two': 2, u'one': 1}
|
||||
|
||||
* These issues also apply to using dicts with tuple keys that contain
|
||||
a mixture of :class:`unicode` and :class:`str`. Once again the best fix
|
||||
is to standardise on either :class:`str` or :class:`unicode`.
|
||||
|
||||
* If you absolutely need to store values in a dictionary where the keys could
|
||||
be either :class:`unicode` or :class:`str` you can use
|
||||
:class:`~kitchen.collections.strictdict.StrictDict` which has separate
|
||||
entries for all :class:`unicode` and byte :class:`str` and deals correctly
|
||||
with any :class:`tuple` containing mixed :class:`unicode` and byte
|
||||
:class:`str`.
|
||||
|
||||
---------
|
||||
Functions
|
||||
---------
|
||||
|
||||
Unicode and byte str conversion
|
||||
===============================
|
||||
|
||||
.. autofunction:: kitchen.text.converters.to_unicode
|
||||
.. autofunction:: kitchen.text.converters.to_bytes
|
||||
.. autofunction:: kitchen.text.converters.getwriter
|
||||
.. autofunction:: kitchen.text.converters.to_str
|
||||
.. autofunction:: kitchen.text.converters.to_utf8
|
||||
|
||||
Transformation to XML
|
||||
=====================
|
||||
|
||||
.. autofunction:: kitchen.text.converters.unicode_to_xml
|
||||
.. autofunction:: kitchen.text.converters.xml_to_unicode
|
||||
.. autofunction:: kitchen.text.converters.byte_string_to_xml
|
||||
.. autofunction:: kitchen.text.converters.xml_to_byte_string
|
||||
.. autofunction:: kitchen.text.converters.bytes_to_xml
|
||||
.. autofunction:: kitchen.text.converters.xml_to_bytes
|
||||
.. autofunction:: kitchen.text.converters.guess_encoding_to_xml
|
||||
.. autofunction:: kitchen.text.converters.to_xml
|
||||
|
||||
Working with exception messages
|
||||
===============================
|
||||
|
||||
.. autodata:: kitchen.text.converters.EXCEPTION_CONVERTERS
|
||||
.. autodata:: kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS
|
||||
.. autofunction:: kitchen.text.converters.exception_to_unicode
|
||||
.. autofunction:: kitchen.text.converters.exception_to_bytes
|
33
kitchen3/docs/api-text-display.rst
Normal file
33
kitchen3/docs/api-text-display.rst
Normal file
|
@ -0,0 +1,33 @@
|
|||
.. automodule:: kitchen.text.display
|
||||
|
||||
.. autofunction:: kitchen.text.display.textual_width
|
||||
|
||||
.. autofunction:: kitchen.text.display.textual_width_chop
|
||||
|
||||
.. autofunction:: kitchen.text.display.textual_width_fill
|
||||
|
||||
.. autofunction:: kitchen.text.display.wrap
|
||||
|
||||
.. autofunction:: kitchen.text.display.fill
|
||||
|
||||
.. autofunction:: kitchen.text.display.byte_string_textual_width_fill
|
||||
|
||||
Internal Data
|
||||
=============
|
||||
|
||||
There are a few internal functions and variables in this module. Code outside
|
||||
of kitchen shouldn't use them but people coding on kitchen itself may find
|
||||
them useful.
|
||||
|
||||
.. autodata:: kitchen.text.display._COMBINING
|
||||
|
||||
.. autofunction:: kitchen.text.display._generate_combining_table
|
||||
|
||||
.. autofunction:: kitchen.text.display._print_combining_table
|
||||
|
||||
.. autofunction:: kitchen.text.display._interval_bisearch
|
||||
|
||||
.. autofunction:: kitchen.text.display._ucp_width
|
||||
|
||||
.. autofunction:: kitchen.text.display._textual_width_le
|
||||
|
2
kitchen3/docs/api-text-misc.rst
Normal file
2
kitchen3/docs/api-text-misc.rst
Normal file
|
@ -0,0 +1,2 @@
|
|||
.. automodule:: kitchen.text.misc
|
||||
:members:
|
3
kitchen3/docs/api-text-utf8.rst
Normal file
3
kitchen3/docs/api-text-utf8.rst
Normal file
|
@ -0,0 +1,3 @@
|
|||
.. automodule:: kitchen.text.utf8
|
||||
:members:
|
||||
:deprecated:
|
22
kitchen3/docs/api-text.rst
Normal file
22
kitchen3/docs/api-text.rst
Normal file
|
@ -0,0 +1,22 @@
|
|||
=============================================
|
||||
Kitchen.text: unicode and utf8 and xml oh my!
|
||||
=============================================
|
||||
|
||||
The kitchen.text module contains functions that deal with text manipulation.
|
||||
|
||||
.. toctree::
|
||||
|
||||
api-text-converters
|
||||
api-text-display
|
||||
api-text-misc
|
||||
api-text-utf8
|
||||
|
||||
:mod:`~kitchen.text.converters`
|
||||
deals with converting text for different encodings and to and from XML
|
||||
:mod:`~kitchen.text.display`
|
||||
deals with issues with printing text to a screen
|
||||
:mod:`~kitchen.text.misc`
|
||||
is a catchall for text manipulation functions that don't seem to fit
|
||||
elsewhere
|
||||
:mod:`~kitchen.text.utf8`
|
||||
contains deprecated functions to manipulate utf8 byte strings
|
6
kitchen3/docs/api-versioning.rst
Normal file
6
kitchen3/docs/api-versioning.rst
Normal file
|
@ -0,0 +1,6 @@
|
|||
===============================
|
||||
Helpers for versioning software
|
||||
===============================
|
||||
|
||||
.. automodule:: kitchen.versioning
|
||||
:members:
|
220
kitchen3/docs/conf.py
Normal file
220
kitchen3/docs/conf.py
Normal file
|
@ -0,0 +1,220 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Kitchen documentation build configuration file, created by
|
||||
# sphinx-quickstart on Sat May 22 00:51:26 2010.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
import sys, os
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||
import kitchen.release
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#sys.path.append(os.path.abspath('.'))
|
||||
|
||||
# -- General configuration -----------------------------------------------------
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be extensions
|
||||
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
|
||||
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.pngmath', 'sphinx.ext.ifconfig']
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix of source filenames.
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The encoding of source files.
|
||||
#source_encoding = 'utf-8'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = kitchen.release.NAME
|
||||
copyright = kitchen.release.COPYRIGHT
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version.
|
||||
version = '0.2'
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
release = kitchen.__version__
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
language = 'en'
|
||||
|
||||
# There are two options for replacing |today|: either, you set today to some
|
||||
# non-false value, then it is used:
|
||||
#today = ''
|
||||
# Else, today_fmt is used as the format for a strftime call.
|
||||
#today_fmt = '%B %d, %Y'
|
||||
|
||||
# List of documents that shouldn't be included in the build.
|
||||
#unused_docs = []
|
||||
|
||||
# List of directories, relative to source directory, that shouldn't be searched
|
||||
# for source files.
|
||||
exclude_trees = []
|
||||
|
||||
# The reST default role (used for this markup: `text`) to use for all documents.
|
||||
#default_role = None
|
||||
|
||||
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||
add_function_parentheses = True
|
||||
|
||||
# If true, the current module name will be prepended to all description
|
||||
# unit titles (such as .. function::).
|
||||
#add_module_names = True
|
||||
|
||||
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||
# output. They are ignored by default.
|
||||
show_authors = True
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# A list of ignored prefixes for module index sorting.
|
||||
#modindex_common_prefix = []
|
||||
|
||||
highlight_language = 'python'
|
||||
|
||||
# -- Options for HTML output ---------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. Major themes that come with
|
||||
# Sphinx are currently 'default' and 'sphinxdoc'.
|
||||
html_theme = 'default'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#html_theme_options = {}
|
||||
|
||||
# Add any paths that contain custom themes here, relative to this directory.
|
||||
#html_theme_path = []
|
||||
|
||||
# The name for this set of Sphinx documents. If None, it defaults to
|
||||
# "<project> v<release> documentation".
|
||||
#html_title = None
|
||||
|
||||
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||
#html_short_title = None
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top
|
||||
# of the sidebar.
|
||||
#html_logo = None
|
||||
|
||||
# The name of an image file (within the static path) to use as favicon of the
|
||||
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||
# pixels large.
|
||||
#html_favicon = None
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||
# using the given strftime format.
|
||||
#html_last_updated_fmt = '%b %d, %Y'
|
||||
|
||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||
# typographically correct entities.
|
||||
#html_use_smartypants = True
|
||||
|
||||
# Content template for the index page.
|
||||
html_index = 'index.html'
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
#html_sidebars = {}
|
||||
|
||||
# Additional templates that should be rendered to pages, maps page names to
|
||||
# template names.
|
||||
#html_additional_pages = {}
|
||||
|
||||
# If false, no module index is generated.
|
||||
#html_use_modindex = True
|
||||
|
||||
# If false, no index is generated.
|
||||
#html_use_index = True
|
||||
|
||||
# If true, the index is split into individual pages for each letter.
|
||||
#html_split_index = False
|
||||
|
||||
# If true, links to the reST sources are added to the pages.
|
||||
#html_show_sourcelink = True
|
||||
|
||||
# If true, an OpenSearch description file will be output, and all pages will
|
||||
# contain a <link> tag referring to it. The value of this option must be the
|
||||
# base URL from which the finished HTML is served.
|
||||
html_use_opensearch = kitchen.release.DOWNLOAD_URL + 'docs/'
|
||||
|
||||
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
|
||||
#html_file_suffix = ''
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'kitchendoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output --------------------------------------------------
|
||||
|
||||
# The paper size ('letter' or 'a4').
|
||||
#latex_paper_size = 'letter'
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#latex_font_size = '10pt'
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title, author, documentclass [howto/manual]).
|
||||
latex_documents = [
|
||||
('index', 'kitchen.tex', 'kitchen Documentation',
|
||||
'Toshio Kuratomi', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
# the title page.
|
||||
#latex_logo = None
|
||||
|
||||
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||
# not chapters.
|
||||
#latex_use_parts = False
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#latex_preamble = ''
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#latex_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#latex_use_modindex = True
|
||||
|
||||
automodule_skip_lines = 4
|
||||
autoclass_content = "class"
|
||||
|
||||
# Example configuration for intersphinx: refer to the Python standard library.
|
||||
intersphinx_mapping = {'http://docs.python.org/': None,
|
||||
'https://fedorahosted.org/releases/p/y/python-fedora/doc/': None,
|
||||
'https://fedorahosted.org/releases/p/a/packagedb/doc/': None}
|
||||
|
||||
rst_epilog = '''
|
||||
.. |projpage| replace:: project webpage
|
||||
.. _projpage: %(url)s
|
||||
.. |docpage| replace:: documentation page
|
||||
.. _docpage: %(download)s/docs
|
||||
.. |downldpage| replace:: download page
|
||||
.. _downldpage: %(download)s
|
||||
.. |stdlib| replace:: python standard library
|
||||
.. _stdlib: http://docs.python.org/library
|
||||
''' % {'url': kitchen.release.URL, 'download': kitchen.release.DOWNLOAD_URL}
|
690
kitchen3/docs/designing-unicode-apis.rst
Normal file
690
kitchen3/docs/designing-unicode-apis.rst
Normal file
|
@ -0,0 +1,690 @@
|
|||
.. _DesigningUnicodeAwareAPIs:
|
||||
|
||||
============================
|
||||
Designing Unicode Aware APIs
|
||||
============================
|
||||
|
||||
APIs that deal with byte :class:`str` and :class:`unicode` strings are
|
||||
difficult to get right. Here are a few strategies with pros and cons of each.
|
||||
|
||||
.. contents::
|
||||
|
||||
-------------------------------------------------
|
||||
Take either bytes or unicode, output only unicode
|
||||
-------------------------------------------------
|
||||
|
||||
In this strategy, you allow the user to enter either :class:`unicode` strings
|
||||
or byte :class:`str` but what you give back is always :class:`unicode`. This
|
||||
strategy is easy for novice endusers to start using immediately as they will
|
||||
be able to feed either type of string into the function and get back a string
|
||||
that they can use in other places.
|
||||
|
||||
However, it does lead to the novice writing code that functions correctly when
|
||||
testing it with :term:`ASCII`-only data but fails when given data that contains
|
||||
non-:term:`ASCII` characters. Worse, if your API is not designed to be
|
||||
flexible, the consumer of your code won't be able to easily correct those
|
||||
problems once they find them.
|
||||
|
||||
Here's a good API that uses this strategy::
|
||||
|
||||
from kitchen.text.converters import to_unicode
|
||||
|
||||
def truncate(msg, max_length, encoding='utf8', errors='replace'):
|
||||
msg = to_unicode(msg, encoding, errors)
|
||||
return msg[:max_length]
|
||||
|
||||
The call to :func:`truncate` starts with the essential parameters for
|
||||
performing the task. It ends with two optional keyword arguments that define
|
||||
the encoding to use to transform from a byte :class:`str` to :class:`unicode`
|
||||
and the strategy to use if undecodable bytes are encountered. The defaults
|
||||
may vary depending on the use cases you have in mind. When the output is
|
||||
generally going to be printed for the user to see, ``errors='replace'`` is
|
||||
a good default. If you are constructing keys to a database, raisng an
|
||||
exception (with ``errors='strict'``) may be a better default. In either case,
|
||||
having both parameters allows the person using your API to choose how they
|
||||
want to handle any problems. Having the values is also a clue to them that
|
||||
a conversion from byte :class:`str` to :class:`unicode` string is going to
|
||||
occur.
|
||||
|
||||
.. note::
|
||||
|
||||
If you're targeting python-3.1 and above, ``errors='surrogateescape'`` may
|
||||
be a better default than ``errors='strict'``. You need to be mindful of
|
||||
a few things when using ``surrogateescape`` though:
|
||||
|
||||
* ``surrogateescape`` will cause issues if a non-:term:`ASCII` compatible
|
||||
encoding is used (for instance, UTF-16 and UTF-32.) That makes it
|
||||
unhelpful in situations where a true general purpose method of encoding
|
||||
must be found. :pep:`383` mentions that ``surrogateescape`` was
|
||||
specifically designed with the limitations of translating using system
|
||||
locales (where :term:`ASCII` compatibility is generally seen as
|
||||
inescapable) so you should keep that in mind.
|
||||
* If you use ``surrogateescape`` to decode from :class:`bytes`
|
||||
to :class:`unicode` you will need to use an error handler other than
|
||||
``strict`` to encode as the lone surrogate that this error handler
|
||||
creates makes for invalid unicode that must be handled when encoding.
|
||||
In Python-3.1.2 or less, a bug in the encoder error handlers mean that
|
||||
you can only use ``surrogateescape`` to encode; anything else will throw
|
||||
an error.
|
||||
|
||||
Evaluate your usages of the variables in question to see what makes sense.
|
||||
|
||||
Here's a bad example of using this strategy::
|
||||
|
||||
from kitchen.text.converters import to_unicode
|
||||
|
||||
def truncate(msg, max_length):
|
||||
msg = to_unicode(msg)
|
||||
return msg[:max_length]
|
||||
|
||||
In this example, we don't have the optional keyword arguments for
|
||||
:attr:`encoding` and :attr:`errors`. A user who uses this function is more
|
||||
likely to miss the fact that a conversion from byte :class:`str` to
|
||||
:class:`unicode` is going to occur. And once an error is reported, they will
|
||||
have to look through their backtrace and think harder about where they want to
|
||||
transform their data into :class:`unicode` strings instead of having the
|
||||
opportunity to control how the conversion takes place in the function itself.
|
||||
Note that the user does have the ability to make this work by making the
|
||||
transformation to unicode themselves::
|
||||
|
||||
from kitchen.text.converters import to_unicode
|
||||
|
||||
msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
|
||||
new_msg = truncate(msg, 5)
|
||||
|
||||
--------------------------------------------------
|
||||
Take either bytes or unicode, output the same type
|
||||
--------------------------------------------------
|
||||
|
||||
This strategy is sometimes called polymorphic because the type of data that is
|
||||
returned is dependent on the type of data that is received. The concept is
|
||||
that when you are given a byte :class:`str` to process, you return a byte
|
||||
:class:`str` in your output. When you are given :class:`unicode` strings to
|
||||
process, you return :class:`unicode` strings in your output.
|
||||
|
||||
This can work well for end users as the ones that know about the difference
|
||||
between the two string types will already have transformed the strings to
|
||||
their desired type before giving it to this function. The ones that don't can
|
||||
remain blissfully ignorant (at least, as far as your function is concerned) as
|
||||
the function does not change the type.
|
||||
|
||||
In cases where the encoding of the byte :class:`str` is known or can be
|
||||
discovered based on the input data this works well. If you can't figure out
|
||||
the input encoding, however, this strategy can fail in any of the following
|
||||
cases:
|
||||
|
||||
1. It needs to do an internal conversion between byte :class:`str` and
|
||||
:class:`unicode` string.
|
||||
2. It cannot return the same data as either a :class:`unicode` string or byte
|
||||
:class:`str`.
|
||||
3. You may need to deal with byte strings that are not byte-compatible with
|
||||
:term:`ASCII`
|
||||
|
||||
First, a couple examples of using this strategy in a good way::
|
||||
|
||||
def translate(msg, table):
|
||||
replacements = table.keys()
|
||||
new_msg = []
|
||||
for index, char in enumerate(msg):
|
||||
if char in replacements:
|
||||
new_msg.append(table[char])
|
||||
else:
|
||||
new_msg.append(char)
|
||||
|
||||
return ''.join(new_msg)
|
||||
|
||||
In this example, all of the strings that we use (except the empty string which
|
||||
is okay because it doesn't have any characters to encode) come from outside of
|
||||
the function. Due to that, the user is responsible for making sure that the
|
||||
:attr:`msg`, and the keys and values in :attr:`table` all match in terms of
|
||||
type (:class:`unicode` vs :class:`str`) and encoding (You can do some error
|
||||
checking to make sure the user gave all the same type but you can't do the
|
||||
same for the user giving different encodings). You do not need to make
|
||||
changes to the string that require you to know the encoding or type of the
|
||||
string; everything is a simple replacement of one element in the array of
|
||||
characters in message with the character in table.
|
||||
|
||||
::
|
||||
|
||||
import json
|
||||
from kitchen.text.converters import to_unicode, to_bytes
|
||||
|
||||
def first_field_from_json_data(json_string):
|
||||
'''Return the first field in a json data structure.
|
||||
|
||||
The format of the json data is a simple list of strings.
|
||||
'["one", "two", "three"]'
|
||||
'''
|
||||
if isinstance(json_string, unicode):
|
||||
# On all python versions, json.loads() returns unicode if given
|
||||
# a unicode string
|
||||
return json.loads(json_string)[0]
|
||||
|
||||
# Byte str: figure out which encoding we're dealing with
|
||||
if '\x00' not in json_data[:2]
|
||||
encoding = 'utf8'
|
||||
elif '\x00\x00\x00' == json_data[:3]:
|
||||
encoding = 'utf-32-be'
|
||||
elif '\x00\x00\x00' == json_data[1:4]:
|
||||
encoding = 'utf-32-le'
|
||||
elif '\x00' == json_data[0] and '\x00' == json_data[2]:
|
||||
encoding = 'utf-16-be'
|
||||
else:
|
||||
encoding = 'utf-16-le'
|
||||
|
||||
data = json.loads(unicode(json_string, encoding))
|
||||
return data[0].encode(encoding)
|
||||
|
||||
In this example the function takes either a byte :class:`str` type or
|
||||
a :class:`unicode` string that has a list in json format and returns the first
|
||||
field from it as the type of the input string. The first section of code is
|
||||
very straightforward; we receive a :class:`unicode` string, parse it with
|
||||
a function, and then return the first field from our parsed data (which our
|
||||
function returned to us as json data).
|
||||
|
||||
The second portion that deals with byte :class:`str` is not so
|
||||
straightforward. Before we can parse the string we have to determine what
|
||||
characters the bytes in the string map to. If we didn't do that, we wouldn't
|
||||
be able to properly find which characters are present in the string. In order
|
||||
to do that we have to figure out the encoding of the byte :class:`str`.
|
||||
Luckily, the json specification states that all strings are unicode and
|
||||
encoded with one of UTF32be, UTF32le, UTF16be, UTF16le, or :term:`UTF-8`. It further
|
||||
defines the format such that the first two characters are always
|
||||
:term:`ASCII`. Each of these has a different sequence of NULLs when they
|
||||
encode an :term:`ASCII` character. We can use that to detect which encoding
|
||||
was used to create the byte :class:`str`.
|
||||
|
||||
Finally, we return the byte :class:`str` by encoding the :class:`unicode` back
|
||||
to a byte :class:`str`.
|
||||
|
||||
As you can see, in this example we have to convert from byte :class:`str` to
|
||||
:class:`unicode` and back. But we know from the json specification that byte
|
||||
:class:`str` has to be one of a limited number of encodings that we are able
|
||||
to detect. That ability makes this strategy work.
|
||||
|
||||
Now for some examples of using this strategy in ways that fail::
|
||||
|
||||
import unicodedata
|
||||
def first_char(msg):
|
||||
'''Return the first character in a string'''
|
||||
if not isinstance(msg, unicode):
|
||||
try:
|
||||
msg = unicode(msg, 'utf8')
|
||||
except UnicodeError:
|
||||
msg = unicode(msg, 'latin1')
|
||||
msg = unicodedata.normalize('NFC', msg)
|
||||
return msg[0]
|
||||
|
||||
If you look at that code and think that there's something fragile and prone to
|
||||
breaking in the ``try: except:`` block you are correct in being suspicious.
|
||||
This code will fail on multi-byte character sets that aren't :term:`UTF-8`. It
|
||||
can also fail on data where the sequence of bytes is valid :term:`UTF-8` but
|
||||
the bytes are actually of a different encoding. The reasons this code fails
|
||||
is that we don't know what encoding the bytes are in and the code must convert
|
||||
from a byte :class:`str` to a :class:`unicode` string in order to function.
|
||||
|
||||
In order to make this code robust we must know the encoding of :attr:`msg`.
|
||||
The only way to know that is to ask the user so the API must do that::
|
||||
|
||||
import unicodedata
|
||||
def number_of_chars(msg, encoding='utf8', errors='strict'):
|
||||
if not isinstance(msg, unicode):
|
||||
msg = unicode(msg, encoding, errors)
|
||||
msg = unicodedata.normalize('NFC', msg)
|
||||
return len(msg)
|
||||
|
||||
Another example of failure::
|
||||
|
||||
import os
|
||||
def listdir(directory):
|
||||
files = os.listdir(directory)
|
||||
if isinstance(directory, str):
|
||||
return files
|
||||
# files could contain both bytes and unicode
|
||||
new_files = []
|
||||
for filename in files:
|
||||
if not isinstance(filename, unicode):
|
||||
# What to do here?
|
||||
continue
|
||||
new_files.appen(filename)
|
||||
return new_files
|
||||
|
||||
This function illustrates the second failure mode. Here, not all of the
|
||||
possible values can be represented as :class:`unicode` without knowing more
|
||||
about the encoding of each of the filenames involved. Since each filename
|
||||
could have a different encoding there's a few different options to pursue. We
|
||||
could make this function always return byte :class:`str` since that can
|
||||
accurately represent anything that could be returned. If we want to return
|
||||
:class:`unicode` we need to at least allow the user to specify what to do in
|
||||
case of an error decoding the bytes to :class:`unicode`. We can also let the
|
||||
user specify the encoding to use for doing the decoding but that won't help in
|
||||
all cases since not all files will be in the same encoding (or even
|
||||
necessarily in any encoding)::
|
||||
|
||||
import locale
|
||||
import os
|
||||
def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
|
||||
# Note: In python-3.1+, surrogateescape may be a better default
|
||||
files = os.listdir(directory)
|
||||
if isinstance(directory, str):
|
||||
return files
|
||||
new_files = []
|
||||
for filename in files:
|
||||
if not isinstance(filename, unicode):
|
||||
filename = unicode(filename, encoding=encoding, errors=errors)
|
||||
new_files.append(filename)
|
||||
return new_files
|
||||
|
||||
Note that although we use :attr:`errors` in this example as what to pass to
|
||||
the codec that decodes to :class:`unicode` we could also have an
|
||||
:attr:`errors` argument that decides other things to do like skip a filename
|
||||
entirely, return a placeholder (``Nondisplayable filename``), or raise an
|
||||
exception.
|
||||
|
||||
This leaves us with one last failure to describe::
|
||||
|
||||
def first_field(csv_string):
|
||||
'''Return the first field in a comma separated values string.'''
|
||||
try:
|
||||
return csv_string[:csv_string.index(',')]
|
||||
except ValueError:
|
||||
return csv_string
|
||||
|
||||
This code looks simple enough. The hidden error here is that we are searching
|
||||
for a comma character in a byte :class:`str` but not all encodings will use
|
||||
the same sequence of bytes to represent the comma. If you use an encoding
|
||||
that's not :term:`ASCII` compatible on the byte level, then the literal comma
|
||||
``','`` in the above code will match inappropriate bytes. Some examples of
|
||||
how it can fail:
|
||||
|
||||
* Will find the byte representing an :term:`ASCII` comma in another character
|
||||
* Will find the comma but leave trailing garbage bytes on the end of the
|
||||
string
|
||||
* Will not match the character that represents the comma in this encoding
|
||||
|
||||
There are two ways to solve this. You can either take the encoding value from
|
||||
the user or you can take the separator value from the user. Of the two,
|
||||
taking the encoding is the better option for two reasons:
|
||||
|
||||
1. Taking a separator argument doesn't clearly document for the API user that
|
||||
the reason they must give it is to properly match the encoding of the
|
||||
:attr:`csv_string`. They're just as likely to think that it's simply a way
|
||||
to specify an alternate character (like ":" or "|") for the separator.
|
||||
2. It's possible for a variable width encoding to reuse the same byte sequence
|
||||
for different characters in multiple sequences.
|
||||
|
||||
.. note::
|
||||
|
||||
:term:`UTF-8` is resistant to this as any character's sequence of
|
||||
bytes will never be a subset of another character's sequence of bytes.
|
||||
|
||||
With that in mind, here's how to improve the API::
|
||||
|
||||
def first_field(csv_string, encoding='utf-8', errors='replace'):
|
||||
if not isinstance(csv_string, unicode):
|
||||
u_string = unicode(csv_string, encoding, errors)
|
||||
is_unicode = False
|
||||
else:
|
||||
u_string = csv_string
|
||||
|
||||
try:
|
||||
field = u_string[:U_string.index(u',')]
|
||||
except ValueError:
|
||||
return csv_string
|
||||
|
||||
if not is_unicode:
|
||||
field = field.encode(encoding, errors)
|
||||
return field
|
||||
|
||||
.. note::
|
||||
|
||||
If you decide you'll never encounter a variable width encoding that reuses
|
||||
byte sequences you can use this code instead::
|
||||
|
||||
def first_field(csv_string, encoding='utf-8'):
|
||||
try:
|
||||
return csv_string[:csv_string.index(','.encode(encoding))]
|
||||
except ValueError:
|
||||
return csv_string
|
||||
|
||||
------------------
|
||||
Separate functions
|
||||
------------------
|
||||
|
||||
Sometimes you want to be able to take either byte :class:`str` or
|
||||
:class:`unicode` strings, perform similar operations on either one and then
|
||||
return data in the same format as was given. Probably the easiest way to do
|
||||
that is to have separate functions for each and adopt a naming convention to
|
||||
show that one is for working with byte :class:`str` and the other is for
|
||||
working with :class:`unicode` strings::
|
||||
|
||||
def translate_b(msg, table):
|
||||
'''Replace values in str with other byte values like unicode.translate'''
|
||||
if not isinstance(msg, str):
|
||||
raise TypeError('msg must be of type str')
|
||||
str_table = [chr(s) for s in xrange(0,256)]
|
||||
delete_chars = []
|
||||
for chr_val in (k for k in table.keys() if isinstance(k, int)):
|
||||
if chr_val > 255:
|
||||
raise ValueError('Keys in table must not exceed 255)')
|
||||
if table[chr_val] == None:
|
||||
delete_chars.append(chr(chr_val))
|
||||
elif isinstance(table[chr_val], int):
|
||||
if table[chr_val] > 255:
|
||||
raise TypeError('table values cannot be more than 255 or less than 0')
|
||||
str_table[chr_val] = chr(table[chr_val])
|
||||
else:
|
||||
if not isinstance(table[chr_val], str):
|
||||
raise TypeError('character mapping must return integer, None or str')
|
||||
str_table[chr_val] = table[chr_val]
|
||||
str_table = ''.join(str_table)
|
||||
delete_chars = ''.join(delete_chars)
|
||||
return msg.translate(str_table, delete_chars)
|
||||
|
||||
def translate(msg, table):
|
||||
'''Replace values in a unicode string with other values'''
|
||||
if not isinstance(msg, unicode):
|
||||
raise TypeError('msg must be of type unicode')
|
||||
return msg.translate(table)
|
||||
|
||||
There's several things that we have to do in this API:
|
||||
|
||||
* Because the function names might not be enough of a clue to the user of the
|
||||
functions of the value types that are expected, we have to check that the
|
||||
types are correct.
|
||||
|
||||
* We keep the behaviour of the two functions as close to the same as possible,
|
||||
just with byte :class:`str` and :class:`unicode` strings substituted for
|
||||
each other.
|
||||
|
||||
|
||||
-----------------------------------------------------------------
|
||||
Deciding whether to take str or unicode when no value is returned
|
||||
-----------------------------------------------------------------
|
||||
|
||||
Not all functions have a return value. Sometimes a function is there to
|
||||
interact with something external to python, for instance, writing a file out
|
||||
to disk or a method exists to update the internal state of a data structure.
|
||||
One of the main questions with these APIs is whether to take byte
|
||||
:class:`str`, :class:`unicode` string, or both. The answer depends on your
|
||||
use case but I'll give some examples here.
|
||||
|
||||
Writing to external data
|
||||
========================
|
||||
|
||||
When your information is going to an external data source like writing to
|
||||
a file you need to decide whether to take in :class:`unicode` strings or byte
|
||||
:class:`str`. Remember that most external data sources are not going to be
|
||||
dealing with unicode directly. Instead, they're going to be dealing with
|
||||
a sequence of bytes that may be interpreted as unicode. With that in mind,
|
||||
you either need to have the user give you a byte :class:`str` or convert to
|
||||
a byte :class:`str` inside the function.
|
||||
|
||||
Next you need to think about the type of data that you're receiving. If it's
|
||||
textual data, (for instance, this is a chat client and the user is typing
|
||||
messages that they expect to be read by another person) it probably makes sense to
|
||||
take in :class:`unicode` strings and do the conversion inside your function.
|
||||
On the other hand, if this is a lower level function that's passing data into
|
||||
a network socket, it probably should be taking byte :class:`str` instead.
|
||||
|
||||
Just as noted in the API notes above, you should specify an :attr:`encoding`
|
||||
and :attr:`errors` argument if you need to transform from :class:`unicode`
|
||||
string to byte :class:`str` and you are unable to guess the encoding from the
|
||||
data itself.
|
||||
|
||||
Updating data structures
|
||||
========================
|
||||
|
||||
Sometimes your API is just going to update a data structure and not
|
||||
immediately output that data anywhere. Just as when writing external data,
|
||||
you should think about both what your function is going to do with the data
|
||||
eventually and what the caller of your function is thinking that they're
|
||||
giving you. Most of the time, you'll want to take :class:`unicode` strings
|
||||
and enter them into the data structure as :class:`unicode` when the data is
|
||||
textual in nature. You'll want to take byte :class:`str` and enter them into
|
||||
the data structure as byte :class:`str` when the data is not text. Use
|
||||
a naming convention so the user knows what's expected.
|
||||
|
||||
-------------
|
||||
APIs to Avoid
|
||||
-------------
|
||||
|
||||
There are a few APIs that are just wrong. If you catch yourself making an API
|
||||
that does one of these things, change it before anyone sees your code.
|
||||
|
||||
Returning unicode unless a conversion fails
|
||||
===========================================
|
||||
|
||||
This type of API usually deals with byte :class:`str` at some point and
|
||||
converts it to :class:`unicode` because it's usually thought to be text.
|
||||
However, there are times when the bytes fail to convert to a :class:`unicode`
|
||||
string. When that happens, this API returns the raw byte :class:`str` instead
|
||||
of a :class:`unicode` string. One example of this is present in the |stdlib|_:
|
||||
python2's :func:`os.listdir`::
|
||||
|
||||
>>> import os
|
||||
>>> import locale
|
||||
>>> locale.getpreferredencoding()
|
||||
'UTF-8'
|
||||
>>> os.mkdir('/tmp/mine')
|
||||
>>> os.chdir('/tmp/mine')
|
||||
>>> open('nonsense_char_\xff', 'w').close()
|
||||
>>> open('all_ascii', 'w').close()
|
||||
>>> os.listdir(u'.')
|
||||
[u'all_ascii', 'nonsense_char_\xff']
|
||||
|
||||
The problem with APIs like this is that they cause failures that are hard to
|
||||
debug because they don't happen where the variables are set. For instance,
|
||||
let's say you take the filenames from :func:`os.listdir` and give it to this
|
||||
function::
|
||||
|
||||
def normalize_filename(filename):
|
||||
'''Change spaces and dashes into underscores'''
|
||||
return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})
|
||||
|
||||
When you test this, you use filenames that all are decodable in your preferred
|
||||
encoding and everything seems to work. But when this code is run on a machine
|
||||
that has filenames in multiple encodings the filenames returned by
|
||||
:func:`os.listdir` suddenly include byte :class:`str`. And byte :class:`str`
|
||||
has a different :func:`string.translate` function that takes different values.
|
||||
So the code raises an exception where it's not immediately obvious that
|
||||
:func:`os.listdir` is at fault.
|
||||
|
||||
Ignoring values with no chance of recovery
|
||||
==========================================
|
||||
|
||||
An early version of python3 attempted to fix the :func:`os.listdir` problem
|
||||
pointed out in the last section by returning all values that were decodable to
|
||||
:class:`unicode` and omitting the filenames that were not. This lead to the
|
||||
following output::
|
||||
|
||||
>>> import os
|
||||
>>> import locale
|
||||
>>> locale.getpreferredencoding()
|
||||
'UTF-8'
|
||||
>>> os.mkdir('/tmp/mine')
|
||||
>>> os.chdir('/tmp/mine')
|
||||
>>> open(b'nonsense_char_\xff', 'w').close()
|
||||
>>> open('all_ascii', 'w').close()
|
||||
>>> os.listdir('.')
|
||||
['all_ascii']
|
||||
|
||||
The issue with this type of code is that it is silently doing something
|
||||
surprising. The caller expects to get a full list of files back from
|
||||
:func:`os.listdir`. Instead, it silently ignores some of the files, returning
|
||||
only a subset. This leads to code that doesn't do what is expected that may
|
||||
go unnoticed until the code is in production and someone notices that
|
||||
something important is being missed.
|
||||
|
||||
Raising a UnicodeException with no chance of recovery
|
||||
=====================================================
|
||||
|
||||
Believe it or not, a few libraries exist that make it impossible to deal
|
||||
with unicode text without raising a :exc:`UnicodeError`. What seems to occur
|
||||
in these libraries is that the library has functions that expect to receive
|
||||
a :class:`unicode` string. However, internally, those functions call other
|
||||
functions that expect to receive a byte :class:`str`. The programmer of the
|
||||
API was smart enough to convert from a :class:`unicode` string to a byte
|
||||
:class:`str` but they did not give the user the chance to specify the
|
||||
encodings to use or how to deal with errors. This results in exceptions when
|
||||
the user passes in a byte :class:`str` because the initial function wants
|
||||
a :class:`unicode` string and exceptions when the user passes in
|
||||
a :class:`unicode` string because the function can't convert the string to
|
||||
bytes in the encoding that it's selected.
|
||||
|
||||
Do not put the user in the position of not being able to use your API without
|
||||
raising a :exc:`UnicodeError` with certain values. If you can only safely
|
||||
take :class:`unicode` strings, document that byte :class:`str` is not allowed
|
||||
and vice versa. If you have to convert internally, make sure to give the
|
||||
caller of your function parameters to control the encoding and how to treat
|
||||
errors that may occur during the encoding/decoding process. If your code will
|
||||
raise a :exc:`UnicodeError` with non-:term:`ASCII` values no matter what, you
|
||||
should probably rethink your API.
|
||||
|
||||
-----------------
|
||||
Knowing your data
|
||||
-----------------
|
||||
|
||||
If you've read all the way down to this section without skipping you've seen
|
||||
several admonitions about the type of data you are processing affecting the
|
||||
viability of the various API choices.
|
||||
|
||||
Here's a few things to consider in your data:
|
||||
|
||||
Do you need to operate on both bytes and unicode?
|
||||
=================================================
|
||||
|
||||
Much of the data in libraries, programs, and the general environment outside
|
||||
of python is written where strings are sequences of bytes. So when we
|
||||
interact with data that comes from outside of python or data that is about to
|
||||
leave python it may make sense to only operate on the data as a byte
|
||||
:class:`str`. There's two times when this may make sense:
|
||||
|
||||
1. The user is intended to hand the data to the function and then the function
|
||||
takes care of sending the data outside of python (to the filesystem, over
|
||||
the network, etc).
|
||||
2. The data is not representable as text. For instance, writing a binary
|
||||
file format.
|
||||
|
||||
Even when your code is operating in this area you still need to think a little
|
||||
more about your data. For instance, it might make sense for the person using
|
||||
your API to pass in :class:`unicode` strings and let the function convert that
|
||||
into the byte :class:`str` that it then sends over the wire.
|
||||
|
||||
There are also times when it might make sense to operate only on
|
||||
:class:`unicode` strings. :class:`unicode` represents text so anytime that
|
||||
you are working on textual data that isn't going to leave python it has the
|
||||
potential to be a :class:`unicode`-only API. However, there's two things that
|
||||
you should consider when designing a :class:`unicode`-only API:
|
||||
|
||||
1. As your API gains popularity, people are going to use your API in places
|
||||
that you may not have thought of. Corner cases in these other places may
|
||||
mean that processing bytes is desirable.
|
||||
2. In python2, byte :class:`str` and :class:`unicode` are often used
|
||||
interchangably with each other. That means that people programming against
|
||||
your API may have received :class:`str` from some other API and it would be
|
||||
most convenient for their code if your API accepted it.
|
||||
|
||||
.. note::
|
||||
|
||||
In python3, the separation between the text type and the byte type
|
||||
are more clear. So in python3, there's less need to have all APIs take
|
||||
both unicode and bytes.
|
||||
|
||||
Can you restrict the encodings?
|
||||
===============================
|
||||
If you determine that you have to deal with byte :class:`str` you should
|
||||
realize that not all encodings are created equal. Each has different
|
||||
properties that may make it possible to provide a simpler API provided that
|
||||
you can reasonably tell the users of your API that they cannot use certain
|
||||
classes of encodings.
|
||||
|
||||
As one example, if you are required to find a comma (``,``) in a byte
|
||||
:class:`str` you have different choices based on what encodings are allowed.
|
||||
If you can reasonably restrict your API users to only giving :term:`ASCII
|
||||
compatible` encodings you can do this simply by searching for the literal
|
||||
comma character because that character will be represented by the same byte
|
||||
sequence in all :term:`ASCII compatible` encodings.
|
||||
|
||||
The following are some classes of encodings to be aware of as you decide how
|
||||
generic your code needs to be.
|
||||
|
||||
Single byte encodings
|
||||
---------------------
|
||||
|
||||
Single byte encodings can only represent 256 total characters. They encode
|
||||
the :term:`code points` for a character to the equivalent number in a single
|
||||
byte.
|
||||
|
||||
Most single byte encodings are :term:`ASCII compatible`. :term:`ASCII
|
||||
compatible` encodings are the most likely to be usable without changes to code
|
||||
so this is good news. A notable exception to this is the `EBDIC
|
||||
<http://en.wikipedia.org/wiki/Extended_Binary_Coded_Decimal_Interchange_Code>`_
|
||||
family of encodings.
|
||||
|
||||
Multibyte encodings
|
||||
-------------------
|
||||
|
||||
Multibyte encodings use more than one byte to encode some characters.
|
||||
|
||||
Fixed width
|
||||
~~~~~~~~~~~
|
||||
|
||||
Fixed width encodings have a set number of bytes to represent all of the
|
||||
characters in the character set. ``UTF-32`` is an example of a fixed width
|
||||
encoding that uses four bytes per character and can express every unicode
|
||||
characters. There are a number of problems with writing APIs that need to
|
||||
operate on fixed width, multibyte characters. To go back to our earlier
|
||||
example of finding a comma in a string, we have to realize that even in
|
||||
``UTF-32`` where the :term:`code point` for :term:`ASCII` characters is the
|
||||
same as in :term:`ASCII`, the byte sequence for them is different. So you
|
||||
cannot search for the literal byte character as it may pick up false
|
||||
positives and may break a byte sequence in an odd place.
|
||||
|
||||
Variable Width
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
ASCII compatible
|
||||
""""""""""""""""
|
||||
|
||||
:term:`UTF-8` and the `EUC <http://en.wikipedia.org/wiki/Extended_Unix_Code>`_
|
||||
family of encodings are examples of :term:`ASCII compatible` multi-byte
|
||||
encodings. They achieve this by adhering to two principles:
|
||||
|
||||
* All of the :term:`ASCII` characters are represented by the byte that they
|
||||
are in the :term:`ASCII` encoding.
|
||||
* None of the :term:`ASCII` byte sequences are reused in any other byte
|
||||
sequence for a different character.
|
||||
|
||||
Escaped
|
||||
"""""""
|
||||
|
||||
Some multibyte encodings work by using only bytes from the :term:`ASCII`
|
||||
encoding but when a particular sequence of those byes is found, they are
|
||||
interpreted as meaning something other than their :term:`ASCII` values.
|
||||
``UTF-7`` is one such encoding that can encode all of the unicode
|
||||
:term:`code points`. For instance, here's a some Japanese characters encoded as
|
||||
``UTF-7``::
|
||||
|
||||
>>> a = u'\u304f\u3089\u3068\u307f'
|
||||
>>> print a
|
||||
くらとみ
|
||||
>>> print a.encode('utf-7')
|
||||
+ME8wiTBoMH8-
|
||||
|
||||
These encodings can be used when you need to encode unicode data that may
|
||||
contain non-:term:`ASCII` characters for inclusion in an :term:`ASCII` only
|
||||
transport medium or file.
|
||||
|
||||
However, they are not :term:`ASCII compatible` in the sense that we used
|
||||
earlier as the bytes that represent a :term:`ASCII` character are being reused
|
||||
as part of other characters. If you were to search for a literal plus sign in
|
||||
this encoded string, you would run across many false positives, for instance.
|
||||
|
||||
Other
|
||||
"""""
|
||||
|
||||
There are many other popular variable width encodings, for instance ``UTF-16``
|
||||
and ``shift-JIS``. Many of these are not :term:`ASCII compatible` so you
|
||||
cannot search for a literal :term:`ASCII` character without danger of false
|
||||
positives or false negatives.
|
107
kitchen3/docs/glossary.rst
Normal file
107
kitchen3/docs/glossary.rst
Normal file
|
@ -0,0 +1,107 @@
|
|||
========
|
||||
Glossary
|
||||
========
|
||||
|
||||
.. glossary::
|
||||
|
||||
"Everything but the kitchen sink"
|
||||
An English idiom meaning to include nearly everything that you can
|
||||
think of.
|
||||
|
||||
API version
|
||||
Version that is meant for computer consumption. This version is
|
||||
parsable and comparable by computers. It contains information about
|
||||
a library's API so that computer software can decide whether it works
|
||||
with the software.
|
||||
|
||||
ASCII
|
||||
A character encoding that maps numbers to characters essential to
|
||||
American English. It maps 128 characters using 7bits.
|
||||
|
||||
.. seealso:: http://en.wikipedia.org/wiki/ASCII
|
||||
|
||||
ASCII compatible
|
||||
An encoding in which the particular byte that maps to a character in
|
||||
the :term:`ASCII` character set is only used to map to that character.
|
||||
This excludes EBDIC based encodings and many multi-byte fixed and
|
||||
variable width encodings since they reuse the bytes that make up the
|
||||
:term:`ASCII` encoding for other purposes. :term:`UTF-8` is notable
|
||||
as a variable width encoding that is :term:`ASCII` compatible.
|
||||
|
||||
.. seealso::
|
||||
|
||||
http://en.wikipedia.org/wiki/Variable-width_encoding
|
||||
For another explanation of various ways bytes are mapped to
|
||||
characters in a possibly incompatible manner.
|
||||
|
||||
code points
|
||||
:term:`code point`
|
||||
|
||||
code point
|
||||
A number that maps to a particular abstract character. Code points
|
||||
make it so that we have a number pointing to a character without
|
||||
worrying about implementation details of how those numbers are stored
|
||||
for the computer to read. Encodings define how the code points map to
|
||||
particular sequences of bytes on disk and in memory.
|
||||
|
||||
control characters
|
||||
:term:`control character`
|
||||
|
||||
control character
|
||||
The set of characters in unicode that are used, not to display glyphs
|
||||
on the screen, but to tell the display in program to do something.
|
||||
|
||||
.. seealso:: http://en.wikipedia.org/wiki/Control_character
|
||||
|
||||
grapheme
|
||||
characters or pieces of characters that you might write on a page to
|
||||
make words, sentences, or other pieces of text.
|
||||
|
||||
.. seealso:: http://en.wikipedia.org/wiki/Grapheme
|
||||
|
||||
I18N
|
||||
I18N is an abbreviation for internationalization. It's often used to
|
||||
signify the need to translate words, number and date formats, and
|
||||
other pieces of data in a computer program so that it will work well
|
||||
for people who speak another language than yourself.
|
||||
|
||||
message catalogs
|
||||
:term:`message catalog`
|
||||
|
||||
message catalog
|
||||
Message catalogs contain translations for user-visible strings that
|
||||
are present in your code. Normally, you need to mark the strings to
|
||||
be translated by wrapping them in one of several :mod:`gettext`
|
||||
functions. The function serves two purposes:
|
||||
|
||||
1. It allows automated tools to find which strings are supposed to be
|
||||
extracted for translation.
|
||||
2. The functions perform the translation when the program is running.
|
||||
|
||||
.. seealso::
|
||||
`babel's documentation
|
||||
<http://babel.edgewall.org/wiki/Documentation/messages.html>`_
|
||||
for one method of extracting message catalogs from source
|
||||
code.
|
||||
|
||||
Murphy's Law
|
||||
"Anything that can go wrong, will go wrong."
|
||||
|
||||
.. seealso:: http://en.wikipedia.org/wiki/Murphy%27s_Law
|
||||
|
||||
release version
|
||||
Version that is meant for human consumption. This version is easy for
|
||||
a human to look at to decide how a particular version relates to other
|
||||
versions of the software.
|
||||
|
||||
textual width
|
||||
The amount of horizontal space a character takes up on a monospaced
|
||||
screen. The units are number of character cells or columns that it
|
||||
takes the place of.
|
||||
|
||||
UTF-8
|
||||
A character encoding that maps all unicode :term:`code points` to a sequence
|
||||
of bytes. It is compatible with :term:`ASCII`. It uses a variable
|
||||
number of bytes to encode all of unicode. ASCII characters take one
|
||||
byte. Characters from other parts of unicode take two to four bytes.
|
||||
It is widespread as an encoding on the internet and in Linux.
|
359
kitchen3/docs/hacking.rst
Normal file
359
kitchen3/docs/hacking.rst
Normal file
|
@ -0,0 +1,359 @@
|
|||
=======================================
|
||||
Conventions for contributing to kitchen
|
||||
=======================================
|
||||
|
||||
-----
|
||||
Style
|
||||
-----
|
||||
|
||||
* Strive to be :pep:`8` compliant
|
||||
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
|
||||
|
||||
------------------------
|
||||
Python 2.4 compatibility
|
||||
------------------------
|
||||
|
||||
At the moment, we're supporting python-2.4 and above. Understand that there's
|
||||
a lot of python features that we cannot use because of this.
|
||||
|
||||
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
|
||||
available. When we do that we need to be careful of several things:
|
||||
|
||||
1. Keep the module in sync with the version in the python-2.x trunk. Use
|
||||
:file:`maintainers/sync-copied-files.py` for this.
|
||||
2. Sync the unittests as well as the module.
|
||||
3. Be aware that not all modules are written to remain compatible with
|
||||
Python-2.4 and might use python language features that were not present
|
||||
then (generator expressions, relative imports, decorators, with, try: with
|
||||
both except: and finally:, etc) These are not good candidates for
|
||||
importing into kitchen as they require more work to keep synced.
|
||||
|
||||
---------
|
||||
Unittests
|
||||
---------
|
||||
|
||||
* At least smoketest your code (make sure a function will return expected
|
||||
values for one set of inputs).
|
||||
* Note that even 100% coverage is not a guarantee of working code! Good tests
|
||||
will realize that you need to also give multiple inputs that test the code
|
||||
paths of called functions that are outside of your code. Example::
|
||||
|
||||
def to_unicode(msg, encoding='utf8', errors='replace'):
|
||||
return unicode(msg, encoding, errors)
|
||||
|
||||
# Smoketest only. This will give 100% coverage for your code (it
|
||||
# tests all of the code inside of to_unicode) but it leaves a lot of
|
||||
# room for errors as it doesn't test all combinations of arguments
|
||||
# that are then passed to the unicode() function.
|
||||
|
||||
tools.ok_(to_unicode('abc') == u'abc')
|
||||
|
||||
# Better -- tests now cover non-ascii characters and that error conditions
|
||||
# occur properly. There's a lot of other permutations that can be
|
||||
# added along these same lines.
|
||||
tools.ok_(to_unicode(u'café', 'utf8', 'replace'))
|
||||
tools.assert_raises(UnicodeError, to_unicode, [u'cafè ñunru'.encode('latin1')])
|
||||
|
||||
* We're using nose for unittesting. Rather than depend on unittest2
|
||||
functionality, use the functions that nose provides.
|
||||
* Remember to maintain python-2.4 compatibility even in unittests.
|
||||
|
||||
----------------------------
|
||||
Docstrings and documentation
|
||||
----------------------------
|
||||
|
||||
We use sphinx to build our documentation. We use the sphinx autodoc extension
|
||||
to pull docstrings out of the modules for API documentation. This means that
|
||||
docstrings for subpackages and modules should follow a certain pattern. The
|
||||
general structure is:
|
||||
|
||||
* Introductory material about a module in the module's top level docstring.
|
||||
|
||||
* Introductory material should begin with a level two title: an overbar and
|
||||
underbar of '-'.
|
||||
|
||||
* docstrings for every function.
|
||||
|
||||
* The first line is a short summary of what the function does
|
||||
* This is followed by a blank line
|
||||
* The next lines are a `field list
|
||||
<http://sphinx.pocoo.org/markup/desc.html#info-field-lists>_` giving
|
||||
information about the function's signature. We use the keywords:
|
||||
``arg``, ``kwarg``, ``raises``, ``returns``, and sometimes ``rtype``. Use
|
||||
these to describe all arguments, key word arguments, exceptions raised,
|
||||
and return values using these.
|
||||
|
||||
* Parameters that are ``kwarg`` should specify what their default
|
||||
behaviour is.
|
||||
|
||||
.. _kitchen-versioning:
|
||||
|
||||
------------------
|
||||
Kitchen versioning
|
||||
------------------
|
||||
|
||||
Currently the kitchen library is in early stages of development. While we're
|
||||
in this state, the main kitchen library uses the following pattern for version
|
||||
information:
|
||||
|
||||
* Versions look like this::
|
||||
__version_info__ = ((0, 1, 2),)
|
||||
__version__ = '0.1.2'
|
||||
|
||||
* The Major version number remains at 0 until we decide to make the first 1.0
|
||||
release of kitchen. At that point, we're declaring that we have some
|
||||
confidence that we won't need to break backwards compatibility for a while.
|
||||
* The Minor version increments for any backwards incompatible API changes.
|
||||
When this is updated, we reset micro to zero.
|
||||
* The Micro version increments for any other changes (backwards compatible API
|
||||
changes, pure bugfixes, etc).
|
||||
|
||||
.. note::
|
||||
|
||||
Versioning is only updated for releases that generate sdists and new
|
||||
uploads to the download directory. Usually we update the version
|
||||
information for the library just before release. By contrast, we update
|
||||
kitchen :ref:`subpackage-versioning` when an API change is made. When in
|
||||
doubt, look at the version information in the last release.
|
||||
|
||||
----
|
||||
I18N
|
||||
----
|
||||
|
||||
All strings that are used as feedback for users need to be translated.
|
||||
:mod:`kitchen` sets up several functions for this. :func:`_` is used for
|
||||
marking things that are shown to users via print, GUIs, or other "standard"
|
||||
methods. Strings for exceptions are marked with :func:`b_`. This function
|
||||
returns a byte :class:`str` which is needed for use with exceptions::
|
||||
|
||||
from kitchen import _, b_
|
||||
|
||||
def print_message(msg, username):
|
||||
print _('%(user)s, your message of the day is: %(message)s') % {
|
||||
'message': msg, 'user': username}
|
||||
|
||||
raise Exception b_('Test message')
|
||||
|
||||
This serves several purposes:
|
||||
|
||||
* It marks the strings to be extracted by an xgettext-like program.
|
||||
* :func:`_` is a function that will substitute available translations at
|
||||
runtime.
|
||||
|
||||
.. note::
|
||||
|
||||
By using the ``%()s with dict`` style of string formatting, we make this
|
||||
string friendly to translators that may need to reorder the variables when
|
||||
they're translating the string.
|
||||
|
||||
`paver <http://www.blueskyonmars.com/projects/paver/>_` and `babel
|
||||
<http://babel.edgewall.org/>_` are used to extract the strings.
|
||||
|
||||
-----------
|
||||
API updates
|
||||
-----------
|
||||
|
||||
Kitchen strives to have a long deprecation cycle so that people have time to
|
||||
switch away from any APIs that we decide to discard. Discarded APIs should
|
||||
raise a :exc:`DeprecationWarning` and clearly state in the warning message and
|
||||
the docstring how to convert old code to use the new interface. An example of
|
||||
deprecating a function::
|
||||
|
||||
import warnings
|
||||
|
||||
from kitchen import _
|
||||
from kitchen.text.converters import to_bytes, to_unicode
|
||||
from kitchen.text.new_module import new_function
|
||||
|
||||
def old_function(param):
|
||||
'''**Deprecated**
|
||||
|
||||
This function is deprecated. Use
|
||||
:func:`kitchen.text.new_module.new_function` instead. If you want
|
||||
unicode strngs as output, switch to::
|
||||
|
||||
>>> from kitchen.text.new_module import new_function
|
||||
>>> output = new_function(param)
|
||||
|
||||
If you want byte strings, use::
|
||||
|
||||
>>> from kitchen.text.new_module import new_function
|
||||
>>> from kitchen.text.converters import to_bytes
|
||||
>>> output = to_bytes(new_function(param))
|
||||
'''
|
||||
warnings.warn(_('kitchen.text.old_function is deprecated. Use'
|
||||
' kitchen.text.new_module.new_function instead'),
|
||||
DeprecationWarning, stacklevel=2)
|
||||
|
||||
as_unicode = isinstance(param, unicode)
|
||||
message = new_function(to_unicode(param))
|
||||
if not as_unicode:
|
||||
message = to_bytes(message)
|
||||
return message
|
||||
|
||||
If a particular API change is very intrusive, it may be better to create a new
|
||||
version of the subpackage and ship both the old version and the new version.
|
||||
|
||||
---------
|
||||
NEWS file
|
||||
---------
|
||||
|
||||
Update the :file:`NEWS` file when you make a change that will be visible to
|
||||
the users. This is not a ChangeLog file so we don't need to list absolutely
|
||||
everything but it should give the user an idea of how this version differs
|
||||
from prior versions. API changes should be listed here explicitly. bugfixes
|
||||
can be more general::
|
||||
|
||||
-----
|
||||
0.2.0
|
||||
-----
|
||||
* Relicense to LGPLv2+
|
||||
* Add kitchen.text.format module with the following functions:
|
||||
textual_width, textual_width_chop.
|
||||
* Rename the kitchen.text.utils module to kitchen.text.misc. use of the
|
||||
old names is deprecated but still available.
|
||||
* bugfixes applied to kitchen.pycompat24.defaultdict that fixes some
|
||||
tracebacks
|
||||
|
||||
-------------------
|
||||
Kitchen subpackages
|
||||
-------------------
|
||||
|
||||
Kitchen itself is a namespace. The kitchen sdist (tarball) provides certain
|
||||
useful subpackages.
|
||||
|
||||
.. seealso::
|
||||
|
||||
`Kitchen addon packages`_
|
||||
For information about subpackages not distributed in the kitchen sdist
|
||||
that install into the kitchen namespace.
|
||||
|
||||
.. _subpackage-versioning:
|
||||
|
||||
Versioning
|
||||
==========
|
||||
|
||||
Each subpackage should have its own version information which is independent
|
||||
of the other kitchen subpackages and the main kitchen library version. This is
|
||||
used so that code that depends on kitchen APIs can check the version
|
||||
information. The standard way to do this is to put something like this in the
|
||||
subpackage's :file:`__init__.py`::
|
||||
|
||||
from kitchen.versioning import version_tuple_to_string
|
||||
|
||||
__version_info__ = ((1, 0, 0),)
|
||||
__version__ = version_tuple_to_string(__version_info__)
|
||||
|
||||
:attr:`__version_info__` is documented in :mod:`kitchen.versioning`. The
|
||||
values of the first tuple should describe API changes to the module. There
|
||||
are at least three numbers present in the tuple: (Major, minor, micro). The
|
||||
major version number is for backwards incompatible changes (For
|
||||
instance, removing a function, or adding a new mandatory argument to
|
||||
a function). Whenever one of these occurs, you should increment the major
|
||||
number and reset minor and micro to zero. The second number is the minor
|
||||
version. Anytime new but backwards compatible changes are introduced this
|
||||
number should be incremented and the micro version number reset to zero. The
|
||||
micro version should be incremented when a change is made that does not change
|
||||
the API at all. This is a common case for bugfixes, for instance.
|
||||
|
||||
Version information beyond the first three parts of the first tuple may be
|
||||
useful for versioning but semantically have similar meaning to the micro
|
||||
version.
|
||||
|
||||
.. note::
|
||||
|
||||
We update the :attr:`__version_info__` tuple when the API is updated.
|
||||
This way there's less chance of forgetting to update the API version when
|
||||
a new release is made. However, we try to only increment the version
|
||||
numbers a single step for any release. So if kitchen-0.1.0 has
|
||||
kitchen.text.__version__ == '1.0.1', kitchen-0.1.1 should have
|
||||
kitchen.text.__version__ == '1.0.2' or '1.1.0' or '2.0.0'.
|
||||
|
||||
Criteria for subpackages in kitchen
|
||||
===================================
|
||||
|
||||
Supackages within kitchen should meet these criteria:
|
||||
|
||||
* Generally useful or needed for other pieces of kitchen.
|
||||
|
||||
* No mandatory requirements outside of the |stdlib|_.
|
||||
|
||||
* Optional requirements from outside the |stdlib|_ are allowed. Things with
|
||||
mandatory requirements are better placed in `kitchen addon packages`_
|
||||
|
||||
* Somewhat API stable -- this is not a hard requirement. We can change the
|
||||
kitchen api. However, it is better not to as people may come to depend on
|
||||
it.
|
||||
|
||||
.. seealso::
|
||||
|
||||
`API Updates`_
|
||||
|
||||
----------------------
|
||||
Kitchen addon packages
|
||||
----------------------
|
||||
|
||||
Addon packages are very similar to subpackages integrated into the kitchen
|
||||
sdist. This section just lists some of the differences to watch out for.
|
||||
|
||||
setup.py
|
||||
========
|
||||
|
||||
Your :file:`setup.py` should contain entries like this::
|
||||
|
||||
# It's suggested to use a dotted name like this so the package is easily
|
||||
# findable on pypi:
|
||||
setup(name='kitchen.config',
|
||||
# Include kitchen in the keywords, again, for searching on pypi
|
||||
keywords=['kitchen', 'configuration'],
|
||||
# This package lives in the directory kitchen/config
|
||||
packages=['kitchen.config'],
|
||||
# [...]
|
||||
)
|
||||
|
||||
Package directory layout
|
||||
========================
|
||||
|
||||
Create a :file:`kitchen` directory in the toplevel. Place the addon
|
||||
subpackage in there. For example::
|
||||
|
||||
./ <== toplevel with README, setup.py, NEWS, etc
|
||||
kitchen/
|
||||
kitchen/__init__.py
|
||||
kitchen/config/ <== subpackage directory
|
||||
kitchen/config/__init__.py
|
||||
|
||||
Fake kitchen module
|
||||
===================
|
||||
|
||||
The :file::`__init__.py` in the :file:`kitchen` directory is special. It
|
||||
won't be installed. It just needs to pull in the kitchen from the system so
|
||||
that you are able to test your module. You should be able to use this
|
||||
boilerplate::
|
||||
|
||||
# Fake module. This is not installed, It's just made to import the real
|
||||
# kitchen modules for testing this module
|
||||
import pkgutil
|
||||
|
||||
# Extend the __path__ with everything in the real kitchen module
|
||||
__path__ = pkgutil.extend_path(__path__, __name__)
|
||||
|
||||
.. note::
|
||||
|
||||
:mod:`kitchen` needs to be findable by python for this to work. Installed
|
||||
in the :file:`site-packages` directory or adding it to the
|
||||
:envvar:`PYTHONPATH` will work.
|
||||
|
||||
Your unittests should now be able to find both your submodule and the main
|
||||
kitchen module.
|
||||
|
||||
Versioning
|
||||
==========
|
||||
|
||||
It is recommended that addon packages version similarly to
|
||||
:ref:`subpackage-versioning`. The :data:`__version_info__` and
|
||||
:data:`__version__` strings can be changed independently of the version
|
||||
exposed by setup.py so that you have both an API version
|
||||
(:data:`__version_info__`) and release version that's easier for people to
|
||||
parse. However, you aren't required to do this and you could follow
|
||||
a different methodology if you want (for instance, :ref:`kitchen-versioning`)
|
140
kitchen3/docs/index.rst
Normal file
140
kitchen3/docs/index.rst
Normal file
|
@ -0,0 +1,140 @@
|
|||
================================
|
||||
Kitchen, everything but the sink
|
||||
================================
|
||||
|
||||
:Author: Toshio Kuratomi
|
||||
:Date: 19 March 2011
|
||||
:Version: 1.0.x
|
||||
|
||||
We've all done it. In the process of writing a brand new application we've
|
||||
discovered that we need a little bit of code that we've invented before.
|
||||
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
||||
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
|
||||
a tiny bit of code that seems too small to worry about pushing into its own
|
||||
module so it sits there, a part of your current project, waiting to be cut and
|
||||
pasted into your next project. And the next. And the next. And since that
|
||||
little bittybit of code proved so useful to you, it's highly likely that it
|
||||
proved useful to someone else as well. Useful enough that they've written it
|
||||
and copy and pasted it over and over into each of their new projects.
|
||||
|
||||
Well, no longer! Kitchen aims to pull these small snippets of code into a few
|
||||
python modules which you can import and use within your project. No more copy
|
||||
and paste! Now you can let someone else maintain and release these small
|
||||
snippets so that you can get on with your life.
|
||||
|
||||
This package forms the core of Kitchen. It contains some useful modules for
|
||||
using newer |stdlib|_ modules on older python versions, text manipulation,
|
||||
:pep:`386` versioning, and initializing :mod:`gettext`. With this package we're
|
||||
trying to provide a few useful features that don't have too many dependencies
|
||||
outside of the |stdlib|_. We'll be releasing other modules that drop into the
|
||||
kitchen namespace to add other features (possibly with larger deps) as time
|
||||
goes on.
|
||||
|
||||
------------
|
||||
Requirements
|
||||
------------
|
||||
|
||||
We've tried to keep the core kitchen module's requirements lightweight. At the
|
||||
moment kitchen only requires
|
||||
|
||||
:python: 2.4 or later
|
||||
|
||||
.. warning:: Kitchen-1.1.0 was the last release to support python-2.3.x.
|
||||
|
||||
Soft Requirements
|
||||
=================
|
||||
|
||||
If found, these libraries will be used to make the implementation of some part
|
||||
of kitchen better in some way. If they are not present, the API that they
|
||||
enable will still exist but may function in a different manner.
|
||||
|
||||
`chardet <http://pypi.python.org/pypi/chardet>`_
|
||||
Used in :func:`~kitchen.text.misc.guess_encoding` and
|
||||
:func:`~kitchen.text.converters.guess_encoding_to_xml` to help guess
|
||||
encoding of byte strings being converted. If not present, unknown
|
||||
encodings will be converted as if they were ``latin1``
|
||||
|
||||
---------------------------
|
||||
Other Recommended Libraries
|
||||
---------------------------
|
||||
|
||||
These libraries implement commonly used functionality that everyone seems to
|
||||
invent. Rather than reinvent their wheel, I simply list the things that they
|
||||
do well for now. Perhaps if people can't find them normally, I'll add them as
|
||||
requirements in :file:`setup.py` or link them into kitchen's namespace. For
|
||||
now, I just mention them here:
|
||||
|
||||
`bunch <http://pypi.python.org/pypi/bunch/>`_
|
||||
Bunch is a dictionary that you can use attribute lookup as well as bracket
|
||||
notation to access. Setting it apart from most homebrewed implementations
|
||||
is the :func:`bunchify` function which will descend nested structures of
|
||||
lists and dicts, transforming the dicts to Bunch's.
|
||||
`hashlib <http://code.krypto.org/python/hashlib/>`_
|
||||
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
|
||||
hash functions to python. If you're developing for python2.4 though, you
|
||||
can install the standalone hashlib library and have access to the same
|
||||
functions.
|
||||
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
|
||||
The python documentation for :mod:`itertools` has some examples
|
||||
of other nice iterable functions that can be built from the
|
||||
:mod:`itertools` functions. This third-party module creates those recipes
|
||||
as a module.
|
||||
`ordereddict <http://pypi.python.org/pypi/ordereddict/>`_
|
||||
Python 2.7 and forward have a :mod:`~collections.OrderedDict` that
|
||||
provides a :class:`dict` whose items are ordered (and indexable) as well
|
||||
as named.
|
||||
`unittest2 <http://pypi.python.org/pypi/unittest2>`_
|
||||
Python 2.7 has an updated :mod:`unittest` library with new functions not
|
||||
present in the |stdlib|_ for Python 2.6 or less. If you want to use those
|
||||
new functions but need your testing framework to be compatible with older
|
||||
Python the unittest2 library provides the update as an external module.
|
||||
`nose <http://somethingaboutorange.com/mrl/projects/nose/>`_
|
||||
If you want to use a test discovery tool instead of the unittest
|
||||
framework, nosetests provides a simple to use way to do that.
|
||||
|
||||
-------
|
||||
License
|
||||
-------
|
||||
|
||||
This python module is distributed under the terms of the
|
||||
`GNU Lesser General Public License Version 2 or later
|
||||
<http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html>`_.
|
||||
|
||||
.. note:: Some parts of this module are licensed under terms less restrictive
|
||||
than the LGPLv2+. If you separate these files from the work as a whole
|
||||
you are allowed to use them under the less restrictive licenses. The
|
||||
following is a list of the files that are known:
|
||||
|
||||
`Python 2 license <http://www.python.org/download/releases/2.4/license/>`_
|
||||
:file:`_subprocess.py`, :file:`test_subprocess.py`,
|
||||
:file:`defaultdict.py`, :file:`test_defaultdict.py`,
|
||||
:file:`_base64.py`, and :file:`test_base64.py`
|
||||
|
||||
--------
|
||||
Contents
|
||||
--------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
tutorial
|
||||
api-overview
|
||||
porting-guide-0.3
|
||||
hacking
|
||||
glossary
|
||||
|
||||
------------------
|
||||
Indices and tables
|
||||
------------------
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
||||
|
||||
-------------
|
||||
Project Pages
|
||||
-------------
|
||||
|
||||
More information about the project can be found on the |projpage|_
|
||||
|
||||
The latest published version of this documentation can be found on the |docpage|_
|
209
kitchen3/docs/porting-guide-0.3.rst
Normal file
209
kitchen3/docs/porting-guide-0.3.rst
Normal file
|
@ -0,0 +1,209 @@
|
|||
===================
|
||||
1.0.0 Porting Guide
|
||||
===================
|
||||
|
||||
The 0.1 through 1.0.0 releases focused on bringing in functions from yum and
|
||||
python-fedora. This porting guide tells how to port from those APIs to their
|
||||
kitchen replacements.
|
||||
|
||||
-------------
|
||||
python-fedora
|
||||
-------------
|
||||
|
||||
=================================== ===================
|
||||
python-fedora kitchen replacement
|
||||
----------------------------------- -------------------
|
||||
:func:`fedora.iterutils.isiterable` :func:`kitchen.iterutils.isiterable` [#f1]_
|
||||
:func:`fedora.textutils.to_unicode` :func:`kitchen.text.converters.to_unicode`
|
||||
:func:`fedora.textutils.to_bytes` :func:`kitchen.text.converters.to_bytes`
|
||||
=================================== ===================
|
||||
|
||||
.. [#f1] :func:`~kitchen.iterutils.isiterable` has changed slightly in
|
||||
kitchen. The :attr:`include_string` attribute has switched its default value
|
||||
from :data:`True` to :data:`False`. So you need to change code like::
|
||||
|
||||
>>> # Old code
|
||||
>>> isiterable('abcdef')
|
||||
True
|
||||
>>> # New code
|
||||
>>> isiterable('abcdef', include_string=True)
|
||||
True
|
||||
|
||||
---
|
||||
yum
|
||||
---
|
||||
|
||||
================================= ===================
|
||||
yum kitchen replacement
|
||||
--------------------------------- -------------------
|
||||
:func:`yum.i18n.dummy_wrapper` :meth:`kitchen.i18n.DummyTranslations.ugettext` [#y1]_
|
||||
:func:`yum.i18n.dummyP_wrapper` :meth:`kitchen.i18n.DummyTanslations.ungettext` [#y1]_
|
||||
:func:`yum.i18n.utf8_width` :func:`kitchen.text.display.textual_width`
|
||||
:func:`yum.i18n.utf8_width_chop` :func:`kitchen.text.display.textual_width_chop`
|
||||
and :func:`kitchen.text.display.textual_width` [#y2]_ [#y4]_
|
||||
:func:`yum.i18n.utf8_valid` :func:`kitchen.text.misc.byte_string_valid_encoding`
|
||||
:func:`yum.i18n.utf8_text_wrap` :func:`kitchen.text.display.wrap` [#y3]_
|
||||
:func:`yum.i18n.utf8_text_fill` :func:`kitchen.text.display.fill` [#y3]_
|
||||
:func:`yum.i18n.to_unicode` :func:`kitchen.text.converters.to_unicode` [#y5]_
|
||||
:func:`yum.i18n.to_unicode_maybe` :func:`kitchen.text.converters.to_unicode` [#y5]_
|
||||
:func:`yum.i18n.to_utf8` :func:`kitchen.text.converters.to_bytes` [#y5]_
|
||||
:func:`yum.i18n.to_str` :func:`kitchen.text.converters.to_unicode`
|
||||
or :func:`kitchen.text.converters.to_bytes` [#y6]_
|
||||
:func:`yum.i18n.str_eq` :func:`kitchen.text.misc.str_eq`
|
||||
:func:`yum.misc.to_xml` :func:`kitchen.text.converters.unicode_to_xml`
|
||||
or :func:`kitchen.text.converters.byte_string_to_xml` [#y7]_
|
||||
:func:`yum.i18n._` See: :ref:`yum-i18n-init`
|
||||
:func:`yum.i18n.P_` See: :ref:`yum-i18n-init`
|
||||
:func:`yum.i18n.exception2msg` :func:`kitchen.text.converters.exception_to_unicode`
|
||||
or :func:`kitchen.text.converter.exception_to_bytes` [#y8]_
|
||||
================================= ===================
|
||||
|
||||
.. [#y1] These yum methods provided fallback support for :mod:`gettext`
|
||||
functions in case either ``gaftonmode`` was set or :mod:`gettext` failed
|
||||
to return an object. In kitchen, we can use the
|
||||
:class:`kitchen.i18n.DummyTranslations` object to fulfill that role.
|
||||
Please see :ref:`yum-i18n-init` for more suggestions on how to do this.
|
||||
|
||||
.. [#y2] The yum version of these functions returned a byte :class:`str`. The
|
||||
kitchen version listed here returns a :class:`unicode` string. If you
|
||||
need a byte :class:`str` simply call
|
||||
:func:`kitchen.text.converters.to_bytes` on the result.
|
||||
|
||||
.. [#y3] The yum version of these functions would return either a byte
|
||||
:class:`str` or a :class:`unicode` string depending on what the input
|
||||
value was. The kitchen version always returns :class:`unicode` strings.
|
||||
|
||||
.. [#y4] :func:`yum.i18n.utf8_width_chop` performed two functions. It
|
||||
returned the piece of the message that fit in a specified width and the
|
||||
width of that message. In kitchen, you need to call two functions, one
|
||||
for each action::
|
||||
|
||||
>>> # Old way
|
||||
>>> utf8_width_chop(msg, 5)
|
||||
(5, 'く ku')
|
||||
>>> # New way
|
||||
>>> from kitchen.text.display import textual_width, textual_width_chop
|
||||
>>> (textual_width(msg), textual_width_chop(msg, 5))
|
||||
(5, u'く ku')
|
||||
|
||||
.. [#y5] If the yum version of :func:`~yum.i18n.to_unicode` or
|
||||
:func:`~yum.i18n.to_utf8` is given an object that is not a string, it
|
||||
returns the object itself. :func:`kitchen.text.converters.to_unicode` and
|
||||
:func:`kitchen.text.converters.to_bytes` default to returning the
|
||||
``simplerepr`` of the object instead. If you want the yum behaviour, set
|
||||
the :attr:`nonstring` parameter to ``passthru``::
|
||||
|
||||
>>> from kitchen.text.converters import to_unicode
|
||||
>>> to_unicode(5)
|
||||
u'5'
|
||||
>>> to_unicode(5, nonstring='passthru')
|
||||
5
|
||||
|
||||
.. [#y6] :func:`yum.i18n.to_str` could return either a byte :class:`str`. or
|
||||
a :class:`unicode` string In kitchen you can get the same effect but you
|
||||
get to choose whether you want a byte :class:`str` or a :class:`unicode`
|
||||
string. Use :func:`~kitchen.text.converters.to_bytes` for :class:`str`
|
||||
and :func:`~kitchen.text.converters.to_unicode` for :class:`unicode`.
|
||||
|
||||
.. [#y7] :func:`yum.misc.to_xml` was buggy as written. I think the intention
|
||||
was for you to be able to pass a byte :class:`str` or :class:`unicode`
|
||||
string in and get out a byte :class:`str` that was valid to use in an xml
|
||||
file. The two kitchen functions
|
||||
:func:`~kitchen.text.converters.byte_string_to_xml` and
|
||||
:func:`~kitchen.text.converters.unicode_to_xml` do that for each string
|
||||
type.
|
||||
|
||||
.. [#y8] When porting :func:`yum.i18n.exception2msg` to use kitchen, you
|
||||
should setup two wrapper functions to aid in your port. They'll look like
|
||||
this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from kitchen.text.converters import EXCEPTION_CONVERTERS, \
|
||||
BYTE_EXCEPTION_CONVERTERS, exception_to_unicode, \
|
||||
exception_to_bytes
|
||||
def exception2umsg(e):
|
||||
'''Return a unicode representation of an exception'''
|
||||
c = [lambda e: e.value]
|
||||
c.extend(EXCEPTION_CONVERTERS)
|
||||
return exception_to_unicode(e, converters=c)
|
||||
def exception2bmsg(e):
|
||||
'''Return a utf8 encoded str representation of an exception'''
|
||||
c = [lambda e: e.value]
|
||||
c.extend(BYTE_EXCEPTION_CONVERTERS)
|
||||
return exception_to_bytes(e, converters=c)
|
||||
|
||||
The reason to define this wrapper is that many of the exceptions in yum
|
||||
put the message in the :attr:`value` attribute of the :exc:`Exception`
|
||||
instead of adding it to the :attr:`args` attribute. So the default
|
||||
:data:`~kitchen.text.converters.EXCEPTION_CONVERTERS` don't know where to
|
||||
find the message. The wrapper tells kitchen to check the :attr:`value`
|
||||
attribute for the message. The reason to define two wrappers may be less
|
||||
obvious. :func:`yum.i18n.exception2msg` can return a :class:`unicode`
|
||||
string or a byte :class:`str` depending on a combination of what
|
||||
attributes are present on the :exc:`Exception` and what locale the
|
||||
function is being run in. By contrast,
|
||||
:func:`kitchen.text.converters.exception_to_unicode` only returns
|
||||
:class:`unicode` strings and
|
||||
:func:`kitchen.text.converters.exception_to_bytes` only returns byte
|
||||
:class:`str`. This is much safer as it keeps code that can only handle
|
||||
:class:`unicode` or only handle byte :class:`str` correctly from getting
|
||||
the wrong type when an input changes but it means you need to examine the
|
||||
calling code when porting from :func:`yum.i18n.exception2msg` and use the
|
||||
appropriate wrapper.
|
||||
|
||||
.. _yum-i18n-init:
|
||||
|
||||
Initializing Yum i18n
|
||||
=====================
|
||||
|
||||
Previously, yum had several pieces of code to initialize i18n. From the
|
||||
toplevel of :file:`yum/i18n.py`::
|
||||
|
||||
try:.
|
||||
'''
|
||||
Setup the yum translation domain and make _() and P_() translation wrappers
|
||||
available.
|
||||
using ugettext to make sure translated strings are in Unicode.
|
||||
'''
|
||||
import gettext
|
||||
t = gettext.translation('yum', fallback=True)
|
||||
_ = t.ugettext
|
||||
P_ = t.ungettext
|
||||
except:
|
||||
'''
|
||||
Something went wrong so we make a dummy _() wrapper there is just
|
||||
returning the same text
|
||||
'''
|
||||
_ = dummy_wrapper
|
||||
P_ = dummyP_wrapper
|
||||
|
||||
With kitchen, this can be changed to this::
|
||||
|
||||
from kitchen.i18n import easy_gettext_setup, DummyTranslations
|
||||
try:
|
||||
_, P_ = easy_gettext_setup('yum')
|
||||
except:
|
||||
translations = DummyTranslations()
|
||||
_ = translations.ugettext
|
||||
P_ = translations.ungettext
|
||||
|
||||
.. note:: In :ref:`overcoming-frustration`, it is mentioned that for some
|
||||
things (like exception messages), using the byte :class:`str` oriented
|
||||
functions is more appropriate. If this is desired, the setup portion is
|
||||
only a second call to :func:`kitchen.i18n.easy_gettext_setup`::
|
||||
|
||||
b_, bP_ = easy_gettext_setup('yum', use_unicode=False)
|
||||
|
||||
The second place where i18n is setup is in :meth:`yum.YumBase._getConfig` in
|
||||
:file:`yum/__init_.py` if ``gaftonmode`` is in effect::
|
||||
|
||||
if startupconf.gaftonmode:
|
||||
global _
|
||||
_ = yum.i18n.dummy_wrapper
|
||||
|
||||
This can be changed to::
|
||||
|
||||
if startupconf.gaftonmode:
|
||||
global _
|
||||
_ = DummyTranslations().ugettext()
|
19
kitchen3/docs/tutorial.rst
Normal file
19
kitchen3/docs/tutorial.rst
Normal file
|
@ -0,0 +1,19 @@
|
|||
================================
|
||||
Using kitchen to write good code
|
||||
================================
|
||||
|
||||
Kitchen's functions won't automatically make you a better programmer. You
|
||||
have to learn when and how to use them as well. This section of the
|
||||
documentation is intended to show you some of the ways that you can apply
|
||||
kitchen's functions to problems that may have arisen in your life. The goal
|
||||
of this section is to give you enough information to understand what the
|
||||
kitchen API can do for you and where in the :ref:`KitchenAPI` docs to look
|
||||
for something that can help you with your next issue. Along the way,
|
||||
you might pick up the knack for identifying issues with your code before you
|
||||
publish it. And that *will* make you a better coder.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
unicode-frustrations
|
||||
designing-unicode-apis
|
504
kitchen3/docs/unicode-frustrations.rst
Normal file
504
kitchen3/docs/unicode-frustrations.rst
Normal file
|
@ -0,0 +1,504 @@
|
|||
.. _overcoming-frustration:
|
||||
|
||||
==========================================================
|
||||
Overcoming frustration: Correctly using unicode in python2
|
||||
==========================================================
|
||||
|
||||
In python-2.x, there's two types that deal with text.
|
||||
|
||||
1. :class:`str` is for strings of bytes. These are very similar in nature to
|
||||
how strings are handled in C.
|
||||
2. :class:`unicode` is for strings of unicode :term:`code points`.
|
||||
|
||||
.. note::
|
||||
|
||||
**Just what the dickens is "Unicode"?**
|
||||
|
||||
One mistake that people encountering this issue for the first time make is
|
||||
confusing the :class:`unicode` type and the encodings of unicode stored in
|
||||
the :class:`str` type. In python, the :class:`unicode` type stores an
|
||||
abstract sequence of :term:`code points`. Each :term:`code point`
|
||||
represents a :term:`grapheme`. By contrast, byte :class:`str` stores
|
||||
a sequence of bytes which can then be mapped to a sequence of :term:`code
|
||||
points`. Each unicode encoding (:term:`UTF-8`, UTF-7, UTF-16, UTF-32,
|
||||
etc) maps different sequences of bytes to the unicode :term:`code points`.
|
||||
|
||||
What does that mean to you as a programmer? When you're dealing with text
|
||||
manipulations (finding the number of characters in a string or cutting
|
||||
a string on word boundaries) you should be dealing with :class:`unicode`
|
||||
strings as they abstract characters in a manner that's appropriate for
|
||||
thinking of them as a sequence of letters that you will see on a page.
|
||||
When dealing with I/O, reading to and from the disk, printing to
|
||||
a terminal, sending something over a network link, etc, you should be dealing
|
||||
with byte :class:`str` as those devices are going to need to deal with
|
||||
concrete implementations of what bytes represent your abstract characters.
|
||||
|
||||
In the python2 world many APIs use these two classes interchangably but there
|
||||
are several important APIs where only one or the other will do the right
|
||||
thing. When you give the wrong type of string to an API that wants the other
|
||||
type, you may end up with an exception being raised (:exc:`UnicodeDecodeError`
|
||||
or :exc:`UnicodeEncodeError`). However, these exceptions aren't always raised
|
||||
because python implicitly converts between types... *sometimes*.
|
||||
|
||||
-----------------------------------
|
||||
Frustration #1: Inconsistent Errors
|
||||
-----------------------------------
|
||||
|
||||
Although converting when possible seems like the right thing to do, it's
|
||||
actually the first source of frustration. A programmer can test out their
|
||||
program with a string like: ``The quick brown fox jumped over the lazy dog``
|
||||
and not encounter any issues. But when they release their software into the
|
||||
wild, someone enters the string: ``I sat down for coffee at the café`` and
|
||||
suddenly an exception is thrown. The reason? The mechanism that converts
|
||||
between the two types is only able to deal with :term:`ASCII` characters.
|
||||
Once you throw non-:term:`ASCII` characters into your strings, you have to
|
||||
start dealing with the conversion manually.
|
||||
|
||||
So, if I manually convert everything to either byte :class:`str` or
|
||||
:class:`unicode` strings, will I be okay? The answer is.... *sometimes*.
|
||||
|
||||
---------------------------------
|
||||
Frustration #2: Inconsistent APIs
|
||||
---------------------------------
|
||||
|
||||
The problem you run into when converting everything to byte :class:`str` or
|
||||
:class:`unicode` strings is that you'll be using someone else's API quite
|
||||
often (this includes the APIs in the |stdlib|_) and find that the API will only
|
||||
accept byte :class:`str` or only accept :class:`unicode` strings. Or worse,
|
||||
that the code will accept either when you're dealing with strings that consist
|
||||
solely of :term:`ASCII` but throw an error when you give it a string that's
|
||||
got non-:term:`ASCII` characters. When you encounter these APIs you first
|
||||
need to identify which type will work better and then you have to convert your
|
||||
values to the correct type for that code. Thus the programmer that wants to
|
||||
proactively fix all unicode errors in their code needs to do two things:
|
||||
|
||||
1. You must keep track of what type your sequences of text are. Does
|
||||
``my_sentence`` contain :class:`unicode` or :class:`str`? If you don't
|
||||
know that then you're going to be in for a world of hurt.
|
||||
2. Anytime you call a function you need to evaluate whether that function will
|
||||
do the right thing with :class:`str` or :class:`unicode` values. Sending
|
||||
the wrong value here will lead to a :exc:`UnicodeError` being thrown when
|
||||
the string contains non-:term:`ASCII` characters.
|
||||
|
||||
.. note::
|
||||
|
||||
There is one mitigating factor here. The python community has been
|
||||
standardizing on using :class:`unicode` in all its APIs. Although there
|
||||
are some APIs that you need to send byte :class:`str` to in order to be
|
||||
safe, (including things as ubiquitous as :func:`print` as we'll see in the
|
||||
next section), it's getting easier and easier to use :class:`unicode`
|
||||
strings with most APIs.
|
||||
|
||||
------------------------------------------------
|
||||
Frustration #3: Inconsistent treatment of output
|
||||
------------------------------------------------
|
||||
|
||||
Alright, since the python community is moving to using :class:`unicode`
|
||||
strings everywhere, we might as well convert everything to :class:`unicode`
|
||||
strings and use that by default, right? Sounds good most of the time but
|
||||
there's at least one huge caveat to be aware of. Anytime you output text to
|
||||
the terminal or to a file, the text has to be converted into a byte
|
||||
:class:`str`. Python will try to implicitly convert from :class:`unicode` to
|
||||
byte :class:`str`... but it will throw an exception if the bytes are
|
||||
non-:term:`ASCII`::
|
||||
|
||||
>>> string = unicode(raw_input(), 'utf8')
|
||||
café
|
||||
>>> log = open('/var/tmp/debug.log', 'w')
|
||||
>>> log.write(string)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||
|
||||
Okay, this is simple enough to solve: Just convert to a byte :class:`str` and
|
||||
we're all set::
|
||||
|
||||
>>> string = unicode(raw_input(), 'utf8')
|
||||
café
|
||||
>>> string_for_output = string.encode('utf8', 'replace')
|
||||
>>> log = open('/var/tmp/debug.log', 'w')
|
||||
>>> log.write(string_for_output)
|
||||
>>>
|
||||
|
||||
So that was simple, right? Well... there's one gotcha that makes things a bit
|
||||
harder to debug sometimes. When you attempt to write non-:term:`ASCII`
|
||||
:class:`unicode` strings to a file-like object you get a traceback everytime.
|
||||
But what happens when you use :func:`print`? The terminal is a file-like object
|
||||
so it should raise an exception right? The answer to that is....
|
||||
*sometimes*:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
$ python
|
||||
>>> print u'café'
|
||||
café
|
||||
|
||||
No exception. Okay, we're fine then?
|
||||
|
||||
We are until someone does one of the following:
|
||||
|
||||
* Runs the script in a different locale:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
$ LC_ALL=C python
|
||||
>>> # Note: if you're using a good terminal program when running in the C locale
|
||||
>>> # The terminal program will prevent you from entering non-ASCII characters
|
||||
>>> # python will still recognize them if you use the codepoint instead:
|
||||
>>> print u'caf\xe9'
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||
|
||||
* Redirects output to a file:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
$ cat test.py
|
||||
#!/usr/bin/python -tt
|
||||
# -*- coding: utf-8 -*-
|
||||
print u'café'
|
||||
$ ./test.py >t
|
||||
Traceback (most recent call last):
|
||||
File "./test.py", line 4, in <module>
|
||||
print u'café'
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||
|
||||
Okay, the locale thing is a pain but understandable: the C locale doesn't
|
||||
understand any characters outside of :term:`ASCII` so naturally attempting to
|
||||
display those won't work. Now why does redirecting to a file cause problems?
|
||||
It's because :func:`print` in python2 is treated specially. Whereas the other
|
||||
file-like objects in python always convert to :term:`ASCII` unless you set
|
||||
them up differently, using :func:`print` to output to the terminal will use
|
||||
the user's locale to convert before sending the output to the terminal. When
|
||||
:func:`print` is not outputting to the terminal (being redirected to a file,
|
||||
for instance), :func:`print` decides that it doesn't know what locale to use
|
||||
for that file and so it tries to convert to :term:`ASCII` instead.
|
||||
|
||||
So what does this mean for you, as a programmer? Unless you have the luxury
|
||||
of controlling how your users use your code, you should always, always, always
|
||||
convert to a byte :class:`str` before outputting strings to the terminal or to
|
||||
a file. Python even provides you with a facility to do just this. If you
|
||||
know that every :class:`unicode` string you send to a particular file-like
|
||||
object (for instance, :data:`~sys.stdout`) should be converted to a particular
|
||||
encoding you can use a :class:`codecs.StreamWriter` object to convert from
|
||||
a :class:`unicode` string into a byte :class:`str`. In particular,
|
||||
:func:`codecs.getwriter` will return a :class:`~codecs.StreamWriter` class
|
||||
that will help you to wrap a file-like object for output. Using our
|
||||
:func:`print` example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
$ cat test.py
|
||||
#!/usr/bin/python -tt
|
||||
# -*- coding: utf-8 -*-
|
||||
import codecs
|
||||
import sys
|
||||
|
||||
UTF8Writer = codecs.getwriter('utf8')
|
||||
sys.stdout = UTF8Writer(sys.stdout)
|
||||
print u'café'
|
||||
$ ./test.py >t
|
||||
$ cat t
|
||||
café
|
||||
|
||||
-----------------------------------------
|
||||
Frustrations #4 and #5 -- The other shoes
|
||||
-----------------------------------------
|
||||
|
||||
In English, there's a saying "waiting for the other shoe to drop". It means
|
||||
that when one event (usually bad) happens, you come to expect another event
|
||||
(usually worse) to come after. In this case we have two other shoes.
|
||||
|
||||
|
||||
Frustration #4: Now it doesn't take byte strings?!
|
||||
==================================================
|
||||
|
||||
If you wrap :data:`sys.stdout` using :func:`codecs.getwriter` and think you
|
||||
are now safe to print any variable without checking its type I am afraid
|
||||
I must inform you that you're not paying enough attention to :term:`Murphy's
|
||||
Law`. The :class:`~codecs.StreamWriter` that :func:`codecs.getwriter`
|
||||
provides will take :class:`unicode` strings and transform them into byte
|
||||
:class:`str` before they get to :data:`sys.stdout`. The problem is if you
|
||||
give it something that's already a byte :class:`str` it tries to transform
|
||||
that as well. To do that it tries to turn the byte :class:`str` you give it
|
||||
into :class:`unicode` and then transform that back into a byte :class:`str`...
|
||||
and since it uses the :term:`ASCII` codec to perform those conversions,
|
||||
chances are that it'll blow up when making them::
|
||||
|
||||
>>> import codecs
|
||||
>>> import sys
|
||||
>>> UTF8Writer = codecs.getwriter('utf8')
|
||||
>>> sys.stdout = UTF8Writer(sys.stdout)
|
||||
>>> print 'café'
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
File "/usr/lib64/python2.6/codecs.py", line 351, in write
|
||||
data, consumed = self.encode(object, self.errors)
|
||||
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
|
||||
|
||||
To work around this, kitchen provides an alternate version of
|
||||
:func:`codecs.getwriter` that can deal with both byte :class:`str` and
|
||||
:class:`unicode` strings. Use :func:`kitchen.text.converters.getwriter` in
|
||||
place of the :mod:`codecs` version like this::
|
||||
|
||||
>>> import sys
|
||||
>>> from kitchen.text.converters import getwriter
|
||||
>>> UTF8Writer = getwriter('utf8')
|
||||
>>> sys.stdout = UTF8Writer(sys.stdout)
|
||||
>>> print u'café'
|
||||
café
|
||||
>>> print 'café'
|
||||
café
|
||||
|
||||
-------------------------------------------
|
||||
Frustration #5: Inconsistent APIs Part deux
|
||||
-------------------------------------------
|
||||
Sometimes you do everything right in your code but other people's code fails
|
||||
you. With unicode issues this happens more often than we want. A glaring
|
||||
example of this is when you get values back from a function that aren't
|
||||
consistently :class:`unicode` string or byte :class:`str`.
|
||||
|
||||
An example from the |stdlib|_ is :mod:`gettext`. The :mod:`gettext` functions
|
||||
are used to help translate messages that you display to users in the users'
|
||||
native languages. Since most languages contain letters outside of the
|
||||
:term:`ASCII` range, the values that are returned contain unicode characters.
|
||||
:mod:`gettext` provides you with :meth:`~gettext.GNUTranslations.ugettext` and
|
||||
:meth:`~gettext.GNUTranslations.ungettext` to return these translations as
|
||||
:class:`unicode` strings and :meth:`~gettext.GNUTranslations.gettext`,
|
||||
:meth:`~gettext.GNUTranslations.ngettext`,
|
||||
:meth:`~gettext.GNUTranslations.lgettext`, and
|
||||
:meth:`~gettext.GNUTranslations.lngettext` to return them as encoded byte
|
||||
:class:`str`. Unfortunately, even though they're documented to return only
|
||||
one type of string or the other, the implementation has corner cases where the
|
||||
wrong type can be returned.
|
||||
|
||||
This means that even if you separate your :class:`unicode` string and byte
|
||||
:class:`str` correctly before you pass your strings to a :mod:`gettext`
|
||||
function, afterwards, you might have to check that you have the right sort of
|
||||
string type again.
|
||||
|
||||
.. note::
|
||||
|
||||
:mod:`kitchen.i18n` provides alternate gettext translation objects that
|
||||
return only byte :class:`str` or only :class:`unicode` string.
|
||||
|
||||
---------------
|
||||
A few solutions
|
||||
---------------
|
||||
|
||||
Now that we've identified the issues, can we define a comprehensive strategy
|
||||
for dealing with them?
|
||||
|
||||
Convert text at the border
|
||||
==========================
|
||||
|
||||
If you get some piece of text from a library, read from a file, etc, turn it
|
||||
into a :class:`unicode` string immediately. Since python is moving in the
|
||||
direction of :class:`unicode` strings everywhere it's going to be easier to
|
||||
work with :class:`unicode` strings within your code.
|
||||
|
||||
If your code is heavily involved with using things that are bytes, you can do
|
||||
the opposite and convert all text into byte :class:`str` at the border and
|
||||
only convert to :class:`unicode` when you need it for passing to another
|
||||
library or performing string operations on it.
|
||||
|
||||
In either case, the important thing is to pick a default type for strings and
|
||||
stick with it throughout your code. When you mix the types it becomes much
|
||||
easier to operate on a string with a function that can only use the other type
|
||||
by mistake.
|
||||
|
||||
.. note:: In python3, the abstract unicode type becomes much more prominent.
|
||||
The type named ``str`` is the equivalent of python2's :class:`unicode` and
|
||||
python3's ``bytes`` type replaces python2's :class:`str`. Most APIs deal
|
||||
in the unicode type of string with just some pieces that are low level
|
||||
dealing with bytes. The implicit conversions between bytes and unicode
|
||||
is removed and whenever you want to make the conversion you need to do so
|
||||
explicitly.
|
||||
|
||||
When the data needs to be treated as bytes (or unicode) use a naming convention
|
||||
===============================================================================
|
||||
|
||||
Sometimes you're converting nearly all of your data to :class:`unicode`
|
||||
strings but you have one or two values where you have to keep byte
|
||||
:class:`str` around. This is often the case when you need to use the value
|
||||
verbatim with some external resource. For instance, filenames or key values
|
||||
in a database. When you do this, use a naming convention for the data you're
|
||||
working with so you (and others reading your code later) don't get confused
|
||||
about what's being stored in the value.
|
||||
|
||||
If you need both a textual string to present to the user and a byte value for
|
||||
an exact match, consider keeping both versions around. You can either use two
|
||||
variables for this or a :class:`dict` whose key is the byte value.
|
||||
|
||||
.. note:: You can use the naming convention used in kitchen as a guide for
|
||||
implementing your own naming convention. It prefixes byte :class:`str`
|
||||
variables of unknown encoding with ``b_`` and byte :class:`str` of known
|
||||
encoding with the encoding name like: ``utf8_``. If the default was to
|
||||
handle :class:`str` and only keep a few :class:`unicode` values, those
|
||||
variables would be prefixed with ``u_``.
|
||||
|
||||
When outputting data, convert back into bytes
|
||||
=============================================
|
||||
|
||||
When you go to send your data back outside of your program (to the filesystem,
|
||||
over the network, displaying to the user, etc) turn the data back into a byte
|
||||
:class:`str`. How you do this will depend on the expected output format of
|
||||
the data. For displaying to the user, you can use the user's default encoding
|
||||
using :func:`locale.getpreferredencoding`. For entering into a file, you're best
|
||||
bet is to pick a single encoding and stick with it.
|
||||
|
||||
.. warning::
|
||||
|
||||
When using the encoding that the user has set (for instance, using
|
||||
:func:`locale.getpreferredencoding`, remember that they may have their
|
||||
encoding set to something that can't display every single unicode
|
||||
character. That means when you convert from :class:`unicode` to a byte
|
||||
:class:`str` you need to decide what should happen if the byte value is
|
||||
not valid in the user's encoding. For purposes of displaying messages to
|
||||
the user, it's usually okay to use the ``replace`` encoding error handler
|
||||
to replace the invalid characters with a question mark or other symbol
|
||||
meaning the character couldn't be displayed.
|
||||
|
||||
You can use :func:`kitchen.text.converters.getwriter` to do this automatically
|
||||
for :data:`sys.stdout`. When creating exception messages be sure to convert
|
||||
to bytes manually.
|
||||
|
||||
When writing unittests, include non-ASCII values and both unicode and str type
|
||||
==============================================================================
|
||||
|
||||
Unless you know that a specific portion of your code will only deal with
|
||||
:term:`ASCII`, be sure to include non-:term:`ASCII` values in your unittests.
|
||||
Including a few characters from several different scripts is highly advised as
|
||||
well because some code may have special cased accented roman characters but
|
||||
not know how to handle characters used in Asian alphabets.
|
||||
|
||||
Similarly, unless you know that that portion of your code will only be given
|
||||
:class:`unicode` strings or only byte :class:`str` be sure to try variables
|
||||
of both types in your unittests. When doing this, make sure that the
|
||||
variables are also non-:term:`ASCII` as python's implicit conversion will mask
|
||||
problems with pure :term:`ASCII` data. In many cases, it makes sense to check
|
||||
what happens if byte :class:`str` and :class:`unicode` strings that won't
|
||||
decode in the present locale are given.
|
||||
|
||||
Be vigilant about spotting poor APIs
|
||||
====================================
|
||||
|
||||
Make sure that the libraries you use return only :class:`unicode` strings or
|
||||
byte :class:`str`. Unittests can help you spot issues here by running many
|
||||
variations of data through your functions and checking that you're still
|
||||
getting the types of string that you expect.
|
||||
|
||||
Example: Putting this all together with kitchen
|
||||
===============================================
|
||||
|
||||
The kitchen library provides a wide array of functions to help you deal with
|
||||
byte :class:`str` and :class:`unicode` strings in your program. Here's
|
||||
a short example that uses many kitchen functions to do its work::
|
||||
|
||||
#!/usr/bin/python -tt
|
||||
# -*- coding: utf-8 -*-
|
||||
import locale
|
||||
import os
|
||||
import sys
|
||||
import unicodedata
|
||||
|
||||
from kitchen.text.converters import getwriter, to_bytes, to_unicode
|
||||
from kitchen.i18n import get_translation_object
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Setup gettext driven translations but use the kitchen functions so
|
||||
# we don't have the mismatched bytes-unicode issues.
|
||||
translations = get_translation_object('example')
|
||||
# We use _() for marking strings that we operate on as unicode
|
||||
# This is pretty much everything
|
||||
_ = translations.ugettext
|
||||
# And b_() for marking strings that we operate on as bytes.
|
||||
# This is limited to exceptions
|
||||
b_ = translations.lgettext
|
||||
|
||||
# Setup stdout
|
||||
encoding = locale.getpreferredencoding()
|
||||
Writer = getwriter(encoding)
|
||||
sys.stdout = Writer(sys.stdout)
|
||||
|
||||
# Load data. Format is filename\0description
|
||||
# description should be utf-8 but filename can be any legal filename
|
||||
# on the filesystem
|
||||
# Sample datafile.txt:
|
||||
# /etc/shells\x00Shells available on caf\xc3\xa9.lan
|
||||
# /var/tmp/file\xff\x00File with non-utf8 data in the filename
|
||||
#
|
||||
# And to create /var/tmp/file\xff (under bash or zsh) do:
|
||||
# echo 'Some data' > /var/tmp/file$'\377'
|
||||
datafile = open('datafile.txt', 'r')
|
||||
data = {}
|
||||
for line in datafile:
|
||||
# We're going to keep filename as bytes because we will need the
|
||||
# exact bytes to access files on a POSIX operating system.
|
||||
# description, we'll immediately transform into unicode type.
|
||||
b_filename, description = line.split('\0', 1)
|
||||
|
||||
# to_unicode defaults to decoding output from utf-8 and replacing
|
||||
# any problematic bytes with the unicode replacement character
|
||||
# We accept mangling of the description here knowing that our file
|
||||
# format is supposed to use utf-8 in that field and that the
|
||||
# description will only be displayed to the user, not used as
|
||||
# a key value.
|
||||
description = to_unicode(description, 'utf-8').strip()
|
||||
data[b_filename] = description
|
||||
datafile.close()
|
||||
|
||||
# We're going to add a pair of extra fields onto our data to show the
|
||||
# length of the description and the filesize. We put those between
|
||||
# the filename and description because we haven't checked that the
|
||||
# description is free of NULLs.
|
||||
datafile = open('newdatafile.txt', 'w')
|
||||
|
||||
# Name filename with a b_ prefix to denote byte string of unknown encoding
|
||||
for b_filename in data:
|
||||
# Since we have the byte representation of filename, we can read any
|
||||
# filename
|
||||
if os.access(b_filename, os.F_OK):
|
||||
size = os.path.getsize(b_filename)
|
||||
else:
|
||||
size = 0
|
||||
# Because the description is unicode type, we know the number of
|
||||
# characters corresponds to the length of the normalized unicode
|
||||
# string.
|
||||
length = len(unicodedata.normalize('NFC', description))
|
||||
|
||||
# Print a summary to the screen
|
||||
# Note that we do not let implici type conversion from str to
|
||||
# unicode transform b_filename into a unicode string. That might
|
||||
# fail as python would use the ASCII filename. Instead we use
|
||||
# to_unicode() to explictly transform in a way that we know will
|
||||
# not traceback.
|
||||
print _(u'filename: %s') % to_unicode(b_filename)
|
||||
print _(u'file size: %s') % size
|
||||
print _(u'desc length: %s') % length
|
||||
print _(u'description: %s') % data[b_filename]
|
||||
|
||||
# First combine the unicode portion
|
||||
line = u'%s\0%s\0%s' % (size, length, data[b_filename])
|
||||
# Since the filenames are bytes, turn everything else to bytes before combining
|
||||
# Turning into unicode first would be wrong as the bytes in b_filename
|
||||
# might not convert
|
||||
b_line = '%s\0%s\n' % (b_filename, to_bytes(line))
|
||||
|
||||
# Just to demonstrate that getwriter will pass bytes through fine
|
||||
print b_('Wrote: %s') % b_line
|
||||
datafile.write(b_line)
|
||||
datafile.close()
|
||||
|
||||
# And just to show how to properly deal with an exception.
|
||||
# Note two things about this:
|
||||
# 1) We use the b_() function to translate the string. This returns a
|
||||
# byte string instead of a unicode string
|
||||
# 2) We're using the b_() function returned by kitchen. If we had
|
||||
# used the one from gettext we would need to convert the message to
|
||||
# a byte str first
|
||||
message = u'Demonstrate the proper way to raise exceptions. Sincerely, \u3068\u3057\u304a'
|
||||
raise Exception(b_(message))
|
||||
|
||||
.. seealso:: :mod:`kitchen.text.converters`
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue