Import kitchen_1.2.4.orig.tar.gz
This commit is contained in:
parent
dfb12f36e6
commit
2faeac3a1a
154 changed files with 11665 additions and 2152 deletions
6
.gitignore
vendored
Normal file
6
.gitignore
vendored
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
*.pyc
|
||||||
|
MANIFEST
|
||||||
|
dist
|
||||||
|
*.egg*
|
||||||
|
*.pdf
|
||||||
|
build
|
12
.travis.yml
Normal file
12
.travis.yml
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
language: python
|
||||||
|
python:
|
||||||
|
- "2.6"
|
||||||
|
- "2.7"
|
||||||
|
- "3.4"
|
||||||
|
install: python setup.py develop
|
||||||
|
script: ./runtests.sh
|
||||||
|
notifications:
|
||||||
|
irc:
|
||||||
|
- "irc.freenode.net#threebean"
|
||||||
|
on_success: never
|
||||||
|
on_failure: always
|
7
.tx/config
Normal file
7
.tx/config
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
[main]
|
||||||
|
host = https://www.transifex.com
|
||||||
|
|
||||||
|
[kitchen.kitchenpot]
|
||||||
|
file_filter = po/<lang>.po
|
||||||
|
source_file = po/kitchen.pot
|
||||||
|
source_lang = en
|
|
@ -3,8 +3,9 @@ Some notes on hacking on kitchen
|
||||||
================================
|
================================
|
||||||
|
|
||||||
:Author: Toshio Kuratomi
|
:Author: Toshio Kuratomi
|
||||||
:Date: 2 Jan 2012
|
:Maintainer: Ralph Bean
|
||||||
:Version: 1.1.x
|
:Date: 2 Dec 2014
|
||||||
|
:Version: 1.2.x
|
||||||
|
|
||||||
For coding and kitchen, see the style guide in the documentation.
|
For coding and kitchen, see the style guide in the documentation.
|
||||||
|
|
||||||
|
@ -40,20 +41,20 @@ be found in the `transifex user's guide`_.
|
||||||
.. `transifex user's guide`:: http://help.transifex.net/user-guide/translating.html
|
.. `transifex user's guide`:: http://help.transifex.net/user-guide/translating.html
|
||||||
|
|
||||||
To generate the POT file (located in the po/ subdirectory), use pybabel to
|
To generate the POT file (located in the po/ subdirectory), use pybabel to
|
||||||
extract the messages. Tun the following from the top level directory::
|
extract the messages. Run the following from the top level directory::
|
||||||
|
|
||||||
pybabel extract -o po/kitchen.pot kitchen -kb_ -kbN_
|
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
|
||||||
|
|
||||||
Then commit this pot file and upload to transifex::
|
Then commit this pot file and upload to transifex::
|
||||||
|
|
||||||
tx push -s
|
tx push -s
|
||||||
bzr commit -m 'Extract new strings from the source files' po/kitchen.pot
|
git commit -m 'Extract new strings from the source files' po/kitchen.pot
|
||||||
bzr push
|
git push
|
||||||
|
|
||||||
To pull messages from transifex prior to making a release, do::
|
To pull messages from transifex prior to making a release, do::
|
||||||
|
|
||||||
tx pull -a
|
tx pull -a
|
||||||
bzr commit -m 'Merge new translations from transifex' po/*.po
|
git commit -m 'Merge new translations from transifex' po/*.po
|
||||||
|
|
||||||
If you see a status message from transifex like this::
|
If you see a status message from transifex like this::
|
||||||
Pulling new translations for resource kitchen.kitchenpot (source: po/kitchen.pot)
|
Pulling new translations for resource kitchen.kitchenpot (source: po/kitchen.pot)
|
||||||
|
@ -62,8 +63,8 @@ If you see a status message from transifex like this::
|
||||||
it means that transifex has created a brand new po file for you. You need to
|
it means that transifex has created a brand new po file for you. You need to
|
||||||
add the new file to source control and commit it like this::
|
add the new file to source control and commit it like this::
|
||||||
|
|
||||||
bzr add po/fr.po
|
git add po/fr.po
|
||||||
bzr commit -m 'New French translation' po/fr.po
|
git commit -m 'New French translation' po/fr.po
|
||||||
|
|
||||||
|
|
||||||
TODO: Add information about announcing string freeze. Using transifex's add
|
TODO: Add information about announcing string freeze. Using transifex's add
|
||||||
|
@ -130,7 +131,8 @@ Unittest
|
||||||
|
|
||||||
Kitchen has a large set of unittests. All of them should pass before release.
|
Kitchen has a large set of unittests. All of them should pass before release.
|
||||||
You can run the unittests with the following command::
|
You can run the unittests with the following command::
|
||||||
nosetests --with-coverage --cover-package kitchen
|
|
||||||
|
./runtests.sh
|
||||||
|
|
||||||
This will run all the unittests under the tests directory and also generate
|
This will run all the unittests under the tests directory and also generate
|
||||||
some statistics about which lines of code were not accessed when kitchen ran.
|
some statistics about which lines of code were not accessed when kitchen ran.
|
||||||
|
@ -144,48 +146,70 @@ some statistics about which lines of code were not accessed when kitchen ran.
|
||||||
a look at :file:`test_i18n.py` and :file:`test_converters.py` to see tests
|
a look at :file:`test_i18n.py` and :file:`test_converters.py` to see tests
|
||||||
that attempt to cover enough input values to detect problems.
|
that attempt to cover enough input values to detect problems.
|
||||||
|
|
||||||
Since kitchen is currently supported on python-2.3.1+, it is desirable to test
|
Since kitchen is currently supported on python2 and python3, it is desirable to
|
||||||
kitchen on at least one python major version from python-2.3 through
|
run tests against as many python versions as possible. We currently have a
|
||||||
python-2.7. We currently have access to a buildbot that has access to
|
jenkins instance in the Fedora Infrastructure private cloud with a job set up
|
||||||
python-2.4, python-2.6, and python-2.7. You can view it at
|
for kitchen at http://jenkins.cloud.fedoraproject.org/job/kitchen/
|
||||||
http://ci.csh.rit.edu:8080/view/Kitchen/ . The buildbot checks the devel
|
|
||||||
repository hourly and if new checkins have occurred, it attempts to rebuild.
|
|
||||||
If you need access to invoke builds on the buildbot more regularly than that,
|
|
||||||
contact Toshio to get access.
|
|
||||||
|
|
||||||
We were unable to get python-2.3 working in the buildbot so I manually run the
|
It is not currently running tests against python-2.{3,4,5,6}. If you are
|
||||||
unittests on a CentOS-4 virtual machine (with python-2.3). I currently don't
|
interested in getting those builds running automatically, please speak up in
|
||||||
test on python-2.5 but I'd be happy to take bug reports or get a new committer
|
the #fedora-apps channel on freenode.
|
||||||
that was interested in that platform.
|
|
||||||
|
|
||||||
Creating the release
|
Creating the release
|
||||||
====================
|
====================
|
||||||
|
|
||||||
|
|
||||||
|
Then commit this pot file and upload to transifex:
|
||||||
|
|
||||||
1. Make sure that any feature branches you want have been merged.
|
1. Make sure that any feature branches you want have been merged.
|
||||||
2. Pull in new translations and verify they are valid::
|
|
||||||
|
2. Make a fresh branch for your release::
|
||||||
|
|
||||||
|
git flow release start $VERSION
|
||||||
|
|
||||||
|
3. Extract strings for translation and push them to transifex::
|
||||||
|
|
||||||
|
pybabel extract -o po/kitchen.pot kitchen2 kitchen3
|
||||||
|
tx push -s
|
||||||
|
git commit -m 'Extract new strings from the source files' po/kitchen.pot
|
||||||
|
git push
|
||||||
|
|
||||||
|
4. Wait for translations. In the meantime...
|
||||||
|
5. Update the version in ``kitchen/__init__.py`` and ``NEWS.rst``.
|
||||||
|
6. When they're all ready, pull in new translations and verify they are valid::
|
||||||
|
|
||||||
tx pull -a
|
tx pull -a
|
||||||
# If msgfmt is installed, this will check that the catalogs are valid
|
# If msgfmt is installed, this will check that the catalogs are valid
|
||||||
./releaseutils.py
|
./releaseutils.py
|
||||||
bzr commit -m 'Merge new translations from transifex.net'
|
git commit -m 'Merge new translations from transifex.net'
|
||||||
3. Update the version in kitchen/__init__.py and NEWS.
|
git push
|
||||||
4. Make a fresh clone of the repository::
|
|
||||||
cd $PATH_TO_MY_SHARED_REPO
|
7. Create a pull-request so someone else from #fedora-apps can review::
|
||||||
bzr branch bzr://bzr.fedorahosted.org/bzr/kitchen/devel release
|
|
||||||
5. Make the source tarball in that directory::
|
hub pull-request -b master
|
||||||
cd release
|
|
||||||
|
8. Once someone has given it a +1, then make a source tarball::
|
||||||
|
|
||||||
python setup.py sdist
|
python setup.py sdist
|
||||||
6. Make sure that the source tarball contains all of the files we want in the release::
|
|
||||||
cd ..
|
9. Upload the docs to pypi::
|
||||||
tar -xzvf release/dist/kitchen*tar.gz
|
|
||||||
diff -uNr devel kitchen-$RELEASE_VERSION
|
mkdir -p build/sphinx/html
|
||||||
7. Upload the docs to pypi::
|
sphinx-build kitchen2/docs/ build/sphinx/html
|
||||||
cd release
|
|
||||||
python setup.py upload_docs
|
python setup.py upload_docs
|
||||||
8. Upload the tarball to pypi::
|
|
||||||
python setup.py sdist upload --sign
|
10. Upload the tarball to pypi::
|
||||||
9. Upload the tarball to fedorahosted::
|
|
||||||
scp dist/kitchen*tar.gz fedorahosted.org:/srv/web/releases/k/i/kitchen/
|
python setup.py sdist upload --sign
|
||||||
10. Tag the release::
|
|
||||||
cd ../devel
|
11. Upload the tarball to fedorahosted::
|
||||||
bzr tag $RELEASE_VERSION
|
|
||||||
bzr push
|
scp dist/kitchen*tar.gz* fedorahosted.org:/srv/web/releases/k/i/kitchen/
|
||||||
|
|
||||||
|
12. Tag and bag it::
|
||||||
|
|
||||||
|
git flow release finish -m $VERSION -u $YOUR_GPG_KEY_ID $VERSION
|
||||||
|
git push origin develop:develop
|
||||||
|
git push origin master:master
|
||||||
|
git push origin --tags
|
||||||
|
# Your pull-request should automatically close. Double-check this, though.
|
11
MANIFEST.in
Normal file
11
MANIFEST.in
Normal file
|
@ -0,0 +1,11 @@
|
||||||
|
include COPYING COPYING.LESSER
|
||||||
|
include *.rst
|
||||||
|
include releaseutils.py
|
||||||
|
recursive-include tests *.py *.po *.pot *.mo
|
||||||
|
recursive-include docs *
|
||||||
|
include po/*.pot
|
||||||
|
include po/*.po
|
||||||
|
include locale/*/*/*.mo
|
||||||
|
recursive-include kitchen2 *.py *.po *.mo *.pot
|
||||||
|
recursive-include kitchen3 *.py *.po *.mo *.pot
|
||||||
|
include runtests.sh
|
|
@ -2,9 +2,72 @@
|
||||||
NEWS
|
NEWS
|
||||||
====
|
====
|
||||||
|
|
||||||
:Authors: Toshio Kuratomi
|
:Author: Toshio Kuratomi
|
||||||
:Date: 14 Feb 2012
|
:Maintainer: Ralph Bean
|
||||||
:Version: 1.1.1
|
:Date: 13 Nov 2015
|
||||||
|
:Version: 1.2.x
|
||||||
|
|
||||||
|
-----
|
||||||
|
1.2.4
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Further compat fixes for python-3.5
|
||||||
|
|
||||||
|
-----
|
||||||
|
1.2.3
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Compatibility with python-3.5
|
||||||
|
|
||||||
|
-----
|
||||||
|
1.2.2
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Compatibility with python-3.4
|
||||||
|
* Compatibility with pep470
|
||||||
|
|
||||||
|
-----
|
||||||
|
1.2.1
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Fix release-related problems with the 1.2.0 tarball.
|
||||||
|
- Include locale data for the test suite.
|
||||||
|
- Include NEWS.rst and README.rst.
|
||||||
|
- Include runtests.sh.
|
||||||
|
- Adjust trove classifiers to indicate python3 support.
|
||||||
|
|
||||||
|
-----
|
||||||
|
1.2.0
|
||||||
|
-----
|
||||||
|
|
||||||
|
* kitchen gained support for python3. The tarball release now includes a
|
||||||
|
``kitchen2/`` and a ``kitchen3/`` directory containing copies of the source
|
||||||
|
code modified to work against each of the two major python versions. When
|
||||||
|
installing with ``pip`` or ``setup.py``, the appropriate version should be
|
||||||
|
selected and installed.
|
||||||
|
* The canonical upstream repository location moved to git and github. See
|
||||||
|
https://github.com/fedora-infra/kitchen
|
||||||
|
* Added kitchen.text.misc.isbasestring(), kitchen.text.misc.isbytestring(),
|
||||||
|
and kitchen.text.misc.isunicodestring(). These are mainly useful for code
|
||||||
|
being ported to python3 as python3 lacks a basestring type and has two types
|
||||||
|
for byte strings. Code that has to run on both python2 and python3 or
|
||||||
|
wants to provide similar byte vs unicode semantics may find these functions
|
||||||
|
to be a good abstraction.
|
||||||
|
* Add a python2_api parameter to various i18n functions: NullTranslations
|
||||||
|
constructor, NewGNUTranslations constructor, and get_translation_object.
|
||||||
|
When set to True (the default), the python2 api for gettext objects is used.
|
||||||
|
When set to False, the python3 api is used. This option is intended to aid
|
||||||
|
in porting from python2 to python3.
|
||||||
|
* Exception messages are no longer translated. The idea is that exceptions
|
||||||
|
should be easily searched for via a web search.
|
||||||
|
* Fix a bug in unicode_to_xml() where xmlcharrefs created when a unicode
|
||||||
|
string is turned into a byte string with an encoding that doesn't have
|
||||||
|
all of the needed characters had their ampersands ("&") escaped.
|
||||||
|
* Fix a bug in NewGNUTranslations.lngettext() if a fallback gettext object is
|
||||||
|
used and the message is not in any catalog.
|
||||||
|
* Speedups to process_control_chars() that are directly reflected in
|
||||||
|
unicode_to_xml() and byte_string_to_xml()
|
||||||
|
* Remove C1 Control Codes in to_xml() as well as C0 Control Codes
|
||||||
|
|
||||||
-----
|
-----
|
||||||
1.1.1
|
1.1.1
|
39
PKG-INFO
39
PKG-INFO
|
@ -1,39 +0,0 @@
|
||||||
Metadata-Version: 1.0
|
|
||||||
Name: kitchen
|
|
||||||
Version: 1.1.1
|
|
||||||
Summary: Kitchen contains a cornucopia of useful code
|
|
||||||
Home-page: https://fedorahosted.org/kitchen
|
|
||||||
Author: Toshio Kuratomi
|
|
||||||
Author-email: toshio@fedoraproject.org
|
|
||||||
License: LGPLv2+
|
|
||||||
Download-URL: https://fedorahosted.org/releases/k/i/kitchen
|
|
||||||
Description:
|
|
||||||
We've all done it. In the process of writing a brand new application we've
|
|
||||||
discovered that we need a little bit of code that we've invented before.
|
|
||||||
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
|
||||||
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
|
|
||||||
a tiny bit of code that seems too small to worry about pushing into its own
|
|
||||||
module so it sits there, a part of your current project, waiting to be cut and
|
|
||||||
pasted into your next project. And the next. And the next. And since that
|
|
||||||
little bittybit of code proved so useful to you, it's highly likely that it
|
|
||||||
proved useful to someone else as well. Useful enough that they've written it
|
|
||||||
and copy and pasted it over and over into each of their new projects.
|
|
||||||
|
|
||||||
Well, no longer! Kitchen aims to pull these small snippets of code into a few
|
|
||||||
python modules which you can import and use within your project. No more copy
|
|
||||||
and paste! Now you can let someone else maintain and release these small
|
|
||||||
snippets so that you can get on with your life.
|
|
||||||
|
|
||||||
Keywords: Useful Small Code Snippets
|
|
||||||
Platform: UNKNOWN
|
|
||||||
Classifier: Development Status :: 4 - Beta
|
|
||||||
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
|
|
||||||
Classifier: Operating System :: OS Independent
|
|
||||||
Classifier: Programming Language :: Python :: 2.3
|
|
||||||
Classifier: Programming Language :: Python :: 2.4
|
|
||||||
Classifier: Programming Language :: Python :: 2.5
|
|
||||||
Classifier: Programming Language :: Python :: 2.6
|
|
||||||
Classifier: Programming Language :: Python :: 2.7
|
|
||||||
Classifier: Topic :: Software Development :: Internationalization
|
|
||||||
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
||||||
Classifier: Topic :: Text Processing :: General
|
|
|
@ -3,8 +3,9 @@ Kitchen.core Module
|
||||||
===================
|
===================
|
||||||
|
|
||||||
:Author: Toshio Kuratomi
|
:Author: Toshio Kuratomi
|
||||||
:Date: 2 Jan 2012
|
:Maintainer: Ralph Bean
|
||||||
:Version: 1.1.x
|
:Date: 13 Nov 2015
|
||||||
|
:Version: 1.2.x
|
||||||
|
|
||||||
The Kitchen module provides a python API for all sorts of little useful
|
The Kitchen module provides a python API for all sorts of little useful
|
||||||
snippets of code that everybody ends up writing for their projects but never
|
snippets of code that everybody ends up writing for their projects but never
|
||||||
|
@ -38,12 +39,15 @@ Requirements
|
||||||
|
|
||||||
kitchen.core requires
|
kitchen.core requires
|
||||||
|
|
||||||
:python: 2.3.1 or later
|
:python: 2.4 or later
|
||||||
|
|
||||||
|
Since version 1.2.0, this package has distributed both python2 and python3
|
||||||
|
compatible versions of the source.
|
||||||
|
|
||||||
Soft Requirements
|
Soft Requirements
|
||||||
=================
|
=================
|
||||||
|
|
||||||
If found, these libraries will be used to make the implementation of soemthing
|
If found, these libraries will be used to make the implementation of something
|
||||||
better in some way. If they are not present, the API that they enable will
|
better in some way. If they are not present, the API that they enable will
|
||||||
still exist but may function in a different manner.
|
still exist but may function in a different manner.
|
||||||
|
|
||||||
|
@ -78,4 +82,5 @@ Testing
|
||||||
=======
|
=======
|
||||||
|
|
||||||
You can run the unittests with this command::
|
You can run the unittests with this command::
|
||||||
nosetests --with-coverage --cover-package kitchen
|
|
||||||
|
./runtests.sh
|
|
@ -10,10 +10,10 @@ Style
|
||||||
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
|
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
|
||||||
|
|
||||||
------------------------
|
------------------------
|
||||||
Python 2.3 compatibility
|
Python 2.4 compatibility
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
At the moment, we're supporting python-2.3 and above. Understand that there's
|
At the moment, we're supporting python-2.4 and above. Understand that there's
|
||||||
a lot of python features that we cannot use because of this.
|
a lot of python features that we cannot use because of this.
|
||||||
|
|
||||||
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
|
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
|
||||||
|
@ -23,7 +23,7 @@ available. When we do that we need to be careful of several things:
|
||||||
:file:`maintainers/sync-copied-files.py` for this.
|
:file:`maintainers/sync-copied-files.py` for this.
|
||||||
2. Sync the unittests as well as the module.
|
2. Sync the unittests as well as the module.
|
||||||
3. Be aware that not all modules are written to remain compatible with
|
3. Be aware that not all modules are written to remain compatible with
|
||||||
Python-2.3 and might use python language features that were not present
|
Python-2.4 and might use python language features that were not present
|
||||||
then (generator expressions, relative imports, decorators, with, try: with
|
then (generator expressions, relative imports, decorators, with, try: with
|
||||||
both except: and finally:, etc) These are not good candidates for
|
both except: and finally:, etc) These are not good candidates for
|
||||||
importing into kitchen as they require more work to keep synced.
|
importing into kitchen as they require more work to keep synced.
|
||||||
|
@ -56,7 +56,7 @@ Unittests
|
||||||
|
|
||||||
* We're using nose for unittesting. Rather than depend on unittest2
|
* We're using nose for unittesting. Rather than depend on unittest2
|
||||||
functionality, use the functions that nose provides.
|
functionality, use the functions that nose provides.
|
||||||
* Remember to maintain python-2.3 compatibility even in unittests.
|
* Remember to maintain python-2.4 compatibility even in unittests.
|
||||||
|
|
||||||
----------------------------
|
----------------------------
|
||||||
Docstrings and documentation
|
Docstrings and documentation
|
|
@ -9,7 +9,7 @@ Kitchen, everything but the sink
|
||||||
We've all done it. In the process of writing a brand new application we've
|
We've all done it. In the process of writing a brand new application we've
|
||||||
discovered that we need a little bit of code that we've invented before.
|
discovered that we need a little bit of code that we've invented before.
|
||||||
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
||||||
a bit of python-2.5 code run on python-2.3. Whatever it is, it ends up being
|
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
|
||||||
a tiny bit of code that seems too small to worry about pushing into its own
|
a tiny bit of code that seems too small to worry about pushing into its own
|
||||||
module so it sits there, a part of your current project, waiting to be cut and
|
module so it sits there, a part of your current project, waiting to be cut and
|
||||||
pasted into your next project. And the next. And the next. And since that
|
pasted into your next project. And the next. And the next. And since that
|
||||||
|
@ -37,11 +37,9 @@ Requirements
|
||||||
We've tried to keep the core kitchen module's requirements lightweight. At the
|
We've tried to keep the core kitchen module's requirements lightweight. At the
|
||||||
moment kitchen only requires
|
moment kitchen only requires
|
||||||
|
|
||||||
:python: 2.3.1 or later
|
:python: 2.4 or later
|
||||||
|
|
||||||
.. warning:: Kitchen-1.1.0 is likely to be the last release that supports
|
.. warning:: Kitchen-1.1.0 was the last release that supported python-2.3.x
|
||||||
python-2.3.x. Future releases will target python-2.4 as the minimum
|
|
||||||
required version.
|
|
||||||
|
|
||||||
Soft Requirements
|
Soft Requirements
|
||||||
=================
|
=================
|
||||||
|
@ -73,9 +71,9 @@ now, I just mention them here:
|
||||||
lists and dicts, transforming the dicts to Bunch's.
|
lists and dicts, transforming the dicts to Bunch's.
|
||||||
`hashlib <http://code.krypto.org/python/hashlib/>`_
|
`hashlib <http://code.krypto.org/python/hashlib/>`_
|
||||||
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
|
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
|
||||||
hash functions to python. If you're developing for python2.3 or
|
hash functions to python. If you're developing for python2.4 though, you
|
||||||
python2.4, though, you can install the standalone hashlib library and have
|
can install the standalone hashlib library and have access to the same
|
||||||
access to the same functions.
|
functions.
|
||||||
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
|
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
|
||||||
The python documentation for :mod:`itertools` has some examples
|
The python documentation for :mod:`itertools` has some examples
|
||||||
of other nice iterable functions that can be built from the
|
of other nice iterable functions that can be built from the
|
|
@ -6,12 +6,12 @@
|
||||||
# modify it under the terms of the GNU Lesser General Public
|
# modify it under the terms of the GNU Lesser General Public
|
||||||
# License as published by the Free Software Foundation; either
|
# License as published by the Free Software Foundation; either
|
||||||
# version 2.1 of the License, or (at your option) any later version.
|
# version 2.1 of the License, or (at your option) any later version.
|
||||||
#
|
#
|
||||||
# kitchen is distributed in the hope that it will be useful,
|
# kitchen is distributed in the hope that it will be useful,
|
||||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
# Lesser General Public License for more details.
|
# Lesser General Public License for more details.
|
||||||
#
|
#
|
||||||
# You should have received a copy of the GNU Lesser General Public
|
# You should have received a copy of the GNU Lesser General Public
|
||||||
# License along with kitchen; if not, see <http://www.gnu.org/licenses/>
|
# License along with kitchen; if not, see <http://www.gnu.org/licenses/>
|
||||||
#
|
#
|
||||||
|
@ -35,7 +35,7 @@ from kitchen import versioning
|
||||||
(b_, bN_) = i18n.easy_gettext_setup('kitchen.core', use_unicode=False)
|
(b_, bN_) = i18n.easy_gettext_setup('kitchen.core', use_unicode=False)
|
||||||
#pylint: enable-msg=C0103
|
#pylint: enable-msg=C0103
|
||||||
|
|
||||||
__version_info__ = ((1, 1, 1),)
|
__version_info__ = ((1, 2, 4),)
|
||||||
__version__ = versioning.version_tuple_to_string(__version_info__)
|
__version__ = versioning.version_tuple_to_string(__version_info__)
|
||||||
|
|
||||||
__all__ = ('exceptions', 'release',)
|
__all__ = ('exceptions', 'release',)
|
|
@ -1,6 +1,6 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
# Copyright (c) 2010-2011 Red Hat, Inc
|
# Copyright (c) 2010-2012 Red Hat, Inc
|
||||||
# Copyright (c) 2009 Milos Komarcevic
|
# Copyright (c) 2009 Milos Komarcevic
|
||||||
# Copyright (c) 2008 Tim Lauridsen
|
# Copyright (c) 2008 Tim Lauridsen
|
||||||
#
|
#
|
||||||
|
@ -89,7 +89,7 @@ See the documentation of :func:`easy_gettext_setup` and
|
||||||
|
|
||||||
from kitchen.versioning import version_tuple_to_string
|
from kitchen.versioning import version_tuple_to_string
|
||||||
|
|
||||||
__version_info__ = ((2, 1, 1),)
|
__version_info__ = ((2, 2, 0),)
|
||||||
__version__ = version_tuple_to_string(__version_info__)
|
__version__ = version_tuple_to_string(__version_info__)
|
||||||
|
|
||||||
import copy
|
import copy
|
||||||
|
@ -99,6 +99,7 @@ import itertools
|
||||||
import locale
|
import locale
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
import warnings
|
||||||
|
|
||||||
# We use the _default_localedir definition in get_translation_object
|
# We use the _default_localedir definition in get_translation_object
|
||||||
try:
|
try:
|
||||||
|
@ -107,7 +108,7 @@ except ImportError:
|
||||||
_DEFAULT_LOCALEDIR = os.path.join(sys.prefix, 'share', 'locale')
|
_DEFAULT_LOCALEDIR = os.path.join(sys.prefix, 'share', 'locale')
|
||||||
|
|
||||||
from kitchen.text.converters import to_bytes, to_unicode
|
from kitchen.text.converters import to_bytes, to_unicode
|
||||||
from kitchen.text.misc import byte_string_valid_encoding
|
from kitchen.text.misc import byte_string_valid_encoding, isbasestring
|
||||||
|
|
||||||
# We cache parts of the translation objects just like stdlib's gettext so that
|
# We cache parts of the translation objects just like stdlib's gettext so that
|
||||||
# we don't reparse the message files and keep them in memory separately if the
|
# we don't reparse the message files and keep them in memory separately if the
|
||||||
|
@ -199,9 +200,12 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
:func:`locale.getpreferredencoding`.
|
:func:`locale.getpreferredencoding`.
|
||||||
* Make setting :attr:`input_charset` and :attr:`output_charset` also
|
* Make setting :attr:`input_charset` and :attr:`output_charset` also
|
||||||
set those attributes on any fallback translation objects.
|
set those attributes on any fallback translation objects.
|
||||||
|
|
||||||
|
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
|
||||||
|
Add python2_api parameter to __init__()
|
||||||
'''
|
'''
|
||||||
#pylint: disable-msg=C0103,C0111
|
#pylint: disable-msg=C0103,C0111
|
||||||
def __init__(self, fp=None):
|
def __init__(self, fp=None, python2_api=True):
|
||||||
gettext.NullTranslations.__init__(self, fp)
|
gettext.NullTranslations.__init__(self, fp)
|
||||||
|
|
||||||
# Python 2.3 compat
|
# Python 2.3 compat
|
||||||
|
@ -212,6 +216,46 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
# 'utf-8' is only a default here. Users can override.
|
# 'utf-8' is only a default here. Users can override.
|
||||||
self._input_charset = 'utf-8'
|
self._input_charset = 'utf-8'
|
||||||
|
|
||||||
|
# Decide whether to mimic the python2 or python3 api
|
||||||
|
self.python2_api = python2_api
|
||||||
|
|
||||||
|
def _set_api(self):
|
||||||
|
if self._python2_api:
|
||||||
|
warnings.warn('Kitchen.i18n provides gettext objects that'
|
||||||
|
' implement either the python2 or python3 gettext api.'
|
||||||
|
' You are currently using the python2 api. Consider'
|
||||||
|
' switching to the python3 api by setting'
|
||||||
|
' python2_api=False when creating the gettext object',
|
||||||
|
PendingDeprecationWarning, stacklevel=2)
|
||||||
|
self.gettext = self._gettext
|
||||||
|
self.lgettext = self._lgettext
|
||||||
|
self.ugettext = self._ugettext
|
||||||
|
self.ngettext = self._ngettext
|
||||||
|
self.lngettext = self._lngettext
|
||||||
|
self.ungettext = self._ungettext
|
||||||
|
else:
|
||||||
|
self.gettext = self._ugettext
|
||||||
|
self.lgettext = self._lgettext
|
||||||
|
self.ngettext = self._ungettext
|
||||||
|
self.lngettext = self._lngettext
|
||||||
|
self.ugettext = self._removed_method_factory('ugettext')
|
||||||
|
self.ungettext = self._removed_method_factory('ungettext')
|
||||||
|
|
||||||
|
def _removed_method_factory(self, name):
|
||||||
|
def _removed_method(*args, **kwargs):
|
||||||
|
raise AttributeError("'%s' object has no attribute '%s'" %
|
||||||
|
(self.__class__.__name__, name))
|
||||||
|
return _removed_method
|
||||||
|
|
||||||
|
def _set_python2_api(self, value):
|
||||||
|
self._python2_api = value
|
||||||
|
self._set_api()
|
||||||
|
|
||||||
|
def _get_python2_api(self):
|
||||||
|
return self._python2_api
|
||||||
|
|
||||||
|
python2_api = property(_get_python2_api, _set_python2_api)
|
||||||
|
|
||||||
def _set_input_charset(self, charset):
|
def _set_input_charset(self, charset):
|
||||||
if self._fallback:
|
if self._fallback:
|
||||||
try:
|
try:
|
||||||
|
@ -276,7 +320,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
# Make sure that we're returning a str of the desired encoding
|
# Make sure that we're returning a str of the desired encoding
|
||||||
return to_bytes(msg, encoding=output_encoding)
|
return to_bytes(msg, encoding=output_encoding)
|
||||||
|
|
||||||
def gettext(self, message):
|
def _gettext(self, message):
|
||||||
# First use any fallback gettext objects. Since DummyTranslations
|
# First use any fallback gettext objects. Since DummyTranslations
|
||||||
# doesn't do any translation on its own, this is a good first step.
|
# doesn't do any translation on its own, this is a good first step.
|
||||||
if self._fallback:
|
if self._fallback:
|
||||||
|
@ -292,7 +336,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(message, output_encoding)
|
return self._reencode_if_necessary(message, output_encoding)
|
||||||
|
|
||||||
def ngettext(self, msgid1, msgid2, n):
|
def _ngettext(self, msgid1, msgid2, n):
|
||||||
# Default
|
# Default
|
||||||
if n == 1:
|
if n == 1:
|
||||||
message = msgid1
|
message = msgid1
|
||||||
|
@ -313,7 +357,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(message, output_encoding)
|
return self._reencode_if_necessary(message, output_encoding)
|
||||||
|
|
||||||
def lgettext(self, message):
|
def _lgettext(self, message):
|
||||||
if self._fallback:
|
if self._fallback:
|
||||||
try:
|
try:
|
||||||
message = self._fallback.lgettext(message)
|
message = self._fallback.lgettext(message)
|
||||||
|
@ -329,7 +373,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(message, output_encoding)
|
return self._reencode_if_necessary(message, output_encoding)
|
||||||
|
|
||||||
def lngettext(self, msgid1, msgid2, n):
|
def _lngettext(self, msgid1, msgid2, n):
|
||||||
# Default
|
# Default
|
||||||
if n == 1:
|
if n == 1:
|
||||||
message = msgid1
|
message = msgid1
|
||||||
|
@ -351,8 +395,8 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(message, output_encoding)
|
return self._reencode_if_necessary(message, output_encoding)
|
||||||
|
|
||||||
def ugettext(self, message):
|
def _ugettext(self, message):
|
||||||
if not isinstance(message, basestring):
|
if not isbasestring(message):
|
||||||
return u''
|
return u''
|
||||||
if self._fallback:
|
if self._fallback:
|
||||||
msg = to_unicode(message, encoding=self.input_charset)
|
msg = to_unicode(message, encoding=self.input_charset)
|
||||||
|
@ -365,7 +409,7 @@ class DummyTranslations(object, gettext.NullTranslations):
|
||||||
# Make sure we're returning unicode
|
# Make sure we're returning unicode
|
||||||
return to_unicode(message, encoding=self.input_charset)
|
return to_unicode(message, encoding=self.input_charset)
|
||||||
|
|
||||||
def ungettext(self, msgid1, msgid2, n):
|
def _ungettext(self, msgid1, msgid2, n):
|
||||||
# Default
|
# Default
|
||||||
if n == 1:
|
if n == 1:
|
||||||
message = msgid1
|
message = msgid1
|
||||||
|
@ -474,8 +518,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
def _parse(self, fp):
|
def _parse(self, fp):
|
||||||
gettext.GNUTranslations._parse(self, fp)
|
gettext.GNUTranslations._parse(self, fp)
|
||||||
|
|
||||||
def gettext(self, message):
|
def _gettext(self, message):
|
||||||
if not isinstance(message, basestring):
|
if not isbasestring(message):
|
||||||
return ''
|
return ''
|
||||||
tmsg = message
|
tmsg = message
|
||||||
u_message = to_unicode(message, encoding=self.input_charset)
|
u_message = to_unicode(message, encoding=self.input_charset)
|
||||||
|
@ -495,13 +539,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||||
|
|
||||||
def ngettext(self, msgid1, msgid2, n):
|
def _ngettext(self, msgid1, msgid2, n):
|
||||||
if n == 1:
|
if n == 1:
|
||||||
tmsg = msgid1
|
tmsg = msgid1
|
||||||
else:
|
else:
|
||||||
tmsg = msgid2
|
tmsg = msgid2
|
||||||
|
|
||||||
if not isinstance(msgid1, basestring):
|
if not isbasestring(msgid1):
|
||||||
return ''
|
return ''
|
||||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||||
try:
|
try:
|
||||||
|
@ -521,8 +565,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||||
|
|
||||||
def lgettext(self, message):
|
def _lgettext(self, message):
|
||||||
if not isinstance(message, basestring):
|
if not isbasestring(message):
|
||||||
return ''
|
return ''
|
||||||
tmsg = message
|
tmsg = message
|
||||||
u_message = to_unicode(message, encoding=self.input_charset)
|
u_message = to_unicode(message, encoding=self.input_charset)
|
||||||
|
@ -542,13 +586,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
|
|
||||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||||
|
|
||||||
def lngettext(self, msgid1, msgid2, n):
|
def _lngettext(self, msgid1, msgid2, n):
|
||||||
if n == 1:
|
if n == 1:
|
||||||
tmsg = msgid1
|
tmsg = msgid1
|
||||||
else:
|
else:
|
||||||
tmsg = msgid2
|
tmsg = msgid2
|
||||||
|
|
||||||
if not isinstance(msgid1, basestring):
|
if not isbasestring(msgid1):
|
||||||
return ''
|
return ''
|
||||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||||
try:
|
try:
|
||||||
|
@ -557,7 +601,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
except KeyError:
|
except KeyError:
|
||||||
if self._fallback:
|
if self._fallback:
|
||||||
try:
|
try:
|
||||||
tmsg = self._fallback.ngettext(msgid1, msgid2, n)
|
tmsg = self._fallback.lngettext(msgid1, msgid2, n)
|
||||||
except (AttributeError, UnicodeError):
|
except (AttributeError, UnicodeError):
|
||||||
# Ignore UnicodeErrors: We'll do our own encoding next
|
# Ignore UnicodeErrors: We'll do our own encoding next
|
||||||
pass
|
pass
|
||||||
|
@ -569,8 +613,8 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
return self._reencode_if_necessary(tmsg, output_encoding)
|
return self._reencode_if_necessary(tmsg, output_encoding)
|
||||||
|
|
||||||
|
|
||||||
def ugettext(self, message):
|
def _ugettext(self, message):
|
||||||
if not isinstance(message, basestring):
|
if not isbasestring(message):
|
||||||
return u''
|
return u''
|
||||||
message = to_unicode(message, encoding=self.input_charset)
|
message = to_unicode(message, encoding=self.input_charset)
|
||||||
try:
|
try:
|
||||||
|
@ -586,13 +630,13 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
# Make sure that we're returning unicode
|
# Make sure that we're returning unicode
|
||||||
return to_unicode(message, encoding=self.input_charset)
|
return to_unicode(message, encoding=self.input_charset)
|
||||||
|
|
||||||
def ungettext(self, msgid1, msgid2, n):
|
def _ungettext(self, msgid1, msgid2, n):
|
||||||
if n == 1:
|
if n == 1:
|
||||||
tmsg = msgid1
|
tmsg = msgid1
|
||||||
else:
|
else:
|
||||||
tmsg = msgid2
|
tmsg = msgid2
|
||||||
|
|
||||||
if not isinstance(msgid1, basestring):
|
if not isbasestring(msgid1):
|
||||||
return u''
|
return u''
|
||||||
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
u_msgid1 = to_unicode(msgid1, encoding=self.input_charset)
|
||||||
try:
|
try:
|
||||||
|
@ -612,7 +656,7 @@ class NewGNUTranslations(DummyTranslations, gettext.GNUTranslations):
|
||||||
|
|
||||||
|
|
||||||
def get_translation_object(domain, localedirs=tuple(), languages=None,
|
def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||||
class_=None, fallback=True, codeset=None):
|
class_=None, fallback=True, codeset=None, python2_api=True):
|
||||||
'''Get a translation object bound to the :term:`message catalogs`
|
'''Get a translation object bound to the :term:`message catalogs`
|
||||||
|
|
||||||
:arg domain: Name of the message domain. This should be a unique name
|
:arg domain: Name of the message domain. This should be a unique name
|
||||||
|
@ -650,6 +694,15 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||||
:class:`str` objects. This is equivalent to calling
|
:class:`str` objects. This is equivalent to calling
|
||||||
:meth:`~gettext.GNUTranslations.output_charset` on the Translations
|
:meth:`~gettext.GNUTranslations.output_charset` on the Translations
|
||||||
object that is returned from this function.
|
object that is returned from this function.
|
||||||
|
:kwarg python2_api: When data:`True` (default), return Translation objects
|
||||||
|
that use the python2 gettext api
|
||||||
|
(:meth:`~gettext.GNUTranslations.gettext` and
|
||||||
|
:meth:`~gettext.GNUTranslations.lgettext` return byte
|
||||||
|
:class:`str`. :meth:`~gettext.GNUTranslations.ugettext` exists and
|
||||||
|
returns :class:`unicode` strings). When :data:`False`, return
|
||||||
|
Translation objects that use the python3 gettext api (gettext returns
|
||||||
|
:class:`unicode` strings and lgettext returns byte :class:`str`.
|
||||||
|
ugettext does not exist.)
|
||||||
:return: Translation object to get :mod:`gettext` methods from
|
:return: Translation object to get :mod:`gettext` methods from
|
||||||
|
|
||||||
If you need more flexibility than :func:`easy_gettext_setup`, use this
|
If you need more flexibility than :func:`easy_gettext_setup`, use this
|
||||||
|
@ -730,7 +783,16 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||||
than simply cycling through until we find a directory that exists.
|
than simply cycling through until we find a directory that exists.
|
||||||
The new code is based heavily on the |stdlib|_
|
The new code is based heavily on the |stdlib|_
|
||||||
:func:`gettext.translation` function.
|
:func:`gettext.translation` function.
|
||||||
|
.. versionchanged:: kitchen-1.2.0 ; API kitchen.i18n 2.2.0
|
||||||
|
Add python2_api parameter
|
||||||
'''
|
'''
|
||||||
|
if python2_api:
|
||||||
|
warnings.warn('get_translation_object returns gettext objects'
|
||||||
|
' that implement either the python2 or python3 gettext api.'
|
||||||
|
' You are currently using the python2 api. Consider'
|
||||||
|
' switching to the python3 api by setting python2_api=False'
|
||||||
|
' when you call the function.',
|
||||||
|
PendingDeprecationWarning, stacklevel=2)
|
||||||
if not class_:
|
if not class_:
|
||||||
class_ = NewGNUTranslations
|
class_ = NewGNUTranslations
|
||||||
|
|
||||||
|
@ -739,7 +801,7 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||||
mofiles.extend(gettext.find(domain, localedir, languages, all=1))
|
mofiles.extend(gettext.find(domain, localedir, languages, all=1))
|
||||||
if not mofiles:
|
if not mofiles:
|
||||||
if fallback:
|
if fallback:
|
||||||
return DummyTranslations()
|
return DummyTranslations(python2_api=python2_api)
|
||||||
raise IOError(ENOENT, 'No translation file found for domain', domain)
|
raise IOError(ENOENT, 'No translation file found for domain', domain)
|
||||||
|
|
||||||
# Accumulate a translation with fallbacks to all the other mofiles
|
# Accumulate a translation with fallbacks to all the other mofiles
|
||||||
|
@ -750,14 +812,22 @@ def get_translation_object(domain, localedirs=tuple(), languages=None,
|
||||||
if not translation:
|
if not translation:
|
||||||
mofile_fh = open(full_path, 'rb')
|
mofile_fh = open(full_path, 'rb')
|
||||||
try:
|
try:
|
||||||
translation = _translations.setdefault(full_path,
|
try:
|
||||||
class_(mofile_fh))
|
translation = _translations.setdefault(full_path,
|
||||||
|
class_(mofile_fh, python2_api=python2_api))
|
||||||
|
except TypeError:
|
||||||
|
# Only our translation classes have the python2_api
|
||||||
|
# parameter
|
||||||
|
translation = _translations.setdefault(full_path,
|
||||||
|
class_(mofile_fh))
|
||||||
|
|
||||||
finally:
|
finally:
|
||||||
mofile_fh.close()
|
mofile_fh.close()
|
||||||
|
|
||||||
# Shallow copy the object so that the fallbacks and output charset can
|
# Shallow copy the object so that the fallbacks and output charset can
|
||||||
# differ but the data we read from the mofile is shared.
|
# differ but the data we read from the mofile is shared.
|
||||||
translation = copy.copy(translation)
|
translation = copy.copy(translation)
|
||||||
|
translation.python2_api = python2_api
|
||||||
if codeset:
|
if codeset:
|
||||||
translation.set_output_charset(codeset)
|
translation.set_output_charset(codeset)
|
||||||
if not stacked_translations:
|
if not stacked_translations:
|
||||||
|
@ -818,9 +888,9 @@ def easy_gettext_setup(domain, localedirs=tuple(), use_unicode=True):
|
||||||
Changed :func:`~kitchen.i18n.easy_gettext_setup` to return the lgettext
|
Changed :func:`~kitchen.i18n.easy_gettext_setup` to return the lgettext
|
||||||
functions instead of gettext functions when use_unicode=False.
|
functions instead of gettext functions when use_unicode=False.
|
||||||
'''
|
'''
|
||||||
translations = get_translation_object(domain, localedirs=localedirs)
|
translations = get_translation_object(domain, localedirs=localedirs, python2_api=False)
|
||||||
if use_unicode:
|
if use_unicode:
|
||||||
return(translations.ugettext, translations.ungettext)
|
return(translations.gettext, translations.ngettext)
|
||||||
return(translations.lgettext, translations.lngettext)
|
return(translations.lgettext, translations.lngettext)
|
||||||
|
|
||||||
__all__ = ('DummyTranslations', 'NewGNUTranslations', 'easy_gettext_setup',
|
__all__ = ('DummyTranslations', 'NewGNUTranslations', 'easy_gettext_setup',
|
|
@ -1,6 +1,6 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
# Copyright (c) 2010 Red Hat, Inc
|
# Copyright (c) 2012 Red Hat, Inc
|
||||||
#
|
#
|
||||||
# kitchen is free software; you can redistribute it and/or modify it under the
|
# kitchen is free software; you can redistribute it and/or modify it under the
|
||||||
# terms of the GNU Lesser General Public License as published by the Free
|
# terms of the GNU Lesser General Public License as published by the Free
|
||||||
|
@ -34,6 +34,8 @@ from kitchen.versioning import version_tuple_to_string
|
||||||
__version_info__ = ((0, 0, 1),)
|
__version_info__ = ((0, 0, 1),)
|
||||||
__version__ = version_tuple_to_string(__version_info__)
|
__version__ = version_tuple_to_string(__version_info__)
|
||||||
|
|
||||||
|
from kitchen.text.misc import isbasestring
|
||||||
|
|
||||||
def isiterable(obj, include_string=False):
|
def isiterable(obj, include_string=False):
|
||||||
'''Check whether an object is an iterable
|
'''Check whether an object is an iterable
|
||||||
|
|
||||||
|
@ -46,7 +48,7 @@ def isiterable(obj, include_string=False):
|
||||||
:returns: :data:`True` if :attr:`obj` is iterable, otherwise
|
:returns: :data:`True` if :attr:`obj` is iterable, otherwise
|
||||||
:data:`False`.
|
:data:`False`.
|
||||||
'''
|
'''
|
||||||
if include_string or not isinstance(obj, basestring):
|
if include_string or not isbasestring(obj):
|
||||||
try:
|
try:
|
||||||
iter(obj)
|
iter(obj)
|
||||||
except TypeError:
|
except TypeError:
|
|
@ -78,8 +78,6 @@ the defaultdict class provided by python-2.5 and above.
|
||||||
|
|
||||||
import types
|
import types
|
||||||
|
|
||||||
from kitchen import b_
|
|
||||||
|
|
||||||
# :C0103, W0613: We're implementing the python-2.5 defaultdict API so
|
# :C0103, W0613: We're implementing the python-2.5 defaultdict API so
|
||||||
# we have to use the same names as python.
|
# we have to use the same names as python.
|
||||||
# :C0111: We point people at the stdlib API docs for defaultdict rather than
|
# :C0111: We point people at the stdlib API docs for defaultdict rather than
|
||||||
|
@ -89,8 +87,8 @@ from kitchen import b_
|
||||||
class defaultdict(dict):
|
class defaultdict(dict):
|
||||||
def __init__(self, default_factory=None, *args, **kwargs):
|
def __init__(self, default_factory=None, *args, **kwargs):
|
||||||
if (default_factory is not None and
|
if (default_factory is not None and
|
||||||
not hasattr(default_factory, '__call__')):
|
not hasattr(default_factory, '__call__')):
|
||||||
raise TypeError(b_('First argument must be callable'))
|
raise TypeError('First argument must be callable')
|
||||||
dict.__init__(self, *args, **kwargs)
|
dict.__init__(self, *args, **kwargs)
|
||||||
self.default_factory = default_factory
|
self.default_factory = default_factory
|
||||||
|
|
|
@ -26,9 +26,9 @@ snippets so that you can get on with your life.
|
||||||
''')
|
''')
|
||||||
AUTHOR = 'Toshio Kuratomi, Seth Vidal, others'
|
AUTHOR = 'Toshio Kuratomi, Seth Vidal, others'
|
||||||
EMAIL = 'toshio@fedoraproject.org'
|
EMAIL = 'toshio@fedoraproject.org'
|
||||||
COPYRIGHT = '2011 Red Hat, Inc. and others'
|
COPYRIGHT = '2012 Red Hat, Inc. and others'
|
||||||
URL = 'https://fedorahosted.org/kitchen'
|
URL = 'https://fedorahosted.org/kitchen'
|
||||||
DOWNLOAD_URL = 'https://fedorahosted.org/releases/k/i/kitchen'
|
DOWNLOAD_URL = 'https://pypi.python.org/pypi/kitchen'
|
||||||
LICENSE = 'LGPLv2+'
|
LICENSE = 'LGPLv2+'
|
||||||
|
|
||||||
__all__ = ('NAME', 'VERSION', 'DESCRIPTION', 'LONG_DESCRIPTION', 'AUTHOR',
|
__all__ = ('NAME', 'VERSION', 'DESCRIPTION', 'LONG_DESCRIPTION', 'AUTHOR',
|
|
@ -11,7 +11,7 @@ and displaying text on the screen.
|
||||||
|
|
||||||
from kitchen.versioning import version_tuple_to_string
|
from kitchen.versioning import version_tuple_to_string
|
||||||
|
|
||||||
__version_info__ = ((2, 1, 1),)
|
__version_info__ = ((2, 2, 0),)
|
||||||
__version__ = version_tuple_to_string(__version_info__)
|
__version__ = version_tuple_to_string(__version_info__)
|
||||||
|
|
||||||
__all__ = ('converters', 'exceptions', 'misc',)
|
__all__ = ('converters', 'exceptions', 'misc',)
|
|
@ -1,6 +1,6 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
# Copyright (c) 2011 Red Hat, Inc.
|
# Copyright (c) 2012 Red Hat, Inc.
|
||||||
#
|
#
|
||||||
# kitchen is free software; you can redistribute it and/or
|
# kitchen is free software; you can redistribute it and/or
|
||||||
# modify it under the terms of the GNU Lesser General Public
|
# modify it under the terms of the GNU Lesser General Public
|
||||||
|
@ -50,15 +50,12 @@ import codecs
|
||||||
import warnings
|
import warnings
|
||||||
import xml.sax.saxutils
|
import xml.sax.saxutils
|
||||||
|
|
||||||
# We need to access b_() for localizing our strings but we'll end up with
|
|
||||||
# a circular import if we import it directly.
|
|
||||||
import kitchen as k
|
|
||||||
from kitchen.pycompat24 import sets
|
from kitchen.pycompat24 import sets
|
||||||
sets.add_builtin_set()
|
sets.add_builtin_set()
|
||||||
|
|
||||||
from kitchen.text.exceptions import ControlCharError, XmlEncodeError
|
from kitchen.text.exceptions import ControlCharError, XmlEncodeError
|
||||||
from kitchen.text.misc import guess_encoding, html_entities_unescape, \
|
from kitchen.text.misc import guess_encoding, html_entities_unescape, \
|
||||||
process_control_chars
|
isbytestring, isunicodestring, process_control_chars
|
||||||
|
|
||||||
#: Aliases for the utf-8 codec
|
#: Aliases for the utf-8 codec
|
||||||
_UTF8_ALIASES = frozenset(('utf-8', 'UTF-8', 'utf8', 'UTF8', 'utf_8', 'UTF_8',
|
_UTF8_ALIASES = frozenset(('utf-8', 'UTF-8', 'utf8', 'UTF8', 'utf_8', 'UTF_8',
|
||||||
|
@ -127,6 +124,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter and changed
|
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter and changed
|
||||||
default value to ``simplerepr``
|
default value to ``simplerepr``
|
||||||
'''
|
'''
|
||||||
|
# Could use isbasestring/isunicode here but we want this code to be as
|
||||||
|
# fast as possible
|
||||||
if isinstance(obj, basestring):
|
if isinstance(obj, basestring):
|
||||||
if isinstance(obj, unicode):
|
if isinstance(obj, unicode):
|
||||||
return obj
|
return obj
|
||||||
|
@ -137,8 +136,8 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
return obj.decode(encoding, errors)
|
return obj.decode(encoding, errors)
|
||||||
|
|
||||||
if non_string:
|
if non_string:
|
||||||
warnings.warn(k.b_('non_string is a deprecated parameter of'
|
warnings.warn('non_string is a deprecated parameter of'
|
||||||
' to_unicode(). Use nonstring instead'), DeprecationWarning,
|
' to_unicode(). Use nonstring instead', DeprecationWarning,
|
||||||
stacklevel=2)
|
stacklevel=2)
|
||||||
if not nonstring:
|
if not nonstring:
|
||||||
nonstring = non_string
|
nonstring = non_string
|
||||||
|
@ -162,21 +161,21 @@ def to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
simple = obj.__str__()
|
simple = obj.__str__()
|
||||||
except (UnicodeError, AttributeError):
|
except (UnicodeError, AttributeError):
|
||||||
simple = u''
|
simple = u''
|
||||||
if not isinstance(simple, unicode):
|
if isbytestring(simple):
|
||||||
return unicode(simple, encoding, errors)
|
return unicode(simple, encoding, errors)
|
||||||
return simple
|
return simple
|
||||||
elif nonstring in ('repr', 'strict'):
|
elif nonstring in ('repr', 'strict'):
|
||||||
obj_repr = repr(obj)
|
obj_repr = repr(obj)
|
||||||
if not isinstance(obj_repr, unicode):
|
if isbytestring(obj_repr):
|
||||||
obj_repr = unicode(obj_repr, encoding, errors)
|
obj_repr = unicode(obj_repr, encoding, errors)
|
||||||
if nonstring == 'repr':
|
if nonstring == 'repr':
|
||||||
return obj_repr
|
return obj_repr
|
||||||
raise TypeError(k.b_('to_unicode was given "%(obj)s" which is neither'
|
raise TypeError('to_unicode was given "%(obj)s" which is neither'
|
||||||
' a byte string (str) or a unicode string') %
|
' a byte string (str) or a unicode string' %
|
||||||
{'obj': obj_repr.encode(encoding, 'replace')})
|
{'obj': obj_repr.encode(encoding, 'replace')})
|
||||||
|
|
||||||
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
|
raise TypeError('nonstring value, %(param)s, is not set to a valid'
|
||||||
' action') % {'param': nonstring})
|
' action' % {'param': nonstring})
|
||||||
|
|
||||||
def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
non_string=None):
|
non_string=None):
|
||||||
|
@ -247,13 +246,15 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter
|
Deprecated :attr:`non_string` in favor of :attr:`nonstring` parameter
|
||||||
and changed default value to ``simplerepr``
|
and changed default value to ``simplerepr``
|
||||||
'''
|
'''
|
||||||
|
# Could use isbasestring, isbytestring here but we want this to be as fast
|
||||||
|
# as possible
|
||||||
if isinstance(obj, basestring):
|
if isinstance(obj, basestring):
|
||||||
if isinstance(obj, str):
|
if isinstance(obj, str):
|
||||||
return obj
|
return obj
|
||||||
return obj.encode(encoding, errors)
|
return obj.encode(encoding, errors)
|
||||||
if non_string:
|
if non_string:
|
||||||
warnings.warn(k.b_('non_string is a deprecated parameter of'
|
warnings.warn('non_string is a deprecated parameter of'
|
||||||
' to_bytes(). Use nonstring instead'), DeprecationWarning,
|
' to_bytes(). Use nonstring instead', DeprecationWarning,
|
||||||
stacklevel=2)
|
stacklevel=2)
|
||||||
if not nonstring:
|
if not nonstring:
|
||||||
nonstring = non_string
|
nonstring = non_string
|
||||||
|
@ -277,7 +278,7 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
simple = obj.__unicode__()
|
simple = obj.__unicode__()
|
||||||
except (AttributeError, UnicodeError):
|
except (AttributeError, UnicodeError):
|
||||||
simple = ''
|
simple = ''
|
||||||
if isinstance(simple, unicode):
|
if isunicodestring(simple):
|
||||||
simple = simple.encode(encoding, 'replace')
|
simple = simple.encode(encoding, 'replace')
|
||||||
return simple
|
return simple
|
||||||
elif nonstring in ('repr', 'strict'):
|
elif nonstring in ('repr', 'strict'):
|
||||||
|
@ -285,17 +286,17 @@ def to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
|
||||||
obj_repr = obj.__repr__()
|
obj_repr = obj.__repr__()
|
||||||
except (AttributeError, UnicodeError):
|
except (AttributeError, UnicodeError):
|
||||||
obj_repr = ''
|
obj_repr = ''
|
||||||
if isinstance(obj_repr, unicode):
|
if isunicodestring(obj_repr):
|
||||||
obj_repr = obj_repr.encode(encoding, errors)
|
obj_repr = obj_repr.encode(encoding, errors)
|
||||||
else:
|
else:
|
||||||
obj_repr = str(obj_repr)
|
obj_repr = str(obj_repr)
|
||||||
if nonstring == 'repr':
|
if nonstring == 'repr':
|
||||||
return obj_repr
|
return obj_repr
|
||||||
raise TypeError(k.b_('to_bytes was given "%(obj)s" which is neither'
|
raise TypeError('to_bytes was given "%(obj)s" which is neither'
|
||||||
' a unicode string or a byte string (str)') % {'obj': obj_repr})
|
' a unicode string or a byte string (str)' % {'obj': obj_repr})
|
||||||
|
|
||||||
raise TypeError(k.b_('nonstring value, %(param)s, is not set to a valid'
|
raise TypeError('nonstring value, %(param)s, is not set to a valid'
|
||||||
' action') % {'param': nonstring})
|
' action' % {'param': nonstring})
|
||||||
|
|
||||||
def getwriter(encoding):
|
def getwriter(encoding):
|
||||||
'''Return a :class:`codecs.StreamWriter` that resists tracing back.
|
'''Return a :class:`codecs.StreamWriter` that resists tracing back.
|
||||||
|
@ -375,9 +376,9 @@ def to_utf8(obj, errors='replace', non_string='passthru'):
|
||||||
|
|
||||||
to_bytes(obj, encoding='utf-8', non_string='passthru')
|
to_bytes(obj, encoding='utf-8', non_string='passthru')
|
||||||
'''
|
'''
|
||||||
warnings.warn(k.b_('kitchen.text.converters.to_utf8 is deprecated. Use'
|
warnings.warn('kitchen.text.converters.to_utf8 is deprecated. Use'
|
||||||
' kitchen.text.converters.to_bytes(obj, encoding="utf-8",'
|
' kitchen.text.converters.to_bytes(obj, encoding="utf-8",'
|
||||||
' nonstring="passthru" instead.'), DeprecationWarning, stacklevel=2)
|
' nonstring="passthru" instead.', DeprecationWarning, stacklevel=2)
|
||||||
return to_bytes(obj, encoding='utf-8', errors=errors,
|
return to_bytes(obj, encoding='utf-8', errors=errors,
|
||||||
nonstring=non_string)
|
nonstring=non_string)
|
||||||
|
|
||||||
|
@ -400,9 +401,8 @@ def to_str(obj):
|
||||||
|
|
||||||
to_bytes(obj, nonstring='simplerepr')
|
to_bytes(obj, nonstring='simplerepr')
|
||||||
'''
|
'''
|
||||||
warnings.warn(k.b_('to_str is deprecated. Use to_unicode or to_bytes'
|
warnings.warn('to_str is deprecated. Use to_unicode or to_bytes'
|
||||||
' instead. See the to_str docstring for'
|
' instead. See the to_str docstring for porting information.',
|
||||||
' porting information.'),
|
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
return to_bytes(obj, nonstring='simplerepr')
|
return to_bytes(obj, nonstring='simplerepr')
|
||||||
|
|
||||||
|
@ -682,22 +682,23 @@ def unicode_to_xml(string, encoding='utf-8', attrib=False,
|
||||||
try:
|
try:
|
||||||
process_control_chars(string, strategy=control_chars)
|
process_control_chars(string, strategy=control_chars)
|
||||||
except TypeError:
|
except TypeError:
|
||||||
raise XmlEncodeError(k.b_('unicode_to_xml must have a unicode type as'
|
raise XmlEncodeError('unicode_to_xml must have a unicode type as'
|
||||||
' the first argument. Use bytes_string_to_xml for byte'
|
' the first argument. Use bytes_string_to_xml for byte'
|
||||||
' strings.'))
|
' strings.')
|
||||||
except ValueError:
|
except ValueError:
|
||||||
raise ValueError(k.b_('The control_chars argument to unicode_to_xml'
|
raise ValueError('The control_chars argument to unicode_to_xml'
|
||||||
' must be one of ignore, replace, or strict'))
|
' must be one of ignore, replace, or strict')
|
||||||
except ControlCharError, exc:
|
except ControlCharError, exc:
|
||||||
raise XmlEncodeError(exc.args[0])
|
raise XmlEncodeError(exc.args[0])
|
||||||
|
|
||||||
string = string.encode(encoding, 'xmlcharrefreplace')
|
|
||||||
|
|
||||||
# Escape characters that have special meaning in xml
|
# Escape characters that have special meaning in xml
|
||||||
if attrib:
|
if attrib:
|
||||||
string = xml.sax.saxutils.escape(string, entities={'"':"""})
|
string = xml.sax.saxutils.escape(string, entities={'"':"""})
|
||||||
else:
|
else:
|
||||||
string = xml.sax.saxutils.escape(string)
|
string = xml.sax.saxutils.escape(string)
|
||||||
|
|
||||||
|
string = string.encode(encoding, 'xmlcharrefreplace')
|
||||||
|
|
||||||
return string
|
return string
|
||||||
|
|
||||||
def xml_to_unicode(byte_string, encoding='utf-8', errors='replace'):
|
def xml_to_unicode(byte_string, encoding='utf-8', errors='replace'):
|
||||||
|
@ -782,10 +783,10 @@ def byte_string_to_xml(byte_string, input_encoding='utf-8', errors='replace',
|
||||||
:func:`unicode_to_xml`
|
:func:`unicode_to_xml`
|
||||||
for other ideas on using this function
|
for other ideas on using this function
|
||||||
'''
|
'''
|
||||||
if not isinstance(byte_string, str):
|
if not isbytestring(byte_string):
|
||||||
raise XmlEncodeError(k.b_('byte_string_to_xml can only take a byte'
|
raise XmlEncodeError('byte_string_to_xml can only take a byte'
|
||||||
' string as its first argument. Use unicode_to_xml for'
|
' string as its first argument. Use unicode_to_xml for'
|
||||||
' unicode strings'))
|
' unicode strings')
|
||||||
|
|
||||||
# Decode the string into unicode
|
# Decode the string into unicode
|
||||||
u_string = unicode(byte_string, input_encoding, errors)
|
u_string = unicode(byte_string, input_encoding, errors)
|
||||||
|
@ -892,7 +893,7 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
|
||||||
|
|
||||||
'''
|
'''
|
||||||
# Unicode strings can just be run through unicode_to_xml()
|
# Unicode strings can just be run through unicode_to_xml()
|
||||||
if isinstance(string, unicode):
|
if isunicodestring(string):
|
||||||
return unicode_to_xml(string, encoding=output_encoding,
|
return unicode_to_xml(string, encoding=output_encoding,
|
||||||
attrib=attrib, control_chars=control_chars)
|
attrib=attrib, control_chars=control_chars)
|
||||||
|
|
||||||
|
@ -907,8 +908,8 @@ def guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
|
||||||
def to_xml(string, encoding='utf-8', attrib=False, control_chars='ignore'):
|
def to_xml(string, encoding='utf-8', attrib=False, control_chars='ignore'):
|
||||||
'''*Deprecated*: Use :func:`guess_encoding_to_xml` instead
|
'''*Deprecated*: Use :func:`guess_encoding_to_xml` instead
|
||||||
'''
|
'''
|
||||||
warnings.warn(k.b_('kitchen.text.converters.to_xml is deprecated. Use'
|
warnings.warn('kitchen.text.converters.to_xml is deprecated. Use'
|
||||||
' kitchen.text.converters.guess_encoding_to_xml instead.'),
|
' kitchen.text.converters.guess_encoding_to_xml instead.',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
return guess_encoding_to_xml(string, output_encoding=encoding,
|
return guess_encoding_to_xml(string, output_encoding=encoding,
|
||||||
attrib=attrib, control_chars=control_chars)
|
attrib=attrib, control_chars=control_chars)
|
|
@ -1,6 +1,6 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
# Copyright (c) 2010 Red Hat, Inc.
|
# Copyright (c) 2013 Red Hat, Inc.
|
||||||
# Copyright (c) 2010 Ville Skyttä
|
# Copyright (c) 2010 Ville Skyttä
|
||||||
# Copyright (c) 2009 Tim Lauridsen
|
# Copyright (c) 2009 Tim Lauridsen
|
||||||
# Copyright (c) 2007 Marcus Kuhn
|
# Copyright (c) 2007 Marcus Kuhn
|
||||||
|
@ -39,7 +39,6 @@ have the same width so we need helper functions for displaying them.
|
||||||
import itertools
|
import itertools
|
||||||
import unicodedata
|
import unicodedata
|
||||||
|
|
||||||
from kitchen import b_
|
|
||||||
from kitchen.text.converters import to_unicode, to_bytes
|
from kitchen.text.converters import to_unicode, to_bytes
|
||||||
from kitchen.text.exceptions import ControlCharError
|
from kitchen.text.exceptions import ControlCharError
|
||||||
|
|
||||||
|
@ -101,7 +100,7 @@ def _interval_bisearch(value, table):
|
||||||
return False
|
return False
|
||||||
|
|
||||||
while maximum >= minimum:
|
while maximum >= minimum:
|
||||||
mid = (minimum + maximum) / 2
|
mid = divmod(minimum + maximum, 2)[0]
|
||||||
if value > table[mid][1]:
|
if value > table[mid][1]:
|
||||||
minimum = mid + 1
|
minimum = mid + 1
|
||||||
elif value < table[mid][0]:
|
elif value < table[mid][0]:
|
||||||
|
@ -115,62 +114,64 @@ _COMBINING = (
|
||||||
(0x300, 0x36f), (0x483, 0x489), (0x591, 0x5bd),
|
(0x300, 0x36f), (0x483, 0x489), (0x591, 0x5bd),
|
||||||
(0x5bf, 0x5bf), (0x5c1, 0x5c2), (0x5c4, 0x5c5),
|
(0x5bf, 0x5bf), (0x5c1, 0x5c2), (0x5c4, 0x5c5),
|
||||||
(0x5c7, 0x5c7), (0x600, 0x603), (0x610, 0x61a),
|
(0x5c7, 0x5c7), (0x600, 0x603), (0x610, 0x61a),
|
||||||
(0x64b, 0x65e), (0x670, 0x670), (0x6d6, 0x6e4),
|
(0x64b, 0x65f), (0x670, 0x670), (0x6d6, 0x6e4),
|
||||||
(0x6e7, 0x6e8), (0x6ea, 0x6ed), (0x70f, 0x70f),
|
(0x6e7, 0x6e8), (0x6ea, 0x6ed), (0x70f, 0x70f),
|
||||||
(0x711, 0x711), (0x730, 0x74a), (0x7a6, 0x7b0),
|
(0x711, 0x711), (0x730, 0x74a), (0x7a6, 0x7b0),
|
||||||
(0x7eb, 0x7f3), (0x816, 0x819), (0x81b, 0x823),
|
(0x7eb, 0x7f3), (0x816, 0x819), (0x81b, 0x823),
|
||||||
(0x825, 0x827), (0x829, 0x82d), (0x901, 0x902),
|
(0x825, 0x827), (0x829, 0x82d), (0x859, 0x85b),
|
||||||
(0x93c, 0x93c), (0x941, 0x948), (0x94d, 0x94d),
|
(0x901, 0x902), (0x93c, 0x93c), (0x941, 0x948),
|
||||||
(0x951, 0x954), (0x962, 0x963), (0x981, 0x981),
|
(0x94d, 0x94d), (0x951, 0x954), (0x962, 0x963),
|
||||||
(0x9bc, 0x9bc), (0x9c1, 0x9c4), (0x9cd, 0x9cd),
|
(0x981, 0x981), (0x9bc, 0x9bc), (0x9c1, 0x9c4),
|
||||||
(0x9e2, 0x9e3), (0xa01, 0xa02), (0xa3c, 0xa3c),
|
(0x9cd, 0x9cd), (0x9e2, 0x9e3), (0xa01, 0xa02),
|
||||||
(0xa41, 0xa42), (0xa47, 0xa48), (0xa4b, 0xa4d),
|
(0xa3c, 0xa3c), (0xa41, 0xa42), (0xa47, 0xa48),
|
||||||
(0xa70, 0xa71), (0xa81, 0xa82), (0xabc, 0xabc),
|
(0xa4b, 0xa4d), (0xa70, 0xa71), (0xa81, 0xa82),
|
||||||
(0xac1, 0xac5), (0xac7, 0xac8), (0xacd, 0xacd),
|
(0xabc, 0xabc), (0xac1, 0xac5), (0xac7, 0xac8),
|
||||||
(0xae2, 0xae3), (0xb01, 0xb01), (0xb3c, 0xb3c),
|
(0xacd, 0xacd), (0xae2, 0xae3), (0xb01, 0xb01),
|
||||||
(0xb3f, 0xb3f), (0xb41, 0xb43), (0xb4d, 0xb4d),
|
(0xb3c, 0xb3c), (0xb3f, 0xb3f), (0xb41, 0xb43),
|
||||||
(0xb56, 0xb56), (0xb82, 0xb82), (0xbc0, 0xbc0),
|
(0xb4d, 0xb4d), (0xb56, 0xb56), (0xb82, 0xb82),
|
||||||
(0xbcd, 0xbcd), (0xc3e, 0xc40), (0xc46, 0xc48),
|
(0xbc0, 0xbc0), (0xbcd, 0xbcd), (0xc3e, 0xc40),
|
||||||
(0xc4a, 0xc4d), (0xc55, 0xc56), (0xcbc, 0xcbc),
|
(0xc46, 0xc48), (0xc4a, 0xc4d), (0xc55, 0xc56),
|
||||||
(0xcbf, 0xcbf), (0xcc6, 0xcc6), (0xccc, 0xccd),
|
(0xcbc, 0xcbc), (0xcbf, 0xcbf), (0xcc6, 0xcc6),
|
||||||
(0xce2, 0xce3), (0xd41, 0xd43), (0xd4d, 0xd4d),
|
(0xccc, 0xccd), (0xce2, 0xce3), (0xd41, 0xd43),
|
||||||
(0xdca, 0xdca), (0xdd2, 0xdd4), (0xdd6, 0xdd6),
|
(0xd4d, 0xd4d), (0xdca, 0xdca), (0xdd2, 0xdd4),
|
||||||
(0xe31, 0xe31), (0xe34, 0xe3a), (0xe47, 0xe4e),
|
(0xdd6, 0xdd6), (0xe31, 0xe31), (0xe34, 0xe3a),
|
||||||
(0xeb1, 0xeb1), (0xeb4, 0xeb9), (0xebb, 0xebc),
|
(0xe47, 0xe4e), (0xeb1, 0xeb1), (0xeb4, 0xeb9),
|
||||||
(0xec8, 0xecd), (0xf18, 0xf19), (0xf35, 0xf35),
|
(0xebb, 0xebc), (0xec8, 0xecd), (0xf18, 0xf19),
|
||||||
(0xf37, 0xf37), (0xf39, 0xf39), (0xf71, 0xf7e),
|
(0xf35, 0xf35), (0xf37, 0xf37), (0xf39, 0xf39),
|
||||||
(0xf80, 0xf84), (0xf86, 0xf87), (0xf90, 0xf97),
|
(0xf71, 0xf7e), (0xf80, 0xf84), (0xf86, 0xf87),
|
||||||
(0xf99, 0xfbc), (0xfc6, 0xfc6), (0x102d, 0x1030),
|
(0xf90, 0xf97), (0xf99, 0xfbc), (0xfc6, 0xfc6),
|
||||||
(0x1032, 0x1032), (0x1036, 0x1037), (0x1039, 0x103a),
|
(0x102d, 0x1030), (0x1032, 0x1032), (0x1036, 0x1037),
|
||||||
(0x1058, 0x1059), (0x108d, 0x108d), (0x1160, 0x11ff),
|
(0x1039, 0x103a), (0x1058, 0x1059), (0x108d, 0x108d),
|
||||||
(0x135f, 0x135f), (0x1712, 0x1714), (0x1732, 0x1734),
|
(0x1160, 0x11ff), (0x135d, 0x135f), (0x1712, 0x1714),
|
||||||
(0x1752, 0x1753), (0x1772, 0x1773), (0x17b4, 0x17b5),
|
(0x1732, 0x1734), (0x1752, 0x1753), (0x1772, 0x1773),
|
||||||
(0x17b7, 0x17bd), (0x17c6, 0x17c6), (0x17c9, 0x17d3),
|
(0x17b4, 0x17b5), (0x17b7, 0x17bd), (0x17c6, 0x17c6),
|
||||||
(0x17dd, 0x17dd), (0x180b, 0x180d), (0x18a9, 0x18a9),
|
(0x17c9, 0x17d3), (0x17dd, 0x17dd), (0x180b, 0x180d),
|
||||||
(0x1920, 0x1922), (0x1927, 0x1928), (0x1932, 0x1932),
|
(0x18a9, 0x18a9), (0x1920, 0x1922), (0x1927, 0x1928),
|
||||||
(0x1939, 0x193b), (0x1a17, 0x1a18), (0x1a60, 0x1a60),
|
(0x1932, 0x1932), (0x1939, 0x193b), (0x1a17, 0x1a18),
|
||||||
(0x1a75, 0x1a7c), (0x1a7f, 0x1a7f), (0x1b00, 0x1b03),
|
(0x1a60, 0x1a60), (0x1a75, 0x1a7c), (0x1a7f, 0x1a7f),
|
||||||
(0x1b34, 0x1b34), (0x1b36, 0x1b3a), (0x1b3c, 0x1b3c),
|
(0x1b00, 0x1b03), (0x1b34, 0x1b34), (0x1b36, 0x1b3a),
|
||||||
(0x1b42, 0x1b42), (0x1b44, 0x1b44), (0x1b6b, 0x1b73),
|
(0x1b3c, 0x1b3c), (0x1b42, 0x1b42), (0x1b44, 0x1b44),
|
||||||
(0x1baa, 0x1baa), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
|
(0x1b6b, 0x1b73), (0x1baa, 0x1baa), (0x1be6, 0x1be6),
|
||||||
|
(0x1bf2, 0x1bf3), (0x1c37, 0x1c37), (0x1cd0, 0x1cd2),
|
||||||
(0x1cd4, 0x1ce0), (0x1ce2, 0x1ce8), (0x1ced, 0x1ced),
|
(0x1cd4, 0x1ce0), (0x1ce2, 0x1ce8), (0x1ced, 0x1ced),
|
||||||
(0x1dc0, 0x1de6), (0x1dfd, 0x1dff), (0x200b, 0x200f),
|
(0x1dc0, 0x1de6), (0x1dfc, 0x1dff), (0x200b, 0x200f),
|
||||||
(0x202a, 0x202e), (0x2060, 0x2063), (0x206a, 0x206f),
|
(0x202a, 0x202e), (0x2060, 0x2063), (0x206a, 0x206f),
|
||||||
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2de0, 0x2dff),
|
(0x20d0, 0x20f0), (0x2cef, 0x2cf1), (0x2d7f, 0x2d7f),
|
||||||
(0x302a, 0x302f), (0x3099, 0x309a), (0xa66f, 0xa66f),
|
(0x2de0, 0x2dff), (0x302a, 0x302f), (0x3099, 0x309a),
|
||||||
(0xa67c, 0xa67d), (0xa6f0, 0xa6f1), (0xa806, 0xa806),
|
(0xa66f, 0xa66f), (0xa67c, 0xa67d), (0xa6f0, 0xa6f1),
|
||||||
(0xa80b, 0xa80b), (0xa825, 0xa826), (0xa8c4, 0xa8c4),
|
(0xa806, 0xa806), (0xa80b, 0xa80b), (0xa825, 0xa826),
|
||||||
(0xa8e0, 0xa8f1), (0xa92b, 0xa92d), (0xa953, 0xa953),
|
(0xa8c4, 0xa8c4), (0xa8e0, 0xa8f1), (0xa92b, 0xa92d),
|
||||||
(0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0), (0xaab0, 0xaab0),
|
(0xa953, 0xa953), (0xa9b3, 0xa9b3), (0xa9c0, 0xa9c0),
|
||||||
(0xaab2, 0xaab4), (0xaab7, 0xaab8), (0xaabe, 0xaabf),
|
(0xaab0, 0xaab0), (0xaab2, 0xaab4), (0xaab7, 0xaab8),
|
||||||
(0xaac1, 0xaac1), (0xabed, 0xabed), (0xfb1e, 0xfb1e),
|
(0xaabe, 0xaabf), (0xaac1, 0xaac1), (0xabed, 0xabed),
|
||||||
(0xfe00, 0xfe0f), (0xfe20, 0xfe26), (0xfeff, 0xfeff),
|
(0xfb1e, 0xfb1e), (0xfe00, 0xfe0f), (0xfe20, 0xfe26),
|
||||||
(0xfff9, 0xfffb), (0x101fd, 0x101fd), (0x10a01, 0x10a03),
|
(0xfeff, 0xfeff), (0xfff9, 0xfffb), (0x101fd, 0x101fd),
|
||||||
(0x10a05, 0x10a06), (0x10a0c, 0x10a0f), (0x10a38, 0x10a3a),
|
(0x10a01, 0x10a03), (0x10a05, 0x10a06), (0x10a0c, 0x10a0f),
|
||||||
(0x10a3f, 0x10a3f), (0x110b9, 0x110ba), (0x1d165, 0x1d169),
|
(0x10a38, 0x10a3a), (0x10a3f, 0x10a3f), (0x11046, 0x11046),
|
||||||
(0x1d16d, 0x1d182), (0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad),
|
(0x110b9, 0x110ba), (0x1d165, 0x1d169), (0x1d16d, 0x1d182),
|
||||||
(0x1d242, 0x1d244), (0xe0001, 0xe0001), (0xe0020, 0xe007f),
|
(0x1d185, 0x1d18b), (0x1d1aa, 0x1d1ad), (0x1d242, 0x1d244),
|
||||||
(0xe0100, 0xe01ef), )
|
(0xe0001, 0xe0001), (0xe0020, 0xe007f), (0xe0100, 0xe01ef), )
|
||||||
|
|
||||||
'''
|
'''
|
||||||
Internal table, provided by this module to list :term:`code points` which
|
Internal table, provided by this module to list :term:`code points` which
|
||||||
combine with other characters and therefore should have no :term:`textual
|
combine with other characters and therefore should have no :term:`textual
|
||||||
|
@ -184,8 +185,8 @@ a combining character.
|
||||||
:func:`~kitchen.text.display._generate_combining_table`
|
:func:`~kitchen.text.display._generate_combining_table`
|
||||||
for how this table is generated
|
for how this table is generated
|
||||||
|
|
||||||
This table was last regenerated on python-2.7.0 with
|
This table was last regenerated on python-3.2.3 with
|
||||||
:data:`unicodedata.unidata_version` 5.1.0
|
:data:`unicodedata.unidata_version` 6.0.0
|
||||||
'''
|
'''
|
||||||
|
|
||||||
# New function from Toshio Kuratomi (LGPLv2+)
|
# New function from Toshio Kuratomi (LGPLv2+)
|
||||||
|
@ -341,8 +342,8 @@ def _ucp_width(ucs, control_chars='guess'):
|
||||||
if ucs < 32 or (ucs < 0xa0 and ucs >= 0x7f):
|
if ucs < 32 or (ucs < 0xa0 and ucs >= 0x7f):
|
||||||
# Control character detected
|
# Control character detected
|
||||||
if control_chars == 'strict':
|
if control_chars == 'strict':
|
||||||
raise ControlCharError(b_('_ucp_width does not understand how to'
|
raise ControlCharError('_ucp_width does not understand how to'
|
||||||
' assign a width value to control characters.'))
|
' assign a width value to control characters.')
|
||||||
if ucs in (0x08, 0x07F, 0x94):
|
if ucs in (0x08, 0x07F, 0x94):
|
||||||
# Backspace, delete, and clear delete remove a single character
|
# Backspace, delete, and clear delete remove a single character
|
||||||
return -1
|
return -1
|
||||||
|
@ -519,7 +520,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
|
||||||
# if current width is high,
|
# if current width is high,
|
||||||
if width > chop:
|
if width > chop:
|
||||||
# calculate new midpoint
|
# calculate new midpoint
|
||||||
mid = minimum + (eos - minimum) / 2
|
mid = minimum + (eos - minimum) // 2
|
||||||
if mid == eos:
|
if mid == eos:
|
||||||
break
|
break
|
||||||
if (eos - chop) < (eos - mid):
|
if (eos - chop) < (eos - mid):
|
||||||
|
@ -537,7 +538,7 @@ def textual_width_chop(msg, chop, encoding='utf-8', errors='replace'):
|
||||||
# short-circuit above means that we never use this branch.
|
# short-circuit above means that we never use this branch.
|
||||||
|
|
||||||
# calculate new midpoint
|
# calculate new midpoint
|
||||||
mid = eos + (maximum - eos) / 2
|
mid = eos + (maximum - eos) // 2
|
||||||
if mid == eos:
|
if mid == eos:
|
||||||
break
|
break
|
||||||
if (chop - eos) < (mid - eos):
|
if (chop - eos) < (mid - eos):
|
|
@ -1,5 +1,5 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
# Copyright (c) 2011 Red Hat, Inc
|
# Copyright (c) 2012 Red Hat, Inc
|
||||||
# Copyright (c) 2010 Seth Vidal
|
# Copyright (c) 2010 Seth Vidal
|
||||||
#
|
#
|
||||||
# kitchen is free software; you can redistribute it and/or
|
# kitchen is free software; you can redistribute it and/or
|
||||||
|
@ -27,6 +27,12 @@ Miscellaneous functions for manipulating text
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
|
|
||||||
Collection of text functions that don't fit in another category.
|
Collection of text functions that don't fit in another category.
|
||||||
|
|
||||||
|
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
|
||||||
|
Added :func:`~kitchen.text.misc.isbasestring`,
|
||||||
|
:func:`~kitchen.text.misc.isbytestring`, and
|
||||||
|
:func:`~kitchen.text.misc.isunicodestring` to help tell which string type
|
||||||
|
is which on python2 and python3
|
||||||
'''
|
'''
|
||||||
import htmlentitydefs
|
import htmlentitydefs
|
||||||
import itertools
|
import itertools
|
||||||
|
@ -37,9 +43,6 @@ try:
|
||||||
except ImportError:
|
except ImportError:
|
||||||
chardet = None
|
chardet = None
|
||||||
|
|
||||||
# We need to access b_() for localizing our strings but we'll end up with
|
|
||||||
# a circular import if we import it directly.
|
|
||||||
import kitchen as k
|
|
||||||
from kitchen.pycompat24 import sets
|
from kitchen.pycompat24 import sets
|
||||||
from kitchen.text.exceptions import ControlCharError
|
from kitchen.text.exceptions import ControlCharError
|
||||||
|
|
||||||
|
@ -49,13 +52,64 @@ sets.add_builtin_set()
|
||||||
# byte strings we're guessing about as latin1
|
# byte strings we're guessing about as latin1
|
||||||
_CHARDET_THRESHHOLD = 0.6
|
_CHARDET_THRESHHOLD = 0.6
|
||||||
|
|
||||||
# ASCII control codes that are illegal in xml 1.0
|
# ASCII control codes (the c0 codes) that are illegal in xml 1.0
|
||||||
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32))
|
# Also unicode control codes (the C1 codes): also illegal in xml
|
||||||
|
_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32) + range(128, 160))
|
||||||
_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
|
_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
|
||||||
|
_IGNORE_TABLE = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
|
||||||
|
_REPLACE_TABLE = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
|
||||||
|
|
||||||
# _ENTITY_RE
|
# _ENTITY_RE
|
||||||
_ENTITY_RE = re.compile(r'(?s)<[^>]*>|&#?\w+;')
|
_ENTITY_RE = re.compile(r'(?s)<[^>]*>|&#?\w+;')
|
||||||
|
|
||||||
|
def isbasestring(obj):
|
||||||
|
'''Determine if obj is a byte :class:`str` or :class:`unicode` string
|
||||||
|
|
||||||
|
In python2 this is eqiuvalent to isinstance(obj, basestring). In python3
|
||||||
|
it checks whether the object is an instance of str, bytes, or bytearray.
|
||||||
|
This is an aid to porting code that needed to test whether an object was
|
||||||
|
derived from basestring in python2 (commonly used in unicode-bytes
|
||||||
|
conversion functions)
|
||||||
|
|
||||||
|
:arg obj: Object to test
|
||||||
|
:returns: True if the object is a :class:`basestring`. Otherwise False.
|
||||||
|
|
||||||
|
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||||
|
'''
|
||||||
|
if isinstance(obj, basestring):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def isbytestring(obj):
|
||||||
|
'''Determine if obj is a byte :class:`str`
|
||||||
|
|
||||||
|
In python2 this is equivalent to isinstance(obj, str). In python3 it
|
||||||
|
checks whether the object is an instance of bytes or bytearray.
|
||||||
|
|
||||||
|
:arg obj: Object to test
|
||||||
|
:returns: True if the object is a byte :class:`str`. Otherwise, False.
|
||||||
|
|
||||||
|
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||||
|
'''
|
||||||
|
if isinstance(obj, str):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def isunicodestring(obj):
|
||||||
|
'''Determine if obj is a :class:`unicode` string
|
||||||
|
|
||||||
|
In python2 this is equivalent to isinstance(obj, unicode). In python3 it
|
||||||
|
checks whether the object is an instance of :class:`str`.
|
||||||
|
|
||||||
|
:arg obj: Object to test
|
||||||
|
:returns: True if the object is a :class:`unicode` string. Otherwise, False.
|
||||||
|
|
||||||
|
.. versionadded:: Kitchen: 1.2.0, API kitchen.text 2.2.0
|
||||||
|
'''
|
||||||
|
if isinstance(obj, unicode):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
def guess_encoding(byte_string, disable_chardet=False):
|
def guess_encoding(byte_string, disable_chardet=False):
|
||||||
'''Try to guess the encoding of a byte :class:`str`
|
'''Try to guess the encoding of a byte :class:`str`
|
||||||
|
|
||||||
|
@ -79,8 +133,8 @@ def guess_encoding(byte_string, disable_chardet=False):
|
||||||
to every byte, decoding from ``latin-1`` to :class:`unicode` will not
|
to every byte, decoding from ``latin-1`` to :class:`unicode` will not
|
||||||
cause :exc:`UnicodeErrors` although the output might be mangled.
|
cause :exc:`UnicodeErrors` although the output might be mangled.
|
||||||
'''
|
'''
|
||||||
if not isinstance(byte_string, str):
|
if not isbytestring(byte_string):
|
||||||
raise TypeError(k.b_('byte_string must be a byte string (str)'))
|
raise TypeError('first argument must be a byte string (str)')
|
||||||
input_encoding = 'utf-8'
|
input_encoding = 'utf-8'
|
||||||
try:
|
try:
|
||||||
unicode(byte_string, input_encoding, 'strict')
|
unicode(byte_string, input_encoding, 'strict')
|
||||||
|
@ -98,7 +152,7 @@ def guess_encoding(byte_string, disable_chardet=False):
|
||||||
return input_encoding
|
return input_encoding
|
||||||
|
|
||||||
def str_eq(str1, str2, encoding='utf-8', errors='replace'):
|
def str_eq(str1, str2, encoding='utf-8', errors='replace'):
|
||||||
'''Compare two stringsi, converting to byte :class:`str` if one is
|
'''Compare two strings, converting to byte :class:`str` if one is
|
||||||
:class:`unicode`
|
:class:`unicode`
|
||||||
|
|
||||||
:arg str1: First string to compare
|
:arg str1: First string to compare
|
||||||
|
@ -135,7 +189,7 @@ def str_eq(str1, str2, encoding='utf-8', errors='replace'):
|
||||||
except UnicodeError:
|
except UnicodeError:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
if isinstance(str1, unicode):
|
if isunicodestring(str1):
|
||||||
str1 = str1.encode(encoding, errors)
|
str1 = str1.encode(encoding, errors)
|
||||||
else:
|
else:
|
||||||
str2 = str2.encode(encoding, errors)
|
str2 = str2.encode(encoding, errors)
|
||||||
|
@ -166,26 +220,30 @@ def process_control_chars(string, strategy='replace'):
|
||||||
:attr:`string`
|
:attr:`string`
|
||||||
:returns: :class:`unicode` string with no :term:`control characters` in
|
:returns: :class:`unicode` string with no :term:`control characters` in
|
||||||
it.
|
it.
|
||||||
'''
|
|
||||||
if not isinstance(string, unicode):
|
|
||||||
raise TypeError(k.b_('process_control_char must have a unicode type as'
|
|
||||||
' the first argument.'))
|
|
||||||
if strategy == 'ignore':
|
|
||||||
control_table = dict(zip(_CONTROL_CODES, [None] * len(_CONTROL_CODES)))
|
|
||||||
elif strategy == 'replace':
|
|
||||||
control_table = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
|
|
||||||
elif strategy == 'strict':
|
|
||||||
control_table = None
|
|
||||||
# Test that there are no control codes present
|
|
||||||
data = frozenset(string)
|
|
||||||
if [c for c in _CONTROL_CHARS if c in data]:
|
|
||||||
raise ControlCharError(k.b_('ASCII control code present in string'
|
|
||||||
' input'))
|
|
||||||
else:
|
|
||||||
raise ValueError(k.b_('The strategy argument to process_control_chars'
|
|
||||||
' must be one of ignore, replace, or strict'))
|
|
||||||
|
|
||||||
if control_table:
|
.. versionchanged:: kitchen 1.2.0, API: kitchen.text 2.2.0
|
||||||
|
Strip out the C1 control characters in addition to the C0 control
|
||||||
|
characters.
|
||||||
|
'''
|
||||||
|
if not isunicodestring(string):
|
||||||
|
raise TypeError('process_control_char must have a unicode type as'
|
||||||
|
' the first argument.')
|
||||||
|
if strategy not in ('replace', 'ignore', 'strict'):
|
||||||
|
raise ValueError('The strategy argument to process_control_chars'
|
||||||
|
' must be one of ignore, replace, or strict')
|
||||||
|
|
||||||
|
# Most strings don't have control chars and translating carries
|
||||||
|
# a higher cost than testing whether the chars are in the string
|
||||||
|
# So only translate if necessary
|
||||||
|
if not _CONTROL_CHARS.isdisjoint(string):
|
||||||
|
if strategy == 'replace':
|
||||||
|
control_table = _REPLACE_TABLE
|
||||||
|
elif strategy == 'ignore':
|
||||||
|
control_table = _IGNORE_TABLE
|
||||||
|
else:
|
||||||
|
# strategy can only equal 'strict'
|
||||||
|
raise ControlCharError('ASCII control code present in string'
|
||||||
|
' input')
|
||||||
string = string.translate(control_table)
|
string = string.translate(control_table)
|
||||||
|
|
||||||
return string
|
return string
|
||||||
|
@ -237,9 +295,9 @@ def html_entities_unescape(string):
|
||||||
return unicode(entity, "iso-8859-1")
|
return unicode(entity, "iso-8859-1")
|
||||||
return string # leave as is
|
return string # leave as is
|
||||||
|
|
||||||
if not isinstance(string, unicode):
|
if not isunicodestring(string):
|
||||||
raise TypeError(k.b_('html_entities_unescape must have a unicode type'
|
raise TypeError('html_entities_unescape must have a unicode type'
|
||||||
' for its first argument'))
|
' for its first argument')
|
||||||
return re.sub(_ENTITY_RE, fixup, string)
|
return re.sub(_ENTITY_RE, fixup, string)
|
||||||
|
|
||||||
def byte_string_valid_xml(byte_string, encoding='utf-8'):
|
def byte_string_valid_xml(byte_string, encoding='utf-8'):
|
||||||
|
@ -264,7 +322,7 @@ def byte_string_valid_xml(byte_string, encoding='utf-8'):
|
||||||
processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
|
processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
|
||||||
output_xml(processed_array)
|
output_xml(processed_array)
|
||||||
'''
|
'''
|
||||||
if not isinstance(byte_string, str):
|
if not isbytestring(byte_string):
|
||||||
# Not a byte string
|
# Not a byte string
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
@ -309,5 +367,5 @@ def byte_string_valid_encoding(byte_string, encoding='utf-8'):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
__all__ = ('byte_string_valid_encoding', 'byte_string_valid_xml',
|
__all__ = ('byte_string_valid_encoding', 'byte_string_valid_xml',
|
||||||
'guess_encoding', 'html_entities_unescape', 'process_control_chars',
|
'guess_encoding', 'html_entities_unescape', 'isbasestring',
|
||||||
'str_eq')
|
'isbytestring', 'isunicodestring', 'process_control_chars', 'str_eq')
|
|
@ -1,6 +1,6 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
# Copyright (c) 2011 Red Hat, Inc.
|
# Copyright (c) 2012 Red Hat, Inc.
|
||||||
# Copyright (c) 2010 Ville Skyttä
|
# Copyright (c) 2010 Ville Skyttä
|
||||||
# Copyright (c) 2009 Tim Lauridsen
|
# Copyright (c) 2009 Tim Lauridsen
|
||||||
# Copyright (c) 2007 Marcus Kuhn
|
# Copyright (c) 2007 Marcus Kuhn
|
||||||
|
@ -50,9 +50,8 @@ Functions for operating on byte :class:`str` encoded as :term:`UTF-8`
|
||||||
'''
|
'''
|
||||||
import warnings
|
import warnings
|
||||||
|
|
||||||
from kitchen import b_
|
|
||||||
from kitchen.text.converters import to_unicode, to_bytes
|
from kitchen.text.converters import to_unicode, to_bytes
|
||||||
from kitchen.text.misc import byte_string_valid_encoding
|
from kitchen.text.misc import byte_string_valid_encoding, isunicodestring
|
||||||
from kitchen.text.display import _textual_width_le, \
|
from kitchen.text.display import _textual_width_le, \
|
||||||
byte_string_textual_width_fill, fill, textual_width, \
|
byte_string_textual_width_fill, fill, textual_width, \
|
||||||
textual_width_chop, wrap
|
textual_width_chop, wrap
|
||||||
|
@ -66,8 +65,8 @@ def utf8_valid(msg):
|
||||||
|
|
||||||
Use :func:`kitchen.text.misc.byte_string_valid_encoding` instead.
|
Use :func:`kitchen.text.misc.byte_string_valid_encoding` instead.
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_valid is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_valid is deprecated. Use'
|
||||||
' kitchen.text.misc.byte_string_valid_encoding(msg) instead'),
|
' kitchen.text.misc.byte_string_valid_encoding(msg) instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
return byte_string_valid_encoding(msg)
|
return byte_string_valid_encoding(msg)
|
||||||
|
|
||||||
|
@ -76,8 +75,8 @@ def utf8_width(msg):
|
||||||
|
|
||||||
Use :func:`kitchen.text.display.textual_width` instead.
|
Use :func:`kitchen.text.display.textual_width` instead.
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_width is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_width is deprecated. Use'
|
||||||
' kitchen.text.display.textual_width(msg) instead'),
|
' kitchen.text.display.textual_width(msg) instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
return textual_width(msg)
|
return textual_width(msg)
|
||||||
|
|
||||||
|
@ -98,14 +97,14 @@ def utf8_width_chop(msg, chop=None):
|
||||||
>>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
|
>>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
|
||||||
(5, 'く ku')
|
(5, 'く ku')
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_width_chop is deprecated. Use'
|
||||||
' kitchen.text.display.textual_width_chop instead'), DeprecationWarning,
|
' kitchen.text.display.textual_width_chop instead', DeprecationWarning,
|
||||||
stacklevel=2)
|
stacklevel=2)
|
||||||
|
|
||||||
if chop == None:
|
if chop == None:
|
||||||
return textual_width(msg), msg
|
return textual_width(msg), msg
|
||||||
|
|
||||||
as_bytes = not isinstance(msg, unicode)
|
as_bytes = not isunicodestring(msg)
|
||||||
|
|
||||||
chopped_msg = textual_width_chop(msg, chop)
|
chopped_msg = textual_width_chop(msg, chop)
|
||||||
if as_bytes:
|
if as_bytes:
|
||||||
|
@ -117,8 +116,8 @@ def utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix=''):
|
||||||
|
|
||||||
Use :func:`~kitchen.text.display.byte_string_textual_width_fill` instead
|
Use :func:`~kitchen.text.display.byte_string_textual_width_fill` instead
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_width_fill is deprecated. Use'
|
||||||
' kitchen.text.display.byte_string_textual_width_fill instead'),
|
' kitchen.text.display.byte_string_textual_width_fill instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
|
|
||||||
return byte_string_textual_width_fill(msg, fill, chop=chop, left=left,
|
return byte_string_textual_width_fill(msg, fill, chop=chop, left=left,
|
||||||
|
@ -130,11 +129,11 @@ def utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent=''):
|
||||||
|
|
||||||
Use :func:`kitchen.text.display.wrap` instead
|
Use :func:`kitchen.text.display.wrap` instead
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_text_wrap is deprecated. Use'
|
||||||
' kitchen.text.display.wrap instead'),
|
' kitchen.text.display.wrap instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
|
|
||||||
as_bytes = not isinstance(text, unicode)
|
as_bytes = not isunicodestring(text)
|
||||||
|
|
||||||
text = to_unicode(text)
|
text = to_unicode(text)
|
||||||
lines = wrap(text, width=width, initial_indent=initial_indent,
|
lines = wrap(text, width=width, initial_indent=initial_indent,
|
||||||
|
@ -150,8 +149,8 @@ def utf8_text_fill(text, *args, **kwargs):
|
||||||
|
|
||||||
Use :func:`kitchen.text.display.fill` instead.
|
Use :func:`kitchen.text.display.fill` instead.
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
|
warnings.warn('kitchen.text.utf8.utf8_text_fill is deprecated. Use'
|
||||||
' kitchen.text.display.fill instead'),
|
' kitchen.text.display.fill instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
# This assumes that all args. are utf8.
|
# This assumes that all args. are utf8.
|
||||||
return fill(text, *args, **kwargs)
|
return fill(text, *args, **kwargs)
|
||||||
|
@ -160,8 +159,8 @@ def _utf8_width_le(width, *args):
|
||||||
'''**Deprecated** Convert the arguments to unicode and use
|
'''**Deprecated** Convert the arguments to unicode and use
|
||||||
:func:`kitchen.text.display._textual_width_le` instead.
|
:func:`kitchen.text.display._textual_width_le` instead.
|
||||||
'''
|
'''
|
||||||
warnings.warn(b_('kitchen.text.utf8._utf8_width_le is deprecated. Use'
|
warnings.warn('kitchen.text.utf8._utf8_width_le is deprecated. Use'
|
||||||
' kitchen.text.display._textual_width_le instead'),
|
' kitchen.text.display._textual_width_le instead',
|
||||||
DeprecationWarning, stacklevel=2)
|
DeprecationWarning, stacklevel=2)
|
||||||
# This assumes that all args. are utf8.
|
# This assumes that all args. are utf8.
|
||||||
return _textual_width_le(width, to_unicode(''.join(args)))
|
return _textual_width_le(width, to_unicode(''.join(args)))
|
|
@ -89,10 +89,10 @@ def version_tuple_to_string(version_info):
|
||||||
if isinstance(values[0], int):
|
if isinstance(values[0], int):
|
||||||
ver_components.append('.'.join(itertools.imap(str, values)))
|
ver_components.append('.'.join(itertools.imap(str, values)))
|
||||||
else:
|
else:
|
||||||
|
modifier = values[0]
|
||||||
if isinstance(values[0], unicode):
|
if isinstance(values[0], unicode):
|
||||||
modifier = values[0].encode('ascii')
|
modifier = values[0].encode('ascii')
|
||||||
else:
|
|
||||||
modifier = values[0]
|
|
||||||
if modifier in ('a', 'b', 'c', 'rc'):
|
if modifier in ('a', 'b', 'c', 'rc'):
|
||||||
ver_components.append('%s%s' % (modifier,
|
ver_components.append('%s%s' % (modifier,
|
||||||
'.'.join(itertools.imap(str, values[1:])) or '0'))
|
'.'.join(itertools.imap(str, values[1:])) or '0'))
|
|
@ -9,6 +9,8 @@ from kitchen.text.converters import to_bytes
|
||||||
from kitchen.text import misc
|
from kitchen.text import misc
|
||||||
|
|
||||||
class UnicodeTestData(object):
|
class UnicodeTestData(object):
|
||||||
|
u_empty_string = u''
|
||||||
|
b_empty_string = ''
|
||||||
# This should encode fine -- sanity check
|
# This should encode fine -- sanity check
|
||||||
u_ascii = u'the quick brown fox jumped over the lazy dog'
|
u_ascii = u'the quick brown fox jumped over the lazy dog'
|
||||||
b_ascii = 'the quick brown fox jumped over the lazy dog'
|
b_ascii = 'the quick brown fox jumped over the lazy dog'
|
||||||
|
@ -16,7 +18,7 @@ class UnicodeTestData(object):
|
||||||
# First challenge -- what happens with latin-1 characters
|
# First challenge -- what happens with latin-1 characters
|
||||||
u_spanish = u'El veloz murciélago saltó sobre el perro perezoso.'
|
u_spanish = u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||||
# utf8 and latin1 both support these chars so no mangling
|
# utf8 and latin1 both support these chars so no mangling
|
||||||
utf8_spanish = u_spanish.encode('utf8')
|
utf8_spanish = u_spanish.encode('utf-8')
|
||||||
latin1_spanish = u_spanish.encode('latin1')
|
latin1_spanish = u_spanish.encode('latin1')
|
||||||
|
|
||||||
# ASCII does not have the accented characters so it mangles
|
# ASCII does not have the accented characters so it mangles
|
||||||
|
@ -62,7 +64,8 @@ class UnicodeTestData(object):
|
||||||
u_entity_escape = u'Test: <"&"> – ' + unicode(u_japanese.encode('ascii', 'xmlcharrefreplace'), 'ascii') + u'é'
|
u_entity_escape = u'Test: <"&"> – ' + unicode(u_japanese.encode('ascii', 'xmlcharrefreplace'), 'ascii') + u'é'
|
||||||
utf8_entity_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
utf8_entity_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
||||||
utf8_attrib_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
utf8_attrib_escape = 'Test: <"&"> – 速い茶色のキツネが怠惰な犬に\'増é'
|
||||||
ascii_entity_escape = (u'Test: <"&"> – ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace').replace('&', '&',1).replace('<', '<').replace('>', '>')
|
ascii_entity_escape = ('Test: <"&"> '.replace('&', '&',1).replace('<', '<').replace('>', '>')) + (u'– ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
|
||||||
|
ascii_attrib_escape = ('Test: <"&"> '.replace('&', '&',1).replace('<', '<').replace('>', '>').replace('"', '"')) + (u'– ' + u_japanese + u'é').encode('ascii', 'xmlcharrefreplace')
|
||||||
|
|
||||||
b_byte_chars = ' '.join(map(chr, range(0, 256)))
|
b_byte_chars = ' '.join(map(chr, range(0, 256)))
|
||||||
b_byte_encoded = 'ACABIAIgAyAEIAUgBiAHIAggCSAKIAsgDCANIA4gDyAQIBEgEiATIBQgFSAWIBcgGCAZIBogGyAcIB0gHiAfICAgISAiICMgJCAlICYgJyAoICkgKiArICwgLSAuIC8gMCAxIDIgMyA0IDUgNiA3IDggOSA6IDsgPCA9ID4gPyBAIEEgQiBDIEQgRSBGIEcgSCBJIEogSyBMIE0gTiBPIFAgUSBSIFMgVCBVIFYgVyBYIFkgWiBbIFwgXSBeIF8gYCBhIGIgYyBkIGUgZiBnIGggaSBqIGsgbCBtIG4gbyBwIHEgciBzIHQgdSB2IHcgeCB5IHogeyB8IH0gfiB/IIAggSCCIIMghCCFIIYghyCIIIkgiiCLIIwgjSCOII8gkCCRIJIgkyCUIJUgliCXIJggmSCaIJsgnCCdIJ4gnyCgIKEgoiCjIKQgpSCmIKcgqCCpIKogqyCsIK0griCvILAgsSCyILMgtCC1ILYgtyC4ILkguiC7ILwgvSC+IL8gwCDBIMIgwyDEIMUgxiDHIMggySDKIMsgzCDNIM4gzyDQINEg0iDTINQg1SDWINcg2CDZINog2yDcIN0g3iDfIOAg4SDiIOMg5CDlIOYg5yDoIOkg6iDrIOwg7SDuIO8g8CDxIPIg8yD0IPUg9iD3IPgg+SD6IPsg/CD9IP4g/w=='
|
b_byte_encoded = 'ACABIAIgAyAEIAUgBiAHIAggCSAKIAsgDCANIA4gDyAQIBEgEiATIBQgFSAWIBcgGCAZIBogGyAcIB0gHiAfICAgISAiICMgJCAlICYgJyAoICkgKiArICwgLSAuIC8gMCAxIDIgMyA0IDUgNiA3IDggOSA6IDsgPCA9ID4gPyBAIEEgQiBDIEQgRSBGIEcgSCBJIEogSyBMIE0gTiBPIFAgUSBSIFMgVCBVIFYgVyBYIFkgWiBbIFwgXSBeIF8gYCBhIGIgYyBkIGUgZiBnIGggaSBqIGsgbCBtIG4gbyBwIHEgciBzIHQgdSB2IHcgeCB5IHogeyB8IH0gfiB/IIAggSCCIIMghCCFIIYghyCIIIkgiiCLIIwgjSCOII8gkCCRIJIgkyCUIJUgliCXIJggmSCaIJsgnCCdIJ4gnyCgIKEgoiCjIKQgpSCmIKcgqCCpIKogqyCsIK0griCvILAgsSCyILMgtCC1ILYgtyC4ILkguiC7ILwgvSC+IL8gwCDBIMIgwyDEIMUgxiDHIMggySDKIMsgzCDNIM4gzyDQINEg0iDTINQg1SDWINcg2CDZINog2yDcIN0g3iDfIOAg4SDiIOMg5CDlIOYg5yDoIOkg6iDrIOwg7SDuIO8g8CDxIPIg8yD0IPUg9iD3IPgg+SD6IPsg/CD9IP4g/w=='
|
||||||
|
@ -127,3 +130,48 @@ u' * A powerful unrepr mode for storing basic datatypes']
|
||||||
u_ascii_no_ctrl = u''.join([c for c in u_ascii_chars if ord(c) not in misc._CONTROL_CODES])
|
u_ascii_no_ctrl = u''.join([c for c in u_ascii_chars if ord(c) not in misc._CONTROL_CODES])
|
||||||
u_ascii_ctrl_replace = u_ascii_chars.translate(dict([(c, u'?') for c in misc._CONTROL_CODES]))
|
u_ascii_ctrl_replace = u_ascii_chars.translate(dict([(c, u'?') for c in misc._CONTROL_CODES]))
|
||||||
utf8_ascii_chars = u_ascii_chars.encode('utf8')
|
utf8_ascii_chars = u_ascii_chars.encode('utf8')
|
||||||
|
|
||||||
|
# These are present in the test catalog as msgids or values
|
||||||
|
u_lemon = u'1 lemon'
|
||||||
|
utf8_lemon = u_lemon.encode('utf-8')
|
||||||
|
latin1_lemon = u_lemon.encode('latin-1')
|
||||||
|
|
||||||
|
u_lemons = u'4 lemons'
|
||||||
|
utf8_lemons = u_lemons.encode('utf-8')
|
||||||
|
latin1_lemons = u_lemons.encode('latin-1')
|
||||||
|
|
||||||
|
u_limao = u'一 limão'
|
||||||
|
utf8_limao = u_limao.encode('utf-8')
|
||||||
|
latin1_limao = u_limao.encode('latin-1', 'replace')
|
||||||
|
|
||||||
|
u_limoes = u'四 limões'
|
||||||
|
utf8_limoes = u_limoes.encode('utf-8')
|
||||||
|
latin1_limoes = u_limoes.encode('latin-1', 'replace')
|
||||||
|
|
||||||
|
u_not_in_catalog = u'café not matched in catalogs'
|
||||||
|
utf8_not_in_catalog = u_not_in_catalog.encode('utf-8')
|
||||||
|
latin1_not_in_catalog = u_not_in_catalog.encode('latin-1')
|
||||||
|
|
||||||
|
u_kitchen = u'kitchen sink'
|
||||||
|
utf8_kitchen = u_kitchen.encode('utf-8')
|
||||||
|
latin1_kitchen = u_kitchen.encode('latin-1')
|
||||||
|
|
||||||
|
u_pt_kitchen = u'pia da cozinha'
|
||||||
|
utf8_pt_kitchen = u_pt_kitchen.encode('utf-8')
|
||||||
|
latin1_pt_kitchen = u_pt_kitchen.encode('latin-1')
|
||||||
|
|
||||||
|
u_kuratomi = u'Kuratomi'
|
||||||
|
utf8_kuratomi = u_kuratomi.encode('utf-8')
|
||||||
|
latin1_kuratomi = u_kuratomi.encode('latin-1')
|
||||||
|
|
||||||
|
u_ja_kuratomi = u'くらとみ'
|
||||||
|
utf8_ja_kuratomi = u_ja_kuratomi.encode('utf-8')
|
||||||
|
latin1_ja_kuratomi = u_ja_kuratomi.encode('latin-1', 'replace')
|
||||||
|
|
||||||
|
u_in_fallback = u'Only café in fallback'
|
||||||
|
utf8_in_fallback = u_in_fallback.encode('utf-8')
|
||||||
|
latin1_in_fallback = u_in_fallback.encode('latin-1')
|
||||||
|
|
||||||
|
u_yes_in_fallback = u'Yes, only café in fallback'
|
||||||
|
utf8_yes_in_fallback = u_yes_in_fallback.encode('utf-8')
|
||||||
|
latin1_yes_in_fallback = u_yes_in_fallback.encode('latin-1')
|
|
@ -29,6 +29,9 @@ class Test__all__(object):
|
||||||
('kitchen', 'i18n', 'to_unicode'),
|
('kitchen', 'i18n', 'to_unicode'),
|
||||||
('kitchen', 'i18n', 'ENOENT'),
|
('kitchen', 'i18n', 'ENOENT'),
|
||||||
('kitchen', 'i18n', 'byte_string_valid_encoding'),
|
('kitchen', 'i18n', 'byte_string_valid_encoding'),
|
||||||
|
('kitchen', 'i18n', 'isbasestring'),
|
||||||
|
('kitchen', 'i18n', 'partial'),
|
||||||
|
('kitchen', 'iterutils', 'isbasestring'),
|
||||||
('kitchen', 'iterutils', 'version_tuple_to_string'),
|
('kitchen', 'iterutils', 'version_tuple_to_string'),
|
||||||
('kitchen', 'pycompat24', 'version_tuple_to_string'),
|
('kitchen', 'pycompat24', 'version_tuple_to_string'),
|
||||||
('kitchen', 'pycompat25', 'version_tuple_to_string'),
|
('kitchen', 'pycompat25', 'version_tuple_to_string'),
|
||||||
|
@ -44,6 +47,8 @@ class Test__all__(object):
|
||||||
('kitchen.text', 'converters', 'ControlCharError'),
|
('kitchen.text', 'converters', 'ControlCharError'),
|
||||||
('kitchen.text', 'converters', 'guess_encoding'),
|
('kitchen.text', 'converters', 'guess_encoding'),
|
||||||
('kitchen.text', 'converters', 'html_entities_unescape'),
|
('kitchen.text', 'converters', 'html_entities_unescape'),
|
||||||
|
('kitchen.text', 'converters', 'isbytestring'),
|
||||||
|
('kitchen.text', 'converters', 'isunicodestring'),
|
||||||
('kitchen.text', 'converters', 'process_control_chars'),
|
('kitchen.text', 'converters', 'process_control_chars'),
|
||||||
('kitchen.text', 'converters', 'XmlEncodeError'),
|
('kitchen.text', 'converters', 'XmlEncodeError'),
|
||||||
('kitchen.text', 'misc', 'b_'),
|
('kitchen.text', 'misc', 'b_'),
|
||||||
|
@ -57,6 +62,7 @@ class Test__all__(object):
|
||||||
('kitchen.text', 'utf8', 'byte_string_textual_width_fill'),
|
('kitchen.text', 'utf8', 'byte_string_textual_width_fill'),
|
||||||
('kitchen.text', 'utf8', 'byte_string_valid_encoding'),
|
('kitchen.text', 'utf8', 'byte_string_valid_encoding'),
|
||||||
('kitchen.text', 'utf8', 'fill'),
|
('kitchen.text', 'utf8', 'fill'),
|
||||||
|
('kitchen.text', 'utf8', 'isunicodestring'),
|
||||||
('kitchen.text', 'utf8', 'textual_width'),
|
('kitchen.text', 'utf8', 'textual_width'),
|
||||||
('kitchen.text', 'utf8', 'textual_width_chop'),
|
('kitchen.text', 'utf8', 'textual_width_chop'),
|
||||||
('kitchen.text', 'utf8', 'to_bytes'),
|
('kitchen.text', 'utf8', 'to_bytes'),
|
|
@ -1,5 +1,4 @@
|
||||||
import unittest
|
import unittest
|
||||||
from test import test_support
|
|
||||||
from kitchen.pycompat24.base64 import _base64 as base64
|
from kitchen.pycompat24.base64 import _base64 as base64
|
||||||
|
|
||||||
|
|
||||||
|
@ -183,6 +182,7 @@ class BaseXYTestCase(unittest.TestCase):
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#from test import test_support
|
||||||
#def test_main():
|
#def test_main():
|
||||||
# test_support.run_unittest(__name__)
|
# test_support.run_unittest(__name__)
|
||||||
#
|
#
|
|
@ -13,16 +13,16 @@ def test_strict_dict_get_set():
|
||||||
d = collections.StrictDict()
|
d = collections.StrictDict()
|
||||||
d[u'a'] = 1
|
d[u'a'] = 1
|
||||||
d['a'] = 2
|
d['a'] = 2
|
||||||
tools.ok_(d[u'a'] != d['a'])
|
tools.assert_not_equal(d[u'a'], d['a'])
|
||||||
tools.ok_(len(d) == 2)
|
tools.eq_(len(d), 2)
|
||||||
|
|
||||||
d[u'\xf1'] = 1
|
d[u'\xf1'] = 1
|
||||||
d['\xf1'] = 2
|
d['\xf1'] = 2
|
||||||
d[u'\xf1'.encode('utf8')] = 3
|
d[u'\xf1'.encode('utf-8')] = 3
|
||||||
tools.ok_(d[u'\xf1'] == 1)
|
tools.eq_(d[u'\xf1'], 1)
|
||||||
tools.ok_(d['\xf1'] == 2)
|
tools.eq_(d['\xf1'], 2)
|
||||||
tools.ok_(d[u'\xf1'.encode('utf8')] == 3)
|
tools.eq_(d[u'\xf1'.encode('utf-8')], 3)
|
||||||
tools.ok_(len(d) == 5)
|
tools.eq_(len(d), 5)
|
||||||
|
|
||||||
class TestStrictDict(unittest.TestCase):
|
class TestStrictDict(unittest.TestCase):
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
|
@ -32,15 +32,14 @@ class TestStrictDict(unittest.TestCase):
|
||||||
self.d[u'\xf1'] = 1
|
self.d[u'\xf1'] = 1
|
||||||
self.d['\xf1'] = 2
|
self.d['\xf1'] = 2
|
||||||
self.d[u'\xf1'.encode('utf8')] = 3
|
self.d[u'\xf1'.encode('utf8')] = 3
|
||||||
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf8')]
|
self.keys = [u'a', 'a', u'\xf1', '\xf1', u'\xf1'.encode('utf-8')]
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
del(self.d)
|
del(self.d)
|
||||||
|
|
||||||
def _compare_lists(self, list1, list2, debug=False):
|
def _compare_lists(self, list1, list2, debug=False):
|
||||||
'''We have a mixture of bytes and unicode and need python2.3 compat
|
'''We have a mixture of bytes and unicode so we have to compare these
|
||||||
|
lists manually and inefficiently
|
||||||
So we have to compare these lists manually and inefficiently
|
|
||||||
'''
|
'''
|
||||||
def _compare_lists_helper(compare_to, dupes, idx, length):
|
def _compare_lists_helper(compare_to, dupes, idx, length):
|
||||||
if i not in compare_to:
|
if i not in compare_to:
|
||||||
|
@ -57,11 +56,11 @@ class TestStrictDict(unittest.TestCase):
|
||||||
|
|
||||||
list1_u = [l for l in list1 if isinstance(l, unicode)]
|
list1_u = [l for l in list1 if isinstance(l, unicode)]
|
||||||
list1_b = [l for l in list1 if isinstance(l, str)]
|
list1_b = [l for l in list1 if isinstance(l, str)]
|
||||||
list1_o = [l for l in list1 if not (isinstance(l, unicode) or isinstance(l, str))]
|
list1_o = [l for l in list1 if not (isinstance(l, (unicode, bytes)))]
|
||||||
|
|
||||||
list2_u = [l for l in list2 if isinstance(l, unicode)]
|
list2_u = [l for l in list2 if isinstance(l, unicode)]
|
||||||
list2_b = [l for l in list2 if isinstance(l, str)]
|
list2_b = [l for l in list2 if isinstance(l, str)]
|
||||||
list2_o = [l for l in list2 if not (isinstance(l, unicode) or isinstance(l, str))]
|
list2_o = [l for l in list2 if not (isinstance(l, (unicode, bytes)))]
|
||||||
|
|
||||||
for i in list1:
|
for i in list1:
|
||||||
if isinstance(i, unicode):
|
if isinstance(i, unicode):
|
||||||
|
@ -109,34 +108,38 @@ class TestStrictDict(unittest.TestCase):
|
||||||
|
|
||||||
def test_strict_dict_len(self):
|
def test_strict_dict_len(self):
|
||||||
'''StrictDict len'''
|
'''StrictDict len'''
|
||||||
tools.ok_(len(self.d) == 5)
|
tools.eq_(len(self.d), 5)
|
||||||
|
|
||||||
def test_strict_dict_del(self):
|
def test_strict_dict_del(self):
|
||||||
'''StrictDict del'''
|
'''StrictDict del'''
|
||||||
tools.ok_(len(self.d) == 5)
|
tools.eq_(len(self.d), 5)
|
||||||
del(self.d[u'\xf1'])
|
del(self.d[u'\xf1'])
|
||||||
tools.assert_raises(KeyError, self.d.__getitem__, u'\xf1')
|
tools.assert_raises(KeyError, self.d.__getitem__, u'\xf1')
|
||||||
tools.ok_(len(self.d) == 4)
|
tools.eq_(len(self.d), 4)
|
||||||
|
|
||||||
def test_strict_dict_iter(self):
|
def test_strict_dict_iter(self):
|
||||||
'''StrictDict iteration'''
|
'''StrictDict iteration'''
|
||||||
keys = []
|
keys = []
|
||||||
for k in self.d:
|
for k in self.d:
|
||||||
keys.append(k)
|
keys.append(k)
|
||||||
tools.ok_(self._compare_lists(keys, self.keys))
|
tools.ok_(self._compare_lists(keys, self.keys),
|
||||||
|
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||||
|
|
||||||
keys = []
|
keys = []
|
||||||
for k in self.d.iterkeys():
|
for k in self.d.iterkeys():
|
||||||
keys.append(k)
|
keys.append(k)
|
||||||
tools.ok_(self._compare_lists(keys, self.keys))
|
tools.ok_(self._compare_lists(keys, self.keys),
|
||||||
|
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||||
|
|
||||||
keys = [k for k in self.d]
|
keys = [k for k in self.d]
|
||||||
tools.ok_(self._compare_lists(keys, self.keys))
|
tools.ok_(self._compare_lists(keys, self.keys),
|
||||||
|
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||||
|
|
||||||
keys = []
|
keys = []
|
||||||
for k in self.d.keys():
|
for k in self.d.keys():
|
||||||
keys.append(k)
|
keys.append(k)
|
||||||
tools.ok_(self._compare_lists(keys, self.keys))
|
tools.ok_(self._compare_lists(keys, self.keys),
|
||||||
|
msg='keys != self.key: %s != %s' % (keys, self.keys))
|
||||||
|
|
||||||
def test_strict_dict_contains(self):
|
def test_strict_dict_contains(self):
|
||||||
'''StrictDict contains function'''
|
'''StrictDict contains function'''
|
415
kitchen2/tests/test_converters.py
Normal file
415
kitchen2/tests/test_converters.py
Normal file
|
@ -0,0 +1,415 @@
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
#
|
||||||
|
|
||||||
|
import unittest
|
||||||
|
from nose import tools
|
||||||
|
from nose.plugins.skip import SkipTest
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import StringIO
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
try:
|
||||||
|
import chardet
|
||||||
|
except:
|
||||||
|
chardet = None
|
||||||
|
|
||||||
|
from kitchen.text import converters
|
||||||
|
from kitchen.text.exceptions import XmlEncodeError
|
||||||
|
|
||||||
|
import base_classes
|
||||||
|
|
||||||
|
class UnicodeNoStr(object):
|
||||||
|
def __unicode__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||||
|
|
||||||
|
class StrNoUnicode(object):
|
||||||
|
def __str__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||||
|
|
||||||
|
class StrReturnsUnicode(object):
|
||||||
|
def __str__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||||
|
|
||||||
|
class UnicodeReturnsStr(object):
|
||||||
|
def __unicode__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||||
|
|
||||||
|
class UnicodeStrCrossed(object):
|
||||||
|
def __unicode__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'.encode('utf8')
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
return u'El veloz murciélago saltó sobre el perro perezoso.'
|
||||||
|
|
||||||
|
class ReprUnicode(object):
|
||||||
|
def __repr__(self):
|
||||||
|
return u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'
|
||||||
|
|
||||||
|
class TestConverters(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def test_to_unicode(self):
|
||||||
|
'''Test to_unicode when the user gives good values'''
|
||||||
|
tools.eq_(converters.to_unicode(self.u_japanese, encoding='latin1'), self.u_japanese)
|
||||||
|
|
||||||
|
tools.eq_(converters.to_unicode(self.utf8_spanish), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(self.utf8_japanese), self.u_japanese)
|
||||||
|
|
||||||
|
tools.eq_(converters.to_unicode(self.latin1_spanish, encoding='latin1'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(self.euc_jp_japanese, encoding='euc_jp'), self.u_japanese)
|
||||||
|
|
||||||
|
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'foo'})
|
||||||
|
|
||||||
|
def test_to_unicode_errors(self):
|
||||||
|
tools.eq_(converters.to_unicode(self.latin1_spanish), self.u_mangled_spanish_latin1_as_utf8)
|
||||||
|
tools.eq_(converters.to_unicode(self.latin1_spanish, errors='ignore'), self.u_spanish_ignore)
|
||||||
|
tools.assert_raises(UnicodeDecodeError, converters.to_unicode,
|
||||||
|
*[self.latin1_spanish], **{'errors': 'strict'})
|
||||||
|
|
||||||
|
def test_to_unicode_nonstring(self):
|
||||||
|
tools.eq_(converters.to_unicode(5), u'5')
|
||||||
|
tools.eq_(converters.to_unicode(5, nonstring='empty'), u'')
|
||||||
|
tools.eq_(converters.to_unicode(5, nonstring='passthru'), 5)
|
||||||
|
tools.eq_(converters.to_unicode(5, nonstring='simplerepr'), u'5')
|
||||||
|
tools.eq_(converters.to_unicode(5, nonstring='repr'), u'5')
|
||||||
|
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'nonstring': 'strict'})
|
||||||
|
|
||||||
|
obj_repr = converters.to_unicode(object, nonstring='simplerepr')
|
||||||
|
tools.eq_(obj_repr, u"<type 'object'>")
|
||||||
|
tools.assert_true(isinstance(obj_repr, unicode))
|
||||||
|
|
||||||
|
def test_to_unicode_nonstring_with_objects_that_have__unicode__and__str__(self):
|
||||||
|
'''Test that to_unicode handles objects that have __unicode__ and __str__ methods'''
|
||||||
|
if sys.version_info < (3, 0):
|
||||||
|
# None of these apply on python3 because python3 does not use __unicode__
|
||||||
|
# and it enforces __str__ returning str
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeNoStr(), nonstring='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(StrNoUnicode(), nonstring='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), nonstring='simplerepr'), self.u_spanish)
|
||||||
|
|
||||||
|
tools.eq_(converters.to_unicode(StrReturnsUnicode(), nonstring='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), nonstring='simplerepr'), self.u_spanish)
|
||||||
|
|
||||||
|
def test_to_bytes(self):
|
||||||
|
'''Test to_bytes when the user gives good values'''
|
||||||
|
tools.eq_(converters.to_bytes(self.utf8_japanese, encoding='latin1'), self.utf8_japanese)
|
||||||
|
|
||||||
|
tools.eq_(converters.to_bytes(self.u_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.to_bytes(self.u_japanese), self.utf8_japanese)
|
||||||
|
|
||||||
|
tools.eq_(converters.to_bytes(self.u_spanish, encoding='latin1'), self.latin1_spanish)
|
||||||
|
tools.eq_(converters.to_bytes(self.u_japanese, encoding='euc_jp'), self.euc_jp_japanese)
|
||||||
|
|
||||||
|
def test_to_bytes_errors(self):
|
||||||
|
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin1'),
|
||||||
|
self.latin1_mixed_replace)
|
||||||
|
tools.eq_(converters.to_bytes(self.u_mixed, encoding='latin',
|
||||||
|
errors='ignore'), self.latin1_mixed_ignore)
|
||||||
|
tools.assert_raises(UnicodeEncodeError, converters.to_bytes,
|
||||||
|
*[self.u_mixed], **{'errors': 'strict', 'encoding': 'latin1'})
|
||||||
|
|
||||||
|
def _check_repr_bytes(self, repr_string, obj_name):
|
||||||
|
tools.assert_true(isinstance(repr_string, str))
|
||||||
|
match = self.repr_re.match(repr_string)
|
||||||
|
tools.assert_not_equal(match, None)
|
||||||
|
tools.eq_(match.groups()[0], obj_name)
|
||||||
|
|
||||||
|
def test_to_bytes_nonstring(self):
|
||||||
|
tools.eq_(converters.to_bytes(5), '5')
|
||||||
|
tools.eq_(converters.to_bytes(5, nonstring='empty'), '')
|
||||||
|
tools.eq_(converters.to_bytes(5, nonstring='passthru'), 5)
|
||||||
|
tools.eq_(converters.to_bytes(5, nonstring='simplerepr'), '5')
|
||||||
|
tools.eq_(converters.to_bytes(5, nonstring='repr'), '5')
|
||||||
|
|
||||||
|
# Raise a TypeError if the msg is nonstring and we're set to strict
|
||||||
|
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'strict'})
|
||||||
|
# Raise a TypeError if given an invalid nonstring arg
|
||||||
|
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'nonstring': 'INVALID'})
|
||||||
|
|
||||||
|
obj_repr = converters.to_bytes(object, nonstring='simplerepr')
|
||||||
|
tools.eq_(obj_repr, "<type 'object'>")
|
||||||
|
tools.assert_true(isinstance(obj_repr, str))
|
||||||
|
|
||||||
|
def test_to_bytes_nonstring_with_objects_that_have__unicode__and__str__(self):
|
||||||
|
if sys.version_info < (3, 0):
|
||||||
|
# This object's _str__ returns a utf8 encoded object
|
||||||
|
tools.eq_(converters.to_bytes(StrNoUnicode(), nonstring='simplerepr'), self.utf8_spanish)
|
||||||
|
# No __str__ method so this returns repr
|
||||||
|
string = converters.to_bytes(UnicodeNoStr(), nonstring='simplerepr')
|
||||||
|
self._check_repr_bytes(string, 'UnicodeNoStr')
|
||||||
|
|
||||||
|
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(StrReturnsUnicode(), nonstring='simplerepr'), self.utf8_spanish)
|
||||||
|
# Unless we explicitly ask for something different
|
||||||
|
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
|
||||||
|
nonstring='simplerepr', encoding='latin1'), self.latin1_spanish)
|
||||||
|
|
||||||
|
# This object has no __str__ so it returns repr
|
||||||
|
string = converters.to_bytes(UnicodeReturnsStr(), nonstring='simplerepr')
|
||||||
|
self._check_repr_bytes(string, 'UnicodeReturnsStr')
|
||||||
|
|
||||||
|
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), nonstring='simplerepr'), self.utf8_spanish)
|
||||||
|
|
||||||
|
# This object's __repr__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='simplerepr'),
|
||||||
|
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||||
|
tools.eq_(converters.to_bytes(ReprUnicode(), nonstring='repr'),
|
||||||
|
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||||
|
|
||||||
|
def test_unicode_to_xml(self):
|
||||||
|
tools.eq_(converters.unicode_to_xml(None), '')
|
||||||
|
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml, *['byte string'])
|
||||||
|
tools.assert_raises(ValueError, converters.unicode_to_xml, *[u'string'], **{'control_chars': 'foo'})
|
||||||
|
tools.assert_raises(XmlEncodeError, converters.unicode_to_xml,
|
||||||
|
*[u'string\u0002'], **{'control_chars': 'strict'})
|
||||||
|
tools.eq_(converters.unicode_to_xml(self.u_entity), self.utf8_entity_escape)
|
||||||
|
tools.eq_(converters.unicode_to_xml(self.u_entity, attrib=True), self.utf8_attrib_escape)
|
||||||
|
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii'), self.ascii_entity_escape)
|
||||||
|
tools.eq_(converters.unicode_to_xml(self.u_entity, encoding='ascii', attrib=True), self.ascii_attrib_escape)
|
||||||
|
|
||||||
|
def test_xml_to_unicode(self):
|
||||||
|
tools.eq_(converters.xml_to_unicode(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity)
|
||||||
|
tools.eq_(converters.xml_to_unicode(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity)
|
||||||
|
tools.eq_(converters.xml_to_unicode(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity)
|
||||||
|
tools.eq_(converters.xml_to_unicode(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity)
|
||||||
|
|
||||||
|
def test_xml_to_byte_string(self):
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.utf8_entity_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape, 'utf8', 'replace'), self.u_entity.encode('utf8'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.ascii_entity_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape, 'ascii', 'replace'), self.u_entity.encode('utf8'))
|
||||||
|
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
|
||||||
|
output_encoding='euc_jp', errors='replace'),
|
||||||
|
self.u_entity.encode('euc_jp', 'replace'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.utf8_attrib_escape,
|
||||||
|
output_encoding='latin1', errors='replace'),
|
||||||
|
self.u_entity.encode('latin1', 'replace'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
|
||||||
|
output_encoding='euc_jp', errors='replace'),
|
||||||
|
self.u_entity.encode('euc_jp', 'replace'))
|
||||||
|
tools.eq_(converters.xml_to_byte_string(self.ascii_attrib_escape,
|
||||||
|
output_encoding='latin1', errors='replace'),
|
||||||
|
self.u_entity.encode('latin1', 'replace'))
|
||||||
|
|
||||||
|
def test_byte_string_to_xml(self):
|
||||||
|
tools.assert_raises(XmlEncodeError, converters.byte_string_to_xml, *[u'test'])
|
||||||
|
tools.eq_(converters.byte_string_to_xml(self.utf8_entity), self.utf8_entity_escape)
|
||||||
|
tools.eq_(converters.byte_string_to_xml(self.utf8_entity, attrib=True), self.utf8_attrib_escape)
|
||||||
|
|
||||||
|
def test_bytes_to_xml(self):
|
||||||
|
tools.eq_(converters.bytes_to_xml(self.b_byte_chars), self.b_byte_encoded)
|
||||||
|
|
||||||
|
def test_xml_to_bytes(self):
|
||||||
|
tools.eq_(converters.xml_to_bytes(self.b_byte_encoded), self.b_byte_chars)
|
||||||
|
|
||||||
|
def test_guess_encoding_to_xml(self):
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.u_entity), self.utf8_entity_escape)
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.latin1_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.utf8_japanese), self.utf8_japanese)
|
||||||
|
|
||||||
|
def test_guess_encoding_to_xml_euc_japanese(self):
|
||||||
|
if chardet:
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
|
||||||
|
self.utf8_japanese)
|
||||||
|
else:
|
||||||
|
raise SkipTest('chardet not installed, euc_japanese won\'t be detected')
|
||||||
|
|
||||||
|
def test_guess_encoding_to_xml_euc_japanese_mangled(self):
|
||||||
|
if chardet:
|
||||||
|
raise SkipTest('chardet installed, euc_japanese won\'t be mangled')
|
||||||
|
else:
|
||||||
|
tools.eq_(converters.guess_encoding_to_xml(self.euc_jp_japanese),
|
||||||
|
self.utf8_mangled_euc_jp_as_latin1)
|
||||||
|
|
||||||
|
class TestGetWriter(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.io = StringIO.StringIO()
|
||||||
|
|
||||||
|
def test_utf8_writer(self):
|
||||||
|
writer = converters.getwriter('utf-8')
|
||||||
|
io = writer(self.io)
|
||||||
|
io.write(self.u_japanese + u'\n')
|
||||||
|
io.seek(0)
|
||||||
|
result = io.read().strip()
|
||||||
|
tools.eq_(result, self.utf8_japanese)
|
||||||
|
|
||||||
|
io.seek(0)
|
||||||
|
io.truncate(0)
|
||||||
|
io.write(self.euc_jp_japanese + '\n')
|
||||||
|
io.seek(0)
|
||||||
|
result = io.read().strip()
|
||||||
|
tools.eq_(result, self.euc_jp_japanese)
|
||||||
|
|
||||||
|
io.seek(0)
|
||||||
|
io.truncate(0)
|
||||||
|
io.write(self.utf8_japanese + '\n')
|
||||||
|
io.seek(0)
|
||||||
|
result = io.read().strip()
|
||||||
|
tools.eq_(result, self.utf8_japanese)
|
||||||
|
|
||||||
|
def test_error_handlers(self):
|
||||||
|
'''Test setting alternate error handlers'''
|
||||||
|
writer = converters.getwriter('latin1')
|
||||||
|
io = writer(self.io, errors='strict')
|
||||||
|
tools.assert_raises(UnicodeEncodeError, io.write, self.u_japanese)
|
||||||
|
|
||||||
|
|
||||||
|
class TestExceptionConverters(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.exceptions = {}
|
||||||
|
tests = {'u_jpn': self.u_japanese,
|
||||||
|
'u_spanish': self.u_spanish,
|
||||||
|
'utf8_jpn': self.utf8_japanese,
|
||||||
|
'utf8_spanish': self.utf8_spanish,
|
||||||
|
'euc_jpn': self.euc_jp_japanese,
|
||||||
|
'latin1_spanish': self.latin1_spanish}
|
||||||
|
for test in tests.iteritems():
|
||||||
|
try:
|
||||||
|
raise Exception(test[1])
|
||||||
|
except Exception, self.exceptions[test[0]]:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def test_exception_to_unicode_with_unicode(self):
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['u_jpn']), self.u_japanese)
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['u_spanish']), self.u_spanish)
|
||||||
|
|
||||||
|
def test_exception_to_unicode_with_bytes(self):
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_jpn']), self.u_japanese)
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['utf8_spanish']), self.u_spanish)
|
||||||
|
# Mangled latin1/utf8 conversion but no tracebacks
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish']), self.u_mangled_spanish_latin1_as_utf8)
|
||||||
|
# Mangled euc_jp/utf8 conversion but no tracebacks
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn']), self.u_mangled_euc_jp_as_utf8)
|
||||||
|
|
||||||
|
def test_exception_to_unicode_custom(self):
|
||||||
|
# If given custom functions, then we should not mangle
|
||||||
|
c = [lambda e: converters.to_unicode(e.args[0], encoding='euc_jp'),
|
||||||
|
lambda e: converters.to_unicode(e, encoding='euc_jp')]
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
|
||||||
|
converters=c), self.u_japanese)
|
||||||
|
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['euc_jpn'],
|
||||||
|
converters=c), self.u_japanese)
|
||||||
|
|
||||||
|
c = [lambda e: converters.to_unicode(e.args[0], encoding='latin1'),
|
||||||
|
lambda e: converters.to_unicode(e, encoding='latin1')]
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
|
||||||
|
converters=c), self.u_spanish)
|
||||||
|
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||||
|
tools.eq_(converters.exception_to_unicode(self.exceptions['latin1_spanish'],
|
||||||
|
converters=c), self.u_spanish)
|
||||||
|
|
||||||
|
def test_exception_to_bytes_with_unicode(self):
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['u_jpn']), self.utf8_japanese)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['u_spanish']), self.utf8_spanish)
|
||||||
|
|
||||||
|
def test_exception_to_bytes_with_bytes(self):
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_jpn']), self.utf8_japanese)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['utf8_spanish']), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish']), self.latin1_spanish)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn']), self.euc_jp_japanese)
|
||||||
|
|
||||||
|
def test_exception_to_bytes_custom(self):
|
||||||
|
# If given custom functions, then we should not mangle
|
||||||
|
c = [lambda e: converters.to_bytes(e.args[0], encoding='euc_jp'),
|
||||||
|
lambda e: converters.to_bytes(e, encoding='euc_jp')]
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
|
||||||
|
converters=c), self.euc_jp_japanese)
|
||||||
|
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['euc_jpn'],
|
||||||
|
converters=c), self.euc_jp_japanese)
|
||||||
|
|
||||||
|
c = [lambda e: converters.to_bytes(e.args[0], encoding='latin1'),
|
||||||
|
lambda e: converters.to_bytes(e, encoding='latin1')]
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
|
||||||
|
converters=c), self.latin1_spanish)
|
||||||
|
c.extend(converters.EXCEPTION_CONVERTERS)
|
||||||
|
tools.eq_(converters.exception_to_bytes(self.exceptions['latin1_spanish'],
|
||||||
|
converters=c), self.latin1_spanish)
|
||||||
|
|
||||||
|
|
||||||
|
class TestDeprecatedConverters(TestConverters):
|
||||||
|
def setUp(self):
|
||||||
|
warnings.simplefilter('ignore', DeprecationWarning)
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
warnings.simplefilter('default', DeprecationWarning)
|
||||||
|
|
||||||
|
def test_to_xml(self):
|
||||||
|
tools.eq_(converters.to_xml(self.u_entity), self.utf8_entity_escape)
|
||||||
|
tools.eq_(converters.to_xml(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.to_xml(self.latin1_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.to_xml(self.utf8_japanese), self.utf8_japanese)
|
||||||
|
|
||||||
|
def test_to_utf8(self):
|
||||||
|
tools.eq_(converters.to_utf8(self.u_japanese), self.utf8_japanese)
|
||||||
|
tools.eq_(converters.to_utf8(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
|
||||||
|
def test_to_str(self):
|
||||||
|
tools.eq_(converters.to_str(self.u_japanese), self.utf8_japanese)
|
||||||
|
tools.eq_(converters.to_str(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(converters.to_str(object), "<type 'object'>")
|
||||||
|
|
||||||
|
def test_non_string(self):
|
||||||
|
'''Test deprecated non_string parameter'''
|
||||||
|
# unicode
|
||||||
|
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'foo'})
|
||||||
|
tools.eq_(converters.to_unicode(5, non_string='empty'), u'')
|
||||||
|
tools.eq_(converters.to_unicode(5, non_string='passthru'), 5)
|
||||||
|
tools.eq_(converters.to_unicode(5, non_string='simplerepr'), u'5')
|
||||||
|
tools.eq_(converters.to_unicode(5, non_string='repr'), u'5')
|
||||||
|
tools.assert_raises(TypeError, converters.to_unicode, *[5], **{'non_string': 'strict'})
|
||||||
|
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeNoStr(), non_string='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(StrNoUnicode(), non_string='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(StrReturnsUnicode(), non_string='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeReturnsStr(), non_string='simplerepr'), self.u_spanish)
|
||||||
|
tools.eq_(converters.to_unicode(UnicodeStrCrossed(), non_string='simplerepr'), self.u_spanish)
|
||||||
|
|
||||||
|
obj_repr = converters.to_unicode(object, non_string='simplerepr')
|
||||||
|
tools.eq_(obj_repr, u"<type 'object'>")
|
||||||
|
tools.assert_true(isinstance(obj_repr, unicode))
|
||||||
|
|
||||||
|
# Bytes
|
||||||
|
tools.eq_(converters.to_bytes(5), '5')
|
||||||
|
tools.eq_(converters.to_bytes(5, non_string='empty'), '')
|
||||||
|
tools.eq_(converters.to_bytes(5, non_string='passthru'), 5)
|
||||||
|
tools.eq_(converters.to_bytes(5, non_string='simplerepr'), '5')
|
||||||
|
tools.eq_(converters.to_bytes(5, non_string='repr'), '5')
|
||||||
|
|
||||||
|
# Raise a TypeError if the msg is non_string and we're set to strict
|
||||||
|
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'strict'})
|
||||||
|
# Raise a TypeError if given an invalid non_string arg
|
||||||
|
tools.assert_raises(TypeError, converters.to_bytes, *[5], **{'non_string': 'INVALID'})
|
||||||
|
|
||||||
|
# No __str__ method so this returns repr
|
||||||
|
string = converters.to_bytes(UnicodeNoStr(), non_string='simplerepr')
|
||||||
|
self._check_repr_bytes(string, 'UnicodeNoStr')
|
||||||
|
|
||||||
|
# This object's _str__ returns a utf8 encoded object
|
||||||
|
tools.eq_(converters.to_bytes(StrNoUnicode(), non_string='simplerepr'), self.utf8_spanish)
|
||||||
|
|
||||||
|
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(StrReturnsUnicode(), non_string='simplerepr'), self.utf8_spanish)
|
||||||
|
# Unless we explicitly ask for something different
|
||||||
|
tools.eq_(converters.to_bytes(StrReturnsUnicode(),
|
||||||
|
non_string='simplerepr', encoding='latin1'), self.latin1_spanish)
|
||||||
|
|
||||||
|
# This object has no __str__ so it returns repr
|
||||||
|
string = converters.to_bytes(UnicodeReturnsStr(), non_string='simplerepr')
|
||||||
|
self._check_repr_bytes(string, 'UnicodeReturnsStr')
|
||||||
|
|
||||||
|
# This object's __str__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(UnicodeStrCrossed(), non_string='simplerepr'), self.utf8_spanish)
|
||||||
|
|
||||||
|
# This object's __repr__ returns unicode which to_bytes converts to utf8
|
||||||
|
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='simplerepr'),
|
||||||
|
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||||
|
tools.eq_(converters.to_bytes(ReprUnicode(), non_string='repr'),
|
||||||
|
u'ReprUnicode(El veloz murciélago saltó sobre el perro perezoso.)'.encode('utf8'))
|
||||||
|
|
||||||
|
obj_repr = converters.to_bytes(object, non_string='simplerepr')
|
||||||
|
tools.eq_(obj_repr, "<type 'object'>")
|
||||||
|
tools.assert_true(isinstance(obj_repr, str))
|
|
@ -4,7 +4,6 @@ import os
|
||||||
import copy
|
import copy
|
||||||
import tempfile
|
import tempfile
|
||||||
import unittest
|
import unittest
|
||||||
from test import test_support
|
|
||||||
|
|
||||||
from kitchen.pycompat25.collections._defaultdict import defaultdict
|
from kitchen.pycompat25.collections._defaultdict import defaultdict
|
||||||
|
|
||||||
|
@ -173,6 +172,7 @@ class TestDefaultDict(unittest.TestCase):
|
||||||
os.remove(tfn)
|
os.remove(tfn)
|
||||||
|
|
||||||
|
|
||||||
|
#from test import test_support
|
||||||
#def test_main():
|
#def test_main():
|
||||||
# test_support.run_unittest(TestDefaultDict)
|
# test_support.run_unittest(TestDefaultDict)
|
||||||
#
|
#
|
|
@ -5,21 +5,19 @@ from nose import tools
|
||||||
|
|
||||||
import sys
|
import sys
|
||||||
import warnings
|
import warnings
|
||||||
|
from kitchen import i18n
|
||||||
from kitchen.text import converters
|
from kitchen.text import converters
|
||||||
from kitchen.text import utf8
|
from kitchen.text import utf8
|
||||||
|
|
||||||
class TestDeprecated(unittest.TestCase):
|
class TestDeprecated(unittest.TestCase):
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
registry = sys._getframe(2).f_globals.get('__warningregistry__')
|
for module in sys.modules.values():
|
||||||
if registry:
|
if hasattr(module, '__warningregistry__'):
|
||||||
registry.clear()
|
del module.__warningregistry__
|
||||||
registry = sys._getframe(1).f_globals.get('__warningregistry__')
|
|
||||||
if registry:
|
|
||||||
registry.clear()
|
|
||||||
warnings.simplefilter('error', DeprecationWarning)
|
warnings.simplefilter('error', DeprecationWarning)
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
warnings.simplefilter('default', DeprecationWarning)
|
warnings.simplefilter('ignore', DeprecationWarning)
|
||||||
|
|
||||||
def test_deprecated_functions(self):
|
def test_deprecated_functions(self):
|
||||||
'''Test that all deprecated functions raise DeprecationWarning'''
|
'''Test that all deprecated functions raise DeprecationWarning'''
|
||||||
|
@ -45,3 +43,23 @@ class TestDeprecated(unittest.TestCase):
|
||||||
**{'non_string': 'simplerepr'})
|
**{'non_string': 'simplerepr'})
|
||||||
tools.assert_raises(DeprecationWarning, converters.to_bytes, *[5],
|
tools.assert_raises(DeprecationWarning, converters.to_bytes, *[5],
|
||||||
**{'nonstring': 'simplerepr', 'non_string': 'simplerepr'})
|
**{'nonstring': 'simplerepr', 'non_string': 'simplerepr'})
|
||||||
|
|
||||||
|
|
||||||
|
class TestPendingDeprecationParameters(unittest.TestCase):
|
||||||
|
def setUp(self):
|
||||||
|
for module in sys.modules.values():
|
||||||
|
if hasattr(module, '__warningregistry__'):
|
||||||
|
del module.__warningregistry__
|
||||||
|
warnings.simplefilter('error', PendingDeprecationWarning)
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
warnings.simplefilter('ignore', PendingDeprecationWarning)
|
||||||
|
|
||||||
|
def test_parameters(self):
|
||||||
|
# test that we warn when using the python2_api parameters
|
||||||
|
tools.assert_raises(PendingDeprecationWarning,
|
||||||
|
i18n.get_translation_object, 'test', **{'python2_api': True})
|
||||||
|
tools.assert_raises(PendingDeprecationWarning,
|
||||||
|
i18n.DummyTranslations, **{'python2_api': True})
|
||||||
|
|
||||||
|
|
820
kitchen2/tests/test_i18n.py
Normal file
820
kitchen2/tests/test_i18n.py
Normal file
|
@ -0,0 +1,820 @@
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
#
|
||||||
|
import unittest
|
||||||
|
from nose import tools
|
||||||
|
|
||||||
|
import os
|
||||||
|
import types
|
||||||
|
|
||||||
|
from kitchen import i18n
|
||||||
|
|
||||||
|
import base_classes
|
||||||
|
|
||||||
|
class TestI18N_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_easy_gettext_setup(self):
|
||||||
|
'''Test that the easy_gettext_setup function works
|
||||||
|
'''
|
||||||
|
_, N_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
tools.assert_true(isinstance(_, types.MethodType))
|
||||||
|
tools.assert_true(isinstance(N_, types.MethodType))
|
||||||
|
tools.eq_(_.__name__, '_ugettext')
|
||||||
|
tools.eq_(N_.__name__, '_ungettext')
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_spanish), self.u_spanish)
|
||||||
|
tools.eq_(_(self.u_spanish), self.u_spanish)
|
||||||
|
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 1), self.u_limao)
|
||||||
|
tools.eq_(N_(self.utf8_limao, self.utf8_limoes, 2), self.u_limoes)
|
||||||
|
tools.eq_(N_(self.u_limao, self.u_limoes, 1), self.u_limao)
|
||||||
|
tools.eq_(N_(self.u_limao, self.u_limoes, 2), self.u_limoes)
|
||||||
|
|
||||||
|
def test_easy_gettext_setup_non_unicode(self):
|
||||||
|
'''Test that the easy_gettext_setup function works
|
||||||
|
'''
|
||||||
|
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__)],
|
||||||
|
use_unicode=False)
|
||||||
|
tools.assert_true(isinstance(b_, types.MethodType))
|
||||||
|
tools.assert_true(isinstance(bN_, types.MethodType))
|
||||||
|
tools.eq_(b_.__name__, '_lgettext')
|
||||||
|
tools.eq_(bN_.__name__, '_lngettext')
|
||||||
|
|
||||||
|
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(b_(self.u_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
|
||||||
|
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.utf8_limao)
|
||||||
|
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.utf8_limoes)
|
||||||
|
|
||||||
|
def test_get_translation_object(self):
|
||||||
|
'''Test that the get_translation_object function works
|
||||||
|
'''
|
||||||
|
translations = i18n.get_translation_object('foo', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
tools.eq_(translations.__class__, i18n.DummyTranslations)
|
||||||
|
tools.assert_raises(IOError, i18n.get_translation_object, 'foo', ['%s/data/locale/' % os.path.dirname(__file__)], fallback=False)
|
||||||
|
|
||||||
|
translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
|
||||||
|
|
||||||
|
def test_get_translation_object_create_fallback(self):
|
||||||
|
'''Test get_translation_object creates fallbacks for additional catalogs'''
|
||||||
|
translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
tools.eq_(translations.__class__, i18n.NewGNUTranslations)
|
||||||
|
tools.eq_(translations._fallback.__class__, i18n.NewGNUTranslations)
|
||||||
|
|
||||||
|
def test_get_translation_object_copy(self):
|
||||||
|
'''Test get_translation_object shallow copies the message catalog'''
|
||||||
|
translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
|
||||||
|
translations.input_charset = 'utf-8'
|
||||||
|
translations2 = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='latin-1')
|
||||||
|
translations2.input_charset = 'latin-1'
|
||||||
|
|
||||||
|
# Test that portions of the translation objects are the same and other
|
||||||
|
# portions are different (which is a space optimization so that the
|
||||||
|
# translation data isn't in memory multiple times)
|
||||||
|
tools.assert_not_equal(id(translations._fallback), id(translations2._fallback))
|
||||||
|
tools.assert_not_equal(id(translations.output_charset()), id(translations2.output_charset()))
|
||||||
|
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
|
||||||
|
tools.assert_not_equal(id(translations.input_charset), id(translations2.input_charset))
|
||||||
|
tools.eq_(id(translations._catalog), id(translations2._catalog))
|
||||||
|
|
||||||
|
def test_get_translation_object_optional_params(self):
|
||||||
|
'''Smoketest leaving out optional parameters'''
|
||||||
|
translations = i18n.get_translation_object('test')
|
||||||
|
tools.assert_true(translations.__class__ in (i18n.NewGNUTranslations, i18n.DummyTranslations))
|
||||||
|
|
||||||
|
def test_get_translation_object_python2_api_default(self):
|
||||||
|
'''Smoketest that python2_api default value yields the python2 functions'''
|
||||||
|
# Default
|
||||||
|
translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8')
|
||||||
|
translations.input_charset = 'utf-8'
|
||||||
|
tools.eq_(translations.gettext.__name__, '_gettext')
|
||||||
|
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||||
|
tools.eq_(translations.ugettext.__name__, '_ugettext')
|
||||||
|
tools.eq_(translations.ngettext.__name__, '_ngettext')
|
||||||
|
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||||
|
tools.eq_(translations.ungettext.__name__, '_ungettext')
|
||||||
|
|
||||||
|
def test_get_translation_object_python2_api_true(self):
|
||||||
|
'''Smoketest that setting python2_api true yields the python2 functions'''
|
||||||
|
# Default
|
||||||
|
translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
|
||||||
|
python2_api=True)
|
||||||
|
translations.input_charset = 'utf-8'
|
||||||
|
tools.eq_(translations.gettext.__name__, '_gettext')
|
||||||
|
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||||
|
tools.eq_(translations.ugettext.__name__, '_ugettext')
|
||||||
|
tools.eq_(translations.ngettext.__name__, '_ngettext')
|
||||||
|
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||||
|
tools.eq_(translations.ungettext.__name__, '_ungettext')
|
||||||
|
|
||||||
|
def test_get_translation_object_python2_api_false(self):
|
||||||
|
'''Smoketest that setting python2_api false yields the python3 functions'''
|
||||||
|
# Default
|
||||||
|
translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)], codeset='utf-8',
|
||||||
|
python2_api=False)
|
||||||
|
translations.input_charset = 'utf-8'
|
||||||
|
tools.eq_(translations.gettext.__name__, '_ugettext')
|
||||||
|
tools.eq_(translations.lgettext.__name__, '_lgettext')
|
||||||
|
tools.eq_(translations.ngettext.__name__, '_ungettext')
|
||||||
|
tools.eq_(translations.lngettext.__name__, '_lngettext')
|
||||||
|
|
||||||
|
tools.assert_raises(AttributeError, translations.ugettext, 'message')
|
||||||
|
tools.assert_raises(AttributeError, translations.ungettext, 'message1', 'message2')
|
||||||
|
|
||||||
|
def test_dummy_translation(self):
|
||||||
|
'''Test that we can create a DummyTranslation object
|
||||||
|
'''
|
||||||
|
tools.assert_true(isinstance(i18n.DummyTranslations(), i18n.DummyTranslations))
|
||||||
|
|
||||||
|
# Note: Using nose's generator tests for this so we can't subclass
|
||||||
|
# unittest.TestCase
|
||||||
|
class TestDummyTranslations(base_classes.UnicodeTestData):
|
||||||
|
def __init__(self):
|
||||||
|
self.test_data = {'bytes': (( # First set is with default charset (utf8)
|
||||||
|
(self.u_ascii, self.b_ascii),
|
||||||
|
(self.u_spanish, self.utf8_spanish),
|
||||||
|
(self.u_japanese, self.utf8_japanese),
|
||||||
|
(self.b_ascii, self.b_ascii),
|
||||||
|
(self.utf8_spanish, self.utf8_spanish),
|
||||||
|
(self.latin1_spanish, self.utf8_mangled_spanish_latin1_as_utf8),
|
||||||
|
(self.utf8_japanese, self.utf8_japanese),
|
||||||
|
),
|
||||||
|
( # Second set is with output_charset of latin1 (ISO-8859-1)
|
||||||
|
(self.u_ascii, self.b_ascii),
|
||||||
|
(self.u_spanish, self.latin1_spanish),
|
||||||
|
(self.u_japanese, self.latin1_mangled_japanese_replace_as_latin1),
|
||||||
|
(self.b_ascii, self.b_ascii),
|
||||||
|
(self.utf8_spanish, self.utf8_spanish),
|
||||||
|
(self.latin1_spanish, self.latin1_spanish),
|
||||||
|
(self.utf8_japanese, self.utf8_japanese),
|
||||||
|
),
|
||||||
|
( # Third set is with output_charset of C
|
||||||
|
(self.u_ascii, self.b_ascii),
|
||||||
|
(self.u_spanish, self.ascii_mangled_spanish_as_ascii),
|
||||||
|
(self.u_japanese, self.ascii_mangled_japanese_replace_as_latin1),
|
||||||
|
(self.b_ascii, self.b_ascii),
|
||||||
|
(self.utf8_spanish, self.ascii_mangled_spanish_as_ascii),
|
||||||
|
(self.latin1_spanish, self.ascii_twice_mangled_spanish_latin1_as_utf8_as_ascii),
|
||||||
|
(self.utf8_japanese, self.ascii_mangled_japanese_replace_as_latin1),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
'unicode': (( # First set is with the default charset (utf8)
|
||||||
|
(self.u_ascii, self.u_ascii),
|
||||||
|
(self.u_spanish, self.u_spanish),
|
||||||
|
(self.u_japanese, self.u_japanese),
|
||||||
|
(self.b_ascii, self.u_ascii),
|
||||||
|
(self.utf8_spanish, self.u_spanish),
|
||||||
|
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_utf8), # String is mangled but no exception
|
||||||
|
(self.utf8_japanese, self.u_japanese),
|
||||||
|
),
|
||||||
|
( # Second set is with _charset of latin1 (ISO-8859-1)
|
||||||
|
(self.u_ascii, self.u_ascii),
|
||||||
|
(self.u_spanish, self.u_spanish),
|
||||||
|
(self.u_japanese, self.u_japanese),
|
||||||
|
(self.b_ascii, self.u_ascii),
|
||||||
|
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_latin1), # String mangled but no exception
|
||||||
|
(self.latin1_spanish, self.u_spanish),
|
||||||
|
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_latin1), # String mangled but no exception
|
||||||
|
),
|
||||||
|
( # Third set is with _charset of C
|
||||||
|
(self.u_ascii, self.u_ascii),
|
||||||
|
(self.u_spanish, self.u_spanish),
|
||||||
|
(self.u_japanese, self.u_japanese),
|
||||||
|
(self.b_ascii, self.u_ascii),
|
||||||
|
(self.utf8_spanish, self.u_mangled_spanish_utf8_as_ascii), # String mangled but no exception
|
||||||
|
(self.latin1_spanish, self.u_mangled_spanish_latin1_as_ascii), # String mangled but no exception
|
||||||
|
(self.utf8_japanese, self.u_mangled_japanese_utf8_as_ascii), # String mangled but no exception
|
||||||
|
),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.translations = i18n.DummyTranslations()
|
||||||
|
|
||||||
|
def check_gettext(self, message, value, charset=None):
|
||||||
|
self.translations.set_output_charset(charset)
|
||||||
|
tools.eq_(self.translations.gettext(message), value,
|
||||||
|
msg='gettext(%s): trans: %s != val: %s (charset=%s)'
|
||||||
|
% (repr(message), repr(self.translations.gettext(message)),
|
||||||
|
repr(value), charset))
|
||||||
|
|
||||||
|
def check_lgettext(self, message, value, charset=None,
|
||||||
|
locale='en_US.UTF-8'):
|
||||||
|
os.environ['LC_ALL'] = locale
|
||||||
|
self.translations.set_output_charset(charset)
|
||||||
|
tools.eq_(self.translations.lgettext(message), value,
|
||||||
|
msg='lgettext(%s): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||||
|
% (repr(message), repr(self.translations.lgettext(message)),
|
||||||
|
repr(value), charset, locale))
|
||||||
|
|
||||||
|
# Note: charset has a default value because nose isn't invoking setUp and
|
||||||
|
# tearDown each time check_* is run.
|
||||||
|
def check_ugettext(self, message, value, charset='utf-8'):
|
||||||
|
'''ugettext method with default values'''
|
||||||
|
self.translations.input_charset = charset
|
||||||
|
tools.eq_(self.translations.ugettext(message), value,
|
||||||
|
msg='ugettext(%s): trans: %s != val: %s (charset=%s)'
|
||||||
|
% (repr(message), repr(self.translations.ugettext(message)),
|
||||||
|
repr(value), charset))
|
||||||
|
|
||||||
|
def check_ngettext(self, message, value, charset=None):
|
||||||
|
self.translations.set_output_charset(charset)
|
||||||
|
tools.eq_(self.translations.ngettext(message, 'blank', 1), value)
|
||||||
|
tools.eq_(self.translations.ngettext('blank', message, 2), value)
|
||||||
|
tools.assert_not_equal(self.translations.ngettext(message, 'blank', 2), value)
|
||||||
|
tools.assert_not_equal(self.translations.ngettext('blank', message, 1), value)
|
||||||
|
|
||||||
|
def check_lngettext(self, message, value, charset=None, locale='en_US.UTF-8'):
|
||||||
|
os.environ['LC_ALL'] = locale
|
||||||
|
self.translations.set_output_charset(charset)
|
||||||
|
tools.eq_(self.translations.lngettext(message, 'blank', 1), value,
|
||||||
|
msg='lngettext(%s, "blank", 1): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||||
|
% (repr(message), repr(self.translations.lngettext(message,
|
||||||
|
'blank', 1)), repr(value), charset, locale))
|
||||||
|
tools.eq_(self.translations.lngettext('blank', message, 2), value,
|
||||||
|
msg='lngettext("blank", %s, 2): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||||
|
% (repr(message), repr(self.translations.lngettext('blank',
|
||||||
|
message, 2)), repr(value), charset, locale))
|
||||||
|
tools.assert_not_equal(self.translations.lngettext(message, 'blank', 2), value,
|
||||||
|
msg='lngettext(%s, "blank", 2): trans: %s, val: %s (charset=%s, locale=%s)'
|
||||||
|
% (repr(message), repr(self.translations.lngettext(message,
|
||||||
|
'blank', 2)), repr(value), charset, locale))
|
||||||
|
tools.assert_not_equal(self.translations.lngettext('blank', message, 1), value,
|
||||||
|
msg='lngettext("blank", %s, 1): trans: %s != val: %s (charset=%s, locale=%s)'
|
||||||
|
% (repr(message), repr(self.translations.lngettext('blank',
|
||||||
|
message, 1)), repr(value), charset, locale))
|
||||||
|
|
||||||
|
# Note: charset has a default value because nose isn't invoking setUp and
|
||||||
|
# tearDown each time check_* is run.
|
||||||
|
def check_ungettext(self, message, value, charset='utf-8'):
|
||||||
|
self.translations.input_charset = charset
|
||||||
|
tools.eq_(self.translations.ungettext(message, 'blank', 1), value)
|
||||||
|
tools.eq_(self.translations.ungettext('blank', message, 2), value)
|
||||||
|
tools.assert_not_equal(self.translations.ungettext(message, 'blank', 2), value)
|
||||||
|
tools.assert_not_equal(self.translations.ungettext('blank', message, 1), value)
|
||||||
|
|
||||||
|
def test_gettext(self):
|
||||||
|
'''gettext method with default values'''
|
||||||
|
for message, value in self.test_data['bytes'][0]:
|
||||||
|
yield self.check_gettext, message, value
|
||||||
|
|
||||||
|
def test_gettext_output_charset(self):
|
||||||
|
'''gettext method after output_charset is set'''
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_gettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_ngettext(self):
|
||||||
|
for message, value in self.test_data['bytes'][0]:
|
||||||
|
yield self.check_ngettext, message, value
|
||||||
|
|
||||||
|
def test_ngettext_output_charset(self):
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_ngettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_lgettext(self):
|
||||||
|
'''lgettext method with default values on a utf8 locale'''
|
||||||
|
for message, value in self.test_data['bytes'][0]:
|
||||||
|
yield self.check_lgettext, message, value
|
||||||
|
|
||||||
|
def test_lgettext_output_charset(self):
|
||||||
|
'''lgettext method after output_charset is set'''
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_lgettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_lgettext_output_charset_and_locale(self):
|
||||||
|
'''lgettext method after output_charset is set in C locale
|
||||||
|
|
||||||
|
output_charset should take precedence
|
||||||
|
'''
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_lgettext, message, value, 'latin1', 'C'
|
||||||
|
|
||||||
|
def test_lgettext_locale_C(self):
|
||||||
|
'''lgettext method in a C locale'''
|
||||||
|
for message, value in self.test_data['bytes'][2]:
|
||||||
|
yield self.check_lgettext, message, value, None, 'C'
|
||||||
|
|
||||||
|
def test_lngettext(self):
|
||||||
|
'''lngettext method with default values on a utf8 locale'''
|
||||||
|
for message, value in self.test_data['bytes'][0]:
|
||||||
|
yield self.check_lngettext, message, value
|
||||||
|
|
||||||
|
def test_lngettext_output_charset(self):
|
||||||
|
'''lngettext method after output_charset is set'''
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_lngettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_lngettext_output_charset_and_locale(self):
|
||||||
|
'''lngettext method after output_charset is set in C locale
|
||||||
|
|
||||||
|
output_charset should take precedence
|
||||||
|
'''
|
||||||
|
for message, value in self.test_data['bytes'][1]:
|
||||||
|
yield self.check_lngettext, message, value, 'latin1', 'C'
|
||||||
|
|
||||||
|
def test_lngettext_locale_C(self):
|
||||||
|
'''lngettext method in a C locale'''
|
||||||
|
for message, value in self.test_data['bytes'][2]:
|
||||||
|
yield self.check_lngettext, message, value, None, 'C'
|
||||||
|
|
||||||
|
def test_ugettext(self):
|
||||||
|
for message, value in self.test_data['unicode'][0]:
|
||||||
|
yield self.check_ugettext, message, value
|
||||||
|
|
||||||
|
def test_ugettext_charset_latin1(self):
|
||||||
|
for message, value in self.test_data['unicode'][1]:
|
||||||
|
yield self.check_ugettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_ugettext_charset_ascii(self):
|
||||||
|
for message, value in self.test_data['unicode'][2]:
|
||||||
|
yield self.check_ugettext, message, value, 'ascii'
|
||||||
|
|
||||||
|
def test_ungettext(self):
|
||||||
|
for message, value in self.test_data['unicode'][0]:
|
||||||
|
yield self.check_ungettext, message, value
|
||||||
|
|
||||||
|
def test_ungettext_charset_latin1(self):
|
||||||
|
for message, value in self.test_data['unicode'][1]:
|
||||||
|
yield self.check_ungettext, message, value, 'latin1'
|
||||||
|
|
||||||
|
def test_ungettext_charset_ascii(self):
|
||||||
|
for message, value in self.test_data['unicode'][2]:
|
||||||
|
yield self.check_ungettext, message, value, 'ascii'
|
||||||
|
|
||||||
|
def test_nonbasestring(self):
|
||||||
|
tools.eq_(self.translations.gettext(dict(hi='there')), self.b_empty_string)
|
||||||
|
tools.eq_(self.translations.ngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
|
||||||
|
tools.eq_(self.translations.lgettext(dict(hi='there')), self.b_empty_string)
|
||||||
|
tools.eq_(self.translations.lngettext(dict(hi='there'), dict(hi='two'), 1), self.b_empty_string)
|
||||||
|
tools.eq_(self.translations.ugettext(dict(hi='there')), self.u_empty_string)
|
||||||
|
tools.eq_(self.translations.ungettext(dict(hi='there'), dict(hi='two'), 1), self.u_empty_string)
|
||||||
|
|
||||||
|
|
||||||
|
class TestI18N_Latin1(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_easy_gettext_setup_non_unicode(self):
|
||||||
|
'''Test that the easy_gettext_setup function works
|
||||||
|
'''
|
||||||
|
b_, bN_ = i18n.easy_gettext_setup('foo', localedirs=
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__)],
|
||||||
|
use_unicode=False)
|
||||||
|
|
||||||
|
tools.eq_(b_(self.utf8_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(b_(self.u_spanish), self.latin1_spanish)
|
||||||
|
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_limao)
|
||||||
|
tools.eq_(bN_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(bN_(self.u_limao, self.u_limoes, 1), self.latin1_limao)
|
||||||
|
tools.eq_(bN_(self.u_limao, self.u_limoes, 2), self.latin1_limoes)
|
||||||
|
|
||||||
|
|
||||||
|
class TestNewGNUTranslationsNoMatch(TestDummyTranslations):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
|
||||||
|
class TestNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_gettext(self):
|
||||||
|
_ = self.translations.gettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_ngettext(self):
|
||||||
|
_ = self.translations.ngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
def test_lgettext(self):
|
||||||
|
_ = self.translations.lgettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_lngettext(self):
|
||||||
|
_ = self.translations.lngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ugettext(self):
|
||||||
|
_ = self.translations.ugettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.u_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
|
||||||
|
# This is not translated to utf8_yes_in_fallback because this test is
|
||||||
|
# without the fallback message catalog
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.u_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
def test_ungettext(self):
|
||||||
|
_ = self.translations.ungettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
class TestNewGNURealTranslations_Latin1(TestNewGNURealTranslations_UTF8):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||||
|
self.translations = i18n.get_translation_object('test', ['%s/data/locale/' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_lgettext(self):
|
||||||
|
_ = self.translations.lgettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
|
||||||
|
# Neither of the following two tests encode to proper latin-1 because:
|
||||||
|
# any byte is valid in latin-1 so there's no way to know that what
|
||||||
|
# we're given in the string is really utf-8
|
||||||
|
#
|
||||||
|
# This is not translated to latin1_yes_in_fallback because this test
|
||||||
|
# is without the fallback message catalog
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
|
||||||
|
# This is not translated to latin1_yes_in_fallback because this test
|
||||||
|
# is without the fallback message catalog
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.latin1_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
|
||||||
|
|
||||||
|
def test_lngettext(self):
|
||||||
|
_ = self.translations.lngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
|
||||||
|
|
||||||
|
# This unfortunately does not encode to proper latin-1 because:
|
||||||
|
# any byte is valid in latin-1 so there's no way to know that what
|
||||||
|
# we're given in the string is really utf-8
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
class TestFallbackNewGNUTranslationsNoMatch(TestDummyTranslations):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
self.translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
|
||||||
|
class TestFallbackNewGNURealTranslations_UTF8(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
self.translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_gettext(self):
|
||||||
|
_ = self.translations.gettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_ngettext(self):
|
||||||
|
_ = self.translations.ngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_lgettext(self):
|
||||||
|
_ = self.translations.lgettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.utf8_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_lngettext(self):
|
||||||
|
_ = self.translations.lngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.utf8_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.utf8_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.utf8_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.utf8_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.utf8_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.utf8_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
def test_ugettext(self):
|
||||||
|
_ = self.translations.ugettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.u_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.u_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.u_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.u_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.u_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.u_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.u_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.u_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
def test_ungettext(self):
|
||||||
|
_ = self.translations.ungettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.u_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.u_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.u_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.u_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.u_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.u_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.u_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.u_lemons)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.u_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
class TestFallbackNewGNURealTranslations_Latin1(TestFallbackNewGNURealTranslations_UTF8):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||||
|
self.translations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_lgettext(self):
|
||||||
|
_ = self.translations.lgettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.latin1_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.latin1_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.latin1_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.latin1_yes_in_fallback)
|
||||||
|
# This unfortunately does not encode to proper latin-1 because:
|
||||||
|
# any byte is valid in latin-1 so there's no way to know that what
|
||||||
|
# we're given in the string is really utf-8
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog), self.utf8_not_in_catalog)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.latin1_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.latin1_kuratomi)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.latin1_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.latin1_yes_in_fallback)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog), self.latin1_not_in_catalog)
|
||||||
|
|
||||||
|
def test_lngettext(self):
|
||||||
|
_ = self.translations.lngettext
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 1), self.latin1_limao)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 1), self.latin1_lemon)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 1), self.latin1_limao)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 1), self.latin1_lemon)
|
||||||
|
|
||||||
|
tools.eq_(_(self.utf8_lemon, self.utf8_lemons, 2), self.latin1_limoes)
|
||||||
|
tools.eq_(_(self.utf8_limao, self.utf8_limoes, 2), self.latin1_lemons)
|
||||||
|
tools.eq_(_(self.u_lemon, self.u_lemons, 2), self.latin1_limoes)
|
||||||
|
tools.eq_(_(self.u_limao, self.u_limoes, 2), self.latin1_lemons)
|
||||||
|
|
||||||
|
# This unfortunately does not encode to proper latin-1 because:
|
||||||
|
# any byte is valid in latin-1 so there's no way to know that what
|
||||||
|
# we're given in the string is really utf-8
|
||||||
|
tools.eq_(_(self.utf8_not_in_catalog, 'throwaway', 1), self.utf8_not_in_catalog)
|
||||||
|
tools.eq_(_(self.u_not_in_catalog, 'throwaway', 1), self.latin1_not_in_catalog)
|
||||||
|
|
||||||
|
|
||||||
|
class TestFallback(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.iso88591'
|
||||||
|
self.gtranslations = i18n.get_translation_object('test',
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
self.gtranslations.add_fallback(object())
|
||||||
|
self.dtranslations = i18n.get_translation_object('nonexistent',
|
||||||
|
['%s/data/locale/' % os.path.dirname(__file__),
|
||||||
|
'%s/data/locale-old' % os.path.dirname(__file__)])
|
||||||
|
self.dtranslations.add_fallback(object())
|
||||||
|
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
|
||||||
|
def test_invalid_fallback_no_raise(self):
|
||||||
|
'''Test when we have an invalid fallback that it does not raise.'''
|
||||||
|
tools.eq_(self.gtranslations.gettext(self.u_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(self.gtranslations.ugettext(self.u_spanish), self.u_spanish)
|
||||||
|
tools.eq_(self.gtranslations.lgettext(self.u_spanish), self.latin1_spanish)
|
||||||
|
|
||||||
|
tools.eq_(self.gtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
|
||||||
|
tools.eq_(self.gtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
|
||||||
|
tools.eq_(self.gtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
|
||||||
|
|
||||||
|
tools.eq_(self.dtranslations.gettext(self.u_spanish), self.utf8_spanish)
|
||||||
|
tools.eq_(self.dtranslations.ugettext(self.u_spanish), self.u_spanish)
|
||||||
|
tools.eq_(self.dtranslations.lgettext(self.u_spanish), self.latin1_spanish)
|
||||||
|
|
||||||
|
tools.eq_(self.dtranslations.ngettext(self.u_spanish, 'cde', 1), self.utf8_spanish)
|
||||||
|
tools.eq_(self.dtranslations.ungettext(self.u_spanish, 'cde', 1), self.u_spanish)
|
||||||
|
tools.eq_(self.dtranslations.lngettext(self.u_spanish, 'cde', 1), self.latin1_spanish)
|
||||||
|
|
||||||
|
|
||||||
|
class TestDefaultLocaleDir(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
|
def setUp(self):
|
||||||
|
self.old_LC_ALL = os.environ.get('LC_ALL', None)
|
||||||
|
os.environ['LC_ALL'] = 'pt_BR.utf8'
|
||||||
|
self.old_DEFAULT_LOCALEDIRS = i18n._DEFAULT_LOCALEDIR
|
||||||
|
i18n._DEFAULT_LOCALEDIR = '%s/data/locale/' % os.path.dirname(__file__)
|
||||||
|
self.translations = i18n.get_translation_object('test')
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
if self.old_LC_ALL:
|
||||||
|
os.environ['LC_ALL'] = self.old_LC_ALL
|
||||||
|
else:
|
||||||
|
del(os.environ['LC_ALL'])
|
||||||
|
if self.old_DEFAULT_LOCALEDIRS:
|
||||||
|
i18n._DEFAULT_LOCALEDIR = self.old_DEFAULT_LOCALEDIRS
|
||||||
|
|
||||||
|
def test_gettext(self):
|
||||||
|
_ = self.translations.gettext
|
||||||
|
tools.eq_(_(self.utf8_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.utf8_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.utf8_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
# Returns msgid because the string is in a fallback catalog which we
|
||||||
|
# haven't setup
|
||||||
|
tools.eq_(_(self.utf8_in_fallback), self.utf8_in_fallback)
|
||||||
|
|
||||||
|
tools.eq_(_(self.u_kitchen), self.utf8_pt_kitchen)
|
||||||
|
tools.eq_(_(self.u_kuratomi), self.utf8_ja_kuratomi)
|
||||||
|
tools.eq_(_(self.u_ja_kuratomi), self.utf8_kuratomi)
|
||||||
|
# Returns msgid because the string is in a fallback catalog which we
|
||||||
|
# haven't setup
|
||||||
|
tools.eq_(_(self.u_in_fallback), self.utf8_in_fallback)
|
||||||
|
|
||||||
|
|
|
@ -5,7 +5,7 @@ from nose import tools
|
||||||
|
|
||||||
from kitchen import iterutils
|
from kitchen import iterutils
|
||||||
|
|
||||||
class TestStrictDict(unittest.TestCase):
|
class TestIterutils(unittest.TestCase):
|
||||||
iterable_data = (
|
iterable_data = (
|
||||||
[0, 1, 2],
|
[0, 1, 2],
|
||||||
[],
|
[],
|
||||||
|
@ -40,6 +40,9 @@ class TestStrictDict(unittest.TestCase):
|
||||||
tools.ok_(iterutils.isiterable('a', include_string=True) == True)
|
tools.ok_(iterutils.isiterable('a', include_string=True) == True)
|
||||||
tools.ok_(iterutils.isiterable('a', include_string=False) == False)
|
tools.ok_(iterutils.isiterable('a', include_string=False) == False)
|
||||||
tools.ok_(iterutils.isiterable('a') == False)
|
tools.ok_(iterutils.isiterable('a') == False)
|
||||||
|
tools.ok_(iterutils.isiterable(u'a', include_string=True) == True)
|
||||||
|
tools.ok_(iterutils.isiterable(u'a', include_string=False) == False)
|
||||||
|
tools.ok_(iterutils.isiterable(u'a') == False)
|
||||||
|
|
||||||
def test_iterate(self):
|
def test_iterate(self):
|
||||||
iterutils.iterate(None)
|
iterutils.iterate(None)
|
||||||
|
@ -55,3 +58,5 @@ class TestStrictDict(unittest.TestCase):
|
||||||
# strings
|
# strings
|
||||||
tools.ok_(list(iterutils.iterate('abc')) == ['abc'])
|
tools.ok_(list(iterutils.iterate('abc')) == ['abc'])
|
||||||
tools.ok_(list(iterutils.iterate('abc', include_string=True)) == ['a', 'b', 'c'])
|
tools.ok_(list(iterutils.iterate('abc', include_string=True)) == ['a', 'b', 'c'])
|
||||||
|
tools.ok_(list(iterutils.iterate(u'abc')) == [u'abc'])
|
||||||
|
tools.ok_(list(iterutils.iterate(u'abc', include_string=True)) == [u'a', u'b', u'c'])
|
|
@ -1,6 +1,5 @@
|
||||||
import unittest
|
import unittest
|
||||||
from nose.plugins.skip import SkipTest
|
from nose.plugins.skip import SkipTest
|
||||||
from test import test_support
|
|
||||||
from kitchen.pycompat27.subprocess import _subprocess as subprocess
|
from kitchen.pycompat27.subprocess import _subprocess as subprocess
|
||||||
import sys
|
import sys
|
||||||
import StringIO
|
import StringIO
|
||||||
|
@ -45,9 +44,14 @@ def reap_children():
|
||||||
except:
|
except:
|
||||||
break
|
break
|
||||||
|
|
||||||
if not hasattr(test_support, 'reap_children'):
|
test_support = None
|
||||||
# No reap_children in python-2.3
|
try:
|
||||||
test_support.reap_children = reap_children
|
from test import test_support
|
||||||
|
if not hasattr(test_support, 'reap_children'):
|
||||||
|
# No reap_children in python-2.3
|
||||||
|
test_support.reap_children = reap_children
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
# In a debug build, stuff like "[6580 refs]" is printed to stderr at
|
# In a debug build, stuff like "[6580 refs]" is printed to stderr at
|
||||||
# shutdown time. That frustrates tests trying to check stderr produced
|
# shutdown time. That frustrates tests trying to check stderr produced
|
||||||
|
@ -79,7 +83,8 @@ class BaseTestCase(unittest.TestCase):
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
# Try to minimize the number of children we have so this test
|
# Try to minimize the number of children we have so this test
|
||||||
# doesn't crash on some buildbots (Alphas in particular).
|
# doesn't crash on some buildbots (Alphas in particular).
|
||||||
test_support.reap_children()
|
if test_support:
|
||||||
|
test_support.reap_children()
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
for inst in subprocess._active:
|
for inst in subprocess._active:
|
||||||
|
@ -596,6 +601,9 @@ class ProcessTestCase(BaseTestCase):
|
||||||
"line1\nline2\rline3\r\nline4\r\nline5\nline6")
|
"line1\nline2\rline3\r\nline4\r\nline5\nline6")
|
||||||
|
|
||||||
def test_no_leaking(self):
|
def test_no_leaking(self):
|
||||||
|
if not test_support:
|
||||||
|
raise SkipTest("No test_support module available.")
|
||||||
|
|
||||||
# Make sure we leak no resources
|
# Make sure we leak no resources
|
||||||
if not mswindows:
|
if not mswindows:
|
||||||
max_handles = 1026 # too much for most UNIX systems
|
max_handles = 1026 # too much for most UNIX systems
|
||||||
|
@ -1032,7 +1040,7 @@ class POSIXProcessTestCase(BaseTestCase):
|
||||||
stdin=stdin,
|
stdin=stdin,
|
||||||
stdout=subprocess.PIPE,
|
stdout=subprocess.PIPE,
|
||||||
stderr=subprocess.PIPE).communicate()
|
stderr=subprocess.PIPE).communicate()
|
||||||
err = re.sub(r"\[\d+ refs\]\r?\n?$", "", err).strip()
|
err = re.sub(r"\[\d+ refs\]\r?\n?$", "", err).strip()
|
||||||
self.assertEqual((out, err), ('apple', 'orange'))
|
self.assertEqual((out, err), ('apple', 'orange'))
|
||||||
finally:
|
finally:
|
||||||
for b, a in zip(newfds, fds):
|
for b, a in zip(newfds, fds):
|
||||||
|
@ -1123,6 +1131,8 @@ class POSIXProcessTestCase(BaseTestCase):
|
||||||
|
|
||||||
def test_wait_when_sigchild_ignored(self):
|
def test_wait_when_sigchild_ignored(self):
|
||||||
# NOTE: sigchild_ignore.py may not be an effective test on all OSes.
|
# NOTE: sigchild_ignore.py may not be an effective test on all OSes.
|
||||||
|
if not test_support:
|
||||||
|
raise SkipTest("No test_support module available.")
|
||||||
sigchild_ignore = test_support.findfile(os.path.join("subprocessdata",
|
sigchild_ignore = test_support.findfile(os.path.join("subprocessdata",
|
||||||
"sigchild_ignore.py"))
|
"sigchild_ignore.py"))
|
||||||
p = subprocess.Popen([sys.executable, sigchild_ignore],
|
p = subprocess.Popen([sys.executable, sigchild_ignore],
|
161
kitchen2/tests/test_text_display.py
Normal file
161
kitchen2/tests/test_text_display.py
Normal file
|
@ -0,0 +1,161 @@
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
#
|
||||||
|
import unittest
|
||||||
|
from nose import tools
|
||||||
|
|
||||||
|
from kitchen.text.exceptions import ControlCharError
|
||||||
|
|
||||||
|
from kitchen.text import display
|
||||||
|
|
||||||
|
import base_classes
|
||||||
|
|
||||||
|
class TestDisplay(base_classes.UnicodeTestData, unittest.TestCase):
|
||||||
|
|
||||||
|
def test_internal_interval_bisearch(self):
|
||||||
|
'''Test that we can find things in an interval table'''
|
||||||
|
table = ((0, 3), (5,7), (9, 10))
|
||||||
|
tools.assert_true(display._interval_bisearch(0, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(1, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(2, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(3, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(5, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(6, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(7, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(9, table))
|
||||||
|
tools.assert_true(display._interval_bisearch(10, table))
|
||||||
|
tools.assert_false(display._interval_bisearch(-1, table))
|
||||||
|
tools.assert_false(display._interval_bisearch(4, table))
|
||||||
|
tools.assert_false(display._interval_bisearch(8, table))
|
||||||
|
tools.assert_false(display._interval_bisearch(11, table))
|
||||||
|
|
||||||
|
def test_internal_generate_combining_table(self):
|
||||||
|
'''Test that the combining table we generate is equal to or a subseet of what's in the current table
|
||||||
|
|
||||||
|
If we assert it can mean one of two things:
|
||||||
|
|
||||||
|
1. The code is broken
|
||||||
|
2. The table we have is out of date.
|
||||||
|
'''
|
||||||
|
old_table = display._COMBINING
|
||||||
|
new_table = display._generate_combining_table()
|
||||||
|
for interval in new_table:
|
||||||
|
if interval[0] == interval[1]:
|
||||||
|
tools.assert_true(display._interval_bisearch(interval[0], old_table))
|
||||||
|
else:
|
||||||
|
for codepoint in xrange(interval[0], interval[1] + 1):
|
||||||
|
tools.assert_true(display._interval_bisearch(interval[0], old_table))
|
||||||
|
|
||||||
|
def test_internal_ucp_width(self):
|
||||||
|
'''Test that ucp_width returns proper width for characters'''
|
||||||
|
for codepoint in xrange(0, 0xFFFFF + 1):
|
||||||
|
if codepoint < 32 or (codepoint < 0xa0 and codepoint >= 0x7f):
|
||||||
|
# With strict on, we should raise an error
|
||||||
|
tools.assert_raises(ControlCharError, display._ucp_width, codepoint, 'strict')
|
||||||
|
|
||||||
|
if codepoint in (0x08, 0x1b, 0x7f, 0x94):
|
||||||
|
# Backspace, delete, clear delete remove one char
|
||||||
|
tools.eq_(display._ucp_width(codepoint), -1)
|
||||||
|
else:
|
||||||
|
# Everything else returns 0
|
||||||
|
tools.eq_(display._ucp_width(codepoint), 0)
|
||||||
|
elif display._interval_bisearch(codepoint, display._COMBINING):
|
||||||
|
# Combining character
|
||||||
|
tools.eq_(display._ucp_width(codepoint), 0)
|
||||||
|
elif (codepoint >= 0x1100 and
|
||||||
|
(codepoint <= 0x115f or # Hangul Jamo init. consonants
|
||||||
|
codepoint == 0x2329 or codepoint == 0x232a or
|
||||||
|
(codepoint >= 0x2e80 and codepoint <= 0xa4cf and
|
||||||
|
codepoint != 0x303f) or # CJK ... Yi
|
||||||
|
(codepoint >= 0xac00 and codepoint <= 0xd7a3) or # Hangul Syllables
|
||||||
|
(codepoint >= 0xf900 and codepoint <= 0xfaff) or # CJK Compatibility Ideographs
|
||||||
|
(codepoint >= 0xfe10 and codepoint <= 0xfe19) or # Vertical forms
|
||||||
|
(codepoint >= 0xfe30 and codepoint <= 0xfe6f) or # CJK Compatibility Forms
|
||||||
|
(codepoint >= 0xff00 and codepoint <= 0xff60) or # Fullwidth Forms
|
||||||
|
(codepoint >= 0xffe0 and codepoint <= 0xffe6) or
|
||||||
|
(codepoint >= 0x20000 and codepoint <= 0x2fffd) or
|
||||||
|
(codepoint >= 0x30000 and codepoint <= 0x3fffd))):
|
||||||
|
tools.eq_(display._ucp_width(codepoint), 2)
|
||||||
|
else:
|
||||||
|
tools.eq_(display._ucp_width(codepoint), 1)
|
||||||
|
|
||||||
|
def test_textual_width(self):
|
||||||
|
'''Test that we find the proper number of spaces that a utf8 string will consume'''
|
||||||
|
tools.eq_(display.textual_width(self.u_japanese), 31)
|
||||||
|
tools.eq_(display.textual_width(self.u_spanish), 50)
|
||||||
|
tools.eq_(display.textual_width(self.u_mixed), 23)
|
||||||
|
|
||||||
|
def test_textual_width_chop(self):
|
||||||
|
'''utf8_width_chop with byte strings'''
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 1000), self.u_mixed)
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 23), self.u_mixed)
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 22), self.u_mixed[:-1])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:-4])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 1), u'')
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 2), self.u_mixed[0])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 3), self.u_mixed[:2])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 4), self.u_mixed[:3])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 5), self.u_mixed[:4])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 6), self.u_mixed[:5])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 7), self.u_mixed[:5])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 8), self.u_mixed[:6])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 9), self.u_mixed[:7])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 10), self.u_mixed[:8])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 11), self.u_mixed[:9])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 12), self.u_mixed[:10])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 13), self.u_mixed[:10])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 14), self.u_mixed[:11])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 15), self.u_mixed[:12])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 16), self.u_mixed[:13])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 17), self.u_mixed[:14])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 18), self.u_mixed[:15])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 19), self.u_mixed[:15])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 20), self.u_mixed[:16])
|
||||||
|
tools.eq_(display.textual_width_chop(self.u_mixed, 21), self.u_mixed[:17])
|
||||||
|
|
||||||
|
def test_textual_width_fill(self):
|
||||||
|
'''Pad a utf8 string'''
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 1), self.u_mixed)
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25), self.u_mixed + u' ')
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25, left=False), u' ' + self.u_mixed)
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18), self.u_mixed[:-4] + u' ')
|
||||||
|
tools.eq_(display.textual_width_fill(self.u_mixed, 25, chop=18, prefix=self.u_spanish, suffix=self.u_spanish), self.u_spanish + self.u_mixed[:-4] + self.u_spanish + u' ')
|
||||||
|
|
||||||
|
def test_internal_textual_width_le(self):
|
||||||
|
test_data = ''.join([self.u_mixed, self.u_spanish])
|
||||||
|
tw = display.textual_width(test_data)
|
||||||
|
tools.eq_(display._textual_width_le(68, self.u_mixed, self.u_spanish), (tw <= 68))
|
||||||
|
tools.eq_(display._textual_width_le(69, self.u_mixed, self.u_spanish), (tw <= 69))
|
||||||
|
tools.eq_(display._textual_width_le(137, self.u_mixed, self.u_spanish), (tw <= 137))
|
||||||
|
tools.eq_(display._textual_width_le(138, self.u_mixed, self.u_spanish), (tw <= 138))
|
||||||
|
tools.eq_(display._textual_width_le(78, self.u_mixed, self.u_spanish), (tw <= 78))
|
||||||
|
tools.eq_(display._textual_width_le(79, self.u_mixed, self.u_spanish), (tw <= 79))
|
||||||
|
|
||||||
|
def test_wrap(self):
|
||||||
|
'''Test that text wrapping works'''
|
||||||
|
tools.eq_(display.wrap(self.u_mixed), [self.u_mixed])
|
||||||
|
tools.eq_(display.wrap(self.u_paragraph), self.u_paragraph_out)
|
||||||
|
tools.eq_(display.wrap(self.utf8_paragraph), self.u_paragraph_out)
|
||||||
|
tools.eq_(display.wrap(self.u_mixed_para), self.u_mixed_para_out)
|
||||||
|
tools.eq_(display.wrap(self.u_mixed_para, width=57,
|
||||||
|
initial_indent=' ', subsequent_indent='----'),
|
||||||
|
self.u_mixed_para_57_initial_subsequent_out)
|
||||||
|
|
||||||
|
def test_fill(self):
|
||||||
|
tools.eq_(display.fill(self.u_paragraph), u'\n'.join(self.u_paragraph_out))
|
||||||
|
tools.eq_(display.fill(self.utf8_paragraph), u'\n'.join(self.u_paragraph_out))
|
||||||
|
tools.eq_(display.fill(self.u_mixed_para), u'\n'.join(self.u_mixed_para_out))
|
||||||
|
tools.eq_(display.fill(self.u_mixed_para, width=57,
|
||||||
|
initial_indent=' ', subsequent_indent='----'),
|
||||||
|
u'\n'.join(self.u_mixed_para_57_initial_subsequent_out))
|
||||||
|
|
||||||
|
def test_byte_string_textual_width_fill(self):
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 1), self.utf8_mixed)
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25), self.utf8_mixed + ' ')
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, left=False), ' ' + self.utf8_mixed)
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18), self.u_mixed[:-4].encode('utf8') + ' ')
|
||||||
|
tools.eq_(display.byte_string_textual_width_fill(self.utf8_mixed, 25, chop=18, prefix=self.utf8_spanish, suffix=self.utf8_spanish), self.utf8_spanish + self.u_mixed[:-4].encode('utf8') + self.utf8_spanish + ' ')
|
||||||
|
|
|
@ -135,3 +135,19 @@ class TestTextMisc(unittest.TestCase, base_classes.UnicodeTestData):
|
||||||
'''Test that we return False with non-encoded chars'''
|
'''Test that we return False with non-encoded chars'''
|
||||||
tools.ok_(misc.byte_string_valid_encoding('\xff') == False)
|
tools.ok_(misc.byte_string_valid_encoding('\xff') == False)
|
||||||
tools.ok_(misc.byte_string_valid_encoding(self.euc_jp_japanese) == False)
|
tools.ok_(misc.byte_string_valid_encoding(self.euc_jp_japanese) == False)
|
||||||
|
|
||||||
|
class TestIsStringTypes(unittest.TestCase):
|
||||||
|
def test_isbasestring(self):
|
||||||
|
tools.assert_true(misc.isbasestring('abc'))
|
||||||
|
tools.assert_true(misc.isbasestring(u'abc'))
|
||||||
|
tools.assert_false(misc.isbasestring(5))
|
||||||
|
|
||||||
|
def test_isbytestring(self):
|
||||||
|
tools.assert_true(misc.isbytestring('abc'))
|
||||||
|
tools.assert_false(misc.isbytestring(u'abc'))
|
||||||
|
tools.assert_false(misc.isbytestring(5))
|
||||||
|
|
||||||
|
def test_isunicodestring(self):
|
||||||
|
tools.assert_false(misc.isunicodestring('abc'))
|
||||||
|
tools.assert_true(misc.isunicodestring(u'abc'))
|
||||||
|
tools.assert_false(misc.isunicodestring(5))
|
|
@ -56,7 +56,7 @@ class TestUTF8(base_classes.UnicodeTestData, unittest.TestCase):
|
||||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 22) == (22, self.u_mixed[:-1]))
|
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 22) == (22, self.u_mixed[:-1]))
|
||||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 19) == (18, self.u_mixed[:-4]))
|
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 19) == (18, self.u_mixed[:-4]))
|
||||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 2) == (2, self.u_mixed[0]))
|
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 2) == (2, self.u_mixed[0]))
|
||||||
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, ''))
|
tools.ok_(utf8.utf8_width_chop(self.u_mixed, 1) == (0, u''))
|
||||||
|
|
||||||
def test_utf8_width_fill(self):
|
def test_utf8_width_fill(self):
|
||||||
'''Pad a utf8 string'''
|
'''Pad a utf8 string'''
|
|
@ -1,6 +1,5 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
#
|
#
|
||||||
import unittest
|
|
||||||
from nose import tools
|
from nose import tools
|
||||||
|
|
||||||
from kitchen.versioning import version_tuple_to_string
|
from kitchen.versioning import version_tuple_to_string
|
||||||
|
@ -26,7 +25,7 @@ class TestVersionTuple(object):
|
||||||
}
|
}
|
||||||
|
|
||||||
def check_ver_tuple_to_str(self, v_tuple, v_str):
|
def check_ver_tuple_to_str(self, v_tuple, v_str):
|
||||||
tools.ok_(version_tuple_to_string(v_tuple) == v_str)
|
tools.eq_(version_tuple_to_string(v_tuple), v_str)
|
||||||
|
|
||||||
def test_version_tuple_to_string(self):
|
def test_version_tuple_to_string(self):
|
||||||
'''Test that version_tuple_to_string outputs PEP-386 compliant strings
|
'''Test that version_tuple_to_string outputs PEP-386 compliant strings
|
6
kitchen3/docs/api-collections.rst
Normal file
6
kitchen3/docs/api-collections.rst
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
===================
|
||||||
|
Kitchen.collections
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. automodule:: kitchen.collections.strictdict
|
||||||
|
:members:
|
12
kitchen3/docs/api-exceptions.rst
Normal file
12
kitchen3/docs/api-exceptions.rst
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
==========
|
||||||
|
Exceptions
|
||||||
|
==========
|
||||||
|
|
||||||
|
Kitchen has a hierarchy of exceptions that should make it easy to catch many
|
||||||
|
errors emitted by kitchen itself.
|
||||||
|
|
||||||
|
.. automodule:: kitchen.exceptions
|
||||||
|
:members:
|
||||||
|
|
||||||
|
.. automodule:: kitchen.text.exceptions
|
||||||
|
:members:
|
38
kitchen3/docs/api-i18n.rst
Normal file
38
kitchen3/docs/api-i18n.rst
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
===================
|
||||||
|
Kitchen.i18n Module
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. automodule:: kitchen.i18n
|
||||||
|
|
||||||
|
Functions
|
||||||
|
=========
|
||||||
|
|
||||||
|
:func:`easy_gettext_setup` should satisfy the needs of most users.
|
||||||
|
:func:`get_translation_object` is designed to ease the way for anyone that
|
||||||
|
needs more control.
|
||||||
|
|
||||||
|
.. autofunction:: easy_gettext_setup
|
||||||
|
|
||||||
|
.. autofunction:: get_translation_object
|
||||||
|
|
||||||
|
Translation Objects
|
||||||
|
===================
|
||||||
|
|
||||||
|
The standard translation objects from the :mod:`gettext` module suffer from
|
||||||
|
several problems:
|
||||||
|
|
||||||
|
* They can throw :exc:`UnicodeError`
|
||||||
|
* They can't find translations for non-:term:`ASCII` byte :class:`str`
|
||||||
|
messages
|
||||||
|
* They may return either :class:`unicode` string or byte :class:`str` from the
|
||||||
|
same function even though the functions say they will only return
|
||||||
|
:class:`unicode` or only return byte :class:`str`.
|
||||||
|
|
||||||
|
:class:`DummyTranslations` and :class:`NewGNUTranslations` were written to fix
|
||||||
|
these issues.
|
||||||
|
|
||||||
|
.. autoclass:: kitchen.i18n.DummyTranslations
|
||||||
|
:members:
|
||||||
|
|
||||||
|
.. autoclass:: kitchen.i18n.NewGNUTranslations
|
||||||
|
:members:
|
9
kitchen3/docs/api-iterutils.rst
Normal file
9
kitchen3/docs/api-iterutils.rst
Normal file
|
@ -0,0 +1,9 @@
|
||||||
|
|
||||||
|
========================
|
||||||
|
Kitchen.iterutils Module
|
||||||
|
========================
|
||||||
|
|
||||||
|
.. automodule:: kitchen.iterutils
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.iterutils.isiterable
|
||||||
|
.. autofunction:: kitchen.iterutils.iterate
|
24
kitchen3/docs/api-overview.rst
Normal file
24
kitchen3/docs/api-overview.rst
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
.. _KitchenAPI:
|
||||||
|
|
||||||
|
===========
|
||||||
|
Kitchen API
|
||||||
|
===========
|
||||||
|
|
||||||
|
Kitchen is structured as a collection of modules. In its current
|
||||||
|
configuration, Kitchen ships with the following modules. Other addon modules
|
||||||
|
that may drag in more dependencies can be found on the `project webpage`_
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
api-i18n
|
||||||
|
api-text
|
||||||
|
api-collections
|
||||||
|
api-iterutils
|
||||||
|
api-versioning
|
||||||
|
api-pycompat24
|
||||||
|
api-pycompat25
|
||||||
|
api-pycompat27
|
||||||
|
api-exceptions
|
||||||
|
|
||||||
|
.. _`project webpage`: https://fedorahosted.org/kitchen
|
34
kitchen3/docs/api-pycompat24.rst
Normal file
34
kitchen3/docs/api-pycompat24.rst
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
=======================
|
||||||
|
Python 2.4 Compatibiity
|
||||||
|
=======================
|
||||||
|
|
||||||
|
|
||||||
|
-------------------
|
||||||
|
Sets for python-2.3
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
.. automodule:: kitchen.pycompat24.sets
|
||||||
|
.. autofunction:: kitchen.pycompat24.sets.add_builtin_set
|
||||||
|
|
||||||
|
----------------------------------
|
||||||
|
Partial new style base64 interface
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
.. automodule:: kitchen.pycompat24.base64
|
||||||
|
:members:
|
||||||
|
|
||||||
|
----------
|
||||||
|
Subprocess
|
||||||
|
----------
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
:mod:`kitchen.pycompat27.subprocess`
|
||||||
|
Kitchen includes the python-2.7 version of subprocess which has a new
|
||||||
|
function, :func:`~kitchen.pycompat27.subprocess.check_output`. When
|
||||||
|
you import :mod:`pycompat24.subprocess` you will be getting the
|
||||||
|
python-2.7 version of subprocess rather than the 2.4 version (where
|
||||||
|
subprocess first appeared). This choice was made so that we can
|
||||||
|
concentrate our efforts on keeping the single version of subprocess up
|
||||||
|
to date rather than working on a 2.4 version that very few people
|
||||||
|
would need specifically.
|
8
kitchen3/docs/api-pycompat25.rst
Normal file
8
kitchen3/docs/api-pycompat25.rst
Normal file
|
@ -0,0 +1,8 @@
|
||||||
|
========================
|
||||||
|
Python 2.5 Compatibility
|
||||||
|
========================
|
||||||
|
|
||||||
|
.. automodule:: kitchen.pycompat25
|
||||||
|
|
||||||
|
.. automodule:: kitchen.pycompat25.collections.defaultdict
|
||||||
|
|
35
kitchen3/docs/api-pycompat27.rst
Normal file
35
kitchen3/docs/api-pycompat27.rst
Normal file
|
@ -0,0 +1,35 @@
|
||||||
|
========================
|
||||||
|
Python 2.7 Compatibility
|
||||||
|
========================
|
||||||
|
|
||||||
|
.. module:: kitchen.pycompat27.subprocess
|
||||||
|
|
||||||
|
--------------------------
|
||||||
|
Subprocess from Python 2.7
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The :mod:`subprocess` module included here is a direct import from
|
||||||
|
python-2.7's |stdlib|_. You can access it via::
|
||||||
|
|
||||||
|
>>> from kitchen.pycompat27 import subprocess
|
||||||
|
|
||||||
|
The motivation for including this module is that various API changing
|
||||||
|
improvements have been made to subprocess over time. The following is a list
|
||||||
|
of the known changes to :mod:`subprocess` with the python version they were
|
||||||
|
introduced in:
|
||||||
|
|
||||||
|
==================================== ===
|
||||||
|
New API Feature Ver
|
||||||
|
==================================== ===
|
||||||
|
:exc:`subprocess.CalledProcessError` 2.5
|
||||||
|
:func:`subprocess.check_call` 2.5
|
||||||
|
:func:`subprocess.check_output` 2.7
|
||||||
|
:meth:`subprocess.Popen.send_signal` 2.6
|
||||||
|
:meth:`subprocess.Popen.terminate` 2.6
|
||||||
|
:meth:`subprocess.Popen.kill` 2.6
|
||||||
|
==================================== ===
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
The stdlib :mod:`subprocess` documenation
|
||||||
|
For complete documentation on how to use subprocess
|
405
kitchen3/docs/api-text-converters.rst
Normal file
405
kitchen3/docs/api-text-converters.rst
Normal file
|
@ -0,0 +1,405 @@
|
||||||
|
-----------------------
|
||||||
|
Kitchen.text.converters
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
.. automodule:: kitchen.text.converters
|
||||||
|
|
||||||
|
Byte Strings and Unicode in Python2
|
||||||
|
===================================
|
||||||
|
|
||||||
|
Python2 has two string types, :class:`str` and :class:`unicode`.
|
||||||
|
:class:`unicode` represents an abstract sequence of text characters. It can
|
||||||
|
hold any character that is present in the unicode standard. :class:`str` can
|
||||||
|
hold any byte of data. The operating system and python work together to
|
||||||
|
display these bytes as characters in many cases but you should always keep in
|
||||||
|
mind that the information is really a sequence of bytes, not a sequence of
|
||||||
|
characters. In python2 these types are interchangeable a large amount of the
|
||||||
|
time. They are one of the few pairs of types that automatically convert when
|
||||||
|
used in equality::
|
||||||
|
|
||||||
|
>>> # string is converted to unicode and then compared
|
||||||
|
>>> "I am a string" == u"I am a string"
|
||||||
|
True
|
||||||
|
>>> # Other types, like int, don't have this special treatment
|
||||||
|
>>> 5 == "5"
|
||||||
|
False
|
||||||
|
|
||||||
|
However, this automatic conversion tends to lull people into a false sense of
|
||||||
|
security. As long as you're dealing with :term:`ASCII` characters the
|
||||||
|
automatic conversion will save you from seeing any differences. Once you
|
||||||
|
start using characters that are not in :term:`ASCII`, you will start getting
|
||||||
|
:exc:`UnicodeError` and :exc:`UnicodeWarning` as the automatic conversions
|
||||||
|
between the types fail::
|
||||||
|
|
||||||
|
>>> "I am an ñ" == u"I am an ñ"
|
||||||
|
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
|
||||||
|
False
|
||||||
|
|
||||||
|
Why do these conversions fail? The reason is that the python2
|
||||||
|
:class:`unicode` type represents an abstract sequence of unicode text known as
|
||||||
|
:term:`code points`. :class:`str`, on the other hand, really represents
|
||||||
|
a sequence of bytes. Those bytes are converted by your operating system to
|
||||||
|
appear as characters on your screen using a particular encoding (usually
|
||||||
|
with a default defined by the operating system and customizable by the
|
||||||
|
individual user.) Although :term:`ASCII` characters are fairly standard in
|
||||||
|
what bytes represent each character, the bytes outside of the :term:`ASCII`
|
||||||
|
range are not. In general, each encoding will map a different character to
|
||||||
|
a particular byte. Newer encodings map individual characters to multiple
|
||||||
|
bytes (which the older encodings will instead treat as multiple characters).
|
||||||
|
In the face of these differences, python refuses to guess at an encoding and
|
||||||
|
instead issues a warning or exception and refuses to convert.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
:ref:`overcoming-frustration`
|
||||||
|
For a longer introduction on this subject.
|
||||||
|
|
||||||
|
Strategy for Explicit Conversion
|
||||||
|
================================
|
||||||
|
|
||||||
|
So what is the best method of dealing with this weltering babble of incoherent
|
||||||
|
encodings? The basic strategy is to explicitly turn everything into
|
||||||
|
:class:`unicode` when it first enters your program. Then, when you send it to
|
||||||
|
output, you can transform the unicode back into bytes. Doing this allows you
|
||||||
|
to control the encodings that are used and avoid getting tracebacks due to
|
||||||
|
:exc:`UnicodeError`. Using the functions defined in this module, that looks
|
||||||
|
something like this:
|
||||||
|
|
||||||
|
.. code-block:: pycon
|
||||||
|
:linenos:
|
||||||
|
|
||||||
|
>>> from kitchen.text.converters import to_unicode, to_bytes
|
||||||
|
>>> name = raw_input('Enter your name: ')
|
||||||
|
Enter your name: Toshio くらとみ
|
||||||
|
>>> name
|
||||||
|
'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
|
||||||
|
>>> type(name)
|
||||||
|
<type 'str'>
|
||||||
|
>>> unicode_name = to_unicode(name)
|
||||||
|
>>> type(unicode_name)
|
||||||
|
<type 'unicode'>
|
||||||
|
>>> unicode_name
|
||||||
|
u'Toshio \u304f\u3089\u3068\u307f'
|
||||||
|
>>> # Do a lot of other things before needing to save/output again:
|
||||||
|
>>> output = open('datafile', 'w')
|
||||||
|
>>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))
|
||||||
|
|
||||||
|
A few notes:
|
||||||
|
|
||||||
|
Looking at line 6, you'll notice that the input we took from the user was
|
||||||
|
a byte :class:`str`. In general, anytime we're getting a value from outside
|
||||||
|
of python (The filesystem, reading data from the network, interacting with an
|
||||||
|
external command, reading values from the environment) we are interacting with
|
||||||
|
something that will want to give us a byte :class:`str`. Some |stdlib|_
|
||||||
|
modules and third party libraries will automatically attempt to convert a byte
|
||||||
|
:class:`str` to :class:`unicode` strings for you. This is both a boon and
|
||||||
|
a curse. If the library can guess correctly about the encoding that the data
|
||||||
|
is in, it will return :class:`unicode` objects to you without you having to
|
||||||
|
convert. However, if it can't guess correctly, you may end up with one of
|
||||||
|
several problems:
|
||||||
|
|
||||||
|
:exc:`UnicodeError`
|
||||||
|
The library attempted to decode a byte :class:`str` into
|
||||||
|
a :class:`unicode`, string failed, and raises an exception.
|
||||||
|
Garbled data
|
||||||
|
If the library returns the data after decoding it with the wrong encoding,
|
||||||
|
the characters you see in the :exc:`unicode` string won't be the ones that
|
||||||
|
you expect.
|
||||||
|
A byte :class:`str` instead of :class:`unicode` string
|
||||||
|
Some libraries will return a :class:`unicode` string when they're able to
|
||||||
|
decode the data and a byte :class:`str` when they can't. This is
|
||||||
|
generally the hardest problem to debug when it occurs. Avoid it in your
|
||||||
|
own code and try to avoid or open bugs against upstreams that do this. See
|
||||||
|
:ref:`DesigningUnicodeAwareAPIs` for strategies to do this properly.
|
||||||
|
|
||||||
|
On line 8, we convert from a byte :class:`str` to a :class:`unicode` string.
|
||||||
|
:func:`~kitchen.text.converters.to_unicode` does this for us. It has some
|
||||||
|
error handling and sane defaults that make this a nicer function to use than
|
||||||
|
calling :meth:`str.decode` directly:
|
||||||
|
|
||||||
|
* Instead of defaulting to the :term:`ASCII` encoding which fails with all
|
||||||
|
but the simple American English characters, it defaults to :term:`UTF-8`.
|
||||||
|
* Instead of raising an error if it cannot decode a value, it will replace
|
||||||
|
the value with the unicode "Replacement character" symbol (``<EFBFBD>``).
|
||||||
|
* If you happen to call this method with something that is not a :class:`str`
|
||||||
|
or :class:`unicode`, it will return an empty :class:`unicode` string.
|
||||||
|
|
||||||
|
All three of these can be overridden using different keyword arguments to the
|
||||||
|
function. See the :func:`to_unicode` documentation for more information.
|
||||||
|
|
||||||
|
On line 15 we push the data back out to a file. Two things you should note here:
|
||||||
|
|
||||||
|
1. We deal with the strings as :class:`unicode` until the last instant. The
|
||||||
|
string format that we're using is :class:`unicode` and the variable also
|
||||||
|
holds :class:`unicode`. People sometimes get into trouble when they mix
|
||||||
|
a byte :class:`str` format with a variable that holds a :class:`unicode`
|
||||||
|
string (or vice versa) at this stage.
|
||||||
|
2. :func:`~kitchen.text.converters.to_bytes`, does the reverse of
|
||||||
|
:func:`to_unicode`. In this case, we're using the default values which
|
||||||
|
turn :class:`unicode` into a byte :class:`str` using :term:`UTF-8`. Any
|
||||||
|
errors are replaced with a ``<EFBFBD>`` and sending nonstring objects yield empty
|
||||||
|
:class:`unicode` strings. Just like :func:`to_unicode`, you can look at
|
||||||
|
the documentation for :func:`to_bytes` to find out how to override any of
|
||||||
|
these defaults.
|
||||||
|
|
||||||
|
When to use an alternate strategy
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
The default strategy of decoding to :class:`unicode` strings when you take
|
||||||
|
data in and encoding to a byte :class:`str` when you send the data back out
|
||||||
|
works great for most problems but there are a few times when you shouldn't:
|
||||||
|
|
||||||
|
* The values aren't meant to be read as text
|
||||||
|
* The values need to be byte-for-byte when you send them back out -- for
|
||||||
|
instance if they are database keys or filenames.
|
||||||
|
* You are transferring the data between several libraries that all expect
|
||||||
|
byte :class:`str`.
|
||||||
|
|
||||||
|
In each of these instances, there is a reason to keep around the byte
|
||||||
|
:class:`str` version of a value. Here's a few hints to keep your sanity in
|
||||||
|
these situations:
|
||||||
|
|
||||||
|
1. Keep your :class:`unicode` and :class:`str` values separate. Just like the
|
||||||
|
pain caused when you have to use someone else's library that returns both
|
||||||
|
:class:`unicode` and :class:`str` you can cause yourself pain if you have
|
||||||
|
functions that can return both types or variables that could hold either
|
||||||
|
type of value.
|
||||||
|
2. Name your variables so that you can tell whether you're storing byte
|
||||||
|
:class:`str` or :class:`unicode` string. One of the first things you end
|
||||||
|
up having to do when debugging is determine what type of string you have in
|
||||||
|
a variable and what type of string you are expecting. Naming your
|
||||||
|
variables consistently so that you can tell which type they are supposed to
|
||||||
|
hold will save you from at least one of those steps.
|
||||||
|
3. When you get values initially, make sure that you're dealing with the type
|
||||||
|
of value that you expect as you save it. You can use :func:`isinstance`
|
||||||
|
or :func:`to_bytes` since :func:`to_bytes` doesn't do any modifications of
|
||||||
|
the string if it's already a :class:`str`. When using :func:`to_bytes`
|
||||||
|
for this purpose you might want to use::
|
||||||
|
|
||||||
|
try:
|
||||||
|
b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
|
||||||
|
except:
|
||||||
|
handle_errors_somehow()
|
||||||
|
|
||||||
|
The reason is that the default of :func:`to_bytes` will take characters
|
||||||
|
that are illegal in the chosen encoding and transform them to replacement
|
||||||
|
characters. Since the point of keeping this data as a byte :class:`str` is
|
||||||
|
to keep the exact same bytes when you send it outside of your code,
|
||||||
|
changing things to replacement characters should be rasing red flags that
|
||||||
|
something is wrong. Setting :attr:`errors` to ``strict`` will raise an
|
||||||
|
exception which gives you an opportunity to fail gracefully.
|
||||||
|
4. Sometimes you will want to print out the values that you have in your byte
|
||||||
|
:class:`str`. When you do this you will need to make sure that you
|
||||||
|
transform :class:`unicode` to :class:`str` before combining them. Also be
|
||||||
|
sure that any other function calls (including :mod:`gettext`) are going to
|
||||||
|
give you strings that are the same type. For instance::
|
||||||
|
|
||||||
|
print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}
|
||||||
|
|
||||||
|
Gotchas and how to avoid them
|
||||||
|
=============================
|
||||||
|
|
||||||
|
Even when you have a good conceptual understanding of how python2 treats
|
||||||
|
:class:`unicode` and :class:`str` there are still some things that can
|
||||||
|
surprise you. In most cases this is because, as noted earlier, python or one
|
||||||
|
of the python libraries you depend on is trying to convert a value
|
||||||
|
automatically and failing. Explicit conversion at the appropriate place
|
||||||
|
usually solves that.
|
||||||
|
|
||||||
|
str(obj)
|
||||||
|
--------
|
||||||
|
|
||||||
|
One common idiom for getting a simple, string representation of an object is to use::
|
||||||
|
|
||||||
|
str(obj)
|
||||||
|
|
||||||
|
Unfortunately, this is not safe. Sometimes str(obj) will return
|
||||||
|
:class:`unicode`. Sometimes it will return a byte :class:`str`. Sometimes,
|
||||||
|
it will attempt to convert from a :class:`unicode` string to a byte
|
||||||
|
:class:`str`, fail, and throw a :exc:`UnicodeError`. To be safe from all of
|
||||||
|
these, first decide whether you need :class:`unicode` or :class:`str` to be
|
||||||
|
returned. Then use :func:`to_unicode` or :func:`to_bytes` to get the simple
|
||||||
|
representation like this::
|
||||||
|
|
||||||
|
u_representation = to_unicode(obj, nonstring='simplerepr')
|
||||||
|
b_representation = to_bytes(obj, nonstring='simplerepr')
|
||||||
|
|
||||||
|
print
|
||||||
|
-----
|
||||||
|
|
||||||
|
python has a builtin :func:`print` statement that outputs strings to the
|
||||||
|
terminal. This originated in a time when python only dealt with byte
|
||||||
|
:class:`str`. When :class:`unicode` strings came about, some enhancements
|
||||||
|
were made to the :func:`print` statement so that it could print those as well.
|
||||||
|
The enhancements make :func:`print` work most of the time. However, the times
|
||||||
|
when it doesn't work tend to make for cryptic debugging.
|
||||||
|
|
||||||
|
The basic issue is that :func:`print` has to figure out what encoding to use
|
||||||
|
when it prints a :class:`unicode` string to the terminal. When python is
|
||||||
|
attached to your terminal (ie, you're running the interpreter or running
|
||||||
|
a script that prints to the screen) python is able to take the encoding value
|
||||||
|
from your locale settings :envvar:`LC_ALL` or :envvar:`LC_CTYPE` and print the
|
||||||
|
characters allowed by that encoding. On most modern Unix systems, the
|
||||||
|
encoding is :term:`utf-8` which means that you can print any :class:`unicode`
|
||||||
|
character without problem.
|
||||||
|
|
||||||
|
There are two common cases of things going wrong:
|
||||||
|
|
||||||
|
1. Someone has a locale set that does not accept all valid unicode characters.
|
||||||
|
For instance::
|
||||||
|
|
||||||
|
$ LC_ALL=C python
|
||||||
|
>>> print u'\ufffd'
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
|
||||||
|
|
||||||
|
This often happens when a script that you've written and debugged from the
|
||||||
|
terminal is run from an automated environment like :program:`cron`. It
|
||||||
|
also occurs when you have written a script using a :term:`utf-8` aware
|
||||||
|
locale and released it for consumption by people all over the internet.
|
||||||
|
Inevitably, someone is running with a locale that can't handle all unicode
|
||||||
|
characters and you get a traceback reported.
|
||||||
|
2. You redirect output to a file. Python isn't using the values in
|
||||||
|
:envvar:`LC_ALL` unconditionally to decide what encoding to use. Instead
|
||||||
|
it is using the encoding set for the terminal you are printing to which is
|
||||||
|
set to accept different encodings by :envvar:`LC_ALL`. If you redirect
|
||||||
|
to a file, you are no longer printing to the terminal so :envvar:`LC_ALL`
|
||||||
|
won't have any effect. At this point, python will decide it can't find an
|
||||||
|
encoding and fallback to :term:`ASCII` which will likely lead to
|
||||||
|
:exc:`UnicodeError` being raised. You can see this in a short script::
|
||||||
|
|
||||||
|
#! /usr/bin/python -tt
|
||||||
|
print u'\ufffd'
|
||||||
|
|
||||||
|
And then look at the difference between running it normally and redirecting to a file:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ ./test.py
|
||||||
|
<20>
|
||||||
|
$ ./test.py > t
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "test.py", line 3, in <module>
|
||||||
|
print u'\ufffd'
|
||||||
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
|
||||||
|
|
||||||
|
The short answer to dealing with this is to always use bytes when writing
|
||||||
|
output. You can do this by explicitly converting to bytes like this::
|
||||||
|
|
||||||
|
from kitchen.text.converters import to_bytes
|
||||||
|
u_string = u'\ufffd'
|
||||||
|
print to_bytes(u_string)
|
||||||
|
|
||||||
|
or you can wrap stdout and stderr with a :class:`~codecs.StreamWriter`.
|
||||||
|
A :class:`~codecs.StreamWriter` is convenient in that you can assign it to
|
||||||
|
encode for :data:`sys.stdout` or :data:`sys.stderr` and then have output
|
||||||
|
automatically converted but it has the drawback of still being able to throw
|
||||||
|
:exc:`UnicodeError` if the writer can't encode all possible unicode
|
||||||
|
codepoints. Kitchen provides an alternate version which can be retrieved with
|
||||||
|
:func:`kitchen.text.converters.getwriter` which will not traceback in its
|
||||||
|
standard configuration.
|
||||||
|
|
||||||
|
.. _unicode-and-dict-keys:
|
||||||
|
|
||||||
|
Unicode, str, and dict keys
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
The :func:`hash` of the :term:`ASCII` characters is the same for
|
||||||
|
:class:`unicode` and byte :class:`str`. When you use them in :class:`dict`
|
||||||
|
keys, they evaluate to the same dictionary slot::
|
||||||
|
|
||||||
|
>>> u_string = u'a'
|
||||||
|
>>> b_string = 'a'
|
||||||
|
>>> hash(u_string), hash(b_string)
|
||||||
|
(12416037344, 12416037344)
|
||||||
|
>>> d = {}
|
||||||
|
>>> d[u_string] = 'unicode'
|
||||||
|
>>> d[b_string] = 'bytes'
|
||||||
|
>>> d
|
||||||
|
{u'a': 'bytes'}
|
||||||
|
|
||||||
|
When you deal with key values outside of :term:`ASCII`, :class:`unicode` and
|
||||||
|
byte :class:`str` evaluate unequally no matter what their character content or
|
||||||
|
hash value::
|
||||||
|
|
||||||
|
>>> u_string = u'ñ'
|
||||||
|
>>> b_string = u_string.encode('utf-8')
|
||||||
|
>>> print u_string
|
||||||
|
ñ
|
||||||
|
>>> print b_string
|
||||||
|
ñ
|
||||||
|
>>> d = {}
|
||||||
|
>>> d[u_string] = 'unicode'
|
||||||
|
>>> d[b_string] = 'bytes'
|
||||||
|
>>> d
|
||||||
|
{u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
|
||||||
|
>>> b_string2 = '\\xf1'
|
||||||
|
>>> hash(u_string), hash(b_string2)
|
||||||
|
(30848092528, 30848092528)
|
||||||
|
>>> d = {}
|
||||||
|
>>> d[u_string] = 'unicode'
|
||||||
|
>>> d[b_string2] = 'bytes'
|
||||||
|
{u'\\xf1': 'unicode', '\\xf1': 'bytes'}
|
||||||
|
|
||||||
|
How do you work with this one? Remember rule #1: Keep your :class:`unicode`
|
||||||
|
and byte :class:`str` values separate. That goes for keys in a dictionary
|
||||||
|
just like anything else.
|
||||||
|
|
||||||
|
* For any given dictionary, make sure that all your keys are either
|
||||||
|
:class:`unicode` or :class:`str`. **Do not mix the two.** If you're being
|
||||||
|
given both :class:`unicode` and :class:`str` but you don't need to preserve
|
||||||
|
separate keys for each, I recommend using :func:`to_unicode` or
|
||||||
|
:func:`to_bytes` to convert all keys to one type or the other like this::
|
||||||
|
|
||||||
|
>>> from kitchen.text.converters import to_unicode
|
||||||
|
>>> u_string = u'one'
|
||||||
|
>>> b_string = 'two'
|
||||||
|
>>> d = {}
|
||||||
|
>>> d[to_unicode(u_string)] = 1
|
||||||
|
>>> d[to_unicode(b_string)] = 2
|
||||||
|
>>> d
|
||||||
|
{u'two': 2, u'one': 1}
|
||||||
|
|
||||||
|
* These issues also apply to using dicts with tuple keys that contain
|
||||||
|
a mixture of :class:`unicode` and :class:`str`. Once again the best fix
|
||||||
|
is to standardise on either :class:`str` or :class:`unicode`.
|
||||||
|
|
||||||
|
* If you absolutely need to store values in a dictionary where the keys could
|
||||||
|
be either :class:`unicode` or :class:`str` you can use
|
||||||
|
:class:`~kitchen.collections.strictdict.StrictDict` which has separate
|
||||||
|
entries for all :class:`unicode` and byte :class:`str` and deals correctly
|
||||||
|
with any :class:`tuple` containing mixed :class:`unicode` and byte
|
||||||
|
:class:`str`.
|
||||||
|
|
||||||
|
---------
|
||||||
|
Functions
|
||||||
|
---------
|
||||||
|
|
||||||
|
Unicode and byte str conversion
|
||||||
|
===============================
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.converters.to_unicode
|
||||||
|
.. autofunction:: kitchen.text.converters.to_bytes
|
||||||
|
.. autofunction:: kitchen.text.converters.getwriter
|
||||||
|
.. autofunction:: kitchen.text.converters.to_str
|
||||||
|
.. autofunction:: kitchen.text.converters.to_utf8
|
||||||
|
|
||||||
|
Transformation to XML
|
||||||
|
=====================
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.converters.unicode_to_xml
|
||||||
|
.. autofunction:: kitchen.text.converters.xml_to_unicode
|
||||||
|
.. autofunction:: kitchen.text.converters.byte_string_to_xml
|
||||||
|
.. autofunction:: kitchen.text.converters.xml_to_byte_string
|
||||||
|
.. autofunction:: kitchen.text.converters.bytes_to_xml
|
||||||
|
.. autofunction:: kitchen.text.converters.xml_to_bytes
|
||||||
|
.. autofunction:: kitchen.text.converters.guess_encoding_to_xml
|
||||||
|
.. autofunction:: kitchen.text.converters.to_xml
|
||||||
|
|
||||||
|
Working with exception messages
|
||||||
|
===============================
|
||||||
|
|
||||||
|
.. autodata:: kitchen.text.converters.EXCEPTION_CONVERTERS
|
||||||
|
.. autodata:: kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS
|
||||||
|
.. autofunction:: kitchen.text.converters.exception_to_unicode
|
||||||
|
.. autofunction:: kitchen.text.converters.exception_to_bytes
|
33
kitchen3/docs/api-text-display.rst
Normal file
33
kitchen3/docs/api-text-display.rst
Normal file
|
@ -0,0 +1,33 @@
|
||||||
|
.. automodule:: kitchen.text.display
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.textual_width
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.textual_width_chop
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.textual_width_fill
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.wrap
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.fill
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display.byte_string_textual_width_fill
|
||||||
|
|
||||||
|
Internal Data
|
||||||
|
=============
|
||||||
|
|
||||||
|
There are a few internal functions and variables in this module. Code outside
|
||||||
|
of kitchen shouldn't use them but people coding on kitchen itself may find
|
||||||
|
them useful.
|
||||||
|
|
||||||
|
.. autodata:: kitchen.text.display._COMBINING
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display._generate_combining_table
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display._print_combining_table
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display._interval_bisearch
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display._ucp_width
|
||||||
|
|
||||||
|
.. autofunction:: kitchen.text.display._textual_width_le
|
||||||
|
|
2
kitchen3/docs/api-text-misc.rst
Normal file
2
kitchen3/docs/api-text-misc.rst
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
.. automodule:: kitchen.text.misc
|
||||||
|
:members:
|
3
kitchen3/docs/api-text-utf8.rst
Normal file
3
kitchen3/docs/api-text-utf8.rst
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
.. automodule:: kitchen.text.utf8
|
||||||
|
:members:
|
||||||
|
:deprecated:
|
22
kitchen3/docs/api-text.rst
Normal file
22
kitchen3/docs/api-text.rst
Normal file
|
@ -0,0 +1,22 @@
|
||||||
|
=============================================
|
||||||
|
Kitchen.text: unicode and utf8 and xml oh my!
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
The kitchen.text module contains functions that deal with text manipulation.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
|
||||||
|
api-text-converters
|
||||||
|
api-text-display
|
||||||
|
api-text-misc
|
||||||
|
api-text-utf8
|
||||||
|
|
||||||
|
:mod:`~kitchen.text.converters`
|
||||||
|
deals with converting text for different encodings and to and from XML
|
||||||
|
:mod:`~kitchen.text.display`
|
||||||
|
deals with issues with printing text to a screen
|
||||||
|
:mod:`~kitchen.text.misc`
|
||||||
|
is a catchall for text manipulation functions that don't seem to fit
|
||||||
|
elsewhere
|
||||||
|
:mod:`~kitchen.text.utf8`
|
||||||
|
contains deprecated functions to manipulate utf8 byte strings
|
6
kitchen3/docs/api-versioning.rst
Normal file
6
kitchen3/docs/api-versioning.rst
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
===============================
|
||||||
|
Helpers for versioning software
|
||||||
|
===============================
|
||||||
|
|
||||||
|
.. automodule:: kitchen.versioning
|
||||||
|
:members:
|
220
kitchen3/docs/conf.py
Normal file
220
kitchen3/docs/conf.py
Normal file
|
@ -0,0 +1,220 @@
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
#
|
||||||
|
# Kitchen documentation build configuration file, created by
|
||||||
|
# sphinx-quickstart on Sat May 22 00:51:26 2010.
|
||||||
|
#
|
||||||
|
# This file is execfile()d with the current directory set to its containing dir.
|
||||||
|
#
|
||||||
|
# Note that not all possible configuration values are present in this
|
||||||
|
# autogenerated file.
|
||||||
|
#
|
||||||
|
# All configuration values have a default; values that are commented out
|
||||||
|
# serve to show the default.
|
||||||
|
|
||||||
|
import sys, os
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||||
|
import kitchen.release
|
||||||
|
|
||||||
|
# If extensions (or modules to document with autodoc) are in another directory,
|
||||||
|
# add these directories to sys.path here. If the directory is relative to the
|
||||||
|
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||||
|
#sys.path.append(os.path.abspath('.'))
|
||||||
|
|
||||||
|
# -- General configuration -----------------------------------------------------
|
||||||
|
|
||||||
|
# Add any Sphinx extension module names here, as strings. They can be extensions
|
||||||
|
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
|
||||||
|
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.pngmath', 'sphinx.ext.ifconfig']
|
||||||
|
|
||||||
|
# Add any paths that contain templates here, relative to this directory.
|
||||||
|
templates_path = ['_templates']
|
||||||
|
|
||||||
|
# The suffix of source filenames.
|
||||||
|
source_suffix = '.rst'
|
||||||
|
|
||||||
|
# The encoding of source files.
|
||||||
|
#source_encoding = 'utf-8'
|
||||||
|
|
||||||
|
# The master toctree document.
|
||||||
|
master_doc = 'index'
|
||||||
|
|
||||||
|
# General information about the project.
|
||||||
|
project = kitchen.release.NAME
|
||||||
|
copyright = kitchen.release.COPYRIGHT
|
||||||
|
|
||||||
|
# The version info for the project you're documenting, acts as replacement for
|
||||||
|
# |version| and |release|, also used in various other places throughout the
|
||||||
|
# built documents.
|
||||||
|
#
|
||||||
|
# The short X.Y version.
|
||||||
|
version = '0.2'
|
||||||
|
# The full version, including alpha/beta/rc tags.
|
||||||
|
release = kitchen.__version__
|
||||||
|
|
||||||
|
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||||
|
# for a list of supported languages.
|
||||||
|
language = 'en'
|
||||||
|
|
||||||
|
# There are two options for replacing |today|: either, you set today to some
|
||||||
|
# non-false value, then it is used:
|
||||||
|
#today = ''
|
||||||
|
# Else, today_fmt is used as the format for a strftime call.
|
||||||
|
#today_fmt = '%B %d, %Y'
|
||||||
|
|
||||||
|
# List of documents that shouldn't be included in the build.
|
||||||
|
#unused_docs = []
|
||||||
|
|
||||||
|
# List of directories, relative to source directory, that shouldn't be searched
|
||||||
|
# for source files.
|
||||||
|
exclude_trees = []
|
||||||
|
|
||||||
|
# The reST default role (used for this markup: `text`) to use for all documents.
|
||||||
|
#default_role = None
|
||||||
|
|
||||||
|
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||||
|
add_function_parentheses = True
|
||||||
|
|
||||||
|
# If true, the current module name will be prepended to all description
|
||||||
|
# unit titles (such as .. function::).
|
||||||
|
#add_module_names = True
|
||||||
|
|
||||||
|
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||||
|
# output. They are ignored by default.
|
||||||
|
show_authors = True
|
||||||
|
|
||||||
|
# The name of the Pygments (syntax highlighting) style to use.
|
||||||
|
pygments_style = 'sphinx'
|
||||||
|
|
||||||
|
# A list of ignored prefixes for module index sorting.
|
||||||
|
#modindex_common_prefix = []
|
||||||
|
|
||||||
|
highlight_language = 'python'
|
||||||
|
|
||||||
|
# -- Options for HTML output ---------------------------------------------------
|
||||||
|
|
||||||
|
# The theme to use for HTML and HTML Help pages. Major themes that come with
|
||||||
|
# Sphinx are currently 'default' and 'sphinxdoc'.
|
||||||
|
html_theme = 'default'
|
||||||
|
|
||||||
|
# Theme options are theme-specific and customize the look and feel of a theme
|
||||||
|
# further. For a list of options available for each theme, see the
|
||||||
|
# documentation.
|
||||||
|
#html_theme_options = {}
|
||||||
|
|
||||||
|
# Add any paths that contain custom themes here, relative to this directory.
|
||||||
|
#html_theme_path = []
|
||||||
|
|
||||||
|
# The name for this set of Sphinx documents. If None, it defaults to
|
||||||
|
# "<project> v<release> documentation".
|
||||||
|
#html_title = None
|
||||||
|
|
||||||
|
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||||
|
#html_short_title = None
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top
|
||||||
|
# of the sidebar.
|
||||||
|
#html_logo = None
|
||||||
|
|
||||||
|
# The name of an image file (within the static path) to use as favicon of the
|
||||||
|
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||||
|
# pixels large.
|
||||||
|
#html_favicon = None
|
||||||
|
|
||||||
|
# Add any paths that contain custom static files (such as style sheets) here,
|
||||||
|
# relative to this directory. They are copied after the builtin static files,
|
||||||
|
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||||
|
html_static_path = ['_static']
|
||||||
|
|
||||||
|
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||||
|
# using the given strftime format.
|
||||||
|
#html_last_updated_fmt = '%b %d, %Y'
|
||||||
|
|
||||||
|
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||||
|
# typographically correct entities.
|
||||||
|
#html_use_smartypants = True
|
||||||
|
|
||||||
|
# Content template for the index page.
|
||||||
|
html_index = 'index.html'
|
||||||
|
|
||||||
|
# Custom sidebar templates, maps document names to template names.
|
||||||
|
#html_sidebars = {}
|
||||||
|
|
||||||
|
# Additional templates that should be rendered to pages, maps page names to
|
||||||
|
# template names.
|
||||||
|
#html_additional_pages = {}
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
#html_use_modindex = True
|
||||||
|
|
||||||
|
# If false, no index is generated.
|
||||||
|
#html_use_index = True
|
||||||
|
|
||||||
|
# If true, the index is split into individual pages for each letter.
|
||||||
|
#html_split_index = False
|
||||||
|
|
||||||
|
# If true, links to the reST sources are added to the pages.
|
||||||
|
#html_show_sourcelink = True
|
||||||
|
|
||||||
|
# If true, an OpenSearch description file will be output, and all pages will
|
||||||
|
# contain a <link> tag referring to it. The value of this option must be the
|
||||||
|
# base URL from which the finished HTML is served.
|
||||||
|
html_use_opensearch = kitchen.release.DOWNLOAD_URL + 'docs/'
|
||||||
|
|
||||||
|
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
|
||||||
|
#html_file_suffix = ''
|
||||||
|
|
||||||
|
# Output file base name for HTML help builder.
|
||||||
|
htmlhelp_basename = 'kitchendoc'
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for LaTeX output --------------------------------------------------
|
||||||
|
|
||||||
|
# The paper size ('letter' or 'a4').
|
||||||
|
#latex_paper_size = 'letter'
|
||||||
|
|
||||||
|
# The font size ('10pt', '11pt' or '12pt').
|
||||||
|
#latex_font_size = '10pt'
|
||||||
|
|
||||||
|
# Grouping the document tree into LaTeX files. List of tuples
|
||||||
|
# (source start file, target name, title, author, documentclass [howto/manual]).
|
||||||
|
latex_documents = [
|
||||||
|
('index', 'kitchen.tex', 'kitchen Documentation',
|
||||||
|
'Toshio Kuratomi', 'manual'),
|
||||||
|
]
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top of
|
||||||
|
# the title page.
|
||||||
|
#latex_logo = None
|
||||||
|
|
||||||
|
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||||
|
# not chapters.
|
||||||
|
#latex_use_parts = False
|
||||||
|
|
||||||
|
# Additional stuff for the LaTeX preamble.
|
||||||
|
#latex_preamble = ''
|
||||||
|
|
||||||
|
# Documents to append as an appendix to all manuals.
|
||||||
|
#latex_appendices = []
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
#latex_use_modindex = True
|
||||||
|
|
||||||
|
automodule_skip_lines = 4
|
||||||
|
autoclass_content = "class"
|
||||||
|
|
||||||
|
# Example configuration for intersphinx: refer to the Python standard library.
|
||||||
|
intersphinx_mapping = {'http://docs.python.org/': None,
|
||||||
|
'https://fedorahosted.org/releases/p/y/python-fedora/doc/': None,
|
||||||
|
'https://fedorahosted.org/releases/p/a/packagedb/doc/': None}
|
||||||
|
|
||||||
|
rst_epilog = '''
|
||||||
|
.. |projpage| replace:: project webpage
|
||||||
|
.. _projpage: %(url)s
|
||||||
|
.. |docpage| replace:: documentation page
|
||||||
|
.. _docpage: %(download)s/docs
|
||||||
|
.. |downldpage| replace:: download page
|
||||||
|
.. _downldpage: %(download)s
|
||||||
|
.. |stdlib| replace:: python standard library
|
||||||
|
.. _stdlib: http://docs.python.org/library
|
||||||
|
''' % {'url': kitchen.release.URL, 'download': kitchen.release.DOWNLOAD_URL}
|
690
kitchen3/docs/designing-unicode-apis.rst
Normal file
690
kitchen3/docs/designing-unicode-apis.rst
Normal file
|
@ -0,0 +1,690 @@
|
||||||
|
.. _DesigningUnicodeAwareAPIs:
|
||||||
|
|
||||||
|
============================
|
||||||
|
Designing Unicode Aware APIs
|
||||||
|
============================
|
||||||
|
|
||||||
|
APIs that deal with byte :class:`str` and :class:`unicode` strings are
|
||||||
|
difficult to get right. Here are a few strategies with pros and cons of each.
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
|
||||||
|
-------------------------------------------------
|
||||||
|
Take either bytes or unicode, output only unicode
|
||||||
|
-------------------------------------------------
|
||||||
|
|
||||||
|
In this strategy, you allow the user to enter either :class:`unicode` strings
|
||||||
|
or byte :class:`str` but what you give back is always :class:`unicode`. This
|
||||||
|
strategy is easy for novice endusers to start using immediately as they will
|
||||||
|
be able to feed either type of string into the function and get back a string
|
||||||
|
that they can use in other places.
|
||||||
|
|
||||||
|
However, it does lead to the novice writing code that functions correctly when
|
||||||
|
testing it with :term:`ASCII`-only data but fails when given data that contains
|
||||||
|
non-:term:`ASCII` characters. Worse, if your API is not designed to be
|
||||||
|
flexible, the consumer of your code won't be able to easily correct those
|
||||||
|
problems once they find them.
|
||||||
|
|
||||||
|
Here's a good API that uses this strategy::
|
||||||
|
|
||||||
|
from kitchen.text.converters import to_unicode
|
||||||
|
|
||||||
|
def truncate(msg, max_length, encoding='utf8', errors='replace'):
|
||||||
|
msg = to_unicode(msg, encoding, errors)
|
||||||
|
return msg[:max_length]
|
||||||
|
|
||||||
|
The call to :func:`truncate` starts with the essential parameters for
|
||||||
|
performing the task. It ends with two optional keyword arguments that define
|
||||||
|
the encoding to use to transform from a byte :class:`str` to :class:`unicode`
|
||||||
|
and the strategy to use if undecodable bytes are encountered. The defaults
|
||||||
|
may vary depending on the use cases you have in mind. When the output is
|
||||||
|
generally going to be printed for the user to see, ``errors='replace'`` is
|
||||||
|
a good default. If you are constructing keys to a database, raisng an
|
||||||
|
exception (with ``errors='strict'``) may be a better default. In either case,
|
||||||
|
having both parameters allows the person using your API to choose how they
|
||||||
|
want to handle any problems. Having the values is also a clue to them that
|
||||||
|
a conversion from byte :class:`str` to :class:`unicode` string is going to
|
||||||
|
occur.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you're targeting python-3.1 and above, ``errors='surrogateescape'`` may
|
||||||
|
be a better default than ``errors='strict'``. You need to be mindful of
|
||||||
|
a few things when using ``surrogateescape`` though:
|
||||||
|
|
||||||
|
* ``surrogateescape`` will cause issues if a non-:term:`ASCII` compatible
|
||||||
|
encoding is used (for instance, UTF-16 and UTF-32.) That makes it
|
||||||
|
unhelpful in situations where a true general purpose method of encoding
|
||||||
|
must be found. :pep:`383` mentions that ``surrogateescape`` was
|
||||||
|
specifically designed with the limitations of translating using system
|
||||||
|
locales (where :term:`ASCII` compatibility is generally seen as
|
||||||
|
inescapable) so you should keep that in mind.
|
||||||
|
* If you use ``surrogateescape`` to decode from :class:`bytes`
|
||||||
|
to :class:`unicode` you will need to use an error handler other than
|
||||||
|
``strict`` to encode as the lone surrogate that this error handler
|
||||||
|
creates makes for invalid unicode that must be handled when encoding.
|
||||||
|
In Python-3.1.2 or less, a bug in the encoder error handlers mean that
|
||||||
|
you can only use ``surrogateescape`` to encode; anything else will throw
|
||||||
|
an error.
|
||||||
|
|
||||||
|
Evaluate your usages of the variables in question to see what makes sense.
|
||||||
|
|
||||||
|
Here's a bad example of using this strategy::
|
||||||
|
|
||||||
|
from kitchen.text.converters import to_unicode
|
||||||
|
|
||||||
|
def truncate(msg, max_length):
|
||||||
|
msg = to_unicode(msg)
|
||||||
|
return msg[:max_length]
|
||||||
|
|
||||||
|
In this example, we don't have the optional keyword arguments for
|
||||||
|
:attr:`encoding` and :attr:`errors`. A user who uses this function is more
|
||||||
|
likely to miss the fact that a conversion from byte :class:`str` to
|
||||||
|
:class:`unicode` is going to occur. And once an error is reported, they will
|
||||||
|
have to look through their backtrace and think harder about where they want to
|
||||||
|
transform their data into :class:`unicode` strings instead of having the
|
||||||
|
opportunity to control how the conversion takes place in the function itself.
|
||||||
|
Note that the user does have the ability to make this work by making the
|
||||||
|
transformation to unicode themselves::
|
||||||
|
|
||||||
|
from kitchen.text.converters import to_unicode
|
||||||
|
|
||||||
|
msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
|
||||||
|
new_msg = truncate(msg, 5)
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
Take either bytes or unicode, output the same type
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
This strategy is sometimes called polymorphic because the type of data that is
|
||||||
|
returned is dependent on the type of data that is received. The concept is
|
||||||
|
that when you are given a byte :class:`str` to process, you return a byte
|
||||||
|
:class:`str` in your output. When you are given :class:`unicode` strings to
|
||||||
|
process, you return :class:`unicode` strings in your output.
|
||||||
|
|
||||||
|
This can work well for end users as the ones that know about the difference
|
||||||
|
between the two string types will already have transformed the strings to
|
||||||
|
their desired type before giving it to this function. The ones that don't can
|
||||||
|
remain blissfully ignorant (at least, as far as your function is concerned) as
|
||||||
|
the function does not change the type.
|
||||||
|
|
||||||
|
In cases where the encoding of the byte :class:`str` is known or can be
|
||||||
|
discovered based on the input data this works well. If you can't figure out
|
||||||
|
the input encoding, however, this strategy can fail in any of the following
|
||||||
|
cases:
|
||||||
|
|
||||||
|
1. It needs to do an internal conversion between byte :class:`str` and
|
||||||
|
:class:`unicode` string.
|
||||||
|
2. It cannot return the same data as either a :class:`unicode` string or byte
|
||||||
|
:class:`str`.
|
||||||
|
3. You may need to deal with byte strings that are not byte-compatible with
|
||||||
|
:term:`ASCII`
|
||||||
|
|
||||||
|
First, a couple examples of using this strategy in a good way::
|
||||||
|
|
||||||
|
def translate(msg, table):
|
||||||
|
replacements = table.keys()
|
||||||
|
new_msg = []
|
||||||
|
for index, char in enumerate(msg):
|
||||||
|
if char in replacements:
|
||||||
|
new_msg.append(table[char])
|
||||||
|
else:
|
||||||
|
new_msg.append(char)
|
||||||
|
|
||||||
|
return ''.join(new_msg)
|
||||||
|
|
||||||
|
In this example, all of the strings that we use (except the empty string which
|
||||||
|
is okay because it doesn't have any characters to encode) come from outside of
|
||||||
|
the function. Due to that, the user is responsible for making sure that the
|
||||||
|
:attr:`msg`, and the keys and values in :attr:`table` all match in terms of
|
||||||
|
type (:class:`unicode` vs :class:`str`) and encoding (You can do some error
|
||||||
|
checking to make sure the user gave all the same type but you can't do the
|
||||||
|
same for the user giving different encodings). You do not need to make
|
||||||
|
changes to the string that require you to know the encoding or type of the
|
||||||
|
string; everything is a simple replacement of one element in the array of
|
||||||
|
characters in message with the character in table.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
import json
|
||||||
|
from kitchen.text.converters import to_unicode, to_bytes
|
||||||
|
|
||||||
|
def first_field_from_json_data(json_string):
|
||||||
|
'''Return the first field in a json data structure.
|
||||||
|
|
||||||
|
The format of the json data is a simple list of strings.
|
||||||
|
'["one", "two", "three"]'
|
||||||
|
'''
|
||||||
|
if isinstance(json_string, unicode):
|
||||||
|
# On all python versions, json.loads() returns unicode if given
|
||||||
|
# a unicode string
|
||||||
|
return json.loads(json_string)[0]
|
||||||
|
|
||||||
|
# Byte str: figure out which encoding we're dealing with
|
||||||
|
if '\x00' not in json_data[:2]
|
||||||
|
encoding = 'utf8'
|
||||||
|
elif '\x00\x00\x00' == json_data[:3]:
|
||||||
|
encoding = 'utf-32-be'
|
||||||
|
elif '\x00\x00\x00' == json_data[1:4]:
|
||||||
|
encoding = 'utf-32-le'
|
||||||
|
elif '\x00' == json_data[0] and '\x00' == json_data[2]:
|
||||||
|
encoding = 'utf-16-be'
|
||||||
|
else:
|
||||||
|
encoding = 'utf-16-le'
|
||||||
|
|
||||||
|
data = json.loads(unicode(json_string, encoding))
|
||||||
|
return data[0].encode(encoding)
|
||||||
|
|
||||||
|
In this example the function takes either a byte :class:`str` type or
|
||||||
|
a :class:`unicode` string that has a list in json format and returns the first
|
||||||
|
field from it as the type of the input string. The first section of code is
|
||||||
|
very straightforward; we receive a :class:`unicode` string, parse it with
|
||||||
|
a function, and then return the first field from our parsed data (which our
|
||||||
|
function returned to us as json data).
|
||||||
|
|
||||||
|
The second portion that deals with byte :class:`str` is not so
|
||||||
|
straightforward. Before we can parse the string we have to determine what
|
||||||
|
characters the bytes in the string map to. If we didn't do that, we wouldn't
|
||||||
|
be able to properly find which characters are present in the string. In order
|
||||||
|
to do that we have to figure out the encoding of the byte :class:`str`.
|
||||||
|
Luckily, the json specification states that all strings are unicode and
|
||||||
|
encoded with one of UTF32be, UTF32le, UTF16be, UTF16le, or :term:`UTF-8`. It further
|
||||||
|
defines the format such that the first two characters are always
|
||||||
|
:term:`ASCII`. Each of these has a different sequence of NULLs when they
|
||||||
|
encode an :term:`ASCII` character. We can use that to detect which encoding
|
||||||
|
was used to create the byte :class:`str`.
|
||||||
|
|
||||||
|
Finally, we return the byte :class:`str` by encoding the :class:`unicode` back
|
||||||
|
to a byte :class:`str`.
|
||||||
|
|
||||||
|
As you can see, in this example we have to convert from byte :class:`str` to
|
||||||
|
:class:`unicode` and back. But we know from the json specification that byte
|
||||||
|
:class:`str` has to be one of a limited number of encodings that we are able
|
||||||
|
to detect. That ability makes this strategy work.
|
||||||
|
|
||||||
|
Now for some examples of using this strategy in ways that fail::
|
||||||
|
|
||||||
|
import unicodedata
|
||||||
|
def first_char(msg):
|
||||||
|
'''Return the first character in a string'''
|
||||||
|
if not isinstance(msg, unicode):
|
||||||
|
try:
|
||||||
|
msg = unicode(msg, 'utf8')
|
||||||
|
except UnicodeError:
|
||||||
|
msg = unicode(msg, 'latin1')
|
||||||
|
msg = unicodedata.normalize('NFC', msg)
|
||||||
|
return msg[0]
|
||||||
|
|
||||||
|
If you look at that code and think that there's something fragile and prone to
|
||||||
|
breaking in the ``try: except:`` block you are correct in being suspicious.
|
||||||
|
This code will fail on multi-byte character sets that aren't :term:`UTF-8`. It
|
||||||
|
can also fail on data where the sequence of bytes is valid :term:`UTF-8` but
|
||||||
|
the bytes are actually of a different encoding. The reasons this code fails
|
||||||
|
is that we don't know what encoding the bytes are in and the code must convert
|
||||||
|
from a byte :class:`str` to a :class:`unicode` string in order to function.
|
||||||
|
|
||||||
|
In order to make this code robust we must know the encoding of :attr:`msg`.
|
||||||
|
The only way to know that is to ask the user so the API must do that::
|
||||||
|
|
||||||
|
import unicodedata
|
||||||
|
def number_of_chars(msg, encoding='utf8', errors='strict'):
|
||||||
|
if not isinstance(msg, unicode):
|
||||||
|
msg = unicode(msg, encoding, errors)
|
||||||
|
msg = unicodedata.normalize('NFC', msg)
|
||||||
|
return len(msg)
|
||||||
|
|
||||||
|
Another example of failure::
|
||||||
|
|
||||||
|
import os
|
||||||
|
def listdir(directory):
|
||||||
|
files = os.listdir(directory)
|
||||||
|
if isinstance(directory, str):
|
||||||
|
return files
|
||||||
|
# files could contain both bytes and unicode
|
||||||
|
new_files = []
|
||||||
|
for filename in files:
|
||||||
|
if not isinstance(filename, unicode):
|
||||||
|
# What to do here?
|
||||||
|
continue
|
||||||
|
new_files.appen(filename)
|
||||||
|
return new_files
|
||||||
|
|
||||||
|
This function illustrates the second failure mode. Here, not all of the
|
||||||
|
possible values can be represented as :class:`unicode` without knowing more
|
||||||
|
about the encoding of each of the filenames involved. Since each filename
|
||||||
|
could have a different encoding there's a few different options to pursue. We
|
||||||
|
could make this function always return byte :class:`str` since that can
|
||||||
|
accurately represent anything that could be returned. If we want to return
|
||||||
|
:class:`unicode` we need to at least allow the user to specify what to do in
|
||||||
|
case of an error decoding the bytes to :class:`unicode`. We can also let the
|
||||||
|
user specify the encoding to use for doing the decoding but that won't help in
|
||||||
|
all cases since not all files will be in the same encoding (or even
|
||||||
|
necessarily in any encoding)::
|
||||||
|
|
||||||
|
import locale
|
||||||
|
import os
|
||||||
|
def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
|
||||||
|
# Note: In python-3.1+, surrogateescape may be a better default
|
||||||
|
files = os.listdir(directory)
|
||||||
|
if isinstance(directory, str):
|
||||||
|
return files
|
||||||
|
new_files = []
|
||||||
|
for filename in files:
|
||||||
|
if not isinstance(filename, unicode):
|
||||||
|
filename = unicode(filename, encoding=encoding, errors=errors)
|
||||||
|
new_files.append(filename)
|
||||||
|
return new_files
|
||||||
|
|
||||||
|
Note that although we use :attr:`errors` in this example as what to pass to
|
||||||
|
the codec that decodes to :class:`unicode` we could also have an
|
||||||
|
:attr:`errors` argument that decides other things to do like skip a filename
|
||||||
|
entirely, return a placeholder (``Nondisplayable filename``), or raise an
|
||||||
|
exception.
|
||||||
|
|
||||||
|
This leaves us with one last failure to describe::
|
||||||
|
|
||||||
|
def first_field(csv_string):
|
||||||
|
'''Return the first field in a comma separated values string.'''
|
||||||
|
try:
|
||||||
|
return csv_string[:csv_string.index(',')]
|
||||||
|
except ValueError:
|
||||||
|
return csv_string
|
||||||
|
|
||||||
|
This code looks simple enough. The hidden error here is that we are searching
|
||||||
|
for a comma character in a byte :class:`str` but not all encodings will use
|
||||||
|
the same sequence of bytes to represent the comma. If you use an encoding
|
||||||
|
that's not :term:`ASCII` compatible on the byte level, then the literal comma
|
||||||
|
``','`` in the above code will match inappropriate bytes. Some examples of
|
||||||
|
how it can fail:
|
||||||
|
|
||||||
|
* Will find the byte representing an :term:`ASCII` comma in another character
|
||||||
|
* Will find the comma but leave trailing garbage bytes on the end of the
|
||||||
|
string
|
||||||
|
* Will not match the character that represents the comma in this encoding
|
||||||
|
|
||||||
|
There are two ways to solve this. You can either take the encoding value from
|
||||||
|
the user or you can take the separator value from the user. Of the two,
|
||||||
|
taking the encoding is the better option for two reasons:
|
||||||
|
|
||||||
|
1. Taking a separator argument doesn't clearly document for the API user that
|
||||||
|
the reason they must give it is to properly match the encoding of the
|
||||||
|
:attr:`csv_string`. They're just as likely to think that it's simply a way
|
||||||
|
to specify an alternate character (like ":" or "|") for the separator.
|
||||||
|
2. It's possible for a variable width encoding to reuse the same byte sequence
|
||||||
|
for different characters in multiple sequences.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
:term:`UTF-8` is resistant to this as any character's sequence of
|
||||||
|
bytes will never be a subset of another character's sequence of bytes.
|
||||||
|
|
||||||
|
With that in mind, here's how to improve the API::
|
||||||
|
|
||||||
|
def first_field(csv_string, encoding='utf-8', errors='replace'):
|
||||||
|
if not isinstance(csv_string, unicode):
|
||||||
|
u_string = unicode(csv_string, encoding, errors)
|
||||||
|
is_unicode = False
|
||||||
|
else:
|
||||||
|
u_string = csv_string
|
||||||
|
|
||||||
|
try:
|
||||||
|
field = u_string[:U_string.index(u',')]
|
||||||
|
except ValueError:
|
||||||
|
return csv_string
|
||||||
|
|
||||||
|
if not is_unicode:
|
||||||
|
field = field.encode(encoding, errors)
|
||||||
|
return field
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you decide you'll never encounter a variable width encoding that reuses
|
||||||
|
byte sequences you can use this code instead::
|
||||||
|
|
||||||
|
def first_field(csv_string, encoding='utf-8'):
|
||||||
|
try:
|
||||||
|
return csv_string[:csv_string.index(','.encode(encoding))]
|
||||||
|
except ValueError:
|
||||||
|
return csv_string
|
||||||
|
|
||||||
|
------------------
|
||||||
|
Separate functions
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Sometimes you want to be able to take either byte :class:`str` or
|
||||||
|
:class:`unicode` strings, perform similar operations on either one and then
|
||||||
|
return data in the same format as was given. Probably the easiest way to do
|
||||||
|
that is to have separate functions for each and adopt a naming convention to
|
||||||
|
show that one is for working with byte :class:`str` and the other is for
|
||||||
|
working with :class:`unicode` strings::
|
||||||
|
|
||||||
|
def translate_b(msg, table):
|
||||||
|
'''Replace values in str with other byte values like unicode.translate'''
|
||||||
|
if not isinstance(msg, str):
|
||||||
|
raise TypeError('msg must be of type str')
|
||||||
|
str_table = [chr(s) for s in xrange(0,256)]
|
||||||
|
delete_chars = []
|
||||||
|
for chr_val in (k for k in table.keys() if isinstance(k, int)):
|
||||||
|
if chr_val > 255:
|
||||||
|
raise ValueError('Keys in table must not exceed 255)')
|
||||||
|
if table[chr_val] == None:
|
||||||
|
delete_chars.append(chr(chr_val))
|
||||||
|
elif isinstance(table[chr_val], int):
|
||||||
|
if table[chr_val] > 255:
|
||||||
|
raise TypeError('table values cannot be more than 255 or less than 0')
|
||||||
|
str_table[chr_val] = chr(table[chr_val])
|
||||||
|
else:
|
||||||
|
if not isinstance(table[chr_val], str):
|
||||||
|
raise TypeError('character mapping must return integer, None or str')
|
||||||
|
str_table[chr_val] = table[chr_val]
|
||||||
|
str_table = ''.join(str_table)
|
||||||
|
delete_chars = ''.join(delete_chars)
|
||||||
|
return msg.translate(str_table, delete_chars)
|
||||||
|
|
||||||
|
def translate(msg, table):
|
||||||
|
'''Replace values in a unicode string with other values'''
|
||||||
|
if not isinstance(msg, unicode):
|
||||||
|
raise TypeError('msg must be of type unicode')
|
||||||
|
return msg.translate(table)
|
||||||
|
|
||||||
|
There's several things that we have to do in this API:
|
||||||
|
|
||||||
|
* Because the function names might not be enough of a clue to the user of the
|
||||||
|
functions of the value types that are expected, we have to check that the
|
||||||
|
types are correct.
|
||||||
|
|
||||||
|
* We keep the behaviour of the two functions as close to the same as possible,
|
||||||
|
just with byte :class:`str` and :class:`unicode` strings substituted for
|
||||||
|
each other.
|
||||||
|
|
||||||
|
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
Deciding whether to take str or unicode when no value is returned
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
|
||||||
|
Not all functions have a return value. Sometimes a function is there to
|
||||||
|
interact with something external to python, for instance, writing a file out
|
||||||
|
to disk or a method exists to update the internal state of a data structure.
|
||||||
|
One of the main questions with these APIs is whether to take byte
|
||||||
|
:class:`str`, :class:`unicode` string, or both. The answer depends on your
|
||||||
|
use case but I'll give some examples here.
|
||||||
|
|
||||||
|
Writing to external data
|
||||||
|
========================
|
||||||
|
|
||||||
|
When your information is going to an external data source like writing to
|
||||||
|
a file you need to decide whether to take in :class:`unicode` strings or byte
|
||||||
|
:class:`str`. Remember that most external data sources are not going to be
|
||||||
|
dealing with unicode directly. Instead, they're going to be dealing with
|
||||||
|
a sequence of bytes that may be interpreted as unicode. With that in mind,
|
||||||
|
you either need to have the user give you a byte :class:`str` or convert to
|
||||||
|
a byte :class:`str` inside the function.
|
||||||
|
|
||||||
|
Next you need to think about the type of data that you're receiving. If it's
|
||||||
|
textual data, (for instance, this is a chat client and the user is typing
|
||||||
|
messages that they expect to be read by another person) it probably makes sense to
|
||||||
|
take in :class:`unicode` strings and do the conversion inside your function.
|
||||||
|
On the other hand, if this is a lower level function that's passing data into
|
||||||
|
a network socket, it probably should be taking byte :class:`str` instead.
|
||||||
|
|
||||||
|
Just as noted in the API notes above, you should specify an :attr:`encoding`
|
||||||
|
and :attr:`errors` argument if you need to transform from :class:`unicode`
|
||||||
|
string to byte :class:`str` and you are unable to guess the encoding from the
|
||||||
|
data itself.
|
||||||
|
|
||||||
|
Updating data structures
|
||||||
|
========================
|
||||||
|
|
||||||
|
Sometimes your API is just going to update a data structure and not
|
||||||
|
immediately output that data anywhere. Just as when writing external data,
|
||||||
|
you should think about both what your function is going to do with the data
|
||||||
|
eventually and what the caller of your function is thinking that they're
|
||||||
|
giving you. Most of the time, you'll want to take :class:`unicode` strings
|
||||||
|
and enter them into the data structure as :class:`unicode` when the data is
|
||||||
|
textual in nature. You'll want to take byte :class:`str` and enter them into
|
||||||
|
the data structure as byte :class:`str` when the data is not text. Use
|
||||||
|
a naming convention so the user knows what's expected.
|
||||||
|
|
||||||
|
-------------
|
||||||
|
APIs to Avoid
|
||||||
|
-------------
|
||||||
|
|
||||||
|
There are a few APIs that are just wrong. If you catch yourself making an API
|
||||||
|
that does one of these things, change it before anyone sees your code.
|
||||||
|
|
||||||
|
Returning unicode unless a conversion fails
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
This type of API usually deals with byte :class:`str` at some point and
|
||||||
|
converts it to :class:`unicode` because it's usually thought to be text.
|
||||||
|
However, there are times when the bytes fail to convert to a :class:`unicode`
|
||||||
|
string. When that happens, this API returns the raw byte :class:`str` instead
|
||||||
|
of a :class:`unicode` string. One example of this is present in the |stdlib|_:
|
||||||
|
python2's :func:`os.listdir`::
|
||||||
|
|
||||||
|
>>> import os
|
||||||
|
>>> import locale
|
||||||
|
>>> locale.getpreferredencoding()
|
||||||
|
'UTF-8'
|
||||||
|
>>> os.mkdir('/tmp/mine')
|
||||||
|
>>> os.chdir('/tmp/mine')
|
||||||
|
>>> open('nonsense_char_\xff', 'w').close()
|
||||||
|
>>> open('all_ascii', 'w').close()
|
||||||
|
>>> os.listdir(u'.')
|
||||||
|
[u'all_ascii', 'nonsense_char_\xff']
|
||||||
|
|
||||||
|
The problem with APIs like this is that they cause failures that are hard to
|
||||||
|
debug because they don't happen where the variables are set. For instance,
|
||||||
|
let's say you take the filenames from :func:`os.listdir` and give it to this
|
||||||
|
function::
|
||||||
|
|
||||||
|
def normalize_filename(filename):
|
||||||
|
'''Change spaces and dashes into underscores'''
|
||||||
|
return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})
|
||||||
|
|
||||||
|
When you test this, you use filenames that all are decodable in your preferred
|
||||||
|
encoding and everything seems to work. But when this code is run on a machine
|
||||||
|
that has filenames in multiple encodings the filenames returned by
|
||||||
|
:func:`os.listdir` suddenly include byte :class:`str`. And byte :class:`str`
|
||||||
|
has a different :func:`string.translate` function that takes different values.
|
||||||
|
So the code raises an exception where it's not immediately obvious that
|
||||||
|
:func:`os.listdir` is at fault.
|
||||||
|
|
||||||
|
Ignoring values with no chance of recovery
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
An early version of python3 attempted to fix the :func:`os.listdir` problem
|
||||||
|
pointed out in the last section by returning all values that were decodable to
|
||||||
|
:class:`unicode` and omitting the filenames that were not. This lead to the
|
||||||
|
following output::
|
||||||
|
|
||||||
|
>>> import os
|
||||||
|
>>> import locale
|
||||||
|
>>> locale.getpreferredencoding()
|
||||||
|
'UTF-8'
|
||||||
|
>>> os.mkdir('/tmp/mine')
|
||||||
|
>>> os.chdir('/tmp/mine')
|
||||||
|
>>> open(b'nonsense_char_\xff', 'w').close()
|
||||||
|
>>> open('all_ascii', 'w').close()
|
||||||
|
>>> os.listdir('.')
|
||||||
|
['all_ascii']
|
||||||
|
|
||||||
|
The issue with this type of code is that it is silently doing something
|
||||||
|
surprising. The caller expects to get a full list of files back from
|
||||||
|
:func:`os.listdir`. Instead, it silently ignores some of the files, returning
|
||||||
|
only a subset. This leads to code that doesn't do what is expected that may
|
||||||
|
go unnoticed until the code is in production and someone notices that
|
||||||
|
something important is being missed.
|
||||||
|
|
||||||
|
Raising a UnicodeException with no chance of recovery
|
||||||
|
=====================================================
|
||||||
|
|
||||||
|
Believe it or not, a few libraries exist that make it impossible to deal
|
||||||
|
with unicode text without raising a :exc:`UnicodeError`. What seems to occur
|
||||||
|
in these libraries is that the library has functions that expect to receive
|
||||||
|
a :class:`unicode` string. However, internally, those functions call other
|
||||||
|
functions that expect to receive a byte :class:`str`. The programmer of the
|
||||||
|
API was smart enough to convert from a :class:`unicode` string to a byte
|
||||||
|
:class:`str` but they did not give the user the chance to specify the
|
||||||
|
encodings to use or how to deal with errors. This results in exceptions when
|
||||||
|
the user passes in a byte :class:`str` because the initial function wants
|
||||||
|
a :class:`unicode` string and exceptions when the user passes in
|
||||||
|
a :class:`unicode` string because the function can't convert the string to
|
||||||
|
bytes in the encoding that it's selected.
|
||||||
|
|
||||||
|
Do not put the user in the position of not being able to use your API without
|
||||||
|
raising a :exc:`UnicodeError` with certain values. If you can only safely
|
||||||
|
take :class:`unicode` strings, document that byte :class:`str` is not allowed
|
||||||
|
and vice versa. If you have to convert internally, make sure to give the
|
||||||
|
caller of your function parameters to control the encoding and how to treat
|
||||||
|
errors that may occur during the encoding/decoding process. If your code will
|
||||||
|
raise a :exc:`UnicodeError` with non-:term:`ASCII` values no matter what, you
|
||||||
|
should probably rethink your API.
|
||||||
|
|
||||||
|
-----------------
|
||||||
|
Knowing your data
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
If you've read all the way down to this section without skipping you've seen
|
||||||
|
several admonitions about the type of data you are processing affecting the
|
||||||
|
viability of the various API choices.
|
||||||
|
|
||||||
|
Here's a few things to consider in your data:
|
||||||
|
|
||||||
|
Do you need to operate on both bytes and unicode?
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
Much of the data in libraries, programs, and the general environment outside
|
||||||
|
of python is written where strings are sequences of bytes. So when we
|
||||||
|
interact with data that comes from outside of python or data that is about to
|
||||||
|
leave python it may make sense to only operate on the data as a byte
|
||||||
|
:class:`str`. There's two times when this may make sense:
|
||||||
|
|
||||||
|
1. The user is intended to hand the data to the function and then the function
|
||||||
|
takes care of sending the data outside of python (to the filesystem, over
|
||||||
|
the network, etc).
|
||||||
|
2. The data is not representable as text. For instance, writing a binary
|
||||||
|
file format.
|
||||||
|
|
||||||
|
Even when your code is operating in this area you still need to think a little
|
||||||
|
more about your data. For instance, it might make sense for the person using
|
||||||
|
your API to pass in :class:`unicode` strings and let the function convert that
|
||||||
|
into the byte :class:`str` that it then sends over the wire.
|
||||||
|
|
||||||
|
There are also times when it might make sense to operate only on
|
||||||
|
:class:`unicode` strings. :class:`unicode` represents text so anytime that
|
||||||
|
you are working on textual data that isn't going to leave python it has the
|
||||||
|
potential to be a :class:`unicode`-only API. However, there's two things that
|
||||||
|
you should consider when designing a :class:`unicode`-only API:
|
||||||
|
|
||||||
|
1. As your API gains popularity, people are going to use your API in places
|
||||||
|
that you may not have thought of. Corner cases in these other places may
|
||||||
|
mean that processing bytes is desirable.
|
||||||
|
2. In python2, byte :class:`str` and :class:`unicode` are often used
|
||||||
|
interchangably with each other. That means that people programming against
|
||||||
|
your API may have received :class:`str` from some other API and it would be
|
||||||
|
most convenient for their code if your API accepted it.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
In python3, the separation between the text type and the byte type
|
||||||
|
are more clear. So in python3, there's less need to have all APIs take
|
||||||
|
both unicode and bytes.
|
||||||
|
|
||||||
|
Can you restrict the encodings?
|
||||||
|
===============================
|
||||||
|
If you determine that you have to deal with byte :class:`str` you should
|
||||||
|
realize that not all encodings are created equal. Each has different
|
||||||
|
properties that may make it possible to provide a simpler API provided that
|
||||||
|
you can reasonably tell the users of your API that they cannot use certain
|
||||||
|
classes of encodings.
|
||||||
|
|
||||||
|
As one example, if you are required to find a comma (``,``) in a byte
|
||||||
|
:class:`str` you have different choices based on what encodings are allowed.
|
||||||
|
If you can reasonably restrict your API users to only giving :term:`ASCII
|
||||||
|
compatible` encodings you can do this simply by searching for the literal
|
||||||
|
comma character because that character will be represented by the same byte
|
||||||
|
sequence in all :term:`ASCII compatible` encodings.
|
||||||
|
|
||||||
|
The following are some classes of encodings to be aware of as you decide how
|
||||||
|
generic your code needs to be.
|
||||||
|
|
||||||
|
Single byte encodings
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Single byte encodings can only represent 256 total characters. They encode
|
||||||
|
the :term:`code points` for a character to the equivalent number in a single
|
||||||
|
byte.
|
||||||
|
|
||||||
|
Most single byte encodings are :term:`ASCII compatible`. :term:`ASCII
|
||||||
|
compatible` encodings are the most likely to be usable without changes to code
|
||||||
|
so this is good news. A notable exception to this is the `EBDIC
|
||||||
|
<http://en.wikipedia.org/wiki/Extended_Binary_Coded_Decimal_Interchange_Code>`_
|
||||||
|
family of encodings.
|
||||||
|
|
||||||
|
Multibyte encodings
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Multibyte encodings use more than one byte to encode some characters.
|
||||||
|
|
||||||
|
Fixed width
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
Fixed width encodings have a set number of bytes to represent all of the
|
||||||
|
characters in the character set. ``UTF-32`` is an example of a fixed width
|
||||||
|
encoding that uses four bytes per character and can express every unicode
|
||||||
|
characters. There are a number of problems with writing APIs that need to
|
||||||
|
operate on fixed width, multibyte characters. To go back to our earlier
|
||||||
|
example of finding a comma in a string, we have to realize that even in
|
||||||
|
``UTF-32`` where the :term:`code point` for :term:`ASCII` characters is the
|
||||||
|
same as in :term:`ASCII`, the byte sequence for them is different. So you
|
||||||
|
cannot search for the literal byte character as it may pick up false
|
||||||
|
positives and may break a byte sequence in an odd place.
|
||||||
|
|
||||||
|
Variable Width
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
ASCII compatible
|
||||||
|
""""""""""""""""
|
||||||
|
|
||||||
|
:term:`UTF-8` and the `EUC <http://en.wikipedia.org/wiki/Extended_Unix_Code>`_
|
||||||
|
family of encodings are examples of :term:`ASCII compatible` multi-byte
|
||||||
|
encodings. They achieve this by adhering to two principles:
|
||||||
|
|
||||||
|
* All of the :term:`ASCII` characters are represented by the byte that they
|
||||||
|
are in the :term:`ASCII` encoding.
|
||||||
|
* None of the :term:`ASCII` byte sequences are reused in any other byte
|
||||||
|
sequence for a different character.
|
||||||
|
|
||||||
|
Escaped
|
||||||
|
"""""""
|
||||||
|
|
||||||
|
Some multibyte encodings work by using only bytes from the :term:`ASCII`
|
||||||
|
encoding but when a particular sequence of those byes is found, they are
|
||||||
|
interpreted as meaning something other than their :term:`ASCII` values.
|
||||||
|
``UTF-7`` is one such encoding that can encode all of the unicode
|
||||||
|
:term:`code points`. For instance, here's a some Japanese characters encoded as
|
||||||
|
``UTF-7``::
|
||||||
|
|
||||||
|
>>> a = u'\u304f\u3089\u3068\u307f'
|
||||||
|
>>> print a
|
||||||
|
くらとみ
|
||||||
|
>>> print a.encode('utf-7')
|
||||||
|
+ME8wiTBoMH8-
|
||||||
|
|
||||||
|
These encodings can be used when you need to encode unicode data that may
|
||||||
|
contain non-:term:`ASCII` characters for inclusion in an :term:`ASCII` only
|
||||||
|
transport medium or file.
|
||||||
|
|
||||||
|
However, they are not :term:`ASCII compatible` in the sense that we used
|
||||||
|
earlier as the bytes that represent a :term:`ASCII` character are being reused
|
||||||
|
as part of other characters. If you were to search for a literal plus sign in
|
||||||
|
this encoded string, you would run across many false positives, for instance.
|
||||||
|
|
||||||
|
Other
|
||||||
|
"""""
|
||||||
|
|
||||||
|
There are many other popular variable width encodings, for instance ``UTF-16``
|
||||||
|
and ``shift-JIS``. Many of these are not :term:`ASCII compatible` so you
|
||||||
|
cannot search for a literal :term:`ASCII` character without danger of false
|
||||||
|
positives or false negatives.
|
107
kitchen3/docs/glossary.rst
Normal file
107
kitchen3/docs/glossary.rst
Normal file
|
@ -0,0 +1,107 @@
|
||||||
|
========
|
||||||
|
Glossary
|
||||||
|
========
|
||||||
|
|
||||||
|
.. glossary::
|
||||||
|
|
||||||
|
"Everything but the kitchen sink"
|
||||||
|
An English idiom meaning to include nearly everything that you can
|
||||||
|
think of.
|
||||||
|
|
||||||
|
API version
|
||||||
|
Version that is meant for computer consumption. This version is
|
||||||
|
parsable and comparable by computers. It contains information about
|
||||||
|
a library's API so that computer software can decide whether it works
|
||||||
|
with the software.
|
||||||
|
|
||||||
|
ASCII
|
||||||
|
A character encoding that maps numbers to characters essential to
|
||||||
|
American English. It maps 128 characters using 7bits.
|
||||||
|
|
||||||
|
.. seealso:: http://en.wikipedia.org/wiki/ASCII
|
||||||
|
|
||||||
|
ASCII compatible
|
||||||
|
An encoding in which the particular byte that maps to a character in
|
||||||
|
the :term:`ASCII` character set is only used to map to that character.
|
||||||
|
This excludes EBDIC based encodings and many multi-byte fixed and
|
||||||
|
variable width encodings since they reuse the bytes that make up the
|
||||||
|
:term:`ASCII` encoding for other purposes. :term:`UTF-8` is notable
|
||||||
|
as a variable width encoding that is :term:`ASCII` compatible.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
http://en.wikipedia.org/wiki/Variable-width_encoding
|
||||||
|
For another explanation of various ways bytes are mapped to
|
||||||
|
characters in a possibly incompatible manner.
|
||||||
|
|
||||||
|
code points
|
||||||
|
:term:`code point`
|
||||||
|
|
||||||
|
code point
|
||||||
|
A number that maps to a particular abstract character. Code points
|
||||||
|
make it so that we have a number pointing to a character without
|
||||||
|
worrying about implementation details of how those numbers are stored
|
||||||
|
for the computer to read. Encodings define how the code points map to
|
||||||
|
particular sequences of bytes on disk and in memory.
|
||||||
|
|
||||||
|
control characters
|
||||||
|
:term:`control character`
|
||||||
|
|
||||||
|
control character
|
||||||
|
The set of characters in unicode that are used, not to display glyphs
|
||||||
|
on the screen, but to tell the display in program to do something.
|
||||||
|
|
||||||
|
.. seealso:: http://en.wikipedia.org/wiki/Control_character
|
||||||
|
|
||||||
|
grapheme
|
||||||
|
characters or pieces of characters that you might write on a page to
|
||||||
|
make words, sentences, or other pieces of text.
|
||||||
|
|
||||||
|
.. seealso:: http://en.wikipedia.org/wiki/Grapheme
|
||||||
|
|
||||||
|
I18N
|
||||||
|
I18N is an abbreviation for internationalization. It's often used to
|
||||||
|
signify the need to translate words, number and date formats, and
|
||||||
|
other pieces of data in a computer program so that it will work well
|
||||||
|
for people who speak another language than yourself.
|
||||||
|
|
||||||
|
message catalogs
|
||||||
|
:term:`message catalog`
|
||||||
|
|
||||||
|
message catalog
|
||||||
|
Message catalogs contain translations for user-visible strings that
|
||||||
|
are present in your code. Normally, you need to mark the strings to
|
||||||
|
be translated by wrapping them in one of several :mod:`gettext`
|
||||||
|
functions. The function serves two purposes:
|
||||||
|
|
||||||
|
1. It allows automated tools to find which strings are supposed to be
|
||||||
|
extracted for translation.
|
||||||
|
2. The functions perform the translation when the program is running.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
`babel's documentation
|
||||||
|
<http://babel.edgewall.org/wiki/Documentation/messages.html>`_
|
||||||
|
for one method of extracting message catalogs from source
|
||||||
|
code.
|
||||||
|
|
||||||
|
Murphy's Law
|
||||||
|
"Anything that can go wrong, will go wrong."
|
||||||
|
|
||||||
|
.. seealso:: http://en.wikipedia.org/wiki/Murphy%27s_Law
|
||||||
|
|
||||||
|
release version
|
||||||
|
Version that is meant for human consumption. This version is easy for
|
||||||
|
a human to look at to decide how a particular version relates to other
|
||||||
|
versions of the software.
|
||||||
|
|
||||||
|
textual width
|
||||||
|
The amount of horizontal space a character takes up on a monospaced
|
||||||
|
screen. The units are number of character cells or columns that it
|
||||||
|
takes the place of.
|
||||||
|
|
||||||
|
UTF-8
|
||||||
|
A character encoding that maps all unicode :term:`code points` to a sequence
|
||||||
|
of bytes. It is compatible with :term:`ASCII`. It uses a variable
|
||||||
|
number of bytes to encode all of unicode. ASCII characters take one
|
||||||
|
byte. Characters from other parts of unicode take two to four bytes.
|
||||||
|
It is widespread as an encoding on the internet and in Linux.
|
359
kitchen3/docs/hacking.rst
Normal file
359
kitchen3/docs/hacking.rst
Normal file
|
@ -0,0 +1,359 @@
|
||||||
|
=======================================
|
||||||
|
Conventions for contributing to kitchen
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
-----
|
||||||
|
Style
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Strive to be :pep:`8` compliant
|
||||||
|
* Run `:command:`pylint` ` over the code and try to resolve most of its nitpicking
|
||||||
|
|
||||||
|
------------------------
|
||||||
|
Python 2.4 compatibility
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
At the moment, we're supporting python-2.4 and above. Understand that there's
|
||||||
|
a lot of python features that we cannot use because of this.
|
||||||
|
|
||||||
|
Sometimes modules in the |stdlib|_ can be added to kitchen so that they're
|
||||||
|
available. When we do that we need to be careful of several things:
|
||||||
|
|
||||||
|
1. Keep the module in sync with the version in the python-2.x trunk. Use
|
||||||
|
:file:`maintainers/sync-copied-files.py` for this.
|
||||||
|
2. Sync the unittests as well as the module.
|
||||||
|
3. Be aware that not all modules are written to remain compatible with
|
||||||
|
Python-2.4 and might use python language features that were not present
|
||||||
|
then (generator expressions, relative imports, decorators, with, try: with
|
||||||
|
both except: and finally:, etc) These are not good candidates for
|
||||||
|
importing into kitchen as they require more work to keep synced.
|
||||||
|
|
||||||
|
---------
|
||||||
|
Unittests
|
||||||
|
---------
|
||||||
|
|
||||||
|
* At least smoketest your code (make sure a function will return expected
|
||||||
|
values for one set of inputs).
|
||||||
|
* Note that even 100% coverage is not a guarantee of working code! Good tests
|
||||||
|
will realize that you need to also give multiple inputs that test the code
|
||||||
|
paths of called functions that are outside of your code. Example::
|
||||||
|
|
||||||
|
def to_unicode(msg, encoding='utf8', errors='replace'):
|
||||||
|
return unicode(msg, encoding, errors)
|
||||||
|
|
||||||
|
# Smoketest only. This will give 100% coverage for your code (it
|
||||||
|
# tests all of the code inside of to_unicode) but it leaves a lot of
|
||||||
|
# room for errors as it doesn't test all combinations of arguments
|
||||||
|
# that are then passed to the unicode() function.
|
||||||
|
|
||||||
|
tools.ok_(to_unicode('abc') == u'abc')
|
||||||
|
|
||||||
|
# Better -- tests now cover non-ascii characters and that error conditions
|
||||||
|
# occur properly. There's a lot of other permutations that can be
|
||||||
|
# added along these same lines.
|
||||||
|
tools.ok_(to_unicode(u'café', 'utf8', 'replace'))
|
||||||
|
tools.assert_raises(UnicodeError, to_unicode, [u'cafè ñunru'.encode('latin1')])
|
||||||
|
|
||||||
|
* We're using nose for unittesting. Rather than depend on unittest2
|
||||||
|
functionality, use the functions that nose provides.
|
||||||
|
* Remember to maintain python-2.4 compatibility even in unittests.
|
||||||
|
|
||||||
|
----------------------------
|
||||||
|
Docstrings and documentation
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
We use sphinx to build our documentation. We use the sphinx autodoc extension
|
||||||
|
to pull docstrings out of the modules for API documentation. This means that
|
||||||
|
docstrings for subpackages and modules should follow a certain pattern. The
|
||||||
|
general structure is:
|
||||||
|
|
||||||
|
* Introductory material about a module in the module's top level docstring.
|
||||||
|
|
||||||
|
* Introductory material should begin with a level two title: an overbar and
|
||||||
|
underbar of '-'.
|
||||||
|
|
||||||
|
* docstrings for every function.
|
||||||
|
|
||||||
|
* The first line is a short summary of what the function does
|
||||||
|
* This is followed by a blank line
|
||||||
|
* The next lines are a `field list
|
||||||
|
<http://sphinx.pocoo.org/markup/desc.html#info-field-lists>_` giving
|
||||||
|
information about the function's signature. We use the keywords:
|
||||||
|
``arg``, ``kwarg``, ``raises``, ``returns``, and sometimes ``rtype``. Use
|
||||||
|
these to describe all arguments, key word arguments, exceptions raised,
|
||||||
|
and return values using these.
|
||||||
|
|
||||||
|
* Parameters that are ``kwarg`` should specify what their default
|
||||||
|
behaviour is.
|
||||||
|
|
||||||
|
.. _kitchen-versioning:
|
||||||
|
|
||||||
|
------------------
|
||||||
|
Kitchen versioning
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Currently the kitchen library is in early stages of development. While we're
|
||||||
|
in this state, the main kitchen library uses the following pattern for version
|
||||||
|
information:
|
||||||
|
|
||||||
|
* Versions look like this::
|
||||||
|
__version_info__ = ((0, 1, 2),)
|
||||||
|
__version__ = '0.1.2'
|
||||||
|
|
||||||
|
* The Major version number remains at 0 until we decide to make the first 1.0
|
||||||
|
release of kitchen. At that point, we're declaring that we have some
|
||||||
|
confidence that we won't need to break backwards compatibility for a while.
|
||||||
|
* The Minor version increments for any backwards incompatible API changes.
|
||||||
|
When this is updated, we reset micro to zero.
|
||||||
|
* The Micro version increments for any other changes (backwards compatible API
|
||||||
|
changes, pure bugfixes, etc).
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Versioning is only updated for releases that generate sdists and new
|
||||||
|
uploads to the download directory. Usually we update the version
|
||||||
|
information for the library just before release. By contrast, we update
|
||||||
|
kitchen :ref:`subpackage-versioning` when an API change is made. When in
|
||||||
|
doubt, look at the version information in the last release.
|
||||||
|
|
||||||
|
----
|
||||||
|
I18N
|
||||||
|
----
|
||||||
|
|
||||||
|
All strings that are used as feedback for users need to be translated.
|
||||||
|
:mod:`kitchen` sets up several functions for this. :func:`_` is used for
|
||||||
|
marking things that are shown to users via print, GUIs, or other "standard"
|
||||||
|
methods. Strings for exceptions are marked with :func:`b_`. This function
|
||||||
|
returns a byte :class:`str` which is needed for use with exceptions::
|
||||||
|
|
||||||
|
from kitchen import _, b_
|
||||||
|
|
||||||
|
def print_message(msg, username):
|
||||||
|
print _('%(user)s, your message of the day is: %(message)s') % {
|
||||||
|
'message': msg, 'user': username}
|
||||||
|
|
||||||
|
raise Exception b_('Test message')
|
||||||
|
|
||||||
|
This serves several purposes:
|
||||||
|
|
||||||
|
* It marks the strings to be extracted by an xgettext-like program.
|
||||||
|
* :func:`_` is a function that will substitute available translations at
|
||||||
|
runtime.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
By using the ``%()s with dict`` style of string formatting, we make this
|
||||||
|
string friendly to translators that may need to reorder the variables when
|
||||||
|
they're translating the string.
|
||||||
|
|
||||||
|
`paver <http://www.blueskyonmars.com/projects/paver/>_` and `babel
|
||||||
|
<http://babel.edgewall.org/>_` are used to extract the strings.
|
||||||
|
|
||||||
|
-----------
|
||||||
|
API updates
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Kitchen strives to have a long deprecation cycle so that people have time to
|
||||||
|
switch away from any APIs that we decide to discard. Discarded APIs should
|
||||||
|
raise a :exc:`DeprecationWarning` and clearly state in the warning message and
|
||||||
|
the docstring how to convert old code to use the new interface. An example of
|
||||||
|
deprecating a function::
|
||||||
|
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
from kitchen import _
|
||||||
|
from kitchen.text.converters import to_bytes, to_unicode
|
||||||
|
from kitchen.text.new_module import new_function
|
||||||
|
|
||||||
|
def old_function(param):
|
||||||
|
'''**Deprecated**
|
||||||
|
|
||||||
|
This function is deprecated. Use
|
||||||
|
:func:`kitchen.text.new_module.new_function` instead. If you want
|
||||||
|
unicode strngs as output, switch to::
|
||||||
|
|
||||||
|
>>> from kitchen.text.new_module import new_function
|
||||||
|
>>> output = new_function(param)
|
||||||
|
|
||||||
|
If you want byte strings, use::
|
||||||
|
|
||||||
|
>>> from kitchen.text.new_module import new_function
|
||||||
|
>>> from kitchen.text.converters import to_bytes
|
||||||
|
>>> output = to_bytes(new_function(param))
|
||||||
|
'''
|
||||||
|
warnings.warn(_('kitchen.text.old_function is deprecated. Use'
|
||||||
|
' kitchen.text.new_module.new_function instead'),
|
||||||
|
DeprecationWarning, stacklevel=2)
|
||||||
|
|
||||||
|
as_unicode = isinstance(param, unicode)
|
||||||
|
message = new_function(to_unicode(param))
|
||||||
|
if not as_unicode:
|
||||||
|
message = to_bytes(message)
|
||||||
|
return message
|
||||||
|
|
||||||
|
If a particular API change is very intrusive, it may be better to create a new
|
||||||
|
version of the subpackage and ship both the old version and the new version.
|
||||||
|
|
||||||
|
---------
|
||||||
|
NEWS file
|
||||||
|
---------
|
||||||
|
|
||||||
|
Update the :file:`NEWS` file when you make a change that will be visible to
|
||||||
|
the users. This is not a ChangeLog file so we don't need to list absolutely
|
||||||
|
everything but it should give the user an idea of how this version differs
|
||||||
|
from prior versions. API changes should be listed here explicitly. bugfixes
|
||||||
|
can be more general::
|
||||||
|
|
||||||
|
-----
|
||||||
|
0.2.0
|
||||||
|
-----
|
||||||
|
* Relicense to LGPLv2+
|
||||||
|
* Add kitchen.text.format module with the following functions:
|
||||||
|
textual_width, textual_width_chop.
|
||||||
|
* Rename the kitchen.text.utils module to kitchen.text.misc. use of the
|
||||||
|
old names is deprecated but still available.
|
||||||
|
* bugfixes applied to kitchen.pycompat24.defaultdict that fixes some
|
||||||
|
tracebacks
|
||||||
|
|
||||||
|
-------------------
|
||||||
|
Kitchen subpackages
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Kitchen itself is a namespace. The kitchen sdist (tarball) provides certain
|
||||||
|
useful subpackages.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
`Kitchen addon packages`_
|
||||||
|
For information about subpackages not distributed in the kitchen sdist
|
||||||
|
that install into the kitchen namespace.
|
||||||
|
|
||||||
|
.. _subpackage-versioning:
|
||||||
|
|
||||||
|
Versioning
|
||||||
|
==========
|
||||||
|
|
||||||
|
Each subpackage should have its own version information which is independent
|
||||||
|
of the other kitchen subpackages and the main kitchen library version. This is
|
||||||
|
used so that code that depends on kitchen APIs can check the version
|
||||||
|
information. The standard way to do this is to put something like this in the
|
||||||
|
subpackage's :file:`__init__.py`::
|
||||||
|
|
||||||
|
from kitchen.versioning import version_tuple_to_string
|
||||||
|
|
||||||
|
__version_info__ = ((1, 0, 0),)
|
||||||
|
__version__ = version_tuple_to_string(__version_info__)
|
||||||
|
|
||||||
|
:attr:`__version_info__` is documented in :mod:`kitchen.versioning`. The
|
||||||
|
values of the first tuple should describe API changes to the module. There
|
||||||
|
are at least three numbers present in the tuple: (Major, minor, micro). The
|
||||||
|
major version number is for backwards incompatible changes (For
|
||||||
|
instance, removing a function, or adding a new mandatory argument to
|
||||||
|
a function). Whenever one of these occurs, you should increment the major
|
||||||
|
number and reset minor and micro to zero. The second number is the minor
|
||||||
|
version. Anytime new but backwards compatible changes are introduced this
|
||||||
|
number should be incremented and the micro version number reset to zero. The
|
||||||
|
micro version should be incremented when a change is made that does not change
|
||||||
|
the API at all. This is a common case for bugfixes, for instance.
|
||||||
|
|
||||||
|
Version information beyond the first three parts of the first tuple may be
|
||||||
|
useful for versioning but semantically have similar meaning to the micro
|
||||||
|
version.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
We update the :attr:`__version_info__` tuple when the API is updated.
|
||||||
|
This way there's less chance of forgetting to update the API version when
|
||||||
|
a new release is made. However, we try to only increment the version
|
||||||
|
numbers a single step for any release. So if kitchen-0.1.0 has
|
||||||
|
kitchen.text.__version__ == '1.0.1', kitchen-0.1.1 should have
|
||||||
|
kitchen.text.__version__ == '1.0.2' or '1.1.0' or '2.0.0'.
|
||||||
|
|
||||||
|
Criteria for subpackages in kitchen
|
||||||
|
===================================
|
||||||
|
|
||||||
|
Supackages within kitchen should meet these criteria:
|
||||||
|
|
||||||
|
* Generally useful or needed for other pieces of kitchen.
|
||||||
|
|
||||||
|
* No mandatory requirements outside of the |stdlib|_.
|
||||||
|
|
||||||
|
* Optional requirements from outside the |stdlib|_ are allowed. Things with
|
||||||
|
mandatory requirements are better placed in `kitchen addon packages`_
|
||||||
|
|
||||||
|
* Somewhat API stable -- this is not a hard requirement. We can change the
|
||||||
|
kitchen api. However, it is better not to as people may come to depend on
|
||||||
|
it.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
`API Updates`_
|
||||||
|
|
||||||
|
----------------------
|
||||||
|
Kitchen addon packages
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Addon packages are very similar to subpackages integrated into the kitchen
|
||||||
|
sdist. This section just lists some of the differences to watch out for.
|
||||||
|
|
||||||
|
setup.py
|
||||||
|
========
|
||||||
|
|
||||||
|
Your :file:`setup.py` should contain entries like this::
|
||||||
|
|
||||||
|
# It's suggested to use a dotted name like this so the package is easily
|
||||||
|
# findable on pypi:
|
||||||
|
setup(name='kitchen.config',
|
||||||
|
# Include kitchen in the keywords, again, for searching on pypi
|
||||||
|
keywords=['kitchen', 'configuration'],
|
||||||
|
# This package lives in the directory kitchen/config
|
||||||
|
packages=['kitchen.config'],
|
||||||
|
# [...]
|
||||||
|
)
|
||||||
|
|
||||||
|
Package directory layout
|
||||||
|
========================
|
||||||
|
|
||||||
|
Create a :file:`kitchen` directory in the toplevel. Place the addon
|
||||||
|
subpackage in there. For example::
|
||||||
|
|
||||||
|
./ <== toplevel with README, setup.py, NEWS, etc
|
||||||
|
kitchen/
|
||||||
|
kitchen/__init__.py
|
||||||
|
kitchen/config/ <== subpackage directory
|
||||||
|
kitchen/config/__init__.py
|
||||||
|
|
||||||
|
Fake kitchen module
|
||||||
|
===================
|
||||||
|
|
||||||
|
The :file::`__init__.py` in the :file:`kitchen` directory is special. It
|
||||||
|
won't be installed. It just needs to pull in the kitchen from the system so
|
||||||
|
that you are able to test your module. You should be able to use this
|
||||||
|
boilerplate::
|
||||||
|
|
||||||
|
# Fake module. This is not installed, It's just made to import the real
|
||||||
|
# kitchen modules for testing this module
|
||||||
|
import pkgutil
|
||||||
|
|
||||||
|
# Extend the __path__ with everything in the real kitchen module
|
||||||
|
__path__ = pkgutil.extend_path(__path__, __name__)
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
:mod:`kitchen` needs to be findable by python for this to work. Installed
|
||||||
|
in the :file:`site-packages` directory or adding it to the
|
||||||
|
:envvar:`PYTHONPATH` will work.
|
||||||
|
|
||||||
|
Your unittests should now be able to find both your submodule and the main
|
||||||
|
kitchen module.
|
||||||
|
|
||||||
|
Versioning
|
||||||
|
==========
|
||||||
|
|
||||||
|
It is recommended that addon packages version similarly to
|
||||||
|
:ref:`subpackage-versioning`. The :data:`__version_info__` and
|
||||||
|
:data:`__version__` strings can be changed independently of the version
|
||||||
|
exposed by setup.py so that you have both an API version
|
||||||
|
(:data:`__version_info__`) and release version that's easier for people to
|
||||||
|
parse. However, you aren't required to do this and you could follow
|
||||||
|
a different methodology if you want (for instance, :ref:`kitchen-versioning`)
|
140
kitchen3/docs/index.rst
Normal file
140
kitchen3/docs/index.rst
Normal file
|
@ -0,0 +1,140 @@
|
||||||
|
================================
|
||||||
|
Kitchen, everything but the sink
|
||||||
|
================================
|
||||||
|
|
||||||
|
:Author: Toshio Kuratomi
|
||||||
|
:Date: 19 March 2011
|
||||||
|
:Version: 1.0.x
|
||||||
|
|
||||||
|
We've all done it. In the process of writing a brand new application we've
|
||||||
|
discovered that we need a little bit of code that we've invented before.
|
||||||
|
Perhaps it's something to handle unicode text. Perhaps it's something to make
|
||||||
|
a bit of python-2.5 code run on python-2.4. Whatever it is, it ends up being
|
||||||
|
a tiny bit of code that seems too small to worry about pushing into its own
|
||||||
|
module so it sits there, a part of your current project, waiting to be cut and
|
||||||
|
pasted into your next project. And the next. And the next. And since that
|
||||||
|
little bittybit of code proved so useful to you, it's highly likely that it
|
||||||
|
proved useful to someone else as well. Useful enough that they've written it
|
||||||
|
and copy and pasted it over and over into each of their new projects.
|
||||||
|
|
||||||
|
Well, no longer! Kitchen aims to pull these small snippets of code into a few
|
||||||
|
python modules which you can import and use within your project. No more copy
|
||||||
|
and paste! Now you can let someone else maintain and release these small
|
||||||
|
snippets so that you can get on with your life.
|
||||||
|
|
||||||
|
This package forms the core of Kitchen. It contains some useful modules for
|
||||||
|
using newer |stdlib|_ modules on older python versions, text manipulation,
|
||||||
|
:pep:`386` versioning, and initializing :mod:`gettext`. With this package we're
|
||||||
|
trying to provide a few useful features that don't have too many dependencies
|
||||||
|
outside of the |stdlib|_. We'll be releasing other modules that drop into the
|
||||||
|
kitchen namespace to add other features (possibly with larger deps) as time
|
||||||
|
goes on.
|
||||||
|
|
||||||
|
------------
|
||||||
|
Requirements
|
||||||
|
------------
|
||||||
|
|
||||||
|
We've tried to keep the core kitchen module's requirements lightweight. At the
|
||||||
|
moment kitchen only requires
|
||||||
|
|
||||||
|
:python: 2.4 or later
|
||||||
|
|
||||||
|
.. warning:: Kitchen-1.1.0 was the last release to support python-2.3.x.
|
||||||
|
|
||||||
|
Soft Requirements
|
||||||
|
=================
|
||||||
|
|
||||||
|
If found, these libraries will be used to make the implementation of some part
|
||||||
|
of kitchen better in some way. If they are not present, the API that they
|
||||||
|
enable will still exist but may function in a different manner.
|
||||||
|
|
||||||
|
`chardet <http://pypi.python.org/pypi/chardet>`_
|
||||||
|
Used in :func:`~kitchen.text.misc.guess_encoding` and
|
||||||
|
:func:`~kitchen.text.converters.guess_encoding_to_xml` to help guess
|
||||||
|
encoding of byte strings being converted. If not present, unknown
|
||||||
|
encodings will be converted as if they were ``latin1``
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
Other Recommended Libraries
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
These libraries implement commonly used functionality that everyone seems to
|
||||||
|
invent. Rather than reinvent their wheel, I simply list the things that they
|
||||||
|
do well for now. Perhaps if people can't find them normally, I'll add them as
|
||||||
|
requirements in :file:`setup.py` or link them into kitchen's namespace. For
|
||||||
|
now, I just mention them here:
|
||||||
|
|
||||||
|
`bunch <http://pypi.python.org/pypi/bunch/>`_
|
||||||
|
Bunch is a dictionary that you can use attribute lookup as well as bracket
|
||||||
|
notation to access. Setting it apart from most homebrewed implementations
|
||||||
|
is the :func:`bunchify` function which will descend nested structures of
|
||||||
|
lists and dicts, transforming the dicts to Bunch's.
|
||||||
|
`hashlib <http://code.krypto.org/python/hashlib/>`_
|
||||||
|
Python 2.5 and forward have a :mod:`hashlib` library that provides secure
|
||||||
|
hash functions to python. If you're developing for python2.4 though, you
|
||||||
|
can install the standalone hashlib library and have access to the same
|
||||||
|
functions.
|
||||||
|
`iterutils <http://pypi.python.org/pypi/iterutils/>`_
|
||||||
|
The python documentation for :mod:`itertools` has some examples
|
||||||
|
of other nice iterable functions that can be built from the
|
||||||
|
:mod:`itertools` functions. This third-party module creates those recipes
|
||||||
|
as a module.
|
||||||
|
`ordereddict <http://pypi.python.org/pypi/ordereddict/>`_
|
||||||
|
Python 2.7 and forward have a :mod:`~collections.OrderedDict` that
|
||||||
|
provides a :class:`dict` whose items are ordered (and indexable) as well
|
||||||
|
as named.
|
||||||
|
`unittest2 <http://pypi.python.org/pypi/unittest2>`_
|
||||||
|
Python 2.7 has an updated :mod:`unittest` library with new functions not
|
||||||
|
present in the |stdlib|_ for Python 2.6 or less. If you want to use those
|
||||||
|
new functions but need your testing framework to be compatible with older
|
||||||
|
Python the unittest2 library provides the update as an external module.
|
||||||
|
`nose <http://somethingaboutorange.com/mrl/projects/nose/>`_
|
||||||
|
If you want to use a test discovery tool instead of the unittest
|
||||||
|
framework, nosetests provides a simple to use way to do that.
|
||||||
|
|
||||||
|
-------
|
||||||
|
License
|
||||||
|
-------
|
||||||
|
|
||||||
|
This python module is distributed under the terms of the
|
||||||
|
`GNU Lesser General Public License Version 2 or later
|
||||||
|
<http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html>`_.
|
||||||
|
|
||||||
|
.. note:: Some parts of this module are licensed under terms less restrictive
|
||||||
|
than the LGPLv2+. If you separate these files from the work as a whole
|
||||||
|
you are allowed to use them under the less restrictive licenses. The
|
||||||
|
following is a list of the files that are known:
|
||||||
|
|
||||||
|
`Python 2 license <http://www.python.org/download/releases/2.4/license/>`_
|
||||||
|
:file:`_subprocess.py`, :file:`test_subprocess.py`,
|
||||||
|
:file:`defaultdict.py`, :file:`test_defaultdict.py`,
|
||||||
|
:file:`_base64.py`, and :file:`test_base64.py`
|
||||||
|
|
||||||
|
--------
|
||||||
|
Contents
|
||||||
|
--------
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
tutorial
|
||||||
|
api-overview
|
||||||
|
porting-guide-0.3
|
||||||
|
hacking
|
||||||
|
glossary
|
||||||
|
|
||||||
|
------------------
|
||||||
|
Indices and tables
|
||||||
|
------------------
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
||||||
|
* :ref:`modindex`
|
||||||
|
* :ref:`search`
|
||||||
|
|
||||||
|
-------------
|
||||||
|
Project Pages
|
||||||
|
-------------
|
||||||
|
|
||||||
|
More information about the project can be found on the |projpage|_
|
||||||
|
|
||||||
|
The latest published version of this documentation can be found on the |docpage|_
|
209
kitchen3/docs/porting-guide-0.3.rst
Normal file
209
kitchen3/docs/porting-guide-0.3.rst
Normal file
|
@ -0,0 +1,209 @@
|
||||||
|
===================
|
||||||
|
1.0.0 Porting Guide
|
||||||
|
===================
|
||||||
|
|
||||||
|
The 0.1 through 1.0.0 releases focused on bringing in functions from yum and
|
||||||
|
python-fedora. This porting guide tells how to port from those APIs to their
|
||||||
|
kitchen replacements.
|
||||||
|
|
||||||
|
-------------
|
||||||
|
python-fedora
|
||||||
|
-------------
|
||||||
|
|
||||||
|
=================================== ===================
|
||||||
|
python-fedora kitchen replacement
|
||||||
|
----------------------------------- -------------------
|
||||||
|
:func:`fedora.iterutils.isiterable` :func:`kitchen.iterutils.isiterable` [#f1]_
|
||||||
|
:func:`fedora.textutils.to_unicode` :func:`kitchen.text.converters.to_unicode`
|
||||||
|
:func:`fedora.textutils.to_bytes` :func:`kitchen.text.converters.to_bytes`
|
||||||
|
=================================== ===================
|
||||||
|
|
||||||
|
.. [#f1] :func:`~kitchen.iterutils.isiterable` has changed slightly in
|
||||||
|
kitchen. The :attr:`include_string` attribute has switched its default value
|
||||||
|
from :data:`True` to :data:`False`. So you need to change code like::
|
||||||
|
|
||||||
|
>>> # Old code
|
||||||
|
>>> isiterable('abcdef')
|
||||||
|
True
|
||||||
|
>>> # New code
|
||||||
|
>>> isiterable('abcdef', include_string=True)
|
||||||
|
True
|
||||||
|
|
||||||
|
---
|
||||||
|
yum
|
||||||
|
---
|
||||||
|
|
||||||
|
================================= ===================
|
||||||
|
yum kitchen replacement
|
||||||
|
--------------------------------- -------------------
|
||||||
|
:func:`yum.i18n.dummy_wrapper` :meth:`kitchen.i18n.DummyTranslations.ugettext` [#y1]_
|
||||||
|
:func:`yum.i18n.dummyP_wrapper` :meth:`kitchen.i18n.DummyTanslations.ungettext` [#y1]_
|
||||||
|
:func:`yum.i18n.utf8_width` :func:`kitchen.text.display.textual_width`
|
||||||
|
:func:`yum.i18n.utf8_width_chop` :func:`kitchen.text.display.textual_width_chop`
|
||||||
|
and :func:`kitchen.text.display.textual_width` [#y2]_ [#y4]_
|
||||||
|
:func:`yum.i18n.utf8_valid` :func:`kitchen.text.misc.byte_string_valid_encoding`
|
||||||
|
:func:`yum.i18n.utf8_text_wrap` :func:`kitchen.text.display.wrap` [#y3]_
|
||||||
|
:func:`yum.i18n.utf8_text_fill` :func:`kitchen.text.display.fill` [#y3]_
|
||||||
|
:func:`yum.i18n.to_unicode` :func:`kitchen.text.converters.to_unicode` [#y5]_
|
||||||
|
:func:`yum.i18n.to_unicode_maybe` :func:`kitchen.text.converters.to_unicode` [#y5]_
|
||||||
|
:func:`yum.i18n.to_utf8` :func:`kitchen.text.converters.to_bytes` [#y5]_
|
||||||
|
:func:`yum.i18n.to_str` :func:`kitchen.text.converters.to_unicode`
|
||||||
|
or :func:`kitchen.text.converters.to_bytes` [#y6]_
|
||||||
|
:func:`yum.i18n.str_eq` :func:`kitchen.text.misc.str_eq`
|
||||||
|
:func:`yum.misc.to_xml` :func:`kitchen.text.converters.unicode_to_xml`
|
||||||
|
or :func:`kitchen.text.converters.byte_string_to_xml` [#y7]_
|
||||||
|
:func:`yum.i18n._` See: :ref:`yum-i18n-init`
|
||||||
|
:func:`yum.i18n.P_` See: :ref:`yum-i18n-init`
|
||||||
|
:func:`yum.i18n.exception2msg` :func:`kitchen.text.converters.exception_to_unicode`
|
||||||
|
or :func:`kitchen.text.converter.exception_to_bytes` [#y8]_
|
||||||
|
================================= ===================
|
||||||
|
|
||||||
|
.. [#y1] These yum methods provided fallback support for :mod:`gettext`
|
||||||
|
functions in case either ``gaftonmode`` was set or :mod:`gettext` failed
|
||||||
|
to return an object. In kitchen, we can use the
|
||||||
|
:class:`kitchen.i18n.DummyTranslations` object to fulfill that role.
|
||||||
|
Please see :ref:`yum-i18n-init` for more suggestions on how to do this.
|
||||||
|
|
||||||
|
.. [#y2] The yum version of these functions returned a byte :class:`str`. The
|
||||||
|
kitchen version listed here returns a :class:`unicode` string. If you
|
||||||
|
need a byte :class:`str` simply call
|
||||||
|
:func:`kitchen.text.converters.to_bytes` on the result.
|
||||||
|
|
||||||
|
.. [#y3] The yum version of these functions would return either a byte
|
||||||
|
:class:`str` or a :class:`unicode` string depending on what the input
|
||||||
|
value was. The kitchen version always returns :class:`unicode` strings.
|
||||||
|
|
||||||
|
.. [#y4] :func:`yum.i18n.utf8_width_chop` performed two functions. It
|
||||||
|
returned the piece of the message that fit in a specified width and the
|
||||||
|
width of that message. In kitchen, you need to call two functions, one
|
||||||
|
for each action::
|
||||||
|
|
||||||
|
>>> # Old way
|
||||||
|
>>> utf8_width_chop(msg, 5)
|
||||||
|
(5, 'く ku')
|
||||||
|
>>> # New way
|
||||||
|
>>> from kitchen.text.display import textual_width, textual_width_chop
|
||||||
|
>>> (textual_width(msg), textual_width_chop(msg, 5))
|
||||||
|
(5, u'く ku')
|
||||||
|
|
||||||
|
.. [#y5] If the yum version of :func:`~yum.i18n.to_unicode` or
|
||||||
|
:func:`~yum.i18n.to_utf8` is given an object that is not a string, it
|
||||||
|
returns the object itself. :func:`kitchen.text.converters.to_unicode` and
|
||||||
|
:func:`kitchen.text.converters.to_bytes` default to returning the
|
||||||
|
``simplerepr`` of the object instead. If you want the yum behaviour, set
|
||||||
|
the :attr:`nonstring` parameter to ``passthru``::
|
||||||
|
|
||||||
|
>>> from kitchen.text.converters import to_unicode
|
||||||
|
>>> to_unicode(5)
|
||||||
|
u'5'
|
||||||
|
>>> to_unicode(5, nonstring='passthru')
|
||||||
|
5
|
||||||
|
|
||||||
|
.. [#y6] :func:`yum.i18n.to_str` could return either a byte :class:`str`. or
|
||||||
|
a :class:`unicode` string In kitchen you can get the same effect but you
|
||||||
|
get to choose whether you want a byte :class:`str` or a :class:`unicode`
|
||||||
|
string. Use :func:`~kitchen.text.converters.to_bytes` for :class:`str`
|
||||||
|
and :func:`~kitchen.text.converters.to_unicode` for :class:`unicode`.
|
||||||
|
|
||||||
|
.. [#y7] :func:`yum.misc.to_xml` was buggy as written. I think the intention
|
||||||
|
was for you to be able to pass a byte :class:`str` or :class:`unicode`
|
||||||
|
string in and get out a byte :class:`str` that was valid to use in an xml
|
||||||
|
file. The two kitchen functions
|
||||||
|
:func:`~kitchen.text.converters.byte_string_to_xml` and
|
||||||
|
:func:`~kitchen.text.converters.unicode_to_xml` do that for each string
|
||||||
|
type.
|
||||||
|
|
||||||
|
.. [#y8] When porting :func:`yum.i18n.exception2msg` to use kitchen, you
|
||||||
|
should setup two wrapper functions to aid in your port. They'll look like
|
||||||
|
this:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
from kitchen.text.converters import EXCEPTION_CONVERTERS, \
|
||||||
|
BYTE_EXCEPTION_CONVERTERS, exception_to_unicode, \
|
||||||
|
exception_to_bytes
|
||||||
|
def exception2umsg(e):
|
||||||
|
'''Return a unicode representation of an exception'''
|
||||||
|
c = [lambda e: e.value]
|
||||||
|
c.extend(EXCEPTION_CONVERTERS)
|
||||||
|
return exception_to_unicode(e, converters=c)
|
||||||
|
def exception2bmsg(e):
|
||||||
|
'''Return a utf8 encoded str representation of an exception'''
|
||||||
|
c = [lambda e: e.value]
|
||||||
|
c.extend(BYTE_EXCEPTION_CONVERTERS)
|
||||||
|
return exception_to_bytes(e, converters=c)
|
||||||
|
|
||||||
|
The reason to define this wrapper is that many of the exceptions in yum
|
||||||
|
put the message in the :attr:`value` attribute of the :exc:`Exception`
|
||||||
|
instead of adding it to the :attr:`args` attribute. So the default
|
||||||
|
:data:`~kitchen.text.converters.EXCEPTION_CONVERTERS` don't know where to
|
||||||
|
find the message. The wrapper tells kitchen to check the :attr:`value`
|
||||||
|
attribute for the message. The reason to define two wrappers may be less
|
||||||
|
obvious. :func:`yum.i18n.exception2msg` can return a :class:`unicode`
|
||||||
|
string or a byte :class:`str` depending on a combination of what
|
||||||
|
attributes are present on the :exc:`Exception` and what locale the
|
||||||
|
function is being run in. By contrast,
|
||||||
|
:func:`kitchen.text.converters.exception_to_unicode` only returns
|
||||||
|
:class:`unicode` strings and
|
||||||
|
:func:`kitchen.text.converters.exception_to_bytes` only returns byte
|
||||||
|
:class:`str`. This is much safer as it keeps code that can only handle
|
||||||
|
:class:`unicode` or only handle byte :class:`str` correctly from getting
|
||||||
|
the wrong type when an input changes but it means you need to examine the
|
||||||
|
calling code when porting from :func:`yum.i18n.exception2msg` and use the
|
||||||
|
appropriate wrapper.
|
||||||
|
|
||||||
|
.. _yum-i18n-init:
|
||||||
|
|
||||||
|
Initializing Yum i18n
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Previously, yum had several pieces of code to initialize i18n. From the
|
||||||
|
toplevel of :file:`yum/i18n.py`::
|
||||||
|
|
||||||
|
try:.
|
||||||
|
'''
|
||||||
|
Setup the yum translation domain and make _() and P_() translation wrappers
|
||||||
|
available.
|
||||||
|
using ugettext to make sure translated strings are in Unicode.
|
||||||
|
'''
|
||||||
|
import gettext
|
||||||
|
t = gettext.translation('yum', fallback=True)
|
||||||
|
_ = t.ugettext
|
||||||
|
P_ = t.ungettext
|
||||||
|
except:
|
||||||
|
'''
|
||||||
|
Something went wrong so we make a dummy _() wrapper there is just
|
||||||
|
returning the same text
|
||||||
|
'''
|
||||||
|
_ = dummy_wrapper
|
||||||
|
P_ = dummyP_wrapper
|
||||||
|
|
||||||
|
With kitchen, this can be changed to this::
|
||||||
|
|
||||||
|
from kitchen.i18n import easy_gettext_setup, DummyTranslations
|
||||||
|
try:
|
||||||
|
_, P_ = easy_gettext_setup('yum')
|
||||||
|
except:
|
||||||
|
translations = DummyTranslations()
|
||||||
|
_ = translations.ugettext
|
||||||
|
P_ = translations.ungettext
|
||||||
|
|
||||||
|
.. note:: In :ref:`overcoming-frustration`, it is mentioned that for some
|
||||||
|
things (like exception messages), using the byte :class:`str` oriented
|
||||||
|
functions is more appropriate. If this is desired, the setup portion is
|
||||||
|
only a second call to :func:`kitchen.i18n.easy_gettext_setup`::
|
||||||
|
|
||||||
|
b_, bP_ = easy_gettext_setup('yum', use_unicode=False)
|
||||||
|
|
||||||
|
The second place where i18n is setup is in :meth:`yum.YumBase._getConfig` in
|
||||||
|
:file:`yum/__init_.py` if ``gaftonmode`` is in effect::
|
||||||
|
|
||||||
|
if startupconf.gaftonmode:
|
||||||
|
global _
|
||||||
|
_ = yum.i18n.dummy_wrapper
|
||||||
|
|
||||||
|
This can be changed to::
|
||||||
|
|
||||||
|
if startupconf.gaftonmode:
|
||||||
|
global _
|
||||||
|
_ = DummyTranslations().ugettext()
|
19
kitchen3/docs/tutorial.rst
Normal file
19
kitchen3/docs/tutorial.rst
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
================================
|
||||||
|
Using kitchen to write good code
|
||||||
|
================================
|
||||||
|
|
||||||
|
Kitchen's functions won't automatically make you a better programmer. You
|
||||||
|
have to learn when and how to use them as well. This section of the
|
||||||
|
documentation is intended to show you some of the ways that you can apply
|
||||||
|
kitchen's functions to problems that may have arisen in your life. The goal
|
||||||
|
of this section is to give you enough information to understand what the
|
||||||
|
kitchen API can do for you and where in the :ref:`KitchenAPI` docs to look
|
||||||
|
for something that can help you with your next issue. Along the way,
|
||||||
|
you might pick up the knack for identifying issues with your code before you
|
||||||
|
publish it. And that *will* make you a better coder.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
unicode-frustrations
|
||||||
|
designing-unicode-apis
|
504
kitchen3/docs/unicode-frustrations.rst
Normal file
504
kitchen3/docs/unicode-frustrations.rst
Normal file
|
@ -0,0 +1,504 @@
|
||||||
|
.. _overcoming-frustration:
|
||||||
|
|
||||||
|
==========================================================
|
||||||
|
Overcoming frustration: Correctly using unicode in python2
|
||||||
|
==========================================================
|
||||||
|
|
||||||
|
In python-2.x, there's two types that deal with text.
|
||||||
|
|
||||||
|
1. :class:`str` is for strings of bytes. These are very similar in nature to
|
||||||
|
how strings are handled in C.
|
||||||
|
2. :class:`unicode` is for strings of unicode :term:`code points`.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
**Just what the dickens is "Unicode"?**
|
||||||
|
|
||||||
|
One mistake that people encountering this issue for the first time make is
|
||||||
|
confusing the :class:`unicode` type and the encodings of unicode stored in
|
||||||
|
the :class:`str` type. In python, the :class:`unicode` type stores an
|
||||||
|
abstract sequence of :term:`code points`. Each :term:`code point`
|
||||||
|
represents a :term:`grapheme`. By contrast, byte :class:`str` stores
|
||||||
|
a sequence of bytes which can then be mapped to a sequence of :term:`code
|
||||||
|
points`. Each unicode encoding (:term:`UTF-8`, UTF-7, UTF-16, UTF-32,
|
||||||
|
etc) maps different sequences of bytes to the unicode :term:`code points`.
|
||||||
|
|
||||||
|
What does that mean to you as a programmer? When you're dealing with text
|
||||||
|
manipulations (finding the number of characters in a string or cutting
|
||||||
|
a string on word boundaries) you should be dealing with :class:`unicode`
|
||||||
|
strings as they abstract characters in a manner that's appropriate for
|
||||||
|
thinking of them as a sequence of letters that you will see on a page.
|
||||||
|
When dealing with I/O, reading to and from the disk, printing to
|
||||||
|
a terminal, sending something over a network link, etc, you should be dealing
|
||||||
|
with byte :class:`str` as those devices are going to need to deal with
|
||||||
|
concrete implementations of what bytes represent your abstract characters.
|
||||||
|
|
||||||
|
In the python2 world many APIs use these two classes interchangably but there
|
||||||
|
are several important APIs where only one or the other will do the right
|
||||||
|
thing. When you give the wrong type of string to an API that wants the other
|
||||||
|
type, you may end up with an exception being raised (:exc:`UnicodeDecodeError`
|
||||||
|
or :exc:`UnicodeEncodeError`). However, these exceptions aren't always raised
|
||||||
|
because python implicitly converts between types... *sometimes*.
|
||||||
|
|
||||||
|
-----------------------------------
|
||||||
|
Frustration #1: Inconsistent Errors
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
Although converting when possible seems like the right thing to do, it's
|
||||||
|
actually the first source of frustration. A programmer can test out their
|
||||||
|
program with a string like: ``The quick brown fox jumped over the lazy dog``
|
||||||
|
and not encounter any issues. But when they release their software into the
|
||||||
|
wild, someone enters the string: ``I sat down for coffee at the café`` and
|
||||||
|
suddenly an exception is thrown. The reason? The mechanism that converts
|
||||||
|
between the two types is only able to deal with :term:`ASCII` characters.
|
||||||
|
Once you throw non-:term:`ASCII` characters into your strings, you have to
|
||||||
|
start dealing with the conversion manually.
|
||||||
|
|
||||||
|
So, if I manually convert everything to either byte :class:`str` or
|
||||||
|
:class:`unicode` strings, will I be okay? The answer is.... *sometimes*.
|
||||||
|
|
||||||
|
---------------------------------
|
||||||
|
Frustration #2: Inconsistent APIs
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
The problem you run into when converting everything to byte :class:`str` or
|
||||||
|
:class:`unicode` strings is that you'll be using someone else's API quite
|
||||||
|
often (this includes the APIs in the |stdlib|_) and find that the API will only
|
||||||
|
accept byte :class:`str` or only accept :class:`unicode` strings. Or worse,
|
||||||
|
that the code will accept either when you're dealing with strings that consist
|
||||||
|
solely of :term:`ASCII` but throw an error when you give it a string that's
|
||||||
|
got non-:term:`ASCII` characters. When you encounter these APIs you first
|
||||||
|
need to identify which type will work better and then you have to convert your
|
||||||
|
values to the correct type for that code. Thus the programmer that wants to
|
||||||
|
proactively fix all unicode errors in their code needs to do two things:
|
||||||
|
|
||||||
|
1. You must keep track of what type your sequences of text are. Does
|
||||||
|
``my_sentence`` contain :class:`unicode` or :class:`str`? If you don't
|
||||||
|
know that then you're going to be in for a world of hurt.
|
||||||
|
2. Anytime you call a function you need to evaluate whether that function will
|
||||||
|
do the right thing with :class:`str` or :class:`unicode` values. Sending
|
||||||
|
the wrong value here will lead to a :exc:`UnicodeError` being thrown when
|
||||||
|
the string contains non-:term:`ASCII` characters.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
There is one mitigating factor here. The python community has been
|
||||||
|
standardizing on using :class:`unicode` in all its APIs. Although there
|
||||||
|
are some APIs that you need to send byte :class:`str` to in order to be
|
||||||
|
safe, (including things as ubiquitous as :func:`print` as we'll see in the
|
||||||
|
next section), it's getting easier and easier to use :class:`unicode`
|
||||||
|
strings with most APIs.
|
||||||
|
|
||||||
|
------------------------------------------------
|
||||||
|
Frustration #3: Inconsistent treatment of output
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
|
Alright, since the python community is moving to using :class:`unicode`
|
||||||
|
strings everywhere, we might as well convert everything to :class:`unicode`
|
||||||
|
strings and use that by default, right? Sounds good most of the time but
|
||||||
|
there's at least one huge caveat to be aware of. Anytime you output text to
|
||||||
|
the terminal or to a file, the text has to be converted into a byte
|
||||||
|
:class:`str`. Python will try to implicitly convert from :class:`unicode` to
|
||||||
|
byte :class:`str`... but it will throw an exception if the bytes are
|
||||||
|
non-:term:`ASCII`::
|
||||||
|
|
||||||
|
>>> string = unicode(raw_input(), 'utf8')
|
||||||
|
café
|
||||||
|
>>> log = open('/var/tmp/debug.log', 'w')
|
||||||
|
>>> log.write(string)
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||||
|
|
||||||
|
Okay, this is simple enough to solve: Just convert to a byte :class:`str` and
|
||||||
|
we're all set::
|
||||||
|
|
||||||
|
>>> string = unicode(raw_input(), 'utf8')
|
||||||
|
café
|
||||||
|
>>> string_for_output = string.encode('utf8', 'replace')
|
||||||
|
>>> log = open('/var/tmp/debug.log', 'w')
|
||||||
|
>>> log.write(string_for_output)
|
||||||
|
>>>
|
||||||
|
|
||||||
|
So that was simple, right? Well... there's one gotcha that makes things a bit
|
||||||
|
harder to debug sometimes. When you attempt to write non-:term:`ASCII`
|
||||||
|
:class:`unicode` strings to a file-like object you get a traceback everytime.
|
||||||
|
But what happens when you use :func:`print`? The terminal is a file-like object
|
||||||
|
so it should raise an exception right? The answer to that is....
|
||||||
|
*sometimes*:
|
||||||
|
|
||||||
|
.. code-block:: pycon
|
||||||
|
|
||||||
|
$ python
|
||||||
|
>>> print u'café'
|
||||||
|
café
|
||||||
|
|
||||||
|
No exception. Okay, we're fine then?
|
||||||
|
|
||||||
|
We are until someone does one of the following:
|
||||||
|
|
||||||
|
* Runs the script in a different locale:
|
||||||
|
|
||||||
|
.. code-block:: pycon
|
||||||
|
|
||||||
|
$ LC_ALL=C python
|
||||||
|
>>> # Note: if you're using a good terminal program when running in the C locale
|
||||||
|
>>> # The terminal program will prevent you from entering non-ASCII characters
|
||||||
|
>>> # python will still recognize them if you use the codepoint instead:
|
||||||
|
>>> print u'caf\xe9'
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||||
|
|
||||||
|
* Redirects output to a file:
|
||||||
|
|
||||||
|
.. code-block:: pycon
|
||||||
|
|
||||||
|
$ cat test.py
|
||||||
|
#!/usr/bin/python -tt
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
print u'café'
|
||||||
|
$ ./test.py >t
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "./test.py", line 4, in <module>
|
||||||
|
print u'café'
|
||||||
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
|
||||||
|
|
||||||
|
Okay, the locale thing is a pain but understandable: the C locale doesn't
|
||||||
|
understand any characters outside of :term:`ASCII` so naturally attempting to
|
||||||
|
display those won't work. Now why does redirecting to a file cause problems?
|
||||||
|
It's because :func:`print` in python2 is treated specially. Whereas the other
|
||||||
|
file-like objects in python always convert to :term:`ASCII` unless you set
|
||||||
|
them up differently, using :func:`print` to output to the terminal will use
|
||||||
|
the user's locale to convert before sending the output to the terminal. When
|
||||||
|
:func:`print` is not outputting to the terminal (being redirected to a file,
|
||||||
|
for instance), :func:`print` decides that it doesn't know what locale to use
|
||||||
|
for that file and so it tries to convert to :term:`ASCII` instead.
|
||||||
|
|
||||||
|
So what does this mean for you, as a programmer? Unless you have the luxury
|
||||||
|
of controlling how your users use your code, you should always, always, always
|
||||||
|
convert to a byte :class:`str` before outputting strings to the terminal or to
|
||||||
|
a file. Python even provides you with a facility to do just this. If you
|
||||||
|
know that every :class:`unicode` string you send to a particular file-like
|
||||||
|
object (for instance, :data:`~sys.stdout`) should be converted to a particular
|
||||||
|
encoding you can use a :class:`codecs.StreamWriter` object to convert from
|
||||||
|
a :class:`unicode` string into a byte :class:`str`. In particular,
|
||||||
|
:func:`codecs.getwriter` will return a :class:`~codecs.StreamWriter` class
|
||||||
|
that will help you to wrap a file-like object for output. Using our
|
||||||
|
:func:`print` example:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
$ cat test.py
|
||||||
|
#!/usr/bin/python -tt
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
import codecs
|
||||||
|
import sys
|
||||||
|
|
||||||
|
UTF8Writer = codecs.getwriter('utf8')
|
||||||
|
sys.stdout = UTF8Writer(sys.stdout)
|
||||||
|
print u'café'
|
||||||
|
$ ./test.py >t
|
||||||
|
$ cat t
|
||||||
|
café
|
||||||
|
|
||||||
|
-----------------------------------------
|
||||||
|
Frustrations #4 and #5 -- The other shoes
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
In English, there's a saying "waiting for the other shoe to drop". It means
|
||||||
|
that when one event (usually bad) happens, you come to expect another event
|
||||||
|
(usually worse) to come after. In this case we have two other shoes.
|
||||||
|
|
||||||
|
|
||||||
|
Frustration #4: Now it doesn't take byte strings?!
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
If you wrap :data:`sys.stdout` using :func:`codecs.getwriter` and think you
|
||||||
|
are now safe to print any variable without checking its type I am afraid
|
||||||
|
I must inform you that you're not paying enough attention to :term:`Murphy's
|
||||||
|
Law`. The :class:`~codecs.StreamWriter` that :func:`codecs.getwriter`
|
||||||
|
provides will take :class:`unicode` strings and transform them into byte
|
||||||
|
:class:`str` before they get to :data:`sys.stdout`. The problem is if you
|
||||||
|
give it something that's already a byte :class:`str` it tries to transform
|
||||||
|
that as well. To do that it tries to turn the byte :class:`str` you give it
|
||||||
|
into :class:`unicode` and then transform that back into a byte :class:`str`...
|
||||||
|
and since it uses the :term:`ASCII` codec to perform those conversions,
|
||||||
|
chances are that it'll blow up when making them::
|
||||||
|
|
||||||
|
>>> import codecs
|
||||||
|
>>> import sys
|
||||||
|
>>> UTF8Writer = codecs.getwriter('utf8')
|
||||||
|
>>> sys.stdout = UTF8Writer(sys.stdout)
|
||||||
|
>>> print 'café'
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "/usr/lib64/python2.6/codecs.py", line 351, in write
|
||||||
|
data, consumed = self.encode(object, self.errors)
|
||||||
|
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
|
||||||
|
|
||||||
|
To work around this, kitchen provides an alternate version of
|
||||||
|
:func:`codecs.getwriter` that can deal with both byte :class:`str` and
|
||||||
|
:class:`unicode` strings. Use :func:`kitchen.text.converters.getwriter` in
|
||||||
|
place of the :mod:`codecs` version like this::
|
||||||
|
|
||||||
|
>>> import sys
|
||||||
|
>>> from kitchen.text.converters import getwriter
|
||||||
|
>>> UTF8Writer = getwriter('utf8')
|
||||||
|
>>> sys.stdout = UTF8Writer(sys.stdout)
|
||||||
|
>>> print u'café'
|
||||||
|
café
|
||||||
|
>>> print 'café'
|
||||||
|
café
|
||||||
|
|
||||||
|
-------------------------------------------
|
||||||
|
Frustration #5: Inconsistent APIs Part deux
|
||||||
|
-------------------------------------------
|
||||||
|
Sometimes you do everything right in your code but other people's code fails
|
||||||
|
you. With unicode issues this happens more often than we want. A glaring
|
||||||
|
example of this is when you get values back from a function that aren't
|
||||||
|
consistently :class:`unicode` string or byte :class:`str`.
|
||||||
|
|
||||||
|
An example from the |stdlib|_ is :mod:`gettext`. The :mod:`gettext` functions
|
||||||
|
are used to help translate messages that you display to users in the users'
|
||||||
|
native languages. Since most languages contain letters outside of the
|
||||||
|
:term:`ASCII` range, the values that are returned contain unicode characters.
|
||||||
|
:mod:`gettext` provides you with :meth:`~gettext.GNUTranslations.ugettext` and
|
||||||
|
:meth:`~gettext.GNUTranslations.ungettext` to return these translations as
|
||||||
|
:class:`unicode` strings and :meth:`~gettext.GNUTranslations.gettext`,
|
||||||
|
:meth:`~gettext.GNUTranslations.ngettext`,
|
||||||
|
:meth:`~gettext.GNUTranslations.lgettext`, and
|
||||||
|
:meth:`~gettext.GNUTranslations.lngettext` to return them as encoded byte
|
||||||
|
:class:`str`. Unfortunately, even though they're documented to return only
|
||||||
|
one type of string or the other, the implementation has corner cases where the
|
||||||
|
wrong type can be returned.
|
||||||
|
|
||||||
|
This means that even if you separate your :class:`unicode` string and byte
|
||||||
|
:class:`str` correctly before you pass your strings to a :mod:`gettext`
|
||||||
|
function, afterwards, you might have to check that you have the right sort of
|
||||||
|
string type again.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
:mod:`kitchen.i18n` provides alternate gettext translation objects that
|
||||||
|
return only byte :class:`str` or only :class:`unicode` string.
|
||||||
|
|
||||||
|
---------------
|
||||||
|
A few solutions
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Now that we've identified the issues, can we define a comprehensive strategy
|
||||||
|
for dealing with them?
|
||||||
|
|
||||||
|
Convert text at the border
|
||||||
|
==========================
|
||||||
|
|
||||||
|
If you get some piece of text from a library, read from a file, etc, turn it
|
||||||
|
into a :class:`unicode` string immediately. Since python is moving in the
|
||||||
|
direction of :class:`unicode` strings everywhere it's going to be easier to
|
||||||
|
work with :class:`unicode` strings within your code.
|
||||||
|
|
||||||
|
If your code is heavily involved with using things that are bytes, you can do
|
||||||
|
the opposite and convert all text into byte :class:`str` at the border and
|
||||||
|
only convert to :class:`unicode` when you need it for passing to another
|
||||||
|
library or performing string operations on it.
|
||||||
|
|
||||||
|
In either case, the important thing is to pick a default type for strings and
|
||||||
|
stick with it throughout your code. When you mix the types it becomes much
|
||||||
|
easier to operate on a string with a function that can only use the other type
|
||||||
|
by mistake.
|
||||||
|
|
||||||
|
.. note:: In python3, the abstract unicode type becomes much more prominent.
|
||||||
|
The type named ``str`` is the equivalent of python2's :class:`unicode` and
|
||||||
|
python3's ``bytes`` type replaces python2's :class:`str`. Most APIs deal
|
||||||
|
in the unicode type of string with just some pieces that are low level
|
||||||
|
dealing with bytes. The implicit conversions between bytes and unicode
|
||||||
|
is removed and whenever you want to make the conversion you need to do so
|
||||||
|
explicitly.
|
||||||
|
|
||||||
|
When the data needs to be treated as bytes (or unicode) use a naming convention
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
Sometimes you're converting nearly all of your data to :class:`unicode`
|
||||||
|
strings but you have one or two values where you have to keep byte
|
||||||
|
:class:`str` around. This is often the case when you need to use the value
|
||||||
|
verbatim with some external resource. For instance, filenames or key values
|
||||||
|
in a database. When you do this, use a naming convention for the data you're
|
||||||
|
working with so you (and others reading your code later) don't get confused
|
||||||
|
about what's being stored in the value.
|
||||||
|
|
||||||
|
If you need both a textual string to present to the user and a byte value for
|
||||||
|
an exact match, consider keeping both versions around. You can either use two
|
||||||
|
variables for this or a :class:`dict` whose key is the byte value.
|
||||||
|
|
||||||
|
.. note:: You can use the naming convention used in kitchen as a guide for
|
||||||
|
implementing your own naming convention. It prefixes byte :class:`str`
|
||||||
|
variables of unknown encoding with ``b_`` and byte :class:`str` of known
|
||||||
|
encoding with the encoding name like: ``utf8_``. If the default was to
|
||||||
|
handle :class:`str` and only keep a few :class:`unicode` values, those
|
||||||
|
variables would be prefixed with ``u_``.
|
||||||
|
|
||||||
|
When outputting data, convert back into bytes
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
When you go to send your data back outside of your program (to the filesystem,
|
||||||
|
over the network, displaying to the user, etc) turn the data back into a byte
|
||||||
|
:class:`str`. How you do this will depend on the expected output format of
|
||||||
|
the data. For displaying to the user, you can use the user's default encoding
|
||||||
|
using :func:`locale.getpreferredencoding`. For entering into a file, you're best
|
||||||
|
bet is to pick a single encoding and stick with it.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
When using the encoding that the user has set (for instance, using
|
||||||
|
:func:`locale.getpreferredencoding`, remember that they may have their
|
||||||
|
encoding set to something that can't display every single unicode
|
||||||
|
character. That means when you convert from :class:`unicode` to a byte
|
||||||
|
:class:`str` you need to decide what should happen if the byte value is
|
||||||
|
not valid in the user's encoding. For purposes of displaying messages to
|
||||||
|
the user, it's usually okay to use the ``replace`` encoding error handler
|
||||||
|
to replace the invalid characters with a question mark or other symbol
|
||||||
|
meaning the character couldn't be displayed.
|
||||||
|
|
||||||
|
You can use :func:`kitchen.text.converters.getwriter` to do this automatically
|
||||||
|
for :data:`sys.stdout`. When creating exception messages be sure to convert
|
||||||
|
to bytes manually.
|
||||||
|
|
||||||
|
When writing unittests, include non-ASCII values and both unicode and str type
|
||||||
|
==============================================================================
|
||||||
|
|
||||||
|
Unless you know that a specific portion of your code will only deal with
|
||||||
|
:term:`ASCII`, be sure to include non-:term:`ASCII` values in your unittests.
|
||||||
|
Including a few characters from several different scripts is highly advised as
|
||||||
|
well because some code may have special cased accented roman characters but
|
||||||
|
not know how to handle characters used in Asian alphabets.
|
||||||
|
|
||||||
|
Similarly, unless you know that that portion of your code will only be given
|
||||||
|
:class:`unicode` strings or only byte :class:`str` be sure to try variables
|
||||||
|
of both types in your unittests. When doing this, make sure that the
|
||||||
|
variables are also non-:term:`ASCII` as python's implicit conversion will mask
|
||||||
|
problems with pure :term:`ASCII` data. In many cases, it makes sense to check
|
||||||
|
what happens if byte :class:`str` and :class:`unicode` strings that won't
|
||||||
|
decode in the present locale are given.
|
||||||
|
|
||||||
|
Be vigilant about spotting poor APIs
|
||||||
|
====================================
|
||||||
|
|
||||||
|
Make sure that the libraries you use return only :class:`unicode` strings or
|
||||||
|
byte :class:`str`. Unittests can help you spot issues here by running many
|
||||||
|
variations of data through your functions and checking that you're still
|
||||||
|
getting the types of string that you expect.
|
||||||
|
|
||||||
|
Example: Putting this all together with kitchen
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
The kitchen library provides a wide array of functions to help you deal with
|
||||||
|
byte :class:`str` and :class:`unicode` strings in your program. Here's
|
||||||
|
a short example that uses many kitchen functions to do its work::
|
||||||
|
|
||||||
|
#!/usr/bin/python -tt
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
import locale
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import unicodedata
|
||||||
|
|
||||||
|
from kitchen.text.converters import getwriter, to_bytes, to_unicode
|
||||||
|
from kitchen.i18n import get_translation_object
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
# Setup gettext driven translations but use the kitchen functions so
|
||||||
|
# we don't have the mismatched bytes-unicode issues.
|
||||||
|
translations = get_translation_object('example')
|
||||||
|
# We use _() for marking strings that we operate on as unicode
|
||||||
|
# This is pretty much everything
|
||||||
|
_ = translations.ugettext
|
||||||
|
# And b_() for marking strings that we operate on as bytes.
|
||||||
|
# This is limited to exceptions
|
||||||
|
b_ = translations.lgettext
|
||||||
|
|
||||||
|
# Setup stdout
|
||||||
|
encoding = locale.getpreferredencoding()
|
||||||
|
Writer = getwriter(encoding)
|
||||||
|
sys.stdout = Writer(sys.stdout)
|
||||||
|
|
||||||
|
# Load data. Format is filename\0description
|
||||||
|
# description should be utf-8 but filename can be any legal filename
|
||||||
|
# on the filesystem
|
||||||
|
# Sample datafile.txt:
|
||||||
|
# /etc/shells\x00Shells available on caf\xc3\xa9.lan
|
||||||
|
# /var/tmp/file\xff\x00File with non-utf8 data in the filename
|
||||||
|
#
|
||||||
|
# And to create /var/tmp/file\xff (under bash or zsh) do:
|
||||||
|
# echo 'Some data' > /var/tmp/file$'\377'
|
||||||
|
datafile = open('datafile.txt', 'r')
|
||||||
|
data = {}
|
||||||
|
for line in datafile:
|
||||||
|
# We're going to keep filename as bytes because we will need the
|
||||||
|
# exact bytes to access files on a POSIX operating system.
|
||||||
|
# description, we'll immediately transform into unicode type.
|
||||||
|
b_filename, description = line.split('\0', 1)
|
||||||
|
|
||||||
|
# to_unicode defaults to decoding output from utf-8 and replacing
|
||||||
|
# any problematic bytes with the unicode replacement character
|
||||||
|
# We accept mangling of the description here knowing that our file
|
||||||
|
# format is supposed to use utf-8 in that field and that the
|
||||||
|
# description will only be displayed to the user, not used as
|
||||||
|
# a key value.
|
||||||
|
description = to_unicode(description, 'utf-8').strip()
|
||||||
|
data[b_filename] = description
|
||||||
|
datafile.close()
|
||||||
|
|
||||||
|
# We're going to add a pair of extra fields onto our data to show the
|
||||||
|
# length of the description and the filesize. We put those between
|
||||||
|
# the filename and description because we haven't checked that the
|
||||||
|
# description is free of NULLs.
|
||||||
|
datafile = open('newdatafile.txt', 'w')
|
||||||
|
|
||||||
|
# Name filename with a b_ prefix to denote byte string of unknown encoding
|
||||||
|
for b_filename in data:
|
||||||
|
# Since we have the byte representation of filename, we can read any
|
||||||
|
# filename
|
||||||
|
if os.access(b_filename, os.F_OK):
|
||||||
|
size = os.path.getsize(b_filename)
|
||||||
|
else:
|
||||||
|
size = 0
|
||||||
|
# Because the description is unicode type, we know the number of
|
||||||
|
# characters corresponds to the length of the normalized unicode
|
||||||
|
# string.
|
||||||
|
length = len(unicodedata.normalize('NFC', description))
|
||||||
|
|
||||||
|
# Print a summary to the screen
|
||||||
|
# Note that we do not let implici type conversion from str to
|
||||||
|
# unicode transform b_filename into a unicode string. That might
|
||||||
|
# fail as python would use the ASCII filename. Instead we use
|
||||||
|
# to_unicode() to explictly transform in a way that we know will
|
||||||
|
# not traceback.
|
||||||
|
print _(u'filename: %s') % to_unicode(b_filename)
|
||||||
|
print _(u'file size: %s') % size
|
||||||
|
print _(u'desc length: %s') % length
|
||||||
|
print _(u'description: %s') % data[b_filename]
|
||||||
|
|
||||||
|
# First combine the unicode portion
|
||||||
|
line = u'%s\0%s\0%s' % (size, length, data[b_filename])
|
||||||
|
# Since the filenames are bytes, turn everything else to bytes before combining
|
||||||
|
# Turning into unicode first would be wrong as the bytes in b_filename
|
||||||
|
# might not convert
|
||||||
|
b_line = '%s\0%s\n' % (b_filename, to_bytes(line))
|
||||||
|
|
||||||
|
# Just to demonstrate that getwriter will pass bytes through fine
|
||||||
|
print b_('Wrote: %s') % b_line
|
||||||
|
datafile.write(b_line)
|
||||||
|
datafile.close()
|
||||||
|
|
||||||
|
# And just to show how to properly deal with an exception.
|
||||||
|
# Note two things about this:
|
||||||
|
# 1) We use the b_() function to translate the string. This returns a
|
||||||
|
# byte string instead of a unicode string
|
||||||
|
# 2) We're using the b_() function returned by kitchen. If we had
|
||||||
|
# used the one from gettext we would need to convert the message to
|
||||||
|
# a byte str first
|
||||||
|
message = u'Demonstrate the proper way to raise exceptions. Sincerely, \u3068\u3057\u304a'
|
||||||
|
raise Exception(b_(message))
|
||||||
|
|
||||||
|
.. seealso:: :mod:`kitchen.text.converters`
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue