Go Back   ZeroC Forums > Comments

Reply
 
LinkBack Thread Tools Rate Thread Display Modes
  #1 (permalink)  
Old 04-11-2008
cebix cebix is offline
Registered User
 
Name: Christian Bauer
Organization: AREVA NP
Project: Diagnostics frontend for industrial I&C
 
Join Date: Jun 2007
Location: Erlangen, Germany
Posts: 5
Python and Unicode

Quote:
For languages other than C++, Ice encodes strings in their native Unicode representation, so applications can transparently use characters from non-English alphabets.
...says chapter 32.21 of the Ice manual, but if that is the concept shouldn't IcePy map Slice strings to Python Unicode strings instead of 8-bit strings?

For example, the Python implementation of the "Hello World" demo server:
Code:
class PrinterI(Demo.Printer):
  def printString(self, s, current=None):
    print s
only works with a UTF-8 locale. 's' is an 8-bit string which, as long as the printString() operation is called from a correctly written client, will always be in UTF-8. A more correct server implementation should look something like:
Code:
class PrinterI(Demo.Printer):
  def printString(self, s, current=None):
    print s.decode('utf8')
which would, however, generate a UnicodeEncodeError if a client sends a string with characters that are not representable in the server's locale, so more effort is needed to have robust printing in the server (the best I've been able to come up with is
Code:
print s.decode('utf8').encode(locale.getpreferredencoding(), 'replace')
which is not that trivial any more...).

Likewise, in a Python client, I would like to be able to
Code:
printer.printString(u"Hällö Wörld!")
directly (after setting the proper coding for the Python script, of course) instead of
Code:
printer.printString(u"Hällö Wörld!".encode('utf8'))
but this gives me a "ValueError: invalid value for argument 1 in operation `printString'" from Ice.

Alternatively, if IcePy uses 8-bit strings for Slice strings, it should provide an automatic string conversion facility as in C++. Our applications have to run with a Latin1 locale for legacy reasons. In C++ this works very nice and transparent after installing a UTF-8 <-> Latin1 StringConverter, but in Python it gets ugly and increases the potential for mistakes (are encode/decode correctly applied to all strings that go over the Ice interface?).

I guess the best option I currently have is to patch the C++ code of the IcePy module to install a StringConverter there?

In any case, it would be nice if IcePy could marshal Unicode strings to Slice strings instead of raising a ValueError.
Reply With Quote
  #2 (permalink)  
Old 04-23-2008
bernard's Avatar
bernard bernard is offline
ZeroC Staff
 
Name: Bernard Normier
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Palm Beach Gardens, FL
Posts: 751
Hi Christian,

Thanks for your analysis: these issues will be addressed in Ice 3.3.0.

We'll provide the ability to plug-in a string converter (with the underlying Ice for C++ communicator), and you'll be able to pass Unicode strings as in parameters for remote operations that take Slice strings.

Best regards,
Bernard
__________________
Bernard Normier
ZeroC, Inc.
Reply With Quote
  #3 (permalink)  
Old 04-23-2008
bernard's Avatar
bernard bernard is offline
ZeroC Staff
 
Name: Bernard Normier
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Palm Beach Gardens, FL
Posts: 751
Hi Christian,

Quote:
which would, however, generate a UnicodeEncodeError if a client sends a string with characters that are not representable in the server's locale, so more effort is needed to have robust printing in the server (the best I've been able to come up with is
Code:
print s.decode('utf8').encode(locale.getpreferredencoding(), 'replace')
which is not that trivial any more...).
We provide 3 string converter implementations in Ice 3.3:
  • UnicodeWstringConverter
    Converts UTF-16 or UTF-32 wstrings to/from UTF-8 sequences. By default, it's "lenient", i.e. some malformed input sequences are transformed into the Unicode replacement character. In 3.3.0, you'll be able to get the strict behavior as well (no replacement character).
  • IconvStringConverter
    Converts narrow or wstrings from the specified iconv encoding to/from UTF-8. It's always strict, i.e. if there is no mapping, you get an exception.
  • WindowsStringConverter
    Converts narrow strings encoded in a given code-page to/from UTF-8. Like with the Iconv converter, if there is no mapping, you get an exception.

So if you want to use a replacement character, you'll probably need to write your own C++ string converter.

Cheers,
Bernard
__________________
Bernard Normier
ZeroC, Inc.
Reply With Quote
  #4 (permalink)  
Old 04-25-2008
cebix cebix is offline
Registered User
 
Name: Christian Bauer
Organization: AREVA NP
Project: Diagnostics frontend for industrial I&C
 
Join Date: Jun 2007
Location: Erlangen, Germany
Posts: 5
Hi Bernard,

Thanks for taking the time to look into this. I'll be looking forward to Ice 3.3, then.
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
icepatch2 unicode error incubator Help Center 5 11-22-2006 04:04 AM
Python BiT Help Center 1 09-19-2006 05:21 PM
UNICODE not required in 1.3 amrufon Comments 1 03-11-2004 07:22 AM
unicode does not support multiple word languare. damingyipai Bug Reports 4 03-09-2004 10:28 PM
Unicode question Mr.Freeze Help Center 1 08-14-2003 07:43 AM


All times are GMT -4. The time now is 04:59 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0
(c) 2008 ZeroC, Inc.