Go Back   ZeroC Forums > Bug Reports

Reply
 
LinkBack Thread Tools Rate Thread Display Modes
  #1 (permalink)  
Old 08-10-2003
sylvain sylvain is offline
Registered User
 
Name: Sylvain Fasel
Organization: university of Geneva
Project: quantum cryptographic systems
 
Join Date: Feb 2003
Location: Geneva (Switzerland)
Posts: 33
accents problems with Freeze::map

Hi!

I am trying to implement an application using Freeze and I am experiencing problems when working with accentuated characters like "é" or "è".

When inserting a string for a key (or for a value) in a Freeze::map, I can not retrieve this string anymore (the string appears as empty) using Freeze, but the insertion was done in the db, as I can check with db_dump.

I am not sure if this is a problem with Freeze, with Berkeley DB, with my locale settings or something with conflicting utf-8 / iso-8859-15, but anyhow, I cannot make that works. Below I am pasting informations that can be relevant.

I am using linux, kernel version 2.4 (Mandrake 9.1). It seems to me that this Mandrake distribution has troubles with encoding, but I don't know enough about the subject for the moment.

The following code shows the problem:

#include <Freeze/Freeze.h>
#include <StringStringMap.h>
#include <iostream>
int
main(int argc, char* argv[])
{
Ice::CommunicatorPtr communicator=Ice::initialize(argc,argv);
Freeze::DBEnvironmentPtr dbEnv=Freeze::initialize(communicator,"db");
Freeze::DBPtr simpleDB=dbEnv->openDB("simple",true);
StringStringMap Map(simpleDB);

Map.clear();

Map.insert(std::make_pair("yes","elephant"));
Map.insert(std::make_pair("non","élephant"));

for(StringStringMap::const_iterator p=Map.begin();p!=Map.end();++p)
{
std::cout<<p->first<<" "<<p->second<<std::endl;
}

simpleDB->close();
dbEnv->close();
communicator->destroy();
return 0;
}

When running this code, the output is:

non
yes elephant

instead of

non élephant
yes elephant


The command "db_dump -p simple" output:

VERSION=3
format=print
type=btree
db_pagesize=4096
HEADER=END
\0a<Key>non</Key>
\0a<Value>\e9lephant</Value>
\0a<Key>yes</Key>
\0a<Value>elephant</Value>
DATA=END


The "StringStringMap" class was generated with:
slice2freeze --dict StringStringMap,string,string StringStringMap

My c++ compiler is gcc 3.3.

The "locale" command output:

LANG=fr_CH.ISO-8859-15
LC_CTYPE=fr_CH.ISO-8859-15
LC_NUMERIC=fr_CH.ISO-8859-15
LC_TIME=fr_CH.ISO-8859-15
LC_COLLATE=fr_CH.ISO-8859-15
LC_MONETARY=fr_CH.ISO-8859-15
LC_MESSAGES=fr_CH.ISO-8859-15
LC_PAPER=fr_CH.ISO-8859-15
LC_NAME=fr_CH.ISO-8859-15
LC_ADDRESS=fr_CH.ISO-8859-15
LC_TELEPHONE=fr_CH.ISO-8859-15
LC_MEASUREMENT=fr_CH.ISO-8859-15
LC_IDENTIFICATION=fr_CH.ISO-8859-15
LC_ALL=


I hope I give you enough informations.

Thank you in advance for any hint.

Last edited by sylvain : 08-10-2003 at 10:27 AM.
Reply With Quote
  #2 (permalink)  
Old 08-10-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Can you try the following C++ code:

std::string s = "élephant";
std::cout << s << endl;

What does it print?

As a general note, I recommend to use unicode and std::wstring, and then convert to std::string:

std::wstring ws = ... "élephant" in unicode format ...
std::string s = IceUtil::wstringToString(ws); // s now holds "élephant" in utf-8

I don't think that this is the cause of your problems. But the on-the-wire string representation for the Ice protocol is UTF-8, not ISO-8859-15. If you don't use UTF-8, you won't be able to interoperate for example with Ice for Java if you use non-ASCII strings.
Reply With Quote
  #3 (permalink)  
Old 08-10-2003
sylvain sylvain is offline
Registered User
 
Name: Sylvain Fasel
Organization: university of Geneva
Project: quantum cryptographic systems
 
Join Date: Feb 2003
Location: Geneva (Switzerland)
Posts: 33
Quote:
Originally posted by marc
Can you try the following C++ code:

std::string s = "élephant";
std::cout << s << endl;

What does it print?
it prints:
élephant

Quote:
As a general note, I recommend to use unicode and std::wstring, and then convert to std::string:

std::wstring ws = ... "élephant" in unicode format ...
std::string s = IceUtil::wstringToString(ws); // s now holds "élephant" in utf-8
I tried the following modification of my previous code example:
(...)
std::string ss="élephant";
std::wstring ws(ss.begin(),ss.end());// in unicode format ...
Map.insert(std::make_pair("yes","elephant"));
Map.insert(std::make_pair("non",IceUtil::wstringTo String(ws)));
(...)
But the output is still wrong. By the way:
std::string ss="élephant";
std::wstring ws(ss.begin(),ss.end());// in unicode format ...
std::string s = IceUtil::wstringToString(ws); // s now holds "élephant" in utf-8
std::cout << s << std::endl;

prints:
élephant
as well...

Quote:
I don't think that this is the cause of your problems. But the on-the-wire string representation for the Ice protocol is UTF-8, not ISO-8859-15. If you don't use UTF-8, you won't be able to interoperate for example with Ice for Java if you use non-ASCII strings.
I think I will have to learn a little bit about wchar_t, UTF-8, and all that stuff. That was on my "to learn" list anyway...

Thank you.
Reply With Quote
  #4 (permalink)  
Old 08-10-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Quote:
Originally posted by sylvain

I tried the following modification of my previous code example:
(...)
std::string ss="élephant";
std::wstring ws(ss.begin(),ss.end());// in unicode format ...
Map.insert(std::make_pair("yes","elephant"));
Map.insert(std::make_pair("non",IceUtil::wstringTo String(ws)));
(...)
That won't work. You must put "élephant" in Unicode format into your editor.

The reason why all this doesn't work, is because you are using "élephant" in ISO format, but the XML encoding in Freeze expects it in Unicode format.

You can either use an editor that support Unicode, or you have to enter the escape sequence to represent "élephant" in Unicode. Or you must look for a method that conversts ISO to Unicode.

Note that future versions of Freeze will be less sensitive with respect to such problems, when we use binary encodings for Freeze. However, it's still wrong, because you are using ISO strings where Ice expects Unicode.

Cheers,
Marc
Reply With Quote
  #5 (permalink)  
Old 08-10-2003
sylvain sylvain is offline
Registered User
 
Name: Sylvain Fasel
Organization: university of Geneva
Project: quantum cryptographic systems
 
Join Date: Feb 2003
Location: Geneva (Switzerland)
Posts: 33
Obviously, I should dive a little bit more into the subject...

But there is something I don't understand. It seems that the Ice runtime correctly received the accentuated word "élephant" because in the db the entry is
"\0a<Value>\e9lephant</Value>"
for "élephant" so with a \e9 for the é (\00e9 is the unicode for 'é' no?)
and
"\0a<Value>elephant</Value>"
for "elephant".

So why does the iteration through the map retreive an empty string for the value "\e9lephant"? This string has been written down by the Ice runtime, so the Ice runtime should be able to retrieve it. For iso/unicode mismatch reason, I don't expect the exact "élephant" string to be retrieved, but maybe something like "lephant" ...

I did some tries with accentuated word like "mariés" where the accentuated letter is not the first, but I still get an empty string. Where are my non-accentuated characters gone?

regards,
Sylvain
Reply With Quote
  #6 (permalink)  
Old 08-10-2003
sylvain sylvain is offline
Registered User
 
Name: Sylvain Fasel
Organization: university of Geneva
Project: quantum cryptographic systems
 
Join Date: Feb 2003
Location: Geneva (Switzerland)
Posts: 33
Just for your information in the case of this is relevant:

I tested a little further by including a Java client that sends a string to a (c++) servant that insert it in a StringStringMap in the same way than the code in my first post, then print out the whole content of the Freeze::map.

The Java client is just a text field (swing) and a button. The content of the text field is sent to the servant when the button is clicked using a simple operation from an Ice interface: "void add(string s)"

My Java system is the one bundled with the Sun NetBeasn 3.5 IDE (so its jsdk/jre 1.4.2).

When I enter "élephant" the servant says it received : " élephant " and put it in the db, but is unable to retrieve it from the db. More precisely: the value returned is an empty string (not a weird string). The same for word with accentudated characters not at the beginning of the word.

A "db_dump -p" of the db file output:

\0a<Value>\c3\a9lephant</Value>

I don't know exactly how my java handles the accentuated character nor which charset it use, but I think it should be independant from the charset of the system...

Anyway, as far as I understand it the problem is the same: the Ice runtime is writing down in the db something it can not retrieve. If I make a distributed program, how could I ensure that all the clients are running on systems with the correct charset?


Regards,
Sylvain

Last edited by sylvain : 08-11-2003 at 06:18 AM.
Reply With Quote
  #7 (permalink)  
Old 08-11-2003
mes's Avatar
mes mes is offline
ZeroC Staff
 
Name: Mark Spruiell
Organization: ZeroC, Inc.
Project: Ice Developer
 
Join Date: Feb 2003
Location: California
Posts: 962
Hi,

I've been able to reproduce this problem. It looks like an issue with Xerces-C++, which is used in the XML encoding of Freeze maps. As an alternative, you can use the binary encoding by specifying the option --binary to slice2freeze.

I will reply again when I've resolved this issue.

- Mark
Reply With Quote
  #8 (permalink)  
Old 08-12-2003
mes's Avatar
mes mes is offline
ZeroC Staff
 
Name: Mark Spruiell
Organization: ZeroC, Inc.
Project: Ice Developer
 
Join Date: Feb 2003
Location: California
Posts: 962
This was caused by a bug in Ice, and has been fixed for the next release. If you would like a patch sooner, please let me know.

Thanks for the bug report.

- Mark
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
problems with VC8.0 stephan Bug Reports 2 07-31-2006 08:39 AM
some problems with dll simpley Help Center 1 01-21-2005 03:18 AM
Hello example problems jpm Help Center 1 06-02-2003 03:26 PM


All times are GMT -4. The time now is 11:00 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0
(c) 2008 ZeroC, Inc.