Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: how to continue invalid gbk char?

From: xiao feng <xiao.feng(at)trs.com.cn>
Date: Fri Jul 20 2007 - 05:41:38 EDT


It is very common in Chinese website, contain both GBK and GB18030. These chars converted from UTF8 to GB。

  • Original Message ----- From: "xiao feng" <xiao.feng@trs.com.cn> To: "Sergei Golubchik" <serg@mysql.com> Cc: <internals@lists.mysql.com> Sent: Thursday, July 19, 2007 8:59 AM Subject: Re: how to continue invalid gbk char?

> Hi,
> A string has some US2 char, but invalid in GBK.
>
> GB18030 can conver all GBK chars.
> It is very common in Chinese website, maybe some GB18030 chars convert
> from UTF8.
>
> Mysql cut off the string contain invalid chars, I think it is very bad
> news.
> I think it is a shortcoming, mybe replaced with "?" or other char.
>
> Such as Java/Perl/Win32API/iconv, always replaced invalid char with '?'
> or 0x25A1.
>
>
> ----- Original Message -----
> From: "Sergei Golubchik" <serg@mysql.com>
> To: "xiao feng" <xiao.feng@trs.com.cn>
> Cc: <internals@lists.mysql.com>
> Sent: Wednesday, July 18, 2007 7:20 PM
> Subject: Re: how to continue invalid gbk char?
>
>> Hi!
>>
>> On Jul 06, xiao feng wrote:
>>> Hi,
>>> In my storage engine, some data has a invalid gbk char, but mysql cut
>>> off the string.
>>>
>>> how to continue invalid gbk char?
>>>
>>> example:
>>> 1111(invalid gbk char)2222
>>> the client only read "1111".
>>>
>>> how to resolve the problem??thanks!
>>
>> You cannot. MySQL ensures that the content of a gbk column is a valid
>> gbk string.
>>
>> There's a loophole, though, if there's a charset converion involved,
>> characters that cannot be converted are replaced with question marks.
>> That means, if you define your column to be utf8, then when you enter
>> invalid gbk character it'll be inserted as '?', and the string won't be
>> cut. Or simly set character_set_connection to utf8, and keep the column
>> and the client in gbk.
>>
>> Alternatively, you can, of course, define your column as binary - then
>> MySQL won't do any character validation, the string will be simply an
>> array of bytes.
>>
>> Regards / Mit vielen Grüssen,
>> Sergei
>>
>> --
>> __ ___ ___ ____ __
>> / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@mysql.com>
>> / /|_/ / // /\ \/ /_/ / /__ Principal Software Developer
>> /_/ /_/\_, /___/\___\_\___/ MySQL GmbH, Radlkoferstr. 2, D-81373
>> München
>> <___/ Geschäftsführer: Kaj Arnö - HRB München
>> 162140
>

-- 
MySQL Internals Mailing List
For list archives: 
http://lists.mysql.com/internals
To unsubscribe:    
http://lists.mysql.com/internals?unsub=lists@pantek.com
Received on Fri Jul 20 05:38:35 2007

This archive was generated by hypermail 2.1.8 : Thu Aug 09 2007 - 19:06:19 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library