[ACCEPTED]-MySQL Convert latin1 data to UTF8-load-data-infile

Accepted answer
Score: 11

I've had cases like this in old wordpress 16 installations with the problem being that 15 the data itself was already in UTF-8 within 14 a Latin1 database (due to WP default charset). This 13 means there was no real need for conversion 12 of the data but the ddbb and table formats. In 11 my experience things get messed up when 10 doing the dump as I understand MySQL will 9 use the client's default character set which 8 in many cases is now UTF-8. Therefore making 7 sure that exporting with the same coding 6 of the data is very important. In case of 5 Latin1 DDBB with UTF-8 coding:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

Then replace 4 the Latin1 references within the exported 3 dump before reimporting to a new database 2 in UTF-8. Sort of:

$ replace "CHARSET=latin1" "CHARSET=utf8" \
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

In my case this link was of great 1 help. Commented here in spanish.

Score: 7

Though it is hardly still actual for the 12 OP, I happen to have found a solution in 11 MySQL documentation for ALTER TABLE. I post it here 10 just for future reference:

Warning

The CONVERT TO 9 operation converts column values between 8 the character sets. This is not what you 7 want if you have a column in one character 6 set (like latin1) but the stored values 5 actually use some other, incompatible character 4 set (like utf8). In this case, you have 3 to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

The 2 reason this works is that there is no conversion 1 when you convert to or from BLOB columns.

Score: 2

LOAD DATA INFILE allows you to set an encoding 1 file is supposed to be in:

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Score: 2

I wrote that http://code.google.com/p/mysqlutf8convertor/ for Latin Database to UTF-8 2 Database. All tables and field to change 1 UTF-8.

Score: 1

Converting latin1 to UTF8 is not what you 3 want to do, you kind of need the opposite.

If 2 what really happened was this:

  1. UTF-8 strings were interpreted as Latin-1 and transcoded to UTF-8, mangling them.
  2. You are now, or could be, reading UTF-8 strings with no further interpretation

What you must 1 do now is:

  1. Read the "UTF-8" with no transcode.
  2. Convert it to Latin-1. Now you should actually have the original UTF-8.
  3. Now put it in your "UTF-8" column with no further conversion.
Score: 1

I recently completed a shell script that 6 automates the conversion process. It is 5 also configurable to write custom filters 4 for any text you wish to replace or remove. For 3 example : stripping HTML characters etc. Table 2 whitelists and blacklists are also possible. You 1 can download it at sourceforge: https://sourceforge.net/projects/mysqltr/

Score: 0

Try this:

1) Dump your DB

mysqldump --default-character-set=latin1 -u username -p databasename < dump.sql

2) Open dump.sql 3 in text editor and replace all occurences 2 of "SET NAMES latin1" by "SET NAMES utf8"

3) Create 1 a new database and restore your dumpfile

cat dump.sql | mysql -u root -p newdbname

More Related questions