Tuesday, 3 May 2016

How can character encoding be made correctly in both php and mysql database











Searching high & low for a solution. I've tried many variations before posting the question.



What is required to have names appear the same in phpMyAdmin and html page? Can this even be accomplished?



EDIT 1: It would seem that this is a mysql issue. Why? Because the php generated html page will always show the correct characters. At this point it is only the database that shows incorrectly.



EDIT 2: Clarification. With the original settings shown in code snip and images below,





  1. Enter João and submit

  2. João displayed in database

  3. João display after reload



Adding the mysqli_query ( $link, 'SET NAMES utf8' )




  1. Enter João and submit

  2. João displayed in database


  3. Jo�o displayed after reload



end Edit 2



In a mysql database, viewed with phpMyAdmin:
database structure



The items appear in the database like this: (I've modified the first João to appear correct in database)




phpMyAdmin view of 2 entries of same name



And in the html page with encoding set the names appear like (order is reversed & modified has black diamond),



appearance in html page



Encoding:



I have tried changing the column collation to utf8_bin, utf8_general_ci, utf8_unicode_ci, all with no change to either side. Also changed the document (BBEdit) from UTF-8 to UTF-8 (with BOM), ISO Latin 1 and Windows Latin 1. Several of these created more black diamonds, making the issue worse. (Set to UTF-8 in images) I even tried to preg_replace ã, é etc with the encoded equivalents.




The short story is, João is entered on the page (content type above), João is in database, and João comes to the html page on refresh.



Looking for ideas. Thanks.


Answer



Character set issues are often really tricky to figure out. Basically, you need to make sure that all of the following are true:




  • The DB connection is using UTF-8

  • The DB tables are using UTF-8

  • The individual columns in the DB tables are using UTF-8


  • The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources, or changed table or column collations)

  • The web page is requesting UTF-8

  • Apache is serving UTF-8



Here's a good tutorial on dealing with that list, from start to finish: http://www.bluebox.net/news/2009/07/mysql_encoding/



It sounds like your problem is specifically that you've got double-encoded (or triple-encoded) characters, probably from changing character sets or importing already-encoded data with the wrong charset. There's a whole section on fixing that in the above tutorial.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...