Saturday, 25 June 2016

php - "Â " character showing up instead of " "



I found this thread which describes my issue pretty well and this answer describes my issue exactly.




The non-breaking space character is byte 0xA0 is ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as "Â ". That includes a trailing nbsp...




However, I have managed to track my issue down to a function I use to wrap image tags in divs.




function img_format($str)
{
$doc = new DOMDocument();
@$doc->loadHTML($str); // <-- Bonus points for the explaination of the @

// $tags object
$tags = $doc->getElementsByTagName('img');

foreach ($tags as $tag) {


$div = $doc->createElement('div');
$div->setAttribute('class','inner-copy');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);

$tag->setAttribute('class', 'inner-img');
}

$str = $doc->saveHTML();


return $str;
}


Quite simply, how can I fix this issue within this function?



I understand using;







will fix this issue, but there is obviously something I'm overlooking within the function itself.



I've tried;



$dom->validateOnParse = true;


To no avail. (I don't quite know what that does anyway)



Answer



Found it!



@$doc->loadHTML(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));


This answer explains the issue and gives the work around above;




DOMDocument::loadHTML will treat your string as being in ISO-8859-1 unless you tell it otherwise. This results in UTF-8 strings being interpreted incorrectly.




No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...