Occasionally you may find a web page that renders a series of nonsense characters in the midst of otherwise sensible text. The nonsense characters may be question marks inside black diamonds, or inverted question marks, or things that look like Ã (the A-Tilde) or Å (the A-Ring) followed by some other characters. Whenever you see this, it's the signature of a character set encoding error. While there are many ways to botch character set encoding, as a practical matter these errors almost always arise when Extended-ASCII data and UTF-8 data are intermixed. PHP is changing its position on some aspects of character encoding at release 5.4. This presentation looks at the history of PHP character encoding and gives some strategies for dealing with legacy data in an increasingly UTF-8 world.
Ray Paseur is a consulting PHP developer who lives and works in McLean, VA. He is a member of the DC PHP Group and serves on the Product Advisory Committee for Experts-Exchange.