While UTF-8 may be the expected norm for code and the web these days, that wasn't always the case. Legacy PostgreSQL databases defaulted to the SQL_ASCII character set. Unfortunately, SQL_ASCII is not a true character set, but a permission to scribble on the database using any (or no!) character set you want. Converting the resulting hodgepodge of character set encodings in a large and internationally-utilized database can require downtime (unacceptable), guessing (unacceptable), and/or multiple iterations (barely acceptable).
In this presentation, we'll cover the techniques used in a journey from mixed-encodings to pure UTF-8, all transcoded while the database was running online, in production!
About the Presenter (Bob Lunney)
Bob’s first encounter with a relational database involved a VAX 11/750, a tape drive, and Boeing’s database engine written in Fortran. Hooked by this gateway code, he moved on to successively harder databases like Rbase, Sybase, Informix, and Oracle, finally reaching relational nirvana with PostgreSQL. As a recovering 20-year Oracle DBA, Bob appreciates the relative simplicity, extensibility, power, camaraderie and organized anarchy of PostgreSQL and its community. Currently, Bob oversees data management as the Lead Data Architect at MeetMe.