HOWTO: Configure an Amazon RDS instance to use UTF-8

RDS has been working out pretty well for AdGrok — it’s a one-click MySQL 5.1 instance that seems pretty promising:

  • Automatic replication and failover to another deployment zone (colo)
  • Easy to set up firewall configuration
  • Easy to scale up (but not down) for CPU and disk space

If you’re just starting out, it’s a whole lot simpler than, say, a DRBD/Heartbeat/MySQL configuration running on EC2 instances.

There’s just one catch. It doesn’t use UTF-8 encoding by default, it uses latin1_swedish. If you’re going to do business outside the US, utf8 is a must.

The second catch — the Amazon web UI for managing RDS “parameter groups” is read-only. If you know the magick incantations, though, it’s not that bad. Here’s how to make it all go:

Install the RDS tools

I found them here, but that URL doesn’t seem very “authoritative.” The googles might find a newer version for you.

Unzip the archive into /opt, and add this to your ~/.bashrc — but note that the JAVA_HOME is a mac-specific thing — change that, and the REGION, to your liking, and go fetch your specific cert and private key from your EC2 “account” page. Note also that this assumes you made a symlink from the version of the RDS tools to /opt/rds, and from the ec2 tools to /opt/ec2.

Create your RDS instance

OK. Let’s assume you’ve got all that squared away now, and that you’re starting out with a new database (so you don’t have to convert the contents from latin1 to utf8). In the web interface, click the RDS tab, and create a new RDS instance. If in doubt, create a new security group. You can change the definition of the security group, but you can’t change the group for a given RDS instance. I set up “classes” of security groups for AdGrok — one for the blog, and one for the production application.

Let’s also assume you named your database “db1” because you’re an engineer, and think that creativity in naming should be reserved for yachts.

Set up and assign the Parameter Group

Create and configure a new “parameter group” — the default parameter group, default.mysql5.1, isn’t editable.

Then assign your database instance to use the new group:

And the final bit of pain — to make the parameter group settings take affect, you need to reboot the instance. Note that this will take several minutes before you see your RDS instance back online.

And finally, test that everything stuck by opening a mysql prompt:

You want all utf8s — if you don’t, make sure your .my.cnf has these lines:

and, if you’re on Ruby on Rails, your config/database.yml has these lines:

Posted in Technical HOWTOs Tagged with: , , ,
  • Ross Golder

    “Let’s also assume you named your database “db1″ because you’re an engineer, and think that creativity in naming should be reserved for yachts.” 🙂

  • Ross Golder

    btw, you’ve got a typo in the rds-create-db-parameter-group command. It should probably be a ‘-f’ not a ‘-e’.

  • I had to set character_set_database as well. So my modify statement looked like:

    rds-modify-db-parameter-group utf8  
    –parameters=”name=character_set_database, value=utf8, method=immediate”  
    –parameters=”name=character_set_server, value=utf8, method=immediate”  
    –parameters=”name=collation_server, value=utf8_general_ci, method=immediate”

  • Thanks for the information. I had same problems. I dint want to mess with the command lines. So i found easy GUI way to change my.ini parameters. I wrote a post here, it may help you guys.

  • You should probably use utf8_unicode_ci.   utf8_general_ci doesn’t do proper sorting in all languages, and its only benefit it a minor performance increase for sorting.

  • notyoutoo

    I had problems just changing parameters (and rebooting) for an existing DB because when I did:
    > show create database foo;

    I was still seeing:
    > CREATE DATABASE foo /*!40100 DEFAULT CHARACTER SET latin1 */

    And in the information_schema/SCHEMATA the DEFAULT_CHARACTER_SET_NAME and DEFAULT_COLLATION_NAME were still latin1/latin1_general_ci respectively. :-{

    My DBA clued me in on how they were getting databases created with the correct _default_ character set (which is what JIRA inexplicably uses instead of explicitly setting it to UTF8) — it was happening when they did the restore from mysqldump files.

    Which led me to the (slightly confusing) solution that after setting all of the above parameters and rebooting the RDS instance, you still need to do:

    > DROP DATABASE foo;

    It was hard to wrap my head around this because I was actually connecting to foo on the mysql command-line, so I wasn’t aware I could delete the database I was connecting to. But apparently I can.

  • No

    As an update. You can now create parameter groups from within the UI.

  • No Ducks

    Thanks. Works great. For anyone doing this, just remember to select the correct parameter group family, as the parameter group won’t appear in the ‘Modify DB Instance’ dropdown otherwise.