Ruby 1.9 Encoding Issues, Again.

By Loren Segal on November 2nd, 2009 at 11:45 PM

I covered Ruby 1.9 encodings a while back on my blog, but apparently I left out a few other major issues. I noticed these just recently, when running 1.9 on a new environment. It turns out, tweaking your environment is all it takes to fix most of the issues, and I had already done this on my main machine.

So here we go, the issue:

You’re using Rails, and you see this:

invalid byte sequence in US-ASCII

You might also see:

incompatible character encodings: ASCII-8BIT and UTF-8

That happens all the time when trying to get YARD to parse Rails source. It also happens when your ERB templates and your DB data have mismatched encodings.

The Problem

Ruby has multiple default encoding values: internal (the default encoding for new String objects), external (the default encoding for file data), and script (the default encoding of the content of Ruby scripts). When dealing with IO, you need to make sure both your internal and external encodings match up, especially since Ruby defaults to ASCII-7BIT and not UTF-8. The “script” encoding problems were covered in my last blog post on the subject. 

The Fix

All you need to do is tweak your environment variables (your region / language may differ):

$ export LANG=en_US.UTF-8

Now Ruby will use UTF-8 for your external encodings:

$ LANG=en_US.ASCII-7BIT irb19
>> __ENCODING__
=> #<Encoding:US-ASCII>
$ LANG=en_US.UTF-8 irb19
>> __ENCODING__
=> #<Encoding:UTF-8>

Note, this doesn’t cover your default_internal, but usually this will be handled for you. And if you can’t set your ENV, you can set this stuff right in Ruby:

Encoding.default_internal, Encoding.default_external = ['utf-8'] * 2

This sets both your default internal and external encodings.

Now when Rails (well, ERB) tries to read files on disk, it will default to UTF-8 rather than ASCII, and your UTF-8 data from the DB will work just fine.

If you have this problem the other way around (your DB is ASCII), just reverse everything I said.

Questions? Comments? Follow me on Twitter (@lsegal) or email me.