<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>gnuu.org &#187; unicode</title>
	<atom:link href="http://gnuu.org/tag/unicode/feed/" rel="self" type="application/rss+xml" />
	<link>http://gnuu.org</link>
	<description>my word against yours, fight.</description>
	<lastBuildDate>Fri, 16 Jul 2010 22:12:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Get Ruby 1.9, Rails, MySQL and UTF-8 to Work Together</title>
		<link>http://gnuu.org/2009/11/06/ruby19-rails-mysql-utf8/</link>
		<comments>http://gnuu.org/2009/11/06/ruby19-rails-mysql-utf8/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 02:01:33 +0000</pubDate>
		<dc:creator>Loren Segal</dc:creator>
				<category><![CDATA[post]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby 1.9]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf-8]]></category>

		<guid isPermaLink="false">http://gnuu.org/2009/11/06/get-ruby-1-9-rails-mysql-and-utf-8-to-work-together/</guid>
		<description><![CDATA[Update: You can also try out my fork of mysql-ruby that is properly encoding aware. It doesn&#8217;t just convert everything to UTF-8, but rather to your default encoding, so make sure to set your LANG! Here&#8217;s a quick little hack to get MySQL 2.8.1 using UTF-8 in Rails 2.3.4 and Ruby 1.9.1. Filename: lib/mysql_utf8.rb Put [...]]]></description>
			<content:encoded><![CDATA[<p class="note"><strong>Update:</strong> You can also try out my <a href="http://github.com/lsegal/mysql-ruby">fork of mysql-ruby</a> that is properly encoding aware. It doesn&#8217;t just convert everything to UTF-8, but rather to your default encoding, so make sure to <a href="http://gnuu.org/2009/11/02/ruby-1-9-encoding-issues-again/">set your LANG</a>!</p>
<p>Here&#8217;s a quick little hack to get MySQL 2.8.1 using UTF-8 in Rails 2.3.4 and Ruby 1.9.1.</p>
<p class="note">Filename: lib/mysql_utf8.rb</p>
<p><script src="http://gist.github.com/228455.js"></script></p>
<p><!--
<pre class="sh_ruby">class Mysql::Result
  def encode(value, encoding = &#8220;utf-8&#8243;)
    String === value ? value.force_encoding(encoding) : value
  end

  def each_utf8(&#038;block)
    each_orig do |row|
      yield row.map {|col| encode(col) }
    end
  end
  alias each_orig each
  alias each each_utf8

  def each_hash_utf8(&#038;block)
    each_hash_orig do |row|
      row.each {|k, v| row[k] = encode(v) }
      yield(row)
    end
  end
  alias each_hash_orig each_hash
  alias each_hash each_hash_utf8
end</pre>
<p>--></p>
<p>Put that snippet in your Rails project (I used <tt>lib/mysql_utf8.rb</tt>) and load it in your environment.</p>
<p>That's all. Now your queries should use Unicode:</p>
<pre class="sh_ruby">$ ./script/console
Loading development environment (Rails 2.3.4)
&gt;&gt; u = User.find(1)
=&gt; #&lt;User id: 1, name: &quot;Test&quot;, ...&gt;
&gt;&gt; u.name.encoding
=&gt; #&lt;Encoding:ASCII-8BIT&gt;
&gt;&gt; require 'lib/mysql_utf8'
=&gt; []
&gt;&gt; u = User.find(1)
=&gt; #&lt;User id: 1, name: &quot;Test&quot;, ...&gt;
&gt;&gt; u.name.encoding
=&gt; #&lt;Encoding:UTF-8&gt;</pre>
<p>Note that if you have BLOB types in MySQL you’ll need to force_encoding back to another type. You could do this in your model.</p>]]></content:encoded>
			<wfw:commentRss>http://gnuu.org/2009/11/06/ruby19-rails-mysql-utf8/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ruby 1.9 Encoding Issues, Again.</title>
		<link>http://gnuu.org/2009/11/02/ruby-1-9-encoding-issues-again/</link>
		<comments>http://gnuu.org/2009/11/02/ruby-1-9-encoding-issues-again/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 04:45:39 +0000</pubDate>
		<dc:creator>Loren Segal</dc:creator>
				<category><![CDATA[post]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[ruby 1.9]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://gnuu.org/2009/11/02/ruby-1-9-encoding-issues-again/</guid>
		<description><![CDATA[I covered Ruby 1.9 encodings a while back on my blog, but apparently I left out a few other major issues. I noticed these just recently, when running 1.9 on a new environment. It turns out, tweaking your environment is all it takes to fix most of the issues, and I had already done this [...]]]></description>
			<content:encoded><![CDATA[<p>I covered Ruby 1.9 encodings <a href="http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/">a while back</a> on my blog, but apparently I left out a few other major issues. I noticed these just recently, when running 1.9 on a new environment. It turns out, tweaking your environment is all it takes to fix most of the issues, and I had already done this on my main machine.</p>
<p>So here we go, the issue:</p>
<h3>You’re using Rails, and you see this:</h3>
<blockquote><p>invalid byte sequence in US-ASCII</p>
</blockquote>
<p>You might also see:</p>
<blockquote><p>incompatible character encodings: ASCII-8BIT and UTF-8</p>
</blockquote>
<p>That happens <em>all the time</em> when trying to get <a href="http://yard.soen.ca">YARD</a> to parse Rails source. It also happens when your ERB templates and your DB data have mismatched encodings.</p>
<h3>The Problem</h3>
<p> Ruby has multiple default encoding values: internal (the default encoding for new String objects), external (the default encoding for file data), and script (the default encoding of the content of Ruby scripts). When dealing with IO, you need to make sure both your internal and external encodings match up, especially since Ruby defaults to ASCII-7BIT and not UTF-8. The “script” encoding problems were covered in my <a href="http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/">last blog post</a> on the subject.&#160;<br />
<h3>The Fix</h3>
<p>All you need to do is tweak your environment variables (your region / language may differ):</p>
<pre>$ export LANG=en_US.UTF-8</pre>
<p>Now Ruby will use UTF-8 for your external encodings: </p>
<pre>$ LANG=en_US.ASCII-7BIT irb19
&gt;&gt; __ENCODING__
=&gt; #&lt;Encoding:US-ASCII&gt;
$ LANG=en_US.UTF-8 irb19
&gt;&gt; __ENCODING__
=&gt; #&lt;Encoding:UTF-8&gt;</pre>
<p>Note, this doesn’t cover your default_internal, but usually this will be handled for you. And if you can’t set your ENV, you can set this stuff right in Ruby:</p>
<pre class="sh_ruby">Encoding.default_internal, Encoding.default_external = ['utf-8'] * 2</pre>
<p>This sets both your default internal and external encodings. </p>
<p>Now when Rails (well, ERB) tries to read files on disk, it will default to UTF-8 rather than ASCII, and your UTF-8 data from the DB will work just fine.</p>
<p>If you have this problem the other way around (your DB is ASCII), just reverse everything I said.</p>]]></content:encoded>
			<wfw:commentRss>http://gnuu.org/2009/11/02/ruby-1-9-encoding-issues-again/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ruby 1.9 Common Problems Pt. 1: Encoding</title>
		<link>http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/</link>
		<comments>http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 06:05:41 +0000</pubDate>
		<dc:creator>Loren Segal</dc:creator>
				<category><![CDATA[post]]></category>
		<category><![CDATA[BlueCloth]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[programming languages]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby 1.9]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/</guid>
		<description><![CDATA[If you’re migrating from Ruby 1.8.x to 1.9 you probably have run into one of the following error messages: invalid multibyte char (US-ASCII) - OR - CompatibilityError: incompatible encoding regexp match (Windows-31J regexp with UTF-8 string) The errors themselves are relatively self-explanatory. Ruby 1.9 is far more Unicode aware than 1.8, and this error happens [...]]]></description>
			<content:encoded><![CDATA[<p>If you’re migrating from Ruby 1.8.x to 1.9 you probably have run into one of the following error messages:</p>
<p><code class="dark">invalid multibyte char (US-ASCII)</code></p>
<p>- OR -</p>
<p><code class="dark">CompatibilityError: incompatible encoding regexp match (Windows-31J regexp with UTF-8 string)</code></p>
<p>The errors themselves are relatively self-explanatory. Ruby 1.9 is far more Unicode aware than 1.8, and this error happens when have some Unicode (usually UTF-8) in one of your files. What you have to do to fix these, however, is not always as straightforward.</p>
<p>Well, after some time pulling my hair out, I&#8217;ve figure out that the solutions to these issues are actually quite simple. </p>
<h3>Invalid multibyte char (encoding here)</h3>
<p>If you get this issue, add the following to the top of each exploding file (below the <a href="http://en.wikipedia.org/wiki/Shebang_(Unix)">shebang</a> if there is one):</p>
<pre class="dark"># encoding: utf-8</pre>
<p style="border-right: #ccc 1px solid; padding-right: 7px; border-top: #ccc 1px solid; padding-left: 7px; font-size: 0.8em; background: #eee; padding-bottom: 7px; margin: 7px; border-left: #ccc 1px solid; padding-top: 7px; border-bottom: #ccc 1px solid"><strong>Note:</strong> You can also use &quot;coding:&quot; or even the Emacs style <tt>-*- encoding: utf-8 -*-</tt> but I like the simple term, &#8216;encoding&#8217;. Also note that you might need to replace &#8216;utf-8&#8242; wth your specific encoding if it&#8217;s something else.</p>
<p>This should resolve the issue.</p>
<p>Basically, Ruby by default assumes that every file is encoded as US-ASCII, and so when it reads UTF-8 (or any Unicode) it freaks out because it is beyond the 7-bit encoding. You have to tell it that the file is encoded as utf-8 by listing it as we did at the top of the file. Yes, it&#8217;s a little anal, but I&#8217;m sure we&#8217;ll get used to it.</p>
<h3>Incompatible encoding regexp match</h3>
<p>This issue is a little bit hairier. I had this problem with the <a href="http://www.deveiate.org/projects/BlueCloth/">BlueCloth</a> 1.0.0 gem (on line 972 of bluecloth.rb). It turns out that there are a <a href="http://www.zenspider.com/Languages/Ruby/QuickRef.html#11">few switches</a> that turn on specific encodings, and for some reason BlueCloth turns on the <tt>//s</tt> switch which enables the SJIS encoding (maybe <a href="http://daringfireball.net/">John Gruber</a> wanted <tt>//m</tt> for multiline?). Ruby 1.8 didn&#8217;t mind having this on, but 1.9 freaks out. Moral of the story, when you see this error, check your Regexp switches.</p>]]></content:encoded>
			<wfw:commentRss>http://gnuu.org/2009/02/02/ruby-19-common-problems-pt-1-encoding/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ruby 1.9.1 Favorite New Features</title>
		<link>http://gnuu.org/2009/01/31/ruby-191-favorite-new-features/</link>
		<comments>http://gnuu.org/2009/01/31/ruby-191-favorite-new-features/#comments</comments>
		<pubDate>Sat, 31 Jan 2009 21:35:43 +0000</pubDate>
		<dc:creator>Loren Segal</dc:creator>
				<category><![CDATA[post]]></category>
		<category><![CDATA[enumerations]]></category>
		<category><![CDATA[new features]]></category>
		<category><![CDATA[ordered hash]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby 1.9]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://gnuu.org/2009/01/31/ruby-191-favorite-new-features/</guid>
		<description><![CDATA[I just grabbed Ruby 1.9.1 today (it was released January 30th 2009, so I’m 24 hours late) and decided to play with it. I’m interested in the changes they’ve been making to the parser, something I’m particularly interested in given my Ruby documentation tool, YARD, is highly dependent on syntax (and I’m messing with the [...]]]></description>
			<content:encoded><![CDATA[<p>I just grabbed <a href="http://ruby-lang.org">Ruby 1.9.1</a> today (it was released January 30th 2009, so I’m 24 hours late) and decided to play with it. I’m interested in the changes they’ve been making to the parser, something I’m particularly interested in given my Ruby documentation tool, <a href="http://github.com/lsegal/yard">YARD</a>, is highly dependent on syntax (and I’m messing with the parser), but some of the other changes are quite nice too. I came across this nice <a href="http://www.scribd.com/doc/2589469/Migrating-to-Ruby-19">migration guide</a> that covers some of the basic changes, but here is a more distilled list of my favorites:</p>
<h3>1. Hash entries are <em>Ordered!</em></h3>
<pre class="sh_ruby">h = { cat: 1, zebra: 2, dog: 3 }
h.delete(:zebra)
h[:monkey] = 4
h.each {|key, val| puts "#{key}: #{val}" }

# Guaranteed to print:
# cat: 1
# dog: 3
# monkey: 4</pre>
<p>I’ve been praying for this for who knows how long. Everyone knows hashes don’t maintain order internally, but much of Ruby development makes use of Hashes as dictionaries because of the simplistic syntax. The problem is that dictionaries, unlike straight hash objects, usually require order to be maintained. There are tons of places in YARD where I had to resort to nested Arrays (eg. <tt>[[key, val], ...]</tt>) to maintain order of a set of associated objects… I’ve even written countless <em>OrderedHash</em> classes to solve this problem. This is definitely the biggest feature in 1.9 to me.</p>
<p><strike>PS. You might have noticed the Pythonish <tt>{a: 1}</tt> syntax— yep, that&#8217;s new too, but it&#8217;s not quite cool enough to make it&#8217;s own item in our list but it’s really convenient to type.</strike></p>
<h3>2. Symbols are now Intuitively Comparable</h3>
<p>YARD calls #to_s on about a million things. Last I checked, Rails does too. This is because internally we try to keep everything stored as a Symbol for obvious efficiency. The problem is that when writing API&#8217;s we&#8217;re usually allowing developers to pass in a String <em>or</em> a Symbol, which means comparison is <em>always</em> annoying. It looks like the Ruby devs looked at a lot of this code, puked all over their keyboards, and cleaned up both their keyboards and the code to deal with this. We can now do things like:</p>
<pre class="sh_ruby">:hello =~ /e/      # => 1
:hello === "hello" # => true
:hello[1] == "e"   # => true</pre>
<h3>3. Enumerations on a Hash return a Hash</h3>
<p>For the same reasons as above, this is awesome. Performing a #select on a hash was always a pain in Ruby 1.8 because you would end up with an Array… totally not a Hash. This inconsistency has been dealt with and we can now do:</p>
<pre class="sh_ruby">{a:1,b:2,c:3}.reject {|k,v| k == :b }
=> {:a=>1, :c=>3}</pre>
<h3>4. Proper Unicode Support</h3>
<p>Let&#8217;s not get too far into this list without giving credit for the awesome work Ruby 1.9 has done in fixing the Unicode support. In short, it is now possible to do things like:</p>
<pre class="sh_ruby">"Hi!".encode("utf-16be") # => "\x00H\x00i\x00!"
File.read('test.txt', encoding: 'utf-8').encoding</pre>
<p>We also get Enumerations like <tt>#each_char</tt> that iterate properly over such strings. There’s plenty more goodies regarding Unicode, but I still don’t know all of the details. Interestingly enough, I ran into a Ruby Unicode problem the other day, so I think I’ll be revisiting the problem with these new tools soon and write about what happened.</p>
<h3>5. Regexps get Look-ahead/Look-behind</h3>
<p>Probably the most powerful feature of regular expressions have been missing from Ruby for the longest time. I can’t remember any specific times when I hit this limitation, but I imagine it will simplify a lot of hacky code attempting to work around the feature omission in 1.8. </p>
<h3>6. Object #tap</h3>
<p>This is an easy one because anyone could have implemented it, but it&#8217;s a nice way to silently hook into a method call chain. The <a href="http://www.ruby-doc.org/core-1.9/classes/Object.html#M000309">Ruby 1.9 docs</a> give a cool example of this, but I think a better example is when dealing with method calls that aren&#8217;t chainable like <tt>Hash#delete</tt>: </p>
<pre class="sh_ruby"># In 1.8 we need to do this because #delete returns nil:
h = {:a => 1, :b => 2, :c => 3}
h.delete(:b)
h.each {|k,v| puts k }

# In 1.9 we can do this in one line:
{a:1,b:2,c:3}.tap {|o| o.delete(:b) }.each {|k,v| puts k }</pre>
<h3>7. Fibers are lighter Threads</h3>
<p>Still have yet to dig into this one, but <a href="http://pragdave.blogs.pragprog.com/pragdave/2007/12/pipelines-using.html">Dave Thomas</a> covers the new Fiber class. In short, they&#8217;re basically lambdas encapsulated in threads— but of course it’s not the implementation but the way you use these guys that makes them quite elegant. I can think of a few things right off the top of my head that involve the quick creation of threads from inline storable procedures.</p>
<h3>8. New Hash Key Syntax</h3>
<p><small>UPDATE:</small> Okay, I wasn&#8217;t going to put this in the list until I realized how the syntax can be applied to method calls. Consider the following 1.8 code:</p>
<pre class="sh_ruby">def open(filename, opts = {}) end

open('test.txt', :access => :read, :close => true)</pre>
<p>The same code in 1.9 can be called as:</p>
<pre class="sh_ruby">def open(filename, opts = {}) end

open('test.txt', access: :read, close: true)</pre>
<p>That <em>is</em> cool.</p>]]></content:encoded>
			<wfw:commentRss>http://gnuu.org/2009/01/31/ruby-191-favorite-new-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
