Ruby 2.1 is the next significant version of Ruby, having been released on Christmas Day 2013, just 10 months after 2.0.0. It comes with a whole host of changes and improvements, and this post dives in to the details of what’s new.
New versioning policy
With 2.1 Ruby moves to a new versioning scheme based on Semantic Versioning.
The scheme is MAJOR.MINOR.TEENY, so with 2.1.0 the major version is 2, the minor version is 1, and the teeny version is 0. The teeny version number takes over from the patchlevel for minor bug and security fixes. The minor version number will be used for new features that are largely backwards compatible, and major for incompatible changes that can’t be released as a minor.
This means rather than referring to, say, 1.9.3 in general and 1.9.3-p545 specifically it will be 2.1 in general and 2.1.1 specifically.
The plan is to release a new minor version every 12 months, so we can expect to see Ruby 2.2 on Christmas Day 2014.
Required keyword arguments
After being introduced in Ruby 2.0.0 keyword arguments get a small improvement in 2.1. Required keyword arguments allow you to omit the default value for a keyword argument in the method definition, and an error will be raised if they are not given when the method is called.
As you can see in the example above there are some cases where keyword arguments can really help disambiguate which argument is which, but there isn’t any sensible default. Now you don’t have to choose.
As strings in Ruby are mutable, any string literals must result in a new string each time they are evaluated, e.g.
This can be quite wasteful, creating and then garbage collecting a lot of
objects. To allow you to avoid this, calling
#freeze directly on a string
literal is special cased to look up the string in a table of frozen strings.
This means the same string will be reused
Strings literals as keys in Hash literals will also be treated the same,
without the need to call
During the development of 2.1 this feature started off as a syntax addition,
"string"f resulting in a frozen string. It was decided to switch to the
technique of special casing the
#freeze call on a literal as it allows for
writing code that is backwards and forwards compatible, plus subjectively many
people weren’t fond of the new syntax.
def returns the method name as a Symbol
The result of defining a method is no longer
nil, instead it’s a symbol of
the method’s name. The canonical example of this is making a single method
It also makes for a nice way of adding method decorators, here’s an example using Module#prepend to wrap before/after calls around a method.
define_singleton_method methods have also been
updated to return symbols rather than their proc arguments.
Rational and Complex literals
1) and Float (
1.0) literals are a given, now we have Rational
1r) and Complex (
1i) literals too.
These work really nicely with Ruby’s casting mechanism for mathematical
operations, such that a rational number like one third – 1/3 in mathematical
notation – can be written
1/3r in Ruby.
3i produces the complex number
0+3i, this means complex numbers can be written in standard mathematical
2+3i produces the complex number 2+3i!
The many classes that got a
#to_h method in Ruby 2.0.0 are now joined
by Array and any other class including Enumerable.
This will come in handy with all those Enumerable methods on Hash that return an Array
Fine grained method cache
Previous to 2.1 Ruby used a global method cache, this would be invalidated for all classes when a method was defined, module included, object extended with a module, etc. anywhere in your code. This made some classes – such as OpenStruct – and some techniques – such as exception tagging – unusable for performance reasons.
This is now no longer an issue, Ruby 2.1 uses a method cache based on the class hierarchy, invalidating the cache for only the class in question and any subclasses.
A method has been added to the RubyVM class to return some debugging information on the status of the method cache.
Exceptions now have a
#cause method that will return the causing exception.
The causing exception will automatically be set when you rescue one exception
and raise another.
Currently the causing error isn’t output anywhere, and
rescue won’t pay
attention to the cause, but just having the cause automatically set should be a
great help while debugging.
Exceptions also get the
#backtrace_locations method that was curiously
missing from 2.0.0. This returns Thread::Backtrace::Location objects
rather than strings, giving easier access to the details of the backtrace.
Generational Garbage Collection
Ruby 2.1 introduces a generational garbage collector, this divides all objects into young and old generations. During the marking phase a regular GC run will only look at the young generation, with the old being marked less frequently. Sweeping is done with the same lazy sweeping system introduced in 1.9.3. An object is promoted to the old generation when it survives a young generation run.
If you have objects in the old generation referring to objects in the young generation, but you’re only looking at the young generation it may seem like an object doesn’t have any references, and you might incorrectly GC an in-use object. Write barriers prevent this by adding old generation objects to a ‘remember set’ when they are modified to refer to a young generation object (e.g. old_array.push(young_string)). This ‘remember set’ is then taken in to account when marking the young generation.
Most generational garbage collectors need these write barriers on all objects, but with the many 3rd party C extensions available for Ruby this isn’t possible, so a workaround was devised whereby objects that aren’t write barrier protected (“shady” objects) won’t ever be promoted to the old generation. This isn’t ideal as you won’t get the full benefit of the generational GC, but it does maximise backwards compatibility.
While the marking phase is now a lot faster the write barriers do add some overhead, and any performance gains are very dependant on what exactly your code is doing.
GC.start method gets two new keyword arguments,
immediate_sweep. Both of these default to true.
full_mark set to true both generations are marked, false will only mark
the young generation. With
immediate_sweep set true a full ‘stop the world’
sweep will be performed, false will perform a lazy sweep, deferred to when it’s
required and only sweeping the minimum required.
GC.stress debugging option can now be set to an integer flag to control
which part of the garbage collector to stress.
The output of
GC.stat has been updated to include some more details, and the
method itself now takes a key argument to return just the value for that key,
rather than building and returning the full hash.
GC also gets a new method
latest_gc_info which returns information about the
most recent garbage collection run.
GC tuning environment variables
Ruby will pay attention to a whole bunch of new environment variables now when it’s started up, that can be used to tune the behaviour of the garbage collector.
This was available before as RUBY_HEAP_MIN_SLOTS. It sets the initial allocation slots, and defaults to 10000.
This was also available before, as RUBY_FREE_MIN. It sets the minimum number of slots that should be available after GC. New slots will be allocated will be allocated if GC hasn’t freed up enough. Defaults to 4096.
Grows the number of allocated slots by the given factor. (next slots number) = (current slots number) * (this factor). The default is 1.8.
The maximum number of slots that will be allocated at one time. The default is 0, which means no maximum.
This one isn’t new, but it’s worth covering. It is the amount of memory that can be allocated without triggering garbage collection. It defaults to 16 * 1024 * 1024 (16MB).
The rate at which the malloc_limit grows, the default is 1.4.
The maximum the malloc_limit can reach. Default 32 * 1024 * 1024 (32MB).
The amount the old generation can increase by before triggering a full GC. Default is 16 * 1024 * 1024 (16MB).
The rate at which the old_malloc_limit grows. Default 1.2.
The maximum the old_malloc_limit can reach. Default 128 * 1024 * 1024 (128MB).
ObjectSpace tools to track down memory leaks
Ruby 2.1 adds some more tools to help track down when you’re keeping references to old/large objects and not letting the garbage collector claim them.
We now get a collection of methods to trace object allocations and report on them.
The number returned by
allocation_generation is the number of garbage
collections that had been run when the object was created. So if this is a
small number then the object was created early in the lifetime of the
trace_object_allocations_stop as alternatives to
with a block, and
trace_object_allocations_clear to clear recorded allocation
Further to this it’s possible to output this information and a little more to a file or string as JSON for further analysis or visualisation.
You can also use
ObjectSpace.dump_all to dump the entire heap.
Both these methods can be used without activating object allocation tracing, but you’ll get less detail in the output.
ObjectSpace.reachable_objects_from_root which works similarly
ObjectSpace.reachable_objects_from but takes no argument and works from
the root instead. There is one slight quirk to this method in that it returns a
hash that has been put in to ‘compare by identity’ mode, so you need the exact
same string objects that it uses for keys to get anything out of it.
Fortunately there is a workaround.
Refinements are no longer experimental and won’t generate a warning, they also get a couple of small tweaks to make them more useable.
Along with the top level
#using to activate refinements in a file, there is
Module#using method to activate refinements in a module. However, the
effect of ‘using’ a refinement is still lexical, it won’t be active when
reopening a module definition.
Refinement definitions are now inherited with
Module#include, meaning you can
group together a bunch of refinements defined in separate modules to just one,
and activate them all with a single
String#scrub has been added to Ruby 2.1 to help deal with strings that have
ended up with invalid bytes in them.
You wouldn’t ever create a string like this deliberately (or at least I hope not), but it’s not uncommon for a string that has been through a number of systems to get mangled like this.
Presented with just the end result it’s pretty much impossible to untangle it all, but we can at least get rid of the characters that are now invalid.
The same result can also be achieved by calling
#encoding with the current
invalid: :replace as arguments
Bignum/Rational performance improvements
Bignum and Rational now use the GNU Multiple Precision Arithmetic Library (GMP) to improve performance.
$SAFE level 4 removed
$SAFE = 4 was intended to put Ruby in a ‘sandbox’ type mode and allow
execution of untrusted code. However it wasn’t terribly effective, required a
lot of code scattered all over Ruby, and was almost never used, so it has been
Ruby now has access to the system’s
clock_gettime() function though
Process.clock_gettime, this allows easy access to a number of different time
values. It must be called with a clock id as the first argument:
Process::CLOCK_REALTIME will give you a unix timestamp as the
return value. This will match
Time.now.to_f, but as it skips creating a Time
instance it’s a little bit quicker.
Another use for
Process.clock_gettime is to get access to a monotonic clock,
that is a clock that always moves forwards, regardless to adjustments to the
system clock. This is perfect for critical timing or benchmarking.
However the monotonic clock value only makes sense when compared to another as the starting reference point is arbitrary.
Another clock useful for benchmarking is
CLOCK_PROCESS_CPUTIME_ID, this works
similarly to the monotonic clock in that it always advances, and only makes
sense when referenced against another cpu time, but it only advances when the
cpu has to do any work.
These three clocks, realtime, monotonic, and cpu, should always be available. Depending on your system you may have access to other clocks, check the documentation for the others that might be available.
To check if any of these clocks are supported you can check for the presence of the constant storing its clock id.
There is also a
Process.clock_getres method available that can be used to
discover the resolution provided by a specific clock.
-g) option to
gem install no longer requires a file name
for the dependancy file, it will auto-detect the Gemfile. A
gem install will
also generate a Gemfile.lock if one is not present, and respect the versions it
specifies if it exists.
You can see the full list of changes in the RubyGems History File.
Deprecated Rake features removed
The bundled Rake has been updated to version 10.1.0, this removes a bunch of deprecated features. Older versions of Rake have warned about these features for quite a while so hopefully you won’t encounter any compatibility problems.
RDoc template update
The included version of RDoc is now at 4.1, which brings a nice update to the default template with some accessibility improvements. See the RDoc History file for the full set of changes.
A new method
Process.setproctitle has been added to set the process title
without assigning to
$0. A corresponding method
Process.argv0 has also been
added to retrieve the original value of
$0 even if it has been assigned to.
Say you had some code in a background processing worker that looked like the following
you’d see something like the following if you were to run
Symbols now join integers and floating point numbers in being frozen.
This change was made to set things up for garbage collection of symbols in a future version of Ruby.
Fixed eval scope leak
module_function without arguments
in a string evaluated with
module_eval the method
visibility scope would leak out to the calling scope, such that
foo in the
following example would be private.
This is fixed in 2.1, so
foo would be public in this example.
#untrusted? is now an alias of
Ruby previously had two sets of methods for marking/checking objects as
untrusted, the first set,
#untaint, and the second
#trust. These behaved the same, but set
separate flags, so an object could be untrusted, but not tainted.
These methods have been unified to set/get a single flag, with
being the preferred names and
#untrusted? etc generating warnings.
generates the warning
return in lambda now always returns from lambda
Lambdas differ from Procs/blocks in that using return in a lambda returns from
the lambda, not the enclosing method. However there was an exception to this,
if a lambda was passed to a method with
& and called with
exception has now been removed.
The example above would have returned
"hello from lambda" under Ruby <= 2.0.0
Get interface addresses
It is now possible to get details of the system’s network interfaces with
Socket.getifaddrs. This returns an array of Socket::Ifaddr objects.
Named capture support in StringScanner
StringScanner# now accepts symbols as arguments, and will return the
corresponding named capture from the last match.
Psych, the underlying yaml implementation) has had a
method added. By default only the following classes can be deserialised:
Hash. To deserialise other classes that you know will be safe you can pass a
whitelist as an argument.
If a disallowed class is found
Psych::DisallowedClass will be raised, this can
also be referenced as
Resolv one-shot MDNS and LOC record support
Ruby’s Resolv DNS library gets basic support for one-shot multicast DNS lookups. It doesn’t support continuous queries, and can’t do service discovery, but it’s still a pretty neat new feature (Checkout the dnssd gem for full DNS Service Discovery support).
Combined with the resolv-replace library this allows you to use mDNS names with most Ruby networking libraries.
Resolv also gains the ability to query DNS LOC records.
And the final change for Resolve, it’s now possible to get back the full DNS
Improved Socket error messages
Errors from sockets have been improved to include the socket address in the message.
Hash#shift much faster
The performance of
Hash#shift has been massively improved and this, coupled
Hash being insertion ordered since Ruby 1.9, makes it practical to implement a
simple least recently used cache.
Queue, SizedQueue, and ConditionVariable performance improvements
Queue, SizedQueue, and ConditionVariable have been sped up by implementing them in C rather then Ruby.
Timeout internal exception can’t be rescued
It is no longer possible to rescue the exception used internally by Timeout to abort the block it’s given. This is mostly an internal implementation detail that’s nothing to worry about, the Timeout::Error raised externally when the timeout is reached is unchanged and can be rescued as normal.
#intersect? returns true if
the receiver and the argument have at least one value in common, and false
#disjoint? is the opposite and returns true if the sets have no
elements in common, false otherwise.
Another minor change to Set,
#to_set called on a set will simply return self,
rather than a copy.
Easier streaming responses with WEBrick
The WEBrick HTTP response body can now be set to anything responding to
#readpartial. Previously it had to be an instance of IO or a String. The
example below implements a class that wraps an enumerator, and then uses this
to stream out a response of the current time every second for 10 seconds.
#step method on Numeric can now accept the keyword arguments
to: rather than positional arguments. The
to: argument is optional, and if
omitted it will result in an infinite sequence. If using positional arguments
you can pass nil as the first argument to get the same behaviour.
would both output
IO#seek method now accepts
:SET as symbols, along
with the old flags named by the constants IO::SEEK_CUR, IO::SEEK_END, and
New are IO::SEEK_DATA and IO::SEEK_HOLE (or
:HOLE) for its second
argument. When these are supplied then the first argument is used as the
minimum size of the data/hole to seek too.
These may not be supported on all platforms, you can check with
_nonblock without raising exceptions
IO#write_nonblock now each get an
argument. When set to
false (default is
true) this causes the methods to
return a symbol on error, rather than raise exceptions.
IO ignores internal encoding if external encoding is ASCII-8BIT
If you set default internal and external encodings Ruby will transcode from the external encoding to the internal. The exception to this is when the external encoding is set to ASCII-8BIT (aka binary), where no transcoding takes place.
The same exception should be made if the encodings were supplied to an IO method as an argument, but there was a bug, and the transcoding would take place. This has now been fixed.
#prepend now public
Affecting Module and Class, the
#prepend methods are now
Module and Class gain a
#singleton_class? method that, predictably, returns
whether or not the receiver is a singleton class.
Module#ancestors more consistent
#ancestors called on a singleton class now includes singleton classes in the
returned array, this makes the behaviour more consistent between being called
on regular classes and singleton classes. It also clears up an irregularity
where singleton classes would show up, but only if a module had been prepended
(not included) in to the singleton class.
#instance_method, but will return only singleton
Method and UnboundMethod gain an
#original_name method to return the
Mutex#owned? is no longer experimental, and there’s not much more to
say about that.
Hash#reject on a subclass of Hash will issue a warning. In Ruby 2.2
#reject called on a subclass of Hash will returns a new Hash instance,
rather than an instance of the subclass. So in preparation for that potentially
breaking change there is a warning.
Generates the following warning.
Ruby 2.1.1 accidentally included the full change, returning
Hash in the
example above and not generating a warning. This was reverted in 2.1.2.
The Vector class gains a
cross_product instance method.
#bit_length on an integer will return the number of digits it takes
to represent that number in binary.
unpack Native Endian
String#unpack gain the ability to work with native endian
long longs with the
Dir glob returns composed characters
The HFS Plus filesystem on Mac OS X uses the UTF8-MAC encoding for filenames,
with decomposed characters, e.g. é is represented with e and U+0301, rather
than just U+00E9 (with some exceptions).
normalise this back to UTF8 encoded strings with composed characters.
Better type coercion for Numeric#quo
Numeric#quo now calls
#to_r on the receiver which should allow for better
behaviour when implementing your own Numeric subclasses. It also means
TypeError rather than ArgumentError will be raised if the receiver can’t be
converted. As TypeError is a subclass of ArgumentError this shouldn’t be an
Binding gets methods to get/set local variables. This can come in handy if you really want to use a keyword argument that clashes with a reserved word
Or if you want to use a Hash to populate local variables in a Binding, say for evaluating a template
CGI class methods now available from
CGI has a few handy utility class methods for escaping url and html strings. These have been moved to the CGI::Util module which can be included into other classes or the main scope for scripts.
Digest::Class.file passes arguments to initialiser
The various Digest classes have a shortcut method for producing the digest for a given file, this method has been updated to pass any extra arguments past the filename to the implementation’s initialiser. So rather than:
It’s possible to do:
It is now possible to abort an SMTP transaction by sending the RSET command
open-uri supports repeated headers
Kernel#open to open resources with a URI, and will extend the
return value with
OpenURI::Meta. This gains a new
#metas method to return
the header values as arrays, for the case when a header has been used multiple
times, eg set-cookie.
Write to files with Pathname
#binwrite methods to write to files.
Tempfile now has a
create method similar to
new but rather than returning a
Tempfile instance that uses a finaliser to clean up the file when the object is
garbage collected, it yields a plain File object to a block and cleans up the
file at the end of the block.
Rinda multicast support
The Rinda Ring classes are now able to listen on/connect to multicast addresses.
Here’s an example of using Rinda to create an extremely simple service registry listening on the multicast address 22.214.171.124
To have a service register itself:
And discover the address of a service:
I had some issues with the
tuple_space = ring_finger.lookup_ring_any line
causing a segfault, and had to use the following in it’s place:
Easy setting of extra HTTP options for XMLRPC
XMLRPC::Client#http returns the Net::HTTP instance being used by the client
to allow minor configuration options that don’t have an accessor on the client
to be set.
decode_www_form updated to match WHATWG standard
URI.decode_www_form have been updated to match the
URI.decode_www_form no longer treats
; as a separator,
& is the only
default separator, but there is a new
separator: keyword argument if you need
to change it.
URI.decode_www_form can also now successfully decode the output of
URI.encode_www_form when a value is nil.
RbConfig::SIZEOF has been added to provide the size of C types.
Can set facility with Syslog::Logger
Syslog::Logger, the Logger-compatible interface to Syslog, gets the ability to set the facility.
CSV.foreach with no block returns working enumerator
CSV.foreach called without a block argument returns an enumerator, however
this has for a long time resulted in an IOError when it was actually used. This
has now been fixed.
OpenSSL::BN.new now accepts integers as well as strings.
Enumerator.new size argument fixed to accept any callable object
Enumerator.new takes a size argument which can either be an integer, or an
object responding to
#call. Under 2.0.0 only integers and Procs would work,
despite what the documentation said. This has now been fixed.
curses library removed
curses has been removed from the standard library and is now available as a gem.
TSort class methods
TSort can be useful for determining an order to complete tasks from a list of
dependancies. However it’s a bit of a hassle to use, having to implement a
class, include TSort, and implement
But now TSort is a little easier to use with, say, a hash. The same methods
that are available as instance methods are now available on the module itself,
taking two callable objects, one to take the place of
TCP Fast Open
Ruby 2.1 has added support for TCP Fast Open if it is available on your system. It’s possible to check whether it is available by checking the for existence of the Socket::TCP_FASTOPEN and Socket::MSG_FASTOPEN constants.
And that’s it…
Please let me know if there’s anything missing or incorrect here. Also, don’t forget that we’re hiring…