The Ruby Challenger

Friday, January 18, 2013

Some undocumented differences between 1.8 and 1.9

Here I bring some differences among Ruby versions 1.8.7, 1.9.1 and 1.9.2 which I didn't found in other sites, maybe because they represent very rare use cases (or maybe because I didn't search enough). For these cases, version 1.9.3 behaves as 1.9.2. I was faced with them when trying to make Namebox compatible with Ruby 1.8.7 and 1.9.1, but after weeks of work I concluded that it doesn't worth.

Methods names' type

The method names for instance_methods will be String in 1.8.7 and Symbol since 1.9.1.
Since instance_method(method_name)don't care whether method_name is Symbol or String, this seems to be a innocuous difference, but if you code has something like Klass.instance_methods.include?("f") it will break when changing versions. This is also valid for methods and singleton_methods.

class A; def f; end; end

p A.instance_method(:f)         #=> #<Method:A#f>
p A.instance_method('f')        #=> #<Method:A#f>

p A.instance_methods(false)
#=> Ruby 1.8.7: ["f"]
#=> Ruby 1.9.1: [:f]

The superclass of the singleton class of a class

The superclass of the singleton class (also known as eigenclass) of a class X is the singleton class of the superclass of X. This follows the natural way of class methods lookup (not counting the extended modules). However, this works only since 1.9.1. In Ruby 1.8.7, the superclass of the singleton class of any class is the singleton class of the class Class:

class A; end
class B < A; end

# singleton class of B
SB = class << B; self; end

p SB        #=> #<Class:B>

p SB.superclass
#=> Ruby 1.8.7: #<Class:Class>
#=> Ruby 1.9.1: #<Class:A>

Binding class methods to a subclass

For instance methods, you can bind an unbound method to an object since that object is an instance of the method's class or subclass. For class methods, you can bind an unbound method to a subclass of the method's class (or to the class itself). This works since 1.9.2. Earlier versions raises TypeError if you try to bind a class method to a subclass:

class A
  def self.f
    "self is #{self}"
  end
end

class B < A; end

p A.method(:f).unbind.bind(B).call
#=> Ruby until 1.9.1: TypeError
#=> Ruby since 1.9.2: "self is B"

super from a module method after binding

In Ruby 1.8.7, if you bind an instance method of a module to an object (of a class which includes that module), and if that method has super, it will raise NoMethodError instead of looking for the super method. It will flow normally if that method is invoked without bind.

class A
  def f
    "Hello"
  end
end

module M
  def f
    super + " world!"
  end
end

class B < A
  include M
end

b = B.new
p b.f       #=> "Hello world!"

p M.instance_method(:f).bind(b).call
#=> Ruby 1.8.7: NoMethodError
#=> Ruby 1.9.1: "Hello world!"

Saturday, January 5, 2013

Namebox

What is Namebox?

Namebox is a gem I've developed to create namespace boxes to protect the core classes' methods from changes, like Refinements. But, unlike Refinements, Namebox can protect your classes from changes made by external unrefined libraries.

Why Namebox instead of Refinements?

Well, there are a number of reasons for that:

Refinements are supposed to be included in Ruby 2.0, but they're complex (see discussions) and maybe they won't be included in that version yet. Namebox can be used now.
Personal implementations of Refinements (like mine) could work before Ruby 2.0, but only in programmer's new projects/libraries. You can't use your Refinements to refine a gem made by other people. And even after Ruby 2.0 you must wait to gems' authors to refine them. With Namebox you can protect your core classes from changes caused by existing gems and other external unrefined libraries.
Namebox doesn't work exactly as Refinements.

Refinements use modules with the keyword refine, and blocks/regions with the keyword using. There is lexical scope, but it will probably affect subclasses in other files, which is bug-prone. There are some discussions about using modules for Refinements; that new module's role could create confusion.

Namebox is a box where changes can ocurr in the initialize block. When you want use that changes, you open the box in a point of your code file, and close it later. Every method call (of changed methods of the protected classes) inside that region will invoke the code defined in the initialize block. Elsewhere that methods will play their original behavior (or new ones, defined after Namebox initialization). An open region is restricted to its file; it doesn't affect subclasses nor other files. There's no place to confusion and bad surprises.

Why "Namebox"?

It was like looking for a new domain name. The original name was "Namespace", but there was a gem with that name. I tried other names, looking for a short one, and the word "classbox" made me think about "namespace boxes" - that is what nameboxes are.

What are Namebox issues?

Performance. Needless to say, every piece of code takes some time to run. Namebox must check all methods of the protected classes once to detect changes. Moreover, the changed methods are redefined to check the namebox openness and decide which version of the method must be called. If that method is heavily called, it can impact performance. Effort was made to keep that redefined methods as clean and fast as possible. I believe that the speed changes will be imperceptible in most cases, but only everyday use will tell how much.

As mentioned above, Namebox will redefine the changed methods to a decisor method, which will sometimes be in the class or module being protected. If that method is redefined later (outside another Namebox definition), there will be no way to get the desisor method back.

There are some situations where Namebox won't work as expected. For example, if two (or more) nameboxes require the same file somewhere in the initialization block, and that file changes methods of the protected classes, only the first namebox will detect that changes (since require loads each file only once), and that changes won't be available to subsequent nameboxes - unless they're defined where the first is open. I tried to hack Kernel.require to avoid it, but it would impact performance and cause several collateral effects. I expect that kind of case will be rare and circumventable.

Versions. Ruby 1.8.7 and 1.9.1 have some issues regarding class methods. To make Namebox compatible with them, it's necessary to make a more complex code, which runs differently and it's more memory-consuming. Even so, when running over Ruby 1.8.7, the class methods can loose self when changed by extending modules in subclasses (self will be the class instead of subclass when namebox is closed). Namebox versions 0.1.8 and 0.l.9 are compatible with Ruby 1.8.7 and 1.9.1, but newer versions (0.2.0 and greater) are compatible only with Ruby 1.9.2 and later.

Why Public Domain?

I like freedom, but I like it so much that I don't like the limits of GPL/LGPL - they impose freedom in a way that I feel constrained. Public Domain is the most freedom-prone license I've ever heard of. With Public Domain, anyone can do anything with the code, including changing the license, use in private code, earn money, and even improve the code and publish it in a free form.

And the most expected question...

How to use Namebox?

The principle is simple:

# Create a namebox to protect String's methods
NB = Namebox.new(String) do
  # hack String
  class String
    def to_hex
      unpack('H*')[0]
    end
  end
end

NB.open

# String#to_hex is visible here:
puts 'ABC'.to_hex   #=> '414243'

NB.close

# but not here:
puts 'ABC'.to_hex   #=> NoMethodError

Protecting core classes when requiring something:

# :core refers to all loaded modules (but not submodules)
NB = Namebox.new(:core) do
  require "sequel"
end

NB.open

puts :abc & :def     #=> # a Sequel object

NB.close

puts :abc & :def     #=> NoMethodError

There is a wrapper for require:

# :core refers to all loaded modules (but not submodules)
NB = Namebox.require "sequel", :core

NB.open

puts :abc & :def     #=> # a Sequel object

NB.close

puts :abc & :def     #=> NoMethodError

That examples are very silly. Besides you can open and close a namebox several times, you won't want to do that every time. A more practical use is to define methods to use later in your program, like in the Camelize example:

NB_CAMEL = Namebox.new(:String) do
  class String
    def camelize
      dup.gsub(/_([a-z])/) { $1.upcase }
    end
  end
end

class Foo
  NB_CAMEL.open
  def camelize_string(str)
    str.camelize            # works because NB_CAMEL is open here
  end
  NB_CAMEL.close
end

class Bar < Foo
  def change_string1(str)
    camelize_string(str)    # ok
  end
  def change_string2(str)
    str.camelize            # NoMethodError: NB_CAMEL isn't open here
  end
end

Mixing nameboxes:

# with this, you won't need to specify the
# protected modules in this file.
Namebox.default_modules = :core

NB_SEQUEL = Namebox.require "sequel"

NB_OBJECT = Namebox.new do
  class Symbol
    def & x
      "#{self} and #{x}"
    end
  end
end

NB_OBJECT.open

p :abc & :def     #=> abc and def

NB_SEQUEL.open

p :abc & :def     #=> Sequel object;
                  # included modules wins over
                  # superclass changes

NB_OBJECT.close   # you don't need to close the
                  # last opened namebox first

p :abc & :def     #=> Sequel object

NB_SEQUEL.close

p :abc & :def     #=> NoMethodError

You can use Namebox to hack Namebox itself (e.g., to define default_modules thru other files, as constants are global):

NB_DEF_MOD = Namebox.new(Namebox) do
  def Namebox.default_modules
    [Symbol, String]
  end
end

NB_DEF_MOD.open

NB_SEQUEL = Namebox.require("sequel")
# would be the same as Namebox.require("sequel", Symbol, String) 

NB_DEF_MOD.close

You can invoke the previous version of the method thru _old(...):

s = "abc"

puts s.length       #=> 3, any doubt?

NB = Namebox.new(String) do
  class String
    def length
      _old + 1      #=> please, don't do that out of home
    end
  end
end

puts s.length       #=> 3 (namebox is closed here)

NB.open

puts s.length       #=> 4

NB.close

puts s.length       #=> 3

Current version (0.2.2, by 2013-01-27) is under tests, and can be considered a pre-release for 1.0.0. Please, came back later to check for that version! ;-)

Thursday, December 13, 2012

Refinements in Ruby: an ingenuous implementation

UPDATE: I've worked on Namebox, an improved way to protect methods from changes, inspired on this implementation of Refinements.

I'm back to programming after some months of pause. The last thing I've heard about Ruby before pausing was Refinements. And I fell in love with it.

I found that idea so smart that I couldn't continue programming without it. I couldn't wait for Ruby 2.0. I had to implement it on my own.

Ruby 1.8.7 give us enough tools for designing a lexically scoped activation of refinements. I could use using with set_trace_func to detect the end of blocks (scopes), but I preferred to use enable and disable, because:

it's simpler to implement;
it's explicit and easy to read;
the programmer has the freedom to enable and disable the refinements whenever he/she considers it necessary.

My solution is so simple that I called it "an ingenuous implementation". It has many differences from the original proposal, as I will discuss later, but it brings which I consider the most important feature to me: the refinements are limited to physical ranges within the text file. There's no outside consequences. Anyone can use my refined libraries with no (unpleasant) surprises. And the unrefined methods are not affected (if you're thinking about performance impact).

# Refinements for Ruby: an ingenuous implementation
#
# (c) 2012 Sony Fermino dos Santos
# http://rubychallenger.blogspot.com/2012/12/refinements-in-ruby-ingenuous.html
# 
# License: Public Domain
# This software is released "AS IS", without any warranty.
# The author is not responsible for the consequences of use of this software.

# This code is not intended to look professional,

# provided that it does what it is supposed to do.
#
# This software was little tested on Ruby 1.8.7 and 1.9.3, with success.
# However, no heavy tests were made, e.g. threads, continuation, benchmarks, etc.
#
# The intended use is in the straightforward flux of execution.
#
# Instead of using +using+ as in the original proposal, here we use
# Module#enable and Module#disable. They're lexically scoped by the
# file:line of where they're called from.
#
# E.g.: Let StrUtils be a module which refine the String class.
# module StrUtils
#   refine String do
#     def foo
#       #...
#     end
#   end
# end
#
# Using it in the code snippets:
#
# StrUtils.enable
# "abc".foo                       #=> works (foo is "visible")
# def bar; puts "abc".foo; end    #=> bar is defined where foo is "visible"
# StrUtils.disable
# "abc".foo                       #=> doesn't work (foo is "invisible")
# bar                             #=> works, as bar was defined where foo is "visible"
# def baz; puts "abc".foo; end
# baz                             #=> doesn't work.
#
# You can enable and disable a module at any time, since you:
# * enable and disable in this order, in the file AND in the execution flow;
# * disable all modules that you enabled in the same file;
# * don't reenable (or redisable) an already enabled (or disabled) module.
#
# See refine_test.rb for more examples.

# Refinements is to avoid monkey patches, but
# we need some minimal patching to implement it.
class Module

  # Opens an enabled range for this module's refinements
  def enable
    info = ranges_info

    # there should be no open range
    raise "Module #{self} was already enabled in #{info[:file]}:#{info[:last]}" if info[:open]

    # range in progress
    info[:ranges] << info[:line]
  end

  # Close a previously opened enabled range
  def disable
    info = ranges_info

    # there must be an open range in progress
    raise "Module #{self} was not enabled in #{info[:file]} before line #{info[:line]}" unless info[:open]

    # beginning of range must be before end
    r_beg = info[:last]
    r_end = info[:line]
    raise "#{self}.disable in #{info[:file]}:#{r_end} must be after #{self}.enable (line #{r_beg})" unless r_end >= r_beg
    r = Range.new(r_beg, r_end)

    # replace the single initial line with the range, making sure it's unique
    info[:ranges].pop
    info[:ranges] << r unless info[:ranges].include? r
  end

  # Check whether a refined method is called from an enabled range
  def enabled?
    info = ranges_info
    info[:ranges].each do |r|
      case r
      when Range
        return true if r.include?(info[:line])
      when Integer
        return true if info[:line] >= r
      end
    end
    false
  end

  private

  # Stores enabled line ranges of caller files for this module
  def enabled_ranges
    @enabled_ranges ||= {}
  end

  # Get the caller info in a structured way (hash)
  def caller_info
    # ignore internal calls (using skip would differ from 1.8.7 to 1.9.3)
    c = caller.find { |s| !s.start_with?(__FILE__, '(eval)') } and
        m = c.match(/^([^:]+):(\d+)(:in `(.*)')?$/) and
        {:file => m[1], :line => m[2].to_i, :method => m[4]} or {}
  end

  # Get line ranges info for the caller file
  def ranges_info
    ci = caller_info
    ranges = enabled_ranges[ci[:file]] ||= []
    ci[:ranges] = ranges

    # check whether there is an opened range in progress for the caller file
    last = ranges[-1]
    if last.is_a? Integer
      ci[:last] = last
      ci[:open] = true
    end

    ci
  end

  # Here the original methods will be replaced with one which checks
  # whether the method is called from an enabled or disabled region,
  # and then decide which method to call.
  def refine klass, &blk
    modname = to_s
    mdl = Module.new &blk

    klass.class_eval do

      # Rename the klass's original (affected) methods
      mdl.instance_methods.each do |m|
        if method_defined? m
          alias_method "_#{m}_changed_by_#{modname}", m
          remove_method m
        end
      end

      # Include the refined methods
      include mdl
    end

    # Rename the refined methods and replace them with
    # a method which will check what method to call.
    mdl.instance_methods.each do |m|
      klass.class_eval <<-STR
        alias_method :_#{modname}_#{m}, :#{m}

        def #{m}(*args, &b)
          if #{modname}.enabled?
            _#{modname}_#{m}(*args, &b)
          else
            begin
              _#{m}_changed_by_#{modname}(*args, &b)
            rescue NoMethodError
              raise NoMethodError.new("Undefined method `#{m}' for #{klass}")
            end
          end
        end
      STR
    end
  end
end

Here there are some examples:

#!/usr/bin/ruby

require "./refine"

class A
  def a
    'a'
  end
  def b
    a + 'b'
  end
end

module A2
  refine A do
    def a
      a + '2'     # A#a, since here A2 is disabled
    end

    A2.enable     # You must make A2 explicit here
    def d
      a + 'd'     # A2#a, since here A2 is enabled
    end
    A2.disable
  end

  refine String do
    def length
      length + 1  # Original String#length, since A2 is disabled here
    end
  end
end

a = A.new
str = 'abc'

puts a.a        # a
puts a.b        # ab
puts str.length # 3

A2.enable

class A
  def c
    a + 'c'   # a2c, as A2 is enabled
  end
end

puts ''
puts a.a      # a2
puts a.b      # ab (b was not refined nor defined where A2 is enabled)
puts a.c      # a2c
puts a.d      # a2d
puts str.length

A2.disable

puts ''
puts a.a      # a
puts a.b      # ab
puts a.c      # a2c (it was defined where A2 is enabled)
# puts a.d      # NoMethodError, since A2 is disabled
puts str.length

# In-method enabling test

def x(y)
  A2.enable
  puts y.a    # a2
  A2.disable
end

x(a)          # a2
x(a)          # enabling multiple times at same line with no error

# Lazy enabling test

def e
  A2.enable
end

def z(y)
  puts y.a    # defined between enable and disable, but affected only after running e() and d()
end

def d
  A2.disable
end

z(a)          # a
e             # now, activating the range for refinements
d
z(a)          # a2

def e2
  A2.enable
end

e2            # running before d(), but...
d             # error, as you are enabing *after* the disable (physically in the text file)

Differences from the original proposal:

enable and disable instead of using;
calls to refined methods only works if it's within the enabled range in the file; so subclasses won't be affected unless their code is in an enabled range;
super doesn't work for calling the original methods, but you can call it by its name from an un-enabled range; or by calling the renamed methods (see the code for refine).

I think this solution is good enough for me, and I guess it won't have the evil side of refinements which was very well discussed in this post (and I agree).

I'm open to discuss about errors, consequences and improvements to my code; feel free. ;-)

Monday, July 23, 2012

Problems with permission in Apache

If you're getting Error 403 Forbidden in your site, and the owner and the filesystem's permissions of your files are correctly set, maybe you must check your Apache configuration files to see whether your DirectoryIndex directive includes the file that you want as index, and whether the Directory directive of your project's path is allowing the requests.

Saturday, July 14, 2012

RVM + Apache + CGI Scripts

Hello!

I've configured a new server on Ubuntu 12.04 and I started to use RVM, an excellent version manager which permits to have multiple versions of Ruby installed on a single server (and many versions of the gems - see gemsets), and it makes easy to switch among them.

I've installed RVM under my user (as myself, not as root with sudo) by following the Ryan Bigg's guide, with no previous system-wide installed Ruby. So, I didn't have any Ruby under /usr/bin. My first task then was to replace the shebang line of all my CGI scripts, from

!#/usr/bin/ruby

!#/usr/bin/env ruby
# encoding: utf-8

(The second line is needed to define the encoding of the string literals in my code, for Ruby 1.9.)

However my scripts didn't run under Apache. In the terminal I could run them (by typing ./index.cgi, for example), but not over a browser. A relevant note: in both the user is the same, i.e., the Apache user is the same as the one logged on terminal. Through php tests, I've checked the RVM enviroment was not loaded under Apache. (If anyone can solve that, please let me know.)

I saw this tip for running CGI scripts with RVM, which suggests to put the complete path of specific version of Ruby in the shebang line. That can be useful if you have scripts which run on different versions of Ruby. But that solution doesn't work for me, because my scripts must run on different machines, with different users, different ruby versions and different paths.

The solution which works for me is to put a symlink of the desired Ruby version under /usr/bin:

sudo ln -s /home/sony/.rvm/rubies/ruby-1.8.7-p370/bin/ruby /usr/bin/ruby

(Notes: sony is my username and I chose 1.8.7 because by now my scripts aren't 1.9-compliant yet.)

Therefore I didn't need to have changed the shebang lines. :) But I guess that will be useful in the future.

Tuesday, May 22, 2012

How to UPDATE a table using data from another row

The Problem: I have a table with name, num, and diff. The unusual case here is that the diff must be updated with the difference between two sibling nums for the same name. So I'll need to know the num of the next row (for the same name) to update the current one. AND I want to do that in SQL with a single UPDATE.

Here I'll assume the rows must be sorted by num, but it would be sorted by another field, like id or timestamp.

The solution in MySQL is to use a inline temporary table to get the num of next row and associate it with current id, and use that table in the UPDATE statement.

Here is the code. Enjoy!

-- the table used for test (MySQL syntax)
CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) NOT NULL,
  `num` int(11) NOT NULL,
  `diff` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=9 ;

-- some values to play on
INSERT INTO `test` (`id`, `name`, `num`, `diff`) VALUES
(1, 'a', 10, NULL),
(2, 'b', 8, NULL),
(3, 'a', 18, NULL),
(4, 'a', 21, NULL),
(5, 'b', 14, NULL),
(6, 'a', 32, NULL),
(7, 'b', 20, NULL),
(8, 'b', 21, NULL);

-- a select to test the ability to retrieve the desired diff value
select id, name, num, (select min(num) from test where name = t1.name and num > t1.num)-num
from test t1
where diff is null
order by name, num;

-- updating test with the calculated diff using another row on same table.
update test t2, (
  select id, (select min(num) from test where name = t1.name and num > t1.num)-num as diff
  from test t1
) t3
set t2.diff = t3.diff
where t2.id = t3.id
and t2.diff is null

-- an alternative query to get more than one column from t2

select t1.id, t1.name, t1.num, t2.name, t2.num
from test t1, test t2
where t2.id = (select id from test where name = t1.aome and num > t1.num order by num limit 1)
order by t1.name, t1.num

See this post in portuguese.

Tuesday, March 13, 2012

Capitalized methods names

You can define methods with capitalized names:

#!/usr/bin/ruby

def Foo
  puts 'foo'
end

def Bar x
  puts x
end

At call time, to avoid Ruby to interpret them as constants, you must make clear that you are using them as functions, by using parenthesis or parameters:

Foo()      #=> 'foo'
Bar 3      #=> 3
Foo        #=> Error: not initialized constant

Sequel uses this feature to define methods named String, Integer, etc. for creating/altering tables.