Data Structures as Functions (or, Implementing Set#to_proc and Hash#to_proc in Ruby)

By ,
on

View the full presentation on the Skills Matter web site At the March 2015 meeting of the London Ruby User Group, I delivered a presentation based on this blog post with some supplemental material. I’ve since tweaked some of the examples to closer match the ones in the final talk but the content should largely be the same.

Reading Henrik Nyh’s “Array#to_proc for hash access” made me think about a similar concept in Clojure: data structures being used as functions.

Functions

First of all, let’s be clear what we mean by “functions”. The definition of a function in mathematics is as follows:

A function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.

So a function f can be considered as a way of mapping from some input (its “domain”) to some output (its “codomain”), e.g.

f(1)
# => 2

Crucially, whenever the function is called with a specific input, it should always produce the same output. This means f in the following example would not be a function because it produces two different outputs given the same input:

f(1)
# => 2

f(1)
# => 3

Whole programming languages are built out of this concept (it’s called functional programming for a reason) but they also appear in languages that aren’t purely functional, such as Ruby.

A common place you might find them is when using methods from Enumerable such as all? and map:

(1..10).all? { |x| x > 4 }
# => false

(1..10).map { |x| x * 2 }
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In both of the above cases, the blocks passed to all? and map are actually simple functions.

proc { |x| x > 4 }

No matter how many times you call this function with the number 0, it will always return false.

proc { |x| x * 2 }

Similarly, no matter how many times you call this function with the number 1, it will always return 2.

With this in mind, let’s take a look at functions in Clojure.

Using sets as functions

In Clojure (as in many languages, including Ruby), there are data structures for sets and maps (not to be confused with the function of the same name but something akin to a Hash in Ruby or a dictionary in Python) but, unlike other languages, these structures can be used in places where you would expect a function.

For example, Clojure’s filter takes two arguments: a function pred and a collection coll. As you might expect from its name, it goes through coll and only returns elements that return true when fed into pred. We can use this to filter even numbers from a range using even?:

(filter even? (range 1 30))
;; => (2 4 6 8 10 12 14 16 18 20 22 24 26 28)

Of course, even? can be used by itself too:

(even? 5)
;; => false

Nothing too surprising there but what if we used a set instead of an existing function?

Let’s define a set of numbers we like:

(def perfect-numbers #{6 28 496})

Now let’s filter our range again but using the set instead of even?:

(filter perfect-numbers (range 1 30))
;; => (6 28)

To understand what happened here, let’s just try using the set as a function as we did even?:

(perfect-numbers 1)
;; => nil

(perfect-numbers 6)
;; => 6

It turns out that sets can be used as functions of their members: when given some input, they will return that input if (and only if) it is a member of the set; if it isn’t, it returns nil. This behaviour makes for a nifty shortcut when testing whether a value is in some whitelist:

(def valid-attributes #{:name :age})

(defn foo [attribute]
  (when (valid-attributes attribute)
    ;; insert money-making business logic here
    (println "PROFIT")))

The way this actually works in Clojure is that sets implement the IFn interface meaning that they have an invoke method that can be used to call them as if they were ordinary functions. There’s a very similar __invoke magic method in PHP that allows objects to be called as functions:

<?php
class Monkey
{
    public function __invoke($name)
    {
        return "Ook, {$name}!";
    }
}

$monkey = new Monkey;
$monkey('Bob');
// => 'Ook, Bob!'

We can achieve similar behaviour in Ruby by implementing Set#to_proc (in the same way Henrik implemented to_proc on Array):

class Set
  def to_proc
    proc { |element| element if member?(element) }
  end
end

This means we can do the same sort of filtering using select:

require "set"

perfect_numbers = Set[6, 28, 496]

(1..30).select(&perfect_numbers)
# => [6, 28]

If you’re unfamiliar with the syntax above, let’s take a quick detour (if this is old news to you, feel free to skip ahead).

What’s with the ampersand?

In Ruby, there are methods (such as select above) that take blocks. A good example of this is map which evaluates a given block for each item in a collection and returns a new collection with the results:

(1..30).map { |x| x.to_s }
# => ["1", "2", "3", ..., "30"]

Wherever you can pass a block in Ruby, you can also pass a Proc using an ampersand like so:

stringify = proc { |x| x.to_s }

(1..30).map(&stringify)
# => ["1", "2", "3", ..., "30"]

Note that stringify is already a Proc here but if it wasn’t, Ruby would internally call to_proc on it in an attempt to coerce it to the right type.

As mapping over a collection and calling a method on each element is quite a common task and armed with the knowledge of Ruby’s internal use of to_proc, a technique emerged of using symbols as a shortcut:

(1..30).map(&:to_s)
# => ["1", "2", "3", ..., "30"]

This works by implementing Symbol#to_proc with something like so:

class Symbol
  def to_proc
    proc { |x| x.send(self) }
  end
end

(The real implementation actually features a cache to reduce some of the performance overhead of this approach.)

You can think of this as expanding like so:

(1..30).map(&:to_s)
(1..30).map(&:to_s.to_proc)
(1..30).map(&proc { |x| x.send(:to_s) })
(1..30).map { |x| x.send(:to_s) }
(1..30).map { |x| x.to_s }

Calling Procs

With to_proc implemented on sets, can we use them as we did in Clojure?

perfect_numbers.call(1)
# NoMethodError: undefined method `call' for #<Set: {6, 28, 496}>

Sadly not, as sets themselves are not Procs. Perhaps we could use the ampersand trick?

&perfect_numbers.call(1)
# SyntaxError: unexpected &, expecting end-of-input
# &perfect_numbers.call(1)
#  ^

No luck there either. In fact, we’d have to do the following:

perfect_numbers.to_proc.call(1)
# => nil

This is where our efforts diverge from Clojure: simply implementing to_proc doesn’t mean that sets are now functions. You could maybe make them more Proc-like with something like the following but there is no equivalent to Clojure’s IFn or PHP’s __invoke in Ruby:

class Object
  def call(*args)
    to_proc.call(*args)
  end
end

perfect_numbers.call(3)
# => 6

Using hashes as functions

A slightly more interesting case is that of maps in Clojure:

({:a 2 :b 3} :a)
;; => 2

({:a 2 :b 3} :c)
;; => nil

Like the behaviour of sets, maps are functions of their keys: given the input of the key, they return the corresponding value (or nil if there is no such entry). This fits quite nicely with our original mathematical functions: the keys being the domain and the values being the codomain.

As for how this might be useful, Jay Fields wrote up an interesting use case when comparing and filtering two maps. It also gives us an alternate way to access hash keys as in Henrik’s original post.

First, let’s implement Hash#to_proc to mirror Clojure’s behaviour:

class Hash
  def to_proc
    method(:[]).to_proc
  end
end

Here we simply take Hash’s existing [] method for accessing values and convert it to a Proc. Now we can use it like so:

%w(a b c d).map(&{"a" => 1, "b" => 2, "c" => 3})
# => [1, 2, 3, nil]

Or, to use a more realistic example:

person = { name: "Robert Paulson", age: 43 }

name, age = [:name, :age].map(&person)
# => ["Robert Paulson", 43]

In this second example, it’s almost the reverse of Henrik’s Array#to_proc behaviour where the hash is passed into a function created from the keys:

[{ name: "A" }, { name: "B" }].map(&[:name])
# => ["A", "B"]

In our case, it is the key that is passed into a function created from the hash:

[:name].map(&{ name: "A" })
# => "A"

Other data structures as functions

Maps and sets are not the only data structures to implement the IFn interface in Clojure.

There are vectors:

([1 2 3] 0)
;; => 1

Which would be equivalent to the following Array#to_proc:

class Array
  def to_proc
    method(:[]).to_proc
  end
end

(1..3).map(&["A", "B", "C", "D", "E"])
# => ["B", "C", "D"]

There are also keywords which are similar to Ruby’s symbols:

(:name {:name "Robert Paulson", :age 42})
;; => "Robert Paulson"

To implement this in Ruby, we’d have to override the now standard Symbol#to_proc like so:

class Symbol
  def to_proc
    proc { |coll| coll[self] }
  end
end

[{ name: "Robert Paulson", age: 42 }].map(&:name)
# => ["Robert Paulson"]

Of course, this would mean you can no longer use it for method access.

In conclusion, we shouldn’t be afraid to experiment with concepts from other languages. After all, as Alan Perlis said:

A language that doesn’t affect the way you think about programming is not worth knowing.

Further reading