Select and Map Are Good

5 min read —

— Jun 25, 2014

This article argues that when able to one should break down iteration operations over an array into #map and #select as opposed to operating on the enumerable through an #each.

The Examples

Throughout this article I will refer to the following, contrived, example:

You have an array of numbers [1, 2, 3, 4, 5] and you want to subtract 3 from each of the items and then remove all items that are 0.

A) Using #each you could express this as:

the_array = [1, 2, 3, 4, 5]
new_array = []

the_array.each do |item|
  new_item = item - 3
  if new_item != 0
    new_array << new_item
  end
end

B) Using #select and #map you could express this as:

the_array = [1, 2, 3, 4, 5]

new_array = the_array.map { |item| item - 3 }

new_array.select! { |item| item != 0 }

The Arguments

Better Seperation of Logic

Example A is doing two things in one block whereas example B is doing just one thing in each of the two blocks. In general, the less there is in a block the easier that block is to understand. Breaking a problem down into map and select means that you have broken the problem up into two distinct parts.

Clarity

Which brings us to what those two parts do. They are actually named. If I am reading example B and I am trying to find the part where items are removed, then I look in the select block. If I am looking for the part where the items are changed then I look in the map block. The method names tell how the block is to be used.

When reading example A, I have to read all of the #each block if I am looking for where the values are changed or when the items are removed.

Potential Counter Arguments

I think some people may argue that speed is a big issue. The idea is that you are iterating over the enumerable twice so it is using more time.

Okay let us assume the time to to process item - 3 and assign it to variable/add it to the array is a and that time to check the new_item/item != 0 and add it the array is b, and the time to setup each iteration of the array is c. We will also assume i is the number of iterations to travese the array.

So example A will take i(a + b + c) time and example B will take i(a + c) + i(b + c) . The difference between these two ends up being B - A so i(a + b) + 2ic - (i(a + b) + ic) = ic. The difference is going to be the time to setup the iterations.

Let us check the actual difference in time with a much bigger array:

require 'benchmark'

def with_each(the_array)
  new_array = []

  the_array.each do |item|
    new_item = item - 3
    if new_item != 0
      new_array << new_item
    end
  end

  new_array
end

def with_map_and_select(the_array)
  new_array = the_array.map { |item| item - 3 }

  new_array.select! { |item| item != 0 }
end

the_array = (-10000000..10000000).to_a

Benchmark.bmbm do |x|
  x.report("With #each") { with_each(the_array) }
  x.report("With #map and #select") { with_map_and_select(the_array) }
end

The output on my computer is:

                            user     system      total        real
With #each              1.830000   0.040000   1.870000 (  1.924606)
With #map and #select   2.440000   0.030000   2.470000 (  2.468480)

So the time difference is about +25% for example B and this is a really simple example. The difference as a percentage will fall as the complexity of the operations increases.

So the time difference does exists but I don't think this is going to be a huge factor in most cases.