Ruby is my language of choice for general purpose programming, for its expressive power and clarity. The dynamic and generic object structure of Ruby makes it a good match for BSON and MongoDB, likewise of Python, Perl, JavaScript, etc. But the following questions are raised.
These rounded numbers express the relative cost of operations and calls in CPU ticks. Please see the graphs below for actual measured numbers. The following are from the Intel Sandy Bridge numbers (Mac mini mid 2011).
Operation | C++ Measured-Rounded |
Ruby 2.0.0 Measured-Rounded | JRuby 1.7.3 Measured-Rounded |
---|---|---|---|
branch / loop | 0 | 110 | 100 |
intXor | 1 | 80 | 50 |
function call | 1 | 70 | 50 |
stack object | 2 | ||
fwrite | 50 | 610 | 390 | fread | 50 | 4300 | 460 |
malloc + free | 270 | ||
new heap object | 300 | 450 | 100 |
write | 1500 | 4600 | 2,900 |
read | 1500 | 11,000 | 2,700 |
recv | send | 5000 | ||
fork + wait | 200,000 | ||
system | 450,000 | ||
map/hash word count | 330 | 880 | 540 |
map/hash int to string | 1,400 | 1,100 | 410 |
Here's the link to C/C++ Call/Operation Cost in CPU Ticks measurements, analysis, and notes.
Note: C++ STL map is traditionally an ordered tree-based map, while hash_map and unordered_map are hash-based maps. In these tests, I used map because it has been official STL for a long time, as hash_map is not official and unordered_map is more recent than G++ 4.2. Also, hash_map requires the programmer to supply an additional include, namespace, and a hash function, and the simple hash function using string::c_str() is not efficient.
Various expressions were measured for object allocation by disabling the garbage collector (GC), collecting GC stats, running a repeated test expression, enabling and running GC, collecting GC stats again, calculating the difference and the objects allocated per iteration of the expression.
Object Class | Expression | Objects Allocated |
---|---|---|
FalseClass | false || false | 0 |
Fixnum | 1 + 2 | 0 |
Float | 1.0 + 2.0 | 0 |
NilClass | nil | 0 |
Symbol | :my_symbol | 0 |
TrueClass | true && true | 0 |
Array | Array.new | 1 |
Bignum | 1 << 64 | 1 |
Class | Class.new | 2 |
Exception | Exception.new | 1 |
Hash | Hash.new | 1 |
Method | method(:exit) | 1 |
Module | Module.new | 1 |
Numeric | Numeric.new | 1 |
Object | Object.new | 1 |
Proc | Proc.new{} | 2 |
Range | Range.new(1,3) | 1 |
Regexp | Regexp.new(".") | 4 |
String | String.new | 1 |
Struct | Struct.new(:x) | 3 |
Thread | Thread.new{} | 7 |
Time | Time.new | 1 |
In Ruby, all variables are references, e.g., pointers to an object. Knowing that pointers are word aligned with low-bits set to zero, Ruby stores some built-in types with small-data sizes as an immediate object directly in the Ruby VALUE pointer, namely Fixnum, Symbol, true, false, and nil. Therefore, the expressions above with zero objects allocated indicate the an immediate object. A least-significant bit set to one denotes a 63-bit number on a 64-bit architecture. Other low-order bit values specify the encoding of the other immediate objects. On 64-bit architectures, some floats that do not require the full 64 bits can be stored as an immediate object. Symbols expressions have no allocation because there is only one global instance (that is not subject to GC).
MRI and YARV Ruby reduce malloc overhead by allocating a large number of object slots in a single malloc. Objects slots are 40 bytes on 64-bit architectures allowing small objects to be self-contained in a slot. From the measurements, it is clear that JRuby optimizes allocation.
VALUE as an Immediate Object - Extending Ruby - Programming Ruby The Pragmatic Programmer's Guide
MRI Memory Allocation, A Primer For Developers
mapm; for (size_t i = 0; i < iterations; i++) { stringstream ss(STRING_1024); string word; while (ss >> word) { m[word] += 1; } }
h = Hash.new(0) iterations.times do STRING_1024.split.each do |w| h[w] += 1 end end
for (size_t i = 0; i < iterations; i++) { mapm; stringstream ss; for (int j = 0; j < VECTOR_SIZE; j++) { ss.seekg(0); ss << j; m[j] = ss.str(); } }
iterations.times do h = Hash.new VECTOR_SIZE.times do |j| h[j] = j.to_s end end
Sandy Bridge - Instruction Decode and uop Cache - for some clue about sub-CPU-tick measurements due to micro-ops.