Gary J. Murakami, Ph.D.
Lead Engineer & Ruby Evangelist
Tuesday, September 25, 2012
$ brew update $ brew install mongodb $ mongod
$ gem install mongo $ gem install bson_extGemfile
gem 'mongo' gem 'bson_ext'
IDE includes support for Mongoid and MongoDB
require 'mongo'
require 'httpclient'
require 'json'
screen_name = ARGV[0] || 'garymurakami'
friends_ids_uri = "https://api.twitter.com/1/friends/ids.json?cursor=-1&screen_name=#{screen_name}"
friend_ids = JSON.parse(HTTPClient.get(friends_ids_uri).body)['ids']
a few lines of Ruby gets and parses JSON data from the web - twitter API
connection = Mongo::Connection.new
db = connection['twitter']
collection = db['users']
friend_ids.each_slice(100) do |ids| # best practices
users_lookup_uri = "https://api.twitter.com/1/users/lookup.json?user_id=#{ids.join(',')}"
response = HTTPClient.get(users_lookup_uri)
docs = JSON.parse(response.body) # high-level objects - Array of Hashes
docs.each{|doc| doc['_id'] = doc['id']} # user supplied _id
collection.insert(docs, :safe => true) # no schema! - bulk insert - best practices
end
puts "users:#{collection.count}"
a small number of lines load JSON data from the web into a collection
users = Mongo::Connection.new['twitter']['users']
users.find({}, :sort => { followers_count: -1 }, :limit => 10).each do |doc|
puts doc.values_at('followers_count', 'screen_name').join("\t")
end
a simple query with sort and limit
1409213 Dropbox 25092 MongoDB 9074 oscon 8148 railsconf 7302 spf13 4374 10gen 2259 kchodorow 1925 rit 1295 dmerr 1148 aaronheckmann
db = Mongo::Connection.new['twitter']
users = db['users']
tweets = db['tweets']
tweets.ensure_index('user.id') # dot notation to specify subfield
users.find({}, {fields: {id: true, screen_name: true, since_id: true}}).each do |user|
twitter_user_timeline_uri = "https://api.twitter.com/1/statuses/user_timeline.json?user_id=#{user['id']}&count=200&include_rts=true&contributor_details=true"
twitter_user_timeline_uri += "since_id=#{user['since_id']}" if user['since_id']
response = HTTPClient.get(twitter_user_timeline_uri)
docs = JSON.parse(response.body) # high-level objects
docs.each{|doc| doc['_id'] = doc['id']} # user supplied _id
tweets.insert(docs, :continue_on_error => true) # bulk insert
users.update({_id: user['_id']}, {'$set' => {'since_id' => docs.last['id']}})
puts tweets.count(:query => {'user.id' => user['id']})
end
puts "tweets:#{tweets.count}"
a small number of lines loads and records tweets
tweets = Mongo::Connection.new['twitter']['tweets']
screen_name = 'MongoDB'
tweets.find({'user.screen_name' => screen_name}, :sort => { retweet_count: -1 }, :limit => 10).each do |doc|
puts doc.values_at('retweet_count', 'text').join("\t")
end
a simple query with a query selector, sort and limit
172 #MongoDB v2.2 released http://t.co/sN6Rzc7D 77 #mongoDB2dot2 officially released, w/ Advanced Aggregation Framework, Multi-Data Center Deployment + 600 new features http://t.co/DMSdWGwN 59 Announcing free online MongoDB classes. http://t.co/RIoAb7l7 30 RT @jmikola: Introducing mongoqp: a frontend for #MongoDB's query profiler https://t.co/ROCSzs6W 29 Sign up for free online MongoDB courses starting in October http://t.co/Im68Q4NM 24 1,000+ have signed up for free MongoDB training since the courses were announced today http://t.co/oINxfH1t 22 How Disney built a big data platform on a startup budget using MongoDB. http://t.co/8qW5yc7T 21 #mongoDB2dot2, available for download. http://t.co/kcGhUDhI 19 MongoDB Sharding Visualizer http://t.co/onIG08jv another product of #10genLabs 16 MongoDB Java Driver 2.9.0 released http://t.co/UDvZoAYV
{"BSON": ["awesome", 5.05, 1986]} →
"\x31\x00\x00\x00
\x04BSON\x00
&\x00\x00\x00
\x020\x00\x08\x00\x00\x00awesome\x00
\x011\x00333333\x14@
\x102\x00\xc2\x07\x00\x00
\x00
\x00"
fast - no parsing, no server-side data repackaging to disk
multiple database server nodes
one primary member, many secondary members
Ruby on Rails, Sinatra, etc.
class User
include MongoMapper::Document
key :name, String
key :age, Integer
many :hobbies
end
class Hobby
include MongoMapper::EmbeddedDocument
key :name, String
key :started, Time
end
user = User.new(:name => 'Brandon')
user.hobbies.build(:name => 'Programming', :started => 10.years.ago)
user.save!
User.where(:name => 'Brandon').first
class Artist
include Mongoid::Document
field :name, type: String
embeds_many :instruments
end
class Instrument
include Mongoid::Document
field :name, type: String
embedded_in :artist
end
syd = Artist.where(name: "Syd Vicious").between(age: 18..25).first
syd.instruments.create(name: "Bass")
syd.with(database: "bands", session: "backup").save!
session = Moped::Session.new([ "127.0.0.1:27017" ])
session.use "echo_test"
session.with(safe: true) do |safe|
safe[:artists].insert(name: "Syd Vicious")
end
session[:artists].find(name: "Syd Vicious").update(:$push => { instruments: { name: "Bass" }})
a MongoDB driver for Ruby which exposes a simple, elegant, and fast API
"Lightroom 3 Catalog.lrcat"
Objective
Write about best practices for using MongoDB with Ruby.
Post your blog posts in the comments section of
this blog post by October 10 to be entered to win.
Google: mongodb and ruby blogging contest
{ name: "Gary J. Murakami, Ph.D.", title: "Lead Engineer and Ruby Evangelist", company: "10gen (the MongoDB company)", phone: "1-866-237-8815 x8015", mobile: "1-908-787-6621", email: "gary.murakami@10gen.com", im: "gjmurakami (AIM)", twitter: "@GaryMurakami", blog: "grayghostwriter.blogspot.com", website: "www.nobell.org", linkedin: "www.linkedin.com/pub/gary-murakami/1/36a/327", facebook: "facebook.com/gary.j.murakami" }