Harry Heymann, the Engineering Lead for foursquare, will present on scaling foursquare with MongoDB. Before joining foursquare, Harry spent five years at Google where he worked on Google Payments, Dodgeball and various backend advertising systems. Previously, he held roles at Microsoft and Intel. Harry has a bachelor's degree in engineering from Carnegie Mellon University. In addition to pursuing his obvious interest in mobile technologies, Harry is currently obsessed with exploring the Scala programming language.
I do not want you to go off on too big of a tangent, but I would like to know how Lift fits in with MongoDB/scalability.
@Ivan - I think that would be interesting as well.
I'd like to hear about how this affected your domain design choices and how you handle corrupt data, if any, in a schema-less data store
Are you guys using MongoDB as sole DB system or hybrid with PostgreSQL?
Anything new about the last Foursquare outage that you can share?
I'd be curious about the geolocation benefits mongo gave you.
@nate you might want to check out http://www.mongodb.org/display/DOCS/Geospatial+Indexing![]()
I'm guessing 4sq makes heavy use of this
Does anyone use or created his own java lib to do data migrations in mongo? Something like http://liquibase.org/
I only know one for ruby - https://github.com/terrbear/mongrations![]()
I've read the postmortems (FourSquare and 10gen ones) from the October outage so I understand the technical reasons adding the new shard did not work as expected.
I'm interested to know what method you were using to determine when to commission the new EC2 instance with MongoDB. What happened to make the one db shard grow larger than available RAM ahead of schedule so suddenly? Were there certain pressures pushing you guys to delay creation of the new EC2 shard?
Harry posted his slides here: http://bit.ly/e0p3b4![]()
Will work on getting the video up ASAP!
For the fellow asking about compound indexes on two arrays, here's the relevant mongoDB doc:
http://www.mongodb.org/display/DOCS/Multikeys![]()
![]()
See the part about "Parallel Arrays".
There is, I believe, a 30 index limit for collections but its best to check mongodb IRC for official word on the limit... and/or working around it. As usual, depending on your query pattern, there are many potential workarounds.
@eli Thank you for confirming my issue. This states that you cannot insert a document containing two arrays in indexed fields.
"Topper"? made a good suggestion at the meetup. He advised try putting the the arrays together grouped as a subelement in the document and then index on the parent element:
{
'_id': ObjectId('.....'),
'name': 'some test',
'indexed_arrays':
{
'arr1': [],
'arr2': []
}
}
then index on 'indexed_arrays'
@Aryeh - yup that was me :-) (Topper)
Here's what I was suggesting though (let's pretend you have tags and categories).
{
'_id': ObjectId('...'),
name: 'some test',
tags_and_categories: [{type: 'tag', value: 'taggytaggy}, {type:category, value: 'cattycat'}]
}
@Topper, ah I see. - interesting. I think the way I proposed (if it works ) would allow me to create less complicated queries.
I'm going to play around with it and see what works.
Aryeh,
Is there a naive representation of the query you're trying to do? Like this one that can't use the compound index on the two arrays:
db.things.find( { array1 : "blah", array2 : "glorb" } );
Where you want things that have "blah" somewhere in array1 and "glorb" somewhere in array2.. How many items can be in each array? Are the arrays fixed length? (e.g. these are 10 item subsets from some larger sets?)
Anyway, feel free to e-mail me offline if you want.
Video is now available here: http://www.10gen.com/video/misc/foursquare![]()
![]()
Log in to Meetup with your Facebook account.
If anyone has anything in particular you'd like me to discuss in my presentation feel free to mention it here and I will do my best.