[ACCEPTED]-What does it mean to fit "working set" into RAM for MongoDB?-mongodb

Accepted answer
Score: 81

"Working set" is basically the amount of 38 data AND indexes that will be active/in 37 use by your system.

So for example, suppose 36 you have 1 year's worth of data. For simplicity, each 35 month relates to 1GB of data giving 12GB 34 in total, and to cover each month's worth 33 of data you have 1GB worth of indexes again 32 totalling 12GB for the year.

If you are always 31 accessing the last 12 month's worth of data, then 30 your working set is: 12GB (data) + 12GB 29 (indexes) = 24GB.

However, if you actually 28 only access the last 3 month's worth of 27 data, then your working set is: 3GB (data) + 3GB 26 (indexes) = 6GB. In this scenario, if you 25 had 8GB RAM and then you started regularly 24 accessing the past 6 month's worth of data, then 23 your working set would start to exceed past 22 your available RAM and have a performance 21 impact.

But generally, if you have enough 20 RAM to cover the amount of data/indexes 19 you expect to be frequently accessing then 18 you will be fine.

Edit: Response to question in comments
I'm not sure I quite follow, but 17 I'll have a go at answering. Firstly, the 16 calculation for working set is a "ball park 15 figure". Secondly, if you have a (e.g.) 1GB 14 index on user_id, then only the portion 13 of that index that is commonly accessed 12 needs to be in RAM (e.g. suppose 50% of 11 users are inactive, then 0.5GB of the index 10 will be more frequently required/needed 9 in RAM). In general, the more RAM you have, the 8 better especially as working set is likely 7 to grow over time due to increased usage. This 6 is where sharding comes in - split the data 5 over multiple nodes and you can cost effectively 4 scale out. Your working set is then divided 3 over multiple machines, meaning the more 2 can be kept in RAM. Need more RAM? Add another 1 machine to shard on to.

Score: 6

The working set is basically the stuff you 15 are using most (frequently). If you use 14 index A for collection B to search for a 13 subset of documents then you could consider 12 that your working set. As long as the most 11 commonly used parts of those structures 10 can fit in memory then things will be exceedingly 9 fast. As parts no longer fit in your working 8 set, like many of the documents then that 7 can slow down. Generally things will become 6 much slower if your indexes exceed your 5 memory.

Yes, you can have lots of data, where 4 most of it is "archived" and rarely used 3 without affecting the performance of our 2 application or impacting your working set 1 (which doesn't include that archived data).

More Related questions