[ACCEPTED]-alternative to memcached that can persist to disk-distributed

Accepted answer
Score: 22

I have never tried it, but what about redis ?
Its 32 homepage says (quoting) :

Redis is a key-value 31 database. It is similar to memcached but 30 the dataset is not volatile, and values 29 can be strings, exactly like in memcached, but 28 also lists and sets with atomic operations 27 to push/pop elements.

In order to be very 26 fast but at the same time persistent the 25 whole dataset is taken in memory and from 24 time to time and/or when a number of changes to 23 the dataset are performed it is written 22 asynchronously on disk. You may lost the 21 last few queries that is acceptable in 20 many applications but it is as fast as 19 an in memory DB (Redis supports non-blocking 18 master-slave replication in order to solve 17 this problem by redundancy).

It seems to 16 answer some points you talked about, so 15 maybe it might be helpful, in your case?

If 14 you try it, I'm pretty interested in what 13 you find out, btw ;-)


As a side note : if 12 you need to write all this to disk, maybe 11 a cache system is not really what you need... after 10 all, if you are using memcached as a cache, you 9 should be able to re-populate it on-demand, whenever 8 it is necessary -- still, I admit, there 7 might be some performance problems if you 6 whole memcached cluster falls at once...

So, maybe 5 some "more" key/value store oriented 4 software could help? Something like CouchDB, for 3 instance?
It will probably not be as fast 2 as memcached, as data is not store in RAM, but 1 on disk, though...

Score: 18

Maybe your problem is like mine: I have 32 only a few machines for memcached, but with 31 lots of memory. Even if one of them fails 30 or needs to be rebooted, it seriously affects 29 the performance of the system. According 28 to the original memcached philosophy I should 27 add a lot more machines with less memory 26 for each, but that's not cost-efficient 25 and not exactly "green IT" ;)

For our solution, we 24 built an interface layer for the Cache system 23 so that the providers to the underlying 22 cache systems can be nested, like you can do with 21 streams, and wrote a cache provider for 20 memcached as well as our own very simple 19 Key-Value-2-disk storage provider. Then 18 we define a weight for cache items that 17 represent how costly it is to rebuild an 16 item if it cannot be retrieved from cache. The 15 nested Disk cache is only used for items 14 with a weight above a certain threshold, maybe 13 around 10% of all items.

When storing an 12 object in the cache, we won't lose time 11 as saving to one or both caches is queued 10 for asynchronous execution anyway. So writing 9 to the disk cache doesn't need to be fast. Same 8 for reads: First we go for memcached, and 7 only if it's not there and it is a "costly" object, then 6 we check the disk cache (which is by magnitudes 5 slower than memcached, but still so much 4 better then recalculating 30 GB of data 3 after a single machine went down).

This way 2 we get the best from both worlds, without 1 replacing memcached by anything new.

Score: 13

EhCache has a "disk persistent" mode 9 which dumps the cache contents to disk on 8 shutdown, and will reinstate the data when 7 started back up again. As for your other 6 requirements, when running in distributed 5 mode it replicates the data across all nodes, rather 4 than storing them on just one. other than 3 that, it should fit your needs nicely. It's 2 also still under active development, which 1 many other java caching frameworks are not.

Score: 7

Try go-memcached - memcache server written in Go. It persists 6 cached data to disk out of the box. Go-memcached 5 is compatible with memcache clients. It 4 has the following features missing in the 3 original memcached:

  • Cached data survive server crashes and/or restarts.
  • Cache size may exceed available RAM size by multiple orders of magnitude.
  • There is no 250 byte limit on key size.
  • There is no 1Mb limit on value size. Value size is actually limited by 2Gb.
  • It is faster than the original memcached. It also uses less CPU when serving incoming requests.

Here are performance numbers obtained 2 via go-memcached-bench:

-----------------------------------------------------
|            |  go-memcached   | original memcached |
|            |      v1         |      v1.4.13       |
| workerMode ----------------------------------------
|            | Kqps | cpu time |  Kqps  | cpu time  |
|----------------------------------------------------
| GetMiss    | 648  |    17    |  468   |   33      |
| GetHit     | 195  |    16    |  180   |   17      |
| Set        | 204  |    14    |  182   |   25      |
| GetSetRand | 164  |    16    |  157   |   20      |
-----------------------------------------------------

Statically linked binaries for go-memcached and 1 go-memcached-bench are available at downloads page.

Score: 4

Take a look at the Apache Java Caching System (JCS)

JCS is a distributed 11 caching system written in java. It is 10 intended to speed up applications by providing 9 a means to manage cached data of various dynamic 8 natures. Like any caching system, JCS 7 is most useful for high read, low put 6 applications. Latency times drop sharply 5 and bottlenecks move away from the database 4 in an effectively cached system. Learn 3 how to start using JCS.

The JCS goes beyond 2 simply caching objects in memory. It provides numerous 1 additional features:

* Memory management
* Disk overflow (and defragmentation)
* Thread pool controls
* Element grouping
* Minimal dependencies
* Quick nested categorical removal
* Data expiration (idle time and max life)
* Extensible framework
* Fully configurable runtime parameters
* Region data separation and configuration
* Fine grained element configuration options
* Remote synchronization
* Remote store recovery
* Non-blocking "zombie" (balking facade) pattern
* Lateral distribution of elements via HTTP, TCP, or UDP
* UDP Discovery of other caches
* Element event handling
* Remote server chaining (or clustering) and failover
* Custom event logging hooks
* Custom event queue injection
* Custom object serializer injection
* Key pattern matching retrieval
* Network efficient multi-key retrieval
Score: 4

I think membase is what you want.

0

Score: 3

In my experience, it is best to write an 41 intermediate layer between the application 40 and the backend storage. This way you can 39 pair up memcached instances and for example 38 sharedanced (basically same key-value store, but 37 disk based). Most basic way to do this is, always 36 read from memcached and fail-back to sharedanced 35 and always write to sharedanced and memcached.

You 34 can scale writes by sharding between multiple 33 sharedance instances. You can scale reads 32 N-fold by using a solution like repcached 31 (replicated memcached).

If this is not trivial 30 for you, you can still use sharedanced as 29 a basic replacement for memcached. It is 28 fast, most of the filesystem calls are eventually 27 cached - using memcached in combination 26 with sharedance only avoids reading from 25 sharedanced until some data expires in memcache. A 24 restart of the memcached servers would cause 23 all clients to read from the sharedance 22 instance atleast once - not really a problem, unless 21 you have extremely high concurrency for 20 the same keys and clients contend for the 19 same key.

There are certain issues if you 18 are dealing with a severely high traffic 17 environment, one is the choice of filesystem 16 (reiserfs performs 5-10x better than ext3 15 because of some internal caching of the 14 fs tree), it does not have udp support (TCP 13 keepalive is quite an overhead if you use 12 sharedance only, memcached has udp thanks 11 to the facebook team) and scaling is usually 10 done on your aplication (by sharding data 9 across multiple instances of sharedance 8 servers).

If you can leverage these factors, then 7 this might be a good solution for you. In 6 our current setup, a single sharedanced/memcache 5 server can scale up to about 10 million 4 pageviews a day, but this is aplication 3 dependant. We don't use caching for everything 2 (like facebook), so results may vary when 1 it comes to your aplication.

And now, a good 2 years later, Membase is a great product for this. Or Redis, if you need additional functionality like Hashes, Lists, etc.

Score: 2

Have you looked at BerkeleyDB?

  • Fast, embedded, in-process data management.
  • Key/value store, non-relational.
  • Persistent storage.
  • Free, open-source.

However, it fails to 1 meet one of your criteria:

  • BDB supports distributed replication, but the data is not partitioned. Each node stores the full data set.
Score: 2

What about Terracotta?

0

Score: 2

Oracle NoSQL is based on BerkeleyDB (the 5 solution that Bill Karwin pointed to), but 4 adds sharding (partitioning of the data 3 set) and elastic scale-out. See: http://www.oracle.com/technetwork/products/nosqldb/overview/index.html

I think 2 it meets all of the requirements of the 1 original question.

For the sake of full disclosure, I work at Oracle (but not on the Oracle NoSQL product). The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Score: 2

memcached can be substituted by Couchbase - this 9 is an open source and commercial continuation 8 of this product line. It has data to disk 7 persistence (very efficient and configurable). Also 6 original authors of memcached have been 5 working on Couchbase and its compatible 4 with memcached protocol - so you don't need 3 to change your client application code! Its 2 very performing product and comes with 24/7 1 clustering and Cross Datacenter Replication (XDCR) built in. See technical paper.

Score: 2

You could use Tarantool (http://tarantool.org). It is an in-memory 3 database with persistence, master-master 2 replication and scriptable key expiration 1 rules - https://github.com/tarantool/expirationd

Score: 1

We are using OSCache. I think it meets almost all 8 your needs except periodically saving cache 7 to the disk, but you should be able to create 6 2 cache managers (one memory based and one 5 hdd based) and periodically run java cronjob 4 that goes through all in-memory cache key/value 3 pairs and puts them into hdd cache. What's 2 nice about OSCache is that it is very easy 1 to use.

Score: 1

You can use GigaSpaces XAP which is a mature commercial 5 product which answers your requirements 4 and more. It is the fastest distributed 3 in-memory data grid (cache++), it is fully 2 distributed, and supports multiple styles 1 of persistence methods.

Guy Nirpaz, GigaSpaces

Score: 1

Just to complete this list - I just found 1 couchbase. However I haven't tested it yet.

More Related questions