VS: [ljc] Open Source Java Caching APIs

From: Esa
Sent on: Thursday, May 31, 2012 12:23 PM
Kevin, 

You have to have some kind of equivalent of gc also for the off-heap data.
Of course the traversal of the whole serialized object graph is not required anymore, because - like I said - the semantics of the data is known in advance and you don't have to decide the survivorship by following any references in the data itself but just your eviction policy.

Yes, fragmentation won't be an issue IF you know your data to be uniform.
But at least the Terracotta engineers have to think about more general situations where you have heterogeneous block sizes.
So we're talking about trade-off decisions here: Is it really worth to ditch the jvm's automatic memory management and when to do it if so?
They are even themselves using the heap for the hot data - as sort of a .. cache


Gustavo, 
Do you if there's something similar coming out from Infinispan in the near future?
I bumped up with this:


       -esa

Lähettäjä: [address removed] [[address removed]] käyttäjän Kevin Wright [[address removed]] puolesta
Lähetetty: 31. toukokuuta[masked]:56
Vastaanottaja: [address removed]
Aihe: Re: [ljc] Open Source Java Caching APIs



On 31 May[masked]:41, Esa <[address removed]> wrote:

Speaking of BigMemory - has anyone any experience of their handmade off-heap memory allocation implementation?


By putting the data outside the heap does not make the underlying problem of garbage collection suddenly vanished, does it?


It does
 

If the data becomes obsolete, it must be deallocated and the memory space rearranged to prevent fragmentation over time.


This really depends on what you're caching.  In most cases, it's a whole bunch of objects that are *exactly the same size*, so when an object falls out of scope you simply return the relevant index back to the pool and re-use it.  You often don't even need to zero the first.

Fragmentation happens when dealing with a heterogeneous cache, containing objects of different sizes.  And yes, this definitely adds to the complexity of the solution.  It can also be useful to think about locality of reference, aligning related objects so they'll tend to be paged into/out of memory at the same time.  Again, more potential complexity here, but also not necessary in a great many use-cases.

 

They are damning the long pauses created by mark and sweep algorithm (and even G1), but writing & reading data through direct data buffers with serialization/deserialization process takes time too.


No marking, no scanning, no sweeping.  Just a set of offsets into the file plus a tiny bit of metadata (TTL, last access, etc) that start off in a pool and are removed from it as they become allocated.

 

Of course they have much narrower range of use cases to be worried about than the gc designers. Also they can utilize the information in hand about eviction policies and the semantics of the data.


         -esa


If you're talking about having a cache of hundreds of Gigabytes in a single JVM then I agree with you, unless you're comfortable having pauses that can last several minutes eventually. In that case, you'll probably need to think about high availability in case that JVM goes down, but that's another story.

OTOH, the Hotspot is very competitive (and getting better over  time) when it comes to less dramatic sizes, with a little bit of tuning: I've seen 30Gb heaps used as caches that caused less than 2 seconds full garbage collections pauses.

Jame's use case is that data is ever added once a week, so assuming the data size is not massive, there'll be very little stress in garbage collection terms. 

Back to Infinispan, which is a datagrid rather than a 'classic' cache, you can have, for example, hundreds of nodes in a cluster that can combine their heaps to form a massive consistent cache. Replication, fail-over, high availability and elasticity is what you gain by using an architecture like that; you add nodes and they're automatically discovered and become available to hold data; if a nodes crashes, the data is available on other nodes due to replication.

That doesn't mean you need to think upfront in having dozens of machines to use Infinispan; you can start as a simple as a local-only cache that from the programming point of view, is just a Cache interface, no network involved. And it's very easy to go from a local cache to a distributed one (as described above) in case requirements change in the future: it's mainly a matter of configuration.

Cheers,
Gustavo

Lähettäjä: [address removed] [[address removed]] käyttäjän Kevin Wright [[address removed]] puolesta
Lähetetty: 30. toukokuuta[masked]:36
Vastaanottaja: [address removed]
Aihe: Re: [ljc] Open Source Java Caching APIs

Generally speaking, I advise against caching within the JVM heap - it's really not optimised for this use case as cached objects typically live long enough to escape the eden generation, at which point they'll be putting extra pressure on garbage collection.

If you can, cache as close to the edge as possible, varnish is a very effective solution if you have a restful architecture.

If you can't, still try and push it out of the heap.  Consider using memcached or perhaps a commercial solution like bigmemory.

Or if you're feeling really adventurous you can use a memory mapped file (http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ314_029.htm) and manage your own indexes into the thing.  If you do this, be aware that's it's genuinely advanced stuff, and that you'll be reinventing the wheel - it's the same approach that varnish and bigmemory use already :)


On 30 May[masked]:45, James Bowkett <[address removed]> wrote:
Hi All,

I am looking to create a caching layer in my application.  My application has model objects that require a few costly, star-like SQL queries to construct, this data is only ever added-to once per week.  So I plan to create the whole object population once and use an object cache for fast key-based lookup and I'm looking for some opinions/recommendations/conjecture on appropriate technologies.  

I really only want a HashMap that pages its values to disk (I'm not afraid to write it myself, but I'd rather use an open source API (the budget for this is my time only) if there's one available and it's nice and lightweight)

Has anyone used Ehcache?....what are your thoughts?....has anyone used any of the other open source alternatives?....Are there any alternatives that I should be looking at instead? 

I'm finding it hard to find many APIs out there (most of the projects listed here: http://java-source.net/open-source/cache-solutions are really old).  has caching fallen out of favour in preference for some other way of doing things?...or is everyone using things like NoSQL to distribute their caches? (If so, is there a particular NoSQL toolset that would be most appropriate for me?...Should I be looking at something like Voldemort?)

Cheers,

-James


P.S. As a footnote, this is the history of caching APIs as far as I can make it out:

As far as I can tell, there was a JSR for this (107) which looks like it's been dormant for a little over 10 years, the reference implementation for this was JCache, and this came out of some functionality in Oracle 9i, this looks like it was then superseded by Apache's JCS (Java Caching System), although all of these now look like fairly dormant projects (JCS last release was 2007).

It looks like Ehcache was originally a fork of JCS (http://commons.apache.org/jcs/JCSvsEHCache.html), (indeed JCS claims to be faster) but is Ehcache now the only open source caching solution out there that's still under active development?





--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Kevin Wright ([address removed]) from LJC - London Java Community.
To learn more about Kevin Wright, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]

Our Sponsors

  • Our Blog

    Read the latest news from the LJC

  • RecWorks Ltd

    Fixing Tech Recruitment using the Power of Community

  • jClarity

    Java/JVM Performance Analysis Tools & mentoring for Java related matters

  • LJC Aggrity

    Our LJC Aggrity site contains blog posts from our members

  • LJC Book Club

    Our Book club with book reviews from our members

  • Devoxx UK

    Java Community Conference, in collaboration with the LJC 12/13 Jun 14

  • SkillsMatter

    "Host, help organise, promote, film many of our meetings."

  • Packt Publishing

    A publishing company specializing on specific technologies and solutions

  • Java.Net

    We are an official Java User Group recognised by Oracle's JUG program

  • JRebel

    Free 3 month J-Rebel license.

  • O'Reilly

    40% discount on printed books and 50% on e-books.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy