I successfully remove myself from it (DELETE /circle-members/)
my list of circles is not updated correctly (GET /circles/)
returns outdated version of the container because the circle-member resource has been refreshed, but not the connecting circle
deftest_leave_circle_user_cache_updates(self):self.setUpLoggedInUser()another_user=get_random_user()self.setUpCircle(owner=another_user)me=CircleMember.objects.create(user=self.user,circle=self.circle,is_admin=False)response=self.client.get('/users/{}/'.format(self.user.username))self.assertEqual(len(response.data['circles']['ldp:contains']),1)response=self.client.delete('/circle-members/{}/'.format(me.pk))self.assertEqual(response.status_code,204)response=self.client.get('/users/{}/'.format(self.user.username))# -- FAILS HERE --self.assertEqual(len(response.data['circles']['ldp:contains']),0)
Old accounting Protocol
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
@jbpasquier I've confirmed that this behaviour doesn't happen when you ping the container directly e.g. with the same test above but calling /users/x/circles/ each time
Why?
the cache-cleaning code invalidates the LDListMixin cache altogether, and then it invalidates the LDPSerializer cache for only the urlid of the object which has been changed:
@receiver([pre_save, pre_delete, m2m_changed])def invalidate_caches(instance, **kwargs): from djangoldp.serializers import LDListMixin, LDPSerializer LDListMixin.to_representation_cache.reset() if hasattr(instance, 'urlid'): LDPSerializer.to_representation_cache.invalidate(instance.urlid)
in the LDPSerializer.to_representation, the cache is hit on the whole object's serialization on the urlid
Fixes
Finding the ideal solution is difficult because there are lots of different considerations
My initial design was to remove the cache records also of the connected ForeignKey objects, where the reverse relation is to-many, and only when the change is delete or create
But then take for example that depth > 0. All of a sudden the update, and to-many relations are important, and all sorts of cache records will be invalidated whenever I change anything. Or take that I updated the Circle by changing its owner. Update is now important, and what's worse is I can't know in Django that the owner is going to change in this listener, without storing a duplicate of every foreign key
So then I get to thinking, maybe we have to solve part of this where the cache is hit, in the LDPSerializer.to_representation
So now I'm thinking that we shouldn't cache the object at all but the fields - i.e.
I make a GET request on /users/x/circles/. It's cached for my user
I make a GET request on /users/x/. I hit the cache on the circles container, and then I serialize the other containers and fields like new
I make a GET request on /users/x/. Actually, I don't hit the cache on the whole object, but on each container I hit the cache. It might perform slightly worse than caching the whole object BUT
I make a DELETE on /circle-members/1/. It resets the cache on /circles/x/members/ and on /users/x/circles/, but only on these things! The cached containers for the user's projects aren't reset because of the CircleMember
It's also probably not ideal to have two caches and have the LDListMixin cache reset every time anything changes, things which we may be solving by doing this
For the second scenario, it should be enough to clean the cache in both the pre-save and the post-save, so that both the old owner and the new owner get cache resets
What I'm talking about here is a complex system though, right? I'd love to hear that I'm overcomplicating it and there's something simpler that I'm missing?
Caching is really complex in our case, because pretty much everything can be impacted by a single unrelated request. Even a permission change from a guardian group could affect other requests.
I think that we can keep the current caching system if we inverse the entrance, instead of user->view, we could cache requests by model->view->user, this would allow to regenerate bunches of cache at while with less incoherence.
A more efficient cache could be a per field per user cache. As we may be using Redis for activities, we can take profit of it here too. No cache at all by default, (Redis per field per user cache)[https://github.com/jazzband/django-redis] configurable and then keep the serializer not cached, as it's not relevant for our usage.
Not on the proposed solution above... in terms of temporary fixes to Hubl's problem, why don't we override a CircleMemberViewSet so that the cache is reset manually on circle-member create or delete?
Obviously doesn't solve the underlying problem but should resolve the bug in the meantime
Not on the proposed solution above... in terms of temporary fixes to Hubl's problem, why don't we override a CircleMemberViewSet so that the cache is reset manually on circle-member create or delete?
The problem is that we need this temporary fix on everything. Circles, Projects, Notifications, Contacts, etc. everything that's used on Hubl. Disabling the cache will nearly lead to the exact result, at a really expensive performance cost.
I haven't seen the Redis per-field per-user cache before but sounds promising. @jbpasquier you're sure it will hit on all the correct times (e.g. when the update is required because of a ForeignKey as above)?
Part of the plan in #362 (closed) involves bringing in Celery/Redis so maybe that will help towards the effort. @jbpasquier do you have a good idea of the work that would be additionally involved?
I haven't seen the Redis per-field per-user cache before but sounds promising. @jbpasquier you're sure it will hit on all the correct times (e.g. when the update is required because of a ForeignKey as above)?
No certitude, it may need some tweak to reach a generic efficient behavior.
Part of the plan in #362 (closed) involves bringing in Celery/Redis so maybe that will help towards the effort. @jbpasquier do you have a good idea of the work that would be additionally involved?
I think that we can keep the current caching system if we inverse the entrance, instead of user->view, we could cache requests by model->view->user, this would allow to regenerate bunches of cache at while with less incoherence
@jbpasquier I like that your proposal here removes a lot of the complexity from the original suggestion. If I understand correctly:
on the save listener, we reset the cache for the model. E.g. when we save any Notification, the views like /notifications/ and all /users/../inbox/ will be reset
[my addition] we ditch the resource cache (we don't cache /users/x/), but we hit the container caches like /users/ and within a resource /users/x/inbox/, /users/x/circles
Is this what you had in mind? If so I can give an estimation for it
It's hard to visualise so I'd need to implement it to confirm that it would work. In my head it works and the impact would be:
we'd be gaining performance that we're not resetting the global LDListMixin cache every time we save anything
we'd be losing performance that we're not caching the whole resource, sub-containers and all. The user is getting serialized but not their containers (inbox, circles etc)
I think on the whole we'd be gaining performance
we'd be fixing this bug and probably #357 (closed) as well
we have one cache instead of two
we shed complexity from my original suggestion
we lose some performance from my original suggestion, namely that when I save Notification it doesn't only reset the cache for the user who has this notification, but all users who have it. however we could write the cache clearing as a hook in the Model, which allows it to override the cache reset and optimise it, to gain maximum performance without much more code
@balessan technically an LDP container can have more than one type of model in it. This isn't the case for us now but it may be in future? Similarly I think that all of these suggestions and the current system are still tightly coupled to the LDPSerializer and aren't batteries-included for #282 (closed) or #277
Have I missed anything ?
Finally, I haven't used Redis as a cache store before. I think that it's not required for the fix to this though, I'd like to open another issue for that and treat it as a next step?
I'm not sure to understand this one it doesn't only reset the cache for the user who has this notification, but all users who have it?
cache clearing as a hook in the Model
I think that's a must have, does not looks much more complicated but allows more flexibility.
Finally, I haven't used Redis as a cache store before. I think that it's not required for the fix to this though, I'd like to open another issue for that and treat it as a next step?
I'm not sure to understand this one it doesn't only reset the cache for the user who has this notification, but all users who have it?
It just means that the basic Model.clear_cache implementation will do cache[model] = None, meaning that all of the nested caches on that model are cleared too
We'll override it in Notification.clear_cache so that it's a bit cleverer and doesn't do that
Maybe instead of using the view path as well as the model we would be better off using the hierarchy cache[model][field][user]. Otherwise the design remains the same I guess
I'd estimate 10h for this, and I spent 2h finding the solution
4h base implementation + 1h30m testing this and #357 (closed) + 1h30m implementing clean_cache overrides in key models (Notification, Circle, Project) + 2h performance testing (see below) + 1h float time
12 hours equates to 108 tokens I think @alexbourlier . There was 18 tokens released already on #357 (closed) and I spent 9
For due date need to see with @balessan when it fits on the roadmap. I'd aim to devote a day to it and start & finish in the same day. I also had another question for @balessan about the design (#365 (comment 57339))
technically an LDP container can have more than one type of model in it. This isn't the case for us now but it may be in future? Similarly I think that all of these suggestions and the current system are still tightly coupled to the LDPSerializer and aren't batteries-included for #282 (closed) or #277
technically an LDP container can have more than one type of model in it. This isn't the case for us now but it may be in future?
Yes it may. I would say it will.
Okay, tricky one. Need to resolve this before the design is finalised I would say
We can separate the cache from LDPSerializer
We can use container_urlid as a cache key
However I'm not sure at the moment how to change the design so that the instances/models don't need to know which containers they're in when resetting the cache, since obviously the hierarchy cache[model][container_urlid][user] doesn't make sense if the container could have a host of models inside it
An initial idea, but it would be O(N) when the cache is set and O(M) when reset (where N is the number of items in the container, M the number of containers with records on the model):
cache records are aware of which model classes are serialized in the list. The cache is made up of one store cache[container_urlid][user] and a second meta_cache[model] = [<container_urlid>, <container_urlid>, ... ]
This would also add complexity but should remove the need for Model subclasses to optimise their caches. I'd make it overridable anyway so that the cache could be tweaked or disabled