They should be removed after a while in a routine.. @balessan@jbpasquier do you know if there is some Activity Pub guidance on the lifetime of an activity?
Update 07/05/2021
I looked into this one again and thought I should elaborate. The Linked Data Notifications spec refers to storing notifications and returning them
unless the activity is transient, MUST include the new id in the Location header ... The server MUST then add this new Activity to the outbox collection
transient activities do not need to be stored indefinitely - the examples in the spec are (intentionally) loose which implies it's the choice of the application developer, the examples given are chat messages and game notifications
the conclusion was that "Clarify that it's up to servers if they want to keep around objects as long as they want. If they want to delete objects, like maybe delete a bunch of game notifications, that's a-ok"
Hubzilla and others are using expiry keys on activities
In our case, at the moment, I think that the only Activitys which we're storing and which continue to be useful after they are sent are failed activities, used in debugging? @jbpasquier
We could make the default be to not store an activity at all, unless a server setting STORE_ACTIVITIES = 'all' is present or something? This could also be a setting on the Model meta but we might want to make this TODO until someone tells us they need it
The Linked Data Notifications spec doesn't seem to have this same flexibility (I've not researched it in as much detail). It might be a good time to clarify that we're following ActivityStreams/ActivityPub and not Linked Data Notifications, except where the two intersect?
In our case, at the moment, I think that the only Activitys which we're storing and which continue to be useful after they are sent are failed activities, used in debugging?
We're storing everything. Community's database actually own 103.495 activities. :-)
Only failed, targeted to Prosody, are useful. Others, well, usually I don't even know what's there purpose.
We could make the default be to not store an activity at all, unless a server setting STORE_ACTIVITIES = 'all' is present or something? This could also be a setting on the Model meta but we might want to make this TODO until someone tells us they need it
What about an approach more-or-less like the logger of Django? With a top layer array that would accept some filter(s) to keep activities?
Only failed, targeted to Prosody, are useful. Others, well, usually I don't even know what's there purpose.
Their purpose was to follow the spec really. Very occasionally successful ones are useful in debugging, usually because there's another which is failed
What about an approach more-or-less like the logger of Django? With a top layer array that would accept some filter(s) to keep activities?
To be honest I think that for now this might be a sledgehammer to crack a nut. I think that we could make do with this?
old activities are useful in debugging, @jbpasquier can we make the logger standard on the production servers? I could log their success to here instead, and when reproducing an activity we can set
Oh, right, Sentry is the actual place, I think - but not for debug things. I'm not sure that we want to keep every debug on a file on production? Maybe, we can add some settings like your, but commented, to allow an easier debugging?
For debugging when we're reproducing something I think we could just change the setting to
STORE_ACTIVITIES = 'verbose'
with this I had in mind that a record of historical activities might be useful, and somewhere in the admin logs would make sense - as with incoming HTTP logs
in other cases running more verbose logs would be useful as well though
We already have them, from the Apache proxy log by Alwaysdata
Maybe some kind of naive logrotate would avoid filling the disk space with useless logs while still keeping some days. @plup Could we? Drop debug logs on a predefined file on each servers and ensure to keep only 7 days worth of logs or something like that?
@balessan@jbpasquier I mentioned in the stand up that this is likely a big reason why the performances of activities are worse in production than in testing
Community's database actually own 103.495 activities. :-)
We do some database lookups to check that an activity is definitely valid before sending it, e.g.
defget_most_recent_sent_activity(source_obj,source_target_origin):# get a list of activities with the right typeactivities=ActivityModel.objects.filter(external_id=url,is_finished=True,type__in=['add','remove']).order_by('-created_at')[:10]# we are searching for the most recent Add/Remove activity which shares inbox, object and target/originforainactivities.all():astream=a.to_activitystream()obj=astream.get('object',None)target_origin=astream.get('target',astream.get('origin',None))ifobjisNoneortarget_originisNone:continueifsource_obj==objandsource_target_origin==target_origin:returnareturnNone
as you can see we limit the comparison to 10 but this is still a SELECT query ORDER BY on 103,500 resources
I realised that this wasn't in the scope of #362 (closed). If there's funding for it we could try it first and see how it effects reactivity. Obviously in any case we get the guaranteed bonus of freeing storage on the production databases
Yeah nice 👍 I had made a note of it a while back in another issue (#285)
With #332 I've started moving the ActivityPub stuff into a different repository though, so DjangoLDP-side I think extending the models with these kinds of optimisations would be wise
I'm moving everything I can into the new repository, designing it so that DjangoLDP can inject its behaviour and with something like the Rest Framework LDP (suggested) refactor in mind, whilst minimising scope creep to avoid spending the budget 😅 so far so good, haven't ran into any major issues. I should be able to push some code & documentation soon:tm:
I've done a little code duplication for things like the urlid field and allowing rdf_type on the Model Meta. One day I think these should belong to a Rest Framework LDP library which both DjangoLDP and Django-ActivityPub can extend
The Linked Data Notifications spec doesn't seem to have this same flexibility (I've not researched it in as much detail). It might be a good time to clarify that we're following ActivityStreams/ActivityPub and not Linked Data Notifications, except where the two intersect?
The LDN is only a protocol that describe the way senders (applications) can send messages to receivers (servers) and how consumers (applications) can retrieve them.
It does not describe the life-cycle of the content of the notification, but only of the notification by itself, we can notify whatever we want whenever we feel that's a necessity for whichever reason.
as you can see we limit the comparison to 10 but this is still a SELECT query ORDER BY on 103,500 resources
It would also save a lot of database space. To recap the main use of storing successful activities is to debug them in the context of unsuccessful ones or an issue with synchronisation (e.g. when I was debugging an issue in circle creation here djangoldp-circle#2 (closed)), but providing the setting STORE_ACTIVITIES = 'verbose' would provide well enough for this in my opinion
the main use of storing successful activities is to debug them in the context of unsuccessful ones
This isn't true 🤦 the main use of storing successful activities is to check past activities before sending a new one. For example we used a check_update_is_new function to prevent an Update activity being fired on every save - it verifies that the object being sent has updated information on the last successful update, first
This is still a bottleneck, though. I have some ideas for how we might refactor away the database access in this then we can continue with the same plan, but it would be good to discuss them next week