Activities and notifications would both benefit from a queue service, allowing:
resending of undelivered requests for x attempts
ignoring duplicate notifications
I propose that we develop a system to provide this within DjangoLDP
There are many libraries which provide this functionality, standing out for me are Celery, DjangoQ and Huey. I've used Celery before and found it comprehensive, if a bit of a pain to deploy with Elastic Beanstalk (AWS)
The objectives of the Queue are to:
attempt to resend the activity 3 times (by default) if it fails. If it's completely unable, save the Activity with success = False
if there are multiple activities scheduled on the same (resource, inbox) combination, send only the most recent.. for example when Create/Update is fired multiple times
if the server is restarted, recover the scheduled activities on the queue
thread-safe queue usage
Possibly also:
validate Update activity objects have new info before sending the notification (#234 (closed) )
better admin display for activities
after an expiry date, old Activity objects are deleted, with a cron job? Seems to be omitted from the ActivityStreams spec
view to get the status of the ActivityQueue (admin)
if unable to send an activity inform a sentry server: #281 (closed)
I did some comparisons of the libraries, Celery is the best option (most actively supported, most popular, compatible). Celery supports the most popular brokers (e.g. Redis, Amazon SQS, RabbitMQ), which can be configured by the user in settings.py. However it requires the use of a broker, and so server admins using DjangoLDP apps will need to deploy a broker such as Redis to use the feature
I think that I could do the DjangoLDP side of things in 2 days, but each client who wished to use it would have the deployment/infrastructure overhead of the broker
Without Brokers
We can avoid the need for brokers by using Advanced Python Scheduler. It doesn't introduce any hard dependencies and doesn't require infrastructure changes. For this option with testing I'd estimate 1.5 days
Conclusion
My preference would be to implement Advanced Python Scheduler in DjangoLDP, to avoid unnecessary infrastructure overhead for our clients
That said I think that it should be written to be modular, so that a client could implement a celery-based solution if they wish (e.g. for the scaling benefits of using workers)
For the record: the absence of this is causing some circles on Hubl to have incomplete lists of members when a user try to @mention another one.
Saving the circle again usually solves the problem but most users can't do so because they lack the appropriate permissions or knowledge about this workaround, and won't report the problem leading to miserable UX, dissatisfaction, and eventually death threats on @balessan's mother.
Now I know we all want to avoid such a strained situation, therefore I mention it here. Any idea when this will be adressed?
Starting this today :) @alexbourlier could you explain to me please, why does the lack of ActivityQueue cause the incomplete list of members? It's because a backlink activity was lost and not resent?
Because... when the list of members of a circle is updated, a update HTTP request is then sent to Prosody, and then Prosody is powering the list of usernames that pops up in the chat when someone hits @. So if something fails, and is not resent, then Prosody is not updated correctly and when you hit @ to mention someone you just added to a circle, she might not be in the list.
Saving the circle again might solve the problem, but you might not be able to do that if you are not an admin or the owner of the circle.
Unrelated but I'm pretty worried that the "top priorities" are shipped in a month. @balessan@calummackervoy Shouldn't we ask for JB Lemée to intervene? He explicitly said he was looking for some work, and it feels to me we are too slow but... I'm far from the topic and you know better. I'm just suggesting.
Just so you guys know, Christophe Henry also said he was available if we had anything for him to do.
Hi Alex, the due date was updated last week to 23rd September after I discussed it with @balessan. I like to quote dates to clients which I'm certain I will meet :) next time I will give you two - it could be done by next Wednesday, it will be done by the 23rd September, if God wills it
I'd prefer to work on this issue personally because I've started it and I'll enjoy it, but if you want me to work on something else let me know.. I really liked working with Christophe on the time we spent together at the start of the project, and I haven't worked with JB Lemée yet but it's clear from his code that he knows his stuff..
why does the lack of ActivityQueue cause the incomplete list of members? It's because a backlink activity was lost and not resent?
It's because we need to upgrade the way djangoldp-notifications works to use this queue too. We can upgrade it to Activities, as it would make sense too. Need some sync with @MattJ though.
Or we can keep two different queues, but feels redundant.
Certainly LDPNotifications should be changed to use the ActivityPubService during sending. This is a very minor change but removes some code duplication
In fact, we want something more sophisticated than regular timeouts/retries, right? Activities which fail to send are a difference in state with the receiving server
I would propose that if a timeout is hit on a request, then we should reschedule the activity for an hour later?
If it fails three times, then the Activity is stored as unsuccessful
Then it's the responsibility of the receiver, when it reboots, to request from its peers activities which have failed? I'm in the only-works-on-DjangoLDP-sea and I can't see the shore anymore
EDIT: there was a suggestion to send these to a Sentry server. This could be a configured endpoint by the settings for who to tell.. if it's unable to let the Sentry know then I don't know what we can do
attempt to resend the activity several (3?) times if it fails
if there are multiple activities scheduled on the same (resource, inbox) combination, send only the most recent (even if the types are different - e.g. Create, Update, Delete, Add, Remove ... or is there an order of precedence?)
extend Activity to store success status
possibly: validate Update activity objects have new info before sending the notification (#234 (closed) )?
possibly: change the way Activity is stored to be more specific than payload (BinaryField)?
possibly: after an expiry date, old Activity objects are deleted, with a cron job? I can't find in the ActivityPub spec when it's acceptable to delete old activities
if the server is restarted, recover the scheduled activities on the queue
if there are multiple activities scheduled on the same (resource, inbox) combination, send only the most recent (even if the types are different - e.g. Create, Update, Delete, Add, Remove ... or is there an order of precedence?)
attempt to resend the activity several (3?) times if it fails
And
possibly: validate Update activity objects have new info before sending the notification (#234 (closed) )?
To support the fact that CREATE activities are triggering UPDATE activities too. Clearer ?
possibly: after an expiry date, old Activity objects are deleted, with a cron job? I can't find in the ActivityPub spec when it's acceptable to delete old activities
Looks like there is no good practices on that. We have to decide I guess.
attempt to resend the activity several (3?) times if it fails
if there are multiple activities scheduled on the same (resource, inbox) combination, send only the most recent (even if the types are different - e.g. Create, Update, Delete, Add, Remove ... or is there an order of precedence?)
If you miss my Created activity, even after 3 retry, but you receive my Delete one, what will happen?
attempt to resend the activity several (3?) times if it fails
Maybe don't retry on HTTP code 4xx.
extend Activity to store success status
+Something than can handle/manage to send the failure (3 retries) to a Sentry.
@balessan How should we handle the Sentry server on server side? Some global variable on the settings.py/packages.yml?
possibly: after an expiry date, old Activity objects are deleted, with a cron job? I can't find in the ActivityPub spec when it's acceptable to delete old activities
Can't find any mention of something asking to keep them. Where does it ask to save them?
+A way to plug the djangoldp-notification queue on this one.
If you miss my Created activity, even after 3 retry, but you receive my Delete one, what will happen?
I started a thread on this above. Added the sentry suggestion into my comment. Should we open an issue for this? I guess the Sentry isn't just useful for this issue
Where does it ask to save them?
It's mostly for supporting third-party consumers (see LDN spec). It's also used to implement activities such as Undo
+A way to plug the djangoldp-notification queue on this one
I started a thread on this above. Added the sentry suggestion into my comment. Should we open an issue for this? I guess the Sentry isn't just useful for this issue
Yes, I think so
I do not know but the semapps team was suggesting a SIB component to display activities so I guess it's common to save them :-)
Would be really nice to have a component solid-queue for displaying the status of the ActivityQueue.