When creating a bunch of activities, the ActivityQueue take something from 10 to 60 seconds before actually send the activity.
To reproduce:
Create a DjangoLDP server following the Hubl format
You may want to clog the Prosody's part, you can emulate it with this node server using fastify+fastify-plugin. If you use it, you'll need to put the value of JABBER_HOST from the js file on the JABBER_DEFAULT_HOST of the template and http://localhost:4848 as PROSODY_HTTP_URL.
Create an RSA Key ./manage.py creatersakey
Register an administrator ./manage.py createsuperuser
Now, create an user, a circle, or save any resource related to DjangoLDP-Account, DjangoLDP-Circle, DjangoLDP-Project and go to http://localhost:8000/admin/djangoldp/activity/ notice how much time it gets to send any activity and in how much time you'll get something on the node server.
Just so it is written somewhere, this is the cause of one of the most pressing issues on Hubl at the moment. When someone adds a user to a circle, it sometimes takes minutes before being able to @mention her. Ours users conclude that adding that person to the circle didn't work. I personally get this bug reported to me almost everyday now.
Last week I discussed the ActivityQueueService with JB & potential redesigns:
Why the delay before sending the activity ?
Referring to this delay, during which the processor switches back to the main thread
We decided to include it because previously there were a lot of redundant activities being sent (especially with many-to-many fields) and the delay before sending allows some time for new activities to come along and invalidate it
Why are they sent in a single blocking thread ?
In Python having multiple Threads is an illusion, they're actually always in one thread. I can share my notes on the Python concurrency research I did when I made the decision to use multi-threading rather than multi-processing if you like, but from memory it was because using the queue is thread-safe where multi-processing isn't
Thread-safety
We decided against using Celery at the time because it adds a broker (like Redis) as an infrastructure dependency (post)
We didn't use asyncio because it requires threads to explicitly say when they will give up the processor
Questions about how it could be
Removing the sender delay
Ideally we'd remove this delay but we will need to find an alternative solution to the problem it resolved. The activity is scheduled in a listener (e.g. post_save), where obviously we can't know what might be scheduled later, except for example that a Create activity will be followed by an Update in Django
Threadsafe storage of scheduled activities
We store activities in a scheduled state because it allows us to revive them if for example the server goes down before they were delivered. I discussed with JB using the file-system to store the temporary activities and I think this would be a better solution
Sending activities concurrently
we could switch to a Celery/Redis based solution to have it manage the concurrency for us
This avoids reinventing the wheel but adds infrastructure dependencies for all users of DjangoLDP
When we made the first version(s) of the backlinks system it was fairly simple and so we decided it was better to avoid the dependencies than the programming overhead. Is this still the case?
we could use a multi-processing solution
i.e. parallelism instead of concurrency. A parallel process(es) runs the Activity Queue Worker(s) to send the activities
Using the file system for scheduled activities as suggested above should remove the issue of thread-safety in using a separate process
Without having done a full estimation I think refactoring to use Redis and using multiprocessing with the filesystem are similar scales of work
Waiting for the Activity response shouldn't block me from processing other activities
Shortcoming of the Python requests library which we're using. We should use an asynchronous variant, like aiohttp
Keep the actual infrastructure. Add a Celery+Redis alternative, make it activable/de-activable & de-activate/activate the actual infrastructure based on the configuration.
About the sender delay, let's keep it as low as possible, like <100ms. It sounds that the actual delay is per activity, meaning that 10 identical activities with a DEFAULT_ACTIVITY_DELAY to 3 may lead to 30 seconds awaited before actually send the activity.
@calummackervoy@balessan Not OK to answer messages on 15+ days
I can totally understand that you guys have plenty to do but answering us here takes one minute.
If this can be prioritized and done faster, that's really helpful.
That's one of the most painful thing in Hubl at the moment
@alexbourlier sorry about that. I didn't know the answer to your question but I suppose I could have replied "I don't know"
If this can be prioritized and done faster, that's really helpful
I was wondering about the relationship to #332. A priori it can be done alongside it but there will need to be good communication between those devs. Is one or the other a clear priority for FNK?
Yes, I think no letting conversations die is easy and it removes this sense of black hole that we sometimes get
I'm not able to answer regarding the priority between the two issues you're pointing out. My understanding is that they are both needed to have notifications on the job board working. That's our need: notifications on the job board.
@jbpasquier may have an opinion about how to prioritize those two, or whether they can be parallelized or not
For the meantime I opened an MR for a shorter activity delay (!209 (merged)), we could deploy it next week when JB's back
I change the delay from 3s to 100ms, in my testing on the pre-prod it didn't mean that redundant activities were sent and evidently the queue is a lot faster. Sorry I could've thought to do this sooner, or been less cautious in my original default!
No worries, I'm not sure we're able to nail the right way on our first try all the time. Coming from an English culture probably makes it worse (I'm joking)
P.S. I think that #380 and #381 are not required by FNK since you plan to use a new celery-based solution
There were some discussions on another thread about whether this issue is a core-team concern or a client package. I think that #382 at least is no-doubt an extension to DjangoLDP, it's not a bug that we decided not to use celery
Do I have a green light to provide an estimation for #382 then?