Better threadsafe storage of ScheduledActivities
We store activities in a scheduled state because it allows us to revive them if for example the server goes down before they were delivered. I discussed with JB using the file-system to store the temporary activities and I think this would be a better solution
Why change this ?
It will allow us to have multiple workers running Activity queue workers (while remaining thread-safe, read-only access of the files), which will make it faster (e.g. when using Celery - #382 )
Currently an Activity
is saved only when it is delivered. By replacing the ScheduledActivity
with a save of the (undelivered) Activity, it allows us to create it with the correct urlid
and thus send Activities with their id
in the payload (#332 )
Proposal
-
An
Activity
should be created prior to sending (send_activity
), withdelivered=False
-
send_activity
should save the activity body to a new file in a special directory, with the activity serialization - it should be rendered into JSON-LD for sending, saved to a filesystem directory
-
remove
ScheduledActivity
model
^ estimate: 6h
- Activity response should result in updating the Activity, not creating a new one
-
The ActivityQueueService should
revive_activities
from the filestorage - concluding the success/failure of Activity delivery should delete the file from the filesystem
^ estimate: 6h
- The ActivityQueue worker will need to be adjusted to review the scheduled activity files to see whether the activity is new or outdated
Here is where it gets a little tricky, because reading every file in the directory might be expensive if the activity queue is busy, and because the fields being used (to
, published
) are not stored in the serialized activity but as metadata on the Activity
model. It would be valid to serialize them, or the information could be stored in the scheduled activity file
^ estimate: TODO
- Live testing: estimate 4h
Unit tests and float time are included as part of the estimations on individual tasks
Alternatives
We could possibly use the Activity
model directly, without filesystem, and share the information on other activities outside of the worker. This makes more sense with Celery (#382 ) than with the native ActivityQueueService
though, since the worker needs to access the other scheduled activities to see if the activity is still valid and should be sent (see #362 (comment 52917) for why we do that). The TL;DR is that we send backlinks automatically based on links and this generates redundant activities that need to be filtered to prevent excessive communication