Better threadsafe storage of ScheduledActivities

We store activities in a scheduled state because it allows us to revive them if for example the server goes down before they were delivered. I discussed with JB using the file-system to store the temporary activities and I think this would be a better solution

Why change this ?

It will allow us to have multiple workers running Activity queue workers (while remaining thread-safe, read-only access of the files), which will make it faster (e.g. when using Celery - #382 )

Currently an Activity is saved only when it is delivered. By replacing the ScheduledActivity with a save of the (undelivered) Activity, it allows us to create it with the correct urlid and thus send Activities with their id in the payload (#332 )

Proposal

An Activity should be created prior to sending (send_activity), with delivered=False
send_activity should save the activity body to a new file in a special directory, with the activity serialization
it should be rendered into JSON-LD for sending, saved to a filesystem directory
remove ScheduledActivity model

^ estimate: 6h

Activity response should result in updating the Activity, not creating a new one
The ActivityQueueService should revive_activities from the filestorage
concluding the success/failure of Activity delivery should delete the file from the filesystem

^ estimate: 6h

The ActivityQueue worker will need to be adjusted to review the scheduled activity files to see whether the activity is new or outdated

Here is where it gets a little tricky, because reading every file in the directory might be expensive if the activity queue is busy, and because the fields being used (to, published) are not stored in the serialized activity but as metadata on the Activity model. It would be valid to serialize them, or the information could be stored in the scheduled activity file

^ estimate: TODO

Live testing: estimate 4h

Unit tests and float time are included as part of the estimations on individual tasks

Alternatives

We could possibly use the Activity model directly, without filesystem, and share the information on other activities outside of the worker. This makes more sense with Celery (#382 ) than with the native ActivityQueueService though, since the worker needs to access the other scheduled activities to see if the activity is still valid and should be sent (see #362 (comment 52917) for why we do that). The TL;DR is that we send backlinks automatically based on links and this generates redundant activities that need to be filtered to prevent excessive communication

0 of 9 checklist items completed · Edited 3 years ago

Better threadsafe storage of ScheduledActivities

Why change this ?

Proposal

Alternatives

Designs

Child items ...

Activity