Integrating EDC and Django LDP

This issue is a recap of the discussions around the 6th of March 2025 about the next steps of the project.

Where are we ?

We currently have a very crude integration in the djangoldp-indexing package. This integration depends on the feature/dataspace-profile branch of the djangoldp-accountpackage.

What we do at the moment

When accessing a local index (a pre-generated static file) on djangoldp instance (that is to be considered the data plane of a provider in the datasapce)
We call the control plane of the consumer in the MVD to get the accessible catalog.
- The URL of the EDC connector is in a setting of the djangoldp instance
- This call is secured via an API key that we get from the dataspace profile of the logged-in djangoldp user (hence the dependence on a branch of djangoldp-account)
We look in the catalog for an asset with a idx:IndexEntry: <URL of the local index being accessed>
- If found, we serve the static file
- If not found we raise a 403 forbidden

When following the links from the index, we end up getting URLs to actual resources stored on that djangoldp instance (which is fine, it's the goal of the indexing process).

Right now, we can follow these links and access the indexed resources but not for the right reason. We are allowed the access by the fact that we are logged-in as a django user on this instance (necessary to access the dataspace profile and then fetch the catalog), which also gives access to all resources on the server.

Where are we going ?

There are several issues we need to fix in this process :

The local index sits on the provider's side and therefore shouldn't rely on a call to a consumer's control plane
Getting the catalog and checking if it contains a reference to the index being accessed is really suboptimal and is promising to hit a performance bottleneck on the short term
We need access to the indexed resources to be protected by the provider's EDC control planes, instead of using the internal permission system of djangoldp : We want to give access to a resource only if there is an active contract between the consumer accessing it and the provider providing it.

What we agreed upon on 06/03/2025

@robert will investigate the possibility to directly call the provider's control plane to
- get a yes/no response to the accessibility of a given index
- know if there is an active contract on a given asset
- This means reverse engineer the catalog request and, in the end, develop an extension of the connector
@sylvain.roquebert will implement (in coordination with @robert) a new permission check logic in djangoldp that relies on the provider's control plane.
- If there isn't an active contract one will be automatically negociated
- If the automatic negociation fails, and 403 with an explicit message will be raised

What we agreed upon 10/03/2025

Current EDC workflow for contract negotiation & data transfer:

Search & Discovery

Get a catalog
Find an assetId and its policyId you want

Contract Negotiation

Initiate a contract negotiation (presenting your VCs needed for the policyId; this is currently handled by an extension in the MVD to automate this)
Contract negotiation is asynchronous, so you get a receipt Id
Query .../contractnegotiations/request to see the status of your negotiation e.g. FINALISED =

Data Transfer

Once its FINALISED you take the contractAgreementId and initiate a transfer to .../transferprocesses<-- this is also asynchronous
Periodically query .../transferprocesses/request to see the status of your transfer request.
Once the transfer request is in a STARTED state, you can use the transferProcessesId to get - yet more - an auth token needed for the actual transfer via .../edrs/request (EndpointDataReference).
Taking the auth token from the EDR, we can now query the public api of the provider, passing the token and retrieving the data. (NOTE: In this scenario, the public api acts as a proxy for the resources, which is different to our linked-data approach).

The proposal

For requesting access to a resource, our scenario will deviate somewhat from the EDC's MVD, as such, we simplify things by cutting the whole Data Transfer out.

Instead, we will say: "Do you have a contract negotiated for this asset?" if yes - here you go if no - please go and negotiate one

For the "contract check", I propose, a consumer to pass their contractId to the data-plane (djangoldp) when they want to 'access' a given resource.

Before the data-server returns/does not the resource, it would query its own management-api to check if it has a contractAgreement with this id ../contractnegotiations/request/{contractId}/agreement with the contractId provided by the consumer, which responds with:

assetId
consumerId
providerId
policies
....

(see screenshot for detailed example of the response)

The check then has 2 scenarios:

A. The contractId exists, so we check which assetId they want, who the consumer is, etc. and decide if its valid or not.
B. The contractId doesn't exist, return "No contract, please negotiate one if you want"

Implication 1

This puts the logic of the 'contract check' in the hands of the data-server. If we wanted to maintain the logic on the connector's side, an extension could handle that, but this seems like an 'improvement', rather than a 'blocker'; to move progress on?

Implication 2

The request for a resource requires negotiated contracted in-advance. If the consumer does not have a valid contract for this an asset, I guess a separate service could be triggered for helping to negotiate one if they haven't got it? - but it seems to me like a good idea to decouple this functionality anyway, from the 'contract check'.

=> The consumer would have to

determine the asset to which the current resource is a part of
get a list of his contract agreements
see if there is a match
If yes, build the proper payload to access it though the data plane (assetId, contractAgreementId, VC ?)
If not, trigger a contract negotiation

Implication 3

Already alluded earlier, we cut EDC's data-transfer process out, which, in EDC's case, offered another scope for applying policies at, though, it results in more complexity regarding the overall flow (refer to the first message). Instead, in our case, if a contract has been negotiated and agreed upon (i.e. contractAgreementId + state FINALISED), then the consumer can get access to it. No policy applied at the transfer scope.

=> We will start by cutting off this level of policy check and will be able to get it back in a further iteration of the project

Implication 4

Index should include the assetId in its result, for a given resource.

Questions

What's the mechanism for "blocking"/"protecting" the asset from the user, from directly accessing it via linked-data property (from the index)? Would we need any proxy server similar to what the EDC propose with the public api? (modifié)

Edited Mar 17, 2025 by Robert Sahakyan