Integrating EDC and Django LDP
This issue is a recap of the discussions around the 6th of March 2025 about the next steps of the project.
Where are we ?
We currently have a very crude integration in the djangoldp-indexing package. This integration depends on the feature/dataspace-profile branch of the djangoldp-accountpackage.
What we do at the moment
- When accessing a local index (a pre-generated static file) on djangoldp instance (that is to be considered the data plane of a provider in the datasapce)
- We call the control plane of the consumer in the MVD to get the accessible catalog.
- The URL of the EDC connector is in a setting of the djangoldp instance
- This call is secured via an API key that we get from the dataspace profile of the logged-in djangoldp user (hence the dependence on a branch of djangoldp-account)
- We look in the catalog for an asset with a
idx:IndexEntry: <URL of the local index being accessed>- If found, we serve the static file
- If not found we raise a
403 forbidden
When following the links from the index, we end up getting URLs to actual resources stored on that djangoldp instance (which is fine, it's the goal of the indexing process).
Right now, we can follow these links and access the indexed resources but not for the right reason. We are allowed the access by the fact that we are logged-in as a django user on this instance (necessary to access the dataspace profile and then fetch the catalog), which also gives access to all resources on the server.
Where are we going ?
There are several issues we need to fix in this process :
- The local index sits on the provider's side and therefore shouldn't rely on a call to a consumer's control plane
- Getting the catalog and checking if it contains a reference to the index being accessed is really suboptimal and is promising to hit a performance bottleneck on the short term
- We need access to the indexed resources to be protected by the provider's EDC control planes, instead of using the internal permission system of djangoldp : We want to give access to a resource only if there is an active contract between the consumer accessing it and the provider providing it.
What we agreed upon on 06/03/2025
-
@robert will investigate the possibility to directly call the provider's control plane to
- get a yes/no response to the accessibility of a given index
- know if there is an active contract on a given asset
- This means reverse engineer the catalog request and, in the end, develop an extension of the connector
-
@sylvain.roquebert will implement (in coordination with @robert) a new permission check logic in djangoldp that relies on the provider's control plane.
- If there isn't an active contract one will be automatically negociated
- If the automatic negociation fails, and 403 with an explicit message will be raised
What we agreed upon 10/03/2025
Current EDC workflow for contract negotiation & data transfer:
Search & Discovery
- Get a catalog
- Find an assetId and its policyId you want
Contract Negotiation
- Initiate a contract negotiation (presenting your VCs needed for the policyId; this is currently handled by an extension in the MVD to automate this)
- Contract negotiation is asynchronous, so you get a receipt Id
- Query .../contractnegotiations/request to see the status of your negotiation e.g. FINALISED =
Data Transfer
- Once its FINALISED you take the contractAgreementId and initiate a transfer to .../transferprocesses<-- this is also asynchronous
- Periodically query .../transferprocesses/request to see the status of your transfer request.
- Once the transfer request is in a STARTED state, you can use the transferProcessesId to get - yet more - an auth token needed for the actual transfer via .../edrs/request (EndpointDataReference).
- Taking the auth token from the EDR, we can now query the public api of the provider, passing the token and retrieving the data. (NOTE: In this scenario, the public api acts as a proxy for the resources, which is different to our linked-data approach).
The proposal
For requesting access to a resource, our scenario will deviate somewhat from the EDC's MVD, as such, we simplify things by cutting the whole Data Transfer out.
Instead, we will say: "Do you have a contract negotiated for this asset?" if yes - here you go if no - please go and negotiate one
For the "contract check", I propose, a consumer to pass their contractId to the data-plane (djangoldp) when they want to 'access' a given resource.
Before the data-server returns/does not the resource, it would query its own management-api to check if it has a contractAgreement with this id ../contractnegotiations/request/{contractId}/agreement with the contractId provided by the consumer, which responds with:
- assetId
- consumerId
- providerId
- policies
- ....
(see screenshot for detailed example of the response)
The check then has 2 scenarios:
A. The contractId exists, so we check which assetId they want, who the consumer is, etc. and decide if its valid or not.
B. The contractId doesn't exist, return "No contract, please negotiate one if you want"
Implication 1
This puts the logic of the 'contract check' in the hands of the data-server. If we wanted to maintain the logic on the connector's side, an extension could handle that, but this seems like an 'improvement', rather than a 'blocker'; to move progress on?
Implication 2
The request for a resource requires negotiated contracted in-advance. If the consumer does not have a valid contract for this an asset, I guess a separate service could be triggered for helping to negotiate one if they haven't got it? - but it seems to me like a good idea to decouple this functionality anyway, from the 'contract check'.
=> The consumer would have to
- determine the asset to which the current resource is a part of
- get a list of his contract agreements
- see if there is a match
- If yes, build the proper payload to access it though the data plane (assetId, contractAgreementId, VC ?)
- If not, trigger a contract negotiation
Implication 3
Already alluded earlier, we cut EDC's data-transfer process out, which, in EDC's case, offered another scope for applying policies at, though, it results in more complexity regarding the overall flow (refer to the first message). Instead, in our case, if a contract has been negotiated and agreed upon (i.e. contractAgreementId + state FINALISED), then the consumer can get access to it. No policy applied at the transfer scope.
=> We will start by cutting off this level of policy check and will be able to get it back in a further iteration of the project
Implication 4
Index should include the assetId in its result, for a given resource.
Questions
- What's the mechanism for "blocking"/"protecting" the asset from the user, from directly accessing it via linked-data property (from the index)? Would we need any proxy server similar to what the EDC propose with the public api? (modifié)
