Server hangs when LDPSerializer hits 500

added BUG label

mentioned in issue applications/hubl#966 (closed)

mentioned in issue management/core-team#14

Ping @jbpasquier and @plup for inspirations. If you have any ideas on how to avoid getting no logs and how to track some hanged requests, it would be nice to document that here :-)

I just know by experience that:

it happens frequently
it's really frustrating not to be able to get our error message

it's really frustrating not to be able to get our error message

This is so true...

On the current exception related to circles, I didn't reached a way to get any proper logs. @plup neither when he tried last Wednesday.

We can use the DjangoLDP logger remotely, right ? Locally print statements work

Any error and any logger.error should go to the sentry server... looking at that there are some issues which could be the culprit for djangoldp_community

For me I want it to return 500 in the response. Yesterday I spent a couple of hours fixing an AttributeError that would've taken 5 minutes if I'd gotten the 500 on debug

I think we have a major issue with our logs. Those kind of errors should go on sentry but anyway they MUST be logged in the server log files. And they aren't...

So the question is: do we produce enough logs or do we have a problem with the logging on the hosting part ?

@decentral1se @3wc just on the off chance... have either of you had behaviour like this with Django or rest framework before?

So far it seems to me that it happens either in our ModelSerializer (LDPSerializer, I think only in a nested serializer - i.e. a nested object POSTed in the request). In the case of my AttributeError yesterday it was happening in a post_save listener, triggered by a nested serializer save

On another instance of this bug I spent some time in breakpoints which were looping in Django's database backend code and the actual problem was an IntegrityError because an object had been saved with a NULL value on a NOT NULL field (#359 )

As I typed this I realise, our serializer create is done in an atomic transaction

That could be relevant. I'm aware that if the transaction failed, it would raise the Error and rollback the transaction changes. Maybe Django gets stuck because a post_save listener has made some related changes, or something along these lines..?

Does this sound plausible with djangoldp_community's setup @jbpasquier ? And for our coopstarter problems @balessan ?

The NestedLDPSerializer extends the behaviour of the LDPSerializer so I wonder that there could be a "transaction within a transaction" or something like that.. I'll look into the Rest Framework behaviour to see if that's a possible track, but first I'll try removing the transaction code and see if that changes the behaviour

Does this sound plausible with djangoldp_community's setup

Could be, yes, as we need to handle both resources and the m2m at the same time and each of them have listeners on it

@calummackervoy yes for my old coopstarter problems I am quite sure it was linked to some errors in templates in an email notifications which was triggered on post_save.

Something like it.

@jbpasquier I was able to replicate this locally yesterday but I think that I may need to replicate it on test (1?) today... you mentioned that the bug survives server restarts? Do you know how I can reset my test environment after reproducing it?

You need to watch current processes with a top via SSH and then kill every workers from Alwaysdata (Advanced -> Processus).
Wait a bit between each one, and do a call to https://api.test1.startinblox.com/ from a browser to pop another worker.
If after 2-3 minutes no worker gets to 100% CPU, you won this fight! Else, try again.

Edit:
Notice that sometimes, Alwaysdata refuse to spawn a worker again. Maybe it's a bug on their side, I don't know. When it happens, kill the master thread.
Whenever you start a new server (= fresh DB), you have to kill the master, otherwise it'll still keep the old one
If your master does not pop back on AD, you lose. AD will automatically pop it next night, and until there you need to crash another server.

Replicated the hanging behaviour in a unit test on DjangoLDP Community behaviour there suggests that it's not something to do with the transaction

I found that this error can be replicated when raising any exception in the LDPSerializer e.g.

class LDPSerializer(ModelSerializer):
    # ...

    def to_internal_value(self, data):
        raise Exception("hello world")

But not in every test... a priori it will be those involving a call on a URL via the test client

mentioned in issue #359

changed title from Server hangs when it hits 500 to Server hangs when LDPSerializer hits 500

mentioned in issue #365 (closed)

mentioned in issue djangoldp-poll#5 (closed)

mentioned in merge request djangoldp-joboffer!18 (merged)

Could not reproduce

closed

Server hangs when LDPSerializer hits 500

Designs

Child items ...

Activity