[jira] [Created] (APEXCORE-743) Killed container is shown as running

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Created] (APEXCORE-743) Killed container is shown as running

JIRA jira@apache.org
Sandesh created APEXCORE-743:

             Summary: Killed container is shown as running
                 Key: APEXCORE-743
                 URL: https://issues.apache.org/jira/browse/APEXCORE-743
             Project: Apache Apex Core
          Issue Type: Bug
            Reporter: Sandesh

Here is the behavior

1. Container Heartbeat timeout happened
2. AppMaster sends the request to kill the container
3. Container is killed
4.  AppMaster state is not updated and no new container was allocated

After analyzing the code here is the possible reason
1. Send the kill request to NM
2. Container killed by NM, but NM callback doesn't happen. RecoverContainer is called in NM callback, which in this case is not called.
3. AppMaster state is not updated

Possible fix.
Have a timeout for NM callback, so that if NM doesn't respond that the container is killed in time, call the RecoverContainer.

This message was sent by Atlassian JIRA