Storm supervisor configs overwrite with each other #62

clockfly · 2014-01-02T08:27:59Z

Hi,

When starting supervisor container, it will try to write config under /user/xx/.storm/appattempt_xx/conf. When starting multiple supervisors, they will share the same path. Then one supervisor may overwrite the setting of another supervisor, which will results in following errors.

2014-01-02 14:37:31,183 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 39 cluster_timestamp: 1388020955932 } attemptId: 1 } id: 3 } state: C_COMPLETE diagnostics: "Resource hdfs://IDHV22-01:8020/user/yarn/.storm/appattempt_1388020955932_0039_000001/conf changed on src filesystem (expected 1388644474340, was 1388644474889\n" exit_status: -1000

clockfly · 2014-01-02T09:29:29Z

Patch for this:

Index: StormAMRMClient.java
===================================================================
--- StormAMRMClient.java    (revision 23570)
+++ StormAMRMClient.java    (revision 23571)
@@ -182,8 +182,11 @@
               LocalResourceType.ARCHIVE, LocalResourceVisibility.APPLICATION));

     String appHome = Util.getApplicationHomeForId(appAttemptId.toString());
-    Path confDst = Util.createConfigurationFileInFs(fs, appHome,
+    String containerHome = appHome + Path.SEPARATOR + container.getId().getId();
+    
+    Path confDst = Util.createConfigurationFileInFs(fs, containerHome,
             this.storm_conf, this.hadoopConf);
+    
     localResources.put("conf", Util.newYarnAppResource(fs, confDst));

     launchContext.setLocalResources(localResources);

revans2 · 2014-01-02T14:36:32Z

The exact same config file should be output for each of the supervisors. This would only change if the AM/nimbus crashed and needed to be brought up on a new node, which should be handled by the app attempt id. Or if someone pushes new config values to the AM, which still needs some work so that the changes are reflected in the supervisors.

That is why it was made an info message and not an error or a warning. If you want to separate them out you can, but I would prefer moving the creation of the config file to when Nimbus comes up, and then it can be reused.

clockfly · 2014-01-02T15:13:17Z

Hi Bobby，

The key here is that supervisor will fail to start! It is not about the content, it is about the timestamp.

Here is the sequence:
write conf for supervsor A at time 100, and then start A.
write conf for supervor B at time 110, and then start B.
A is starting..

When A try to start, it will try to get the conf from distributed cache, and it expect the timestamp be 100, but actually it gets 110, so A fails to download resource, and A fails to start.

clockfly · 2014-01-02T15:15:45Z

Before this fix, I have to start supervisor one by one. When trying to start multiple supervisors, it won't work:
storm-yarn addSupervisors -appId=xx -supervisors=3

revans2 · 2014-01-02T15:20:09Z

I understand, that is why moving the creation of the file to happen only once would fix it, because the time stamp would not change after that. If you want me to merge this in for now I am fine with that. it just means we are now wasting a lot more HDFS name space. Which is a much smaller problem, I would just like another issue raised to try and reduce the namespace usage later on.

clockfly · 2014-01-02T15:33:53Z

OK, I will create a pull request for this, let's move this in and then
improve the file duplication later.
Since we will only have a copy when supervisor starts, then the problem
should not be big considing a cluster only has limited supervisors.

And maybe we should clean the folders on HDFS when shutdowning the
clusters. 1st folder for master, 2nd folder for supervisors, then no
problems any more.

On Thu, Jan 2, 2014 at 11:20 PM, Robert (Bobby) Evans <
[email protected]> wrote:

I understand, that is why moving the creation of the file to happen only
once would fix it, because the time stamp would not change after that. If
you want me to merge this in for now I am fine with that. it just means we
are now wasting a lot more HDFS name space. Which is a much smaller
problem, I would just like another issue raised to try and reduce the
namespace usage later on.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/62#issuecomment-31458756
.

clockfly mentioned this issue Jan 3, 2014

fix the issue that supervisor config will overwrite with each other #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storm supervisor configs overwrite with each other #62

Storm supervisor configs overwrite with each other #62

clockfly commented Jan 2, 2014

clockfly commented Jan 2, 2014

revans2 commented Jan 2, 2014

clockfly commented Jan 2, 2014

clockfly commented Jan 2, 2014

revans2 commented Jan 2, 2014

clockfly commented Jan 2, 2014

Storm supervisor configs overwrite with each other #62

Storm supervisor configs overwrite with each other #62

Comments

clockfly commented Jan 2, 2014

clockfly commented Jan 2, 2014

revans2 commented Jan 2, 2014

clockfly commented Jan 2, 2014

clockfly commented Jan 2, 2014

revans2 commented Jan 2, 2014

clockfly commented Jan 2, 2014