-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storm supervisor configs overwrite with each other #62
Comments
Patch for this: Index: StormAMRMClient.java
===================================================================
--- StormAMRMClient.java (revision 23570)
+++ StormAMRMClient.java (revision 23571)
@@ -182,8 +182,11 @@
LocalResourceType.ARCHIVE, LocalResourceVisibility.APPLICATION));
String appHome = Util.getApplicationHomeForId(appAttemptId.toString());
- Path confDst = Util.createConfigurationFileInFs(fs, appHome,
+ String containerHome = appHome + Path.SEPARATOR + container.getId().getId();
+
+ Path confDst = Util.createConfigurationFileInFs(fs, containerHome,
this.storm_conf, this.hadoopConf);
+
localResources.put("conf", Util.newYarnAppResource(fs, confDst));
launchContext.setLocalResources(localResources); |
The exact same config file should be output for each of the supervisors. This would only change if the AM/nimbus crashed and needed to be brought up on a new node, which should be handled by the app attempt id. Or if someone pushes new config values to the AM, which still needs some work so that the changes are reflected in the supervisors. That is why it was made an info message and not an error or a warning. If you want to separate them out you can, but I would prefer moving the creation of the config file to when Nimbus comes up, and then it can be reused. |
Hi Bobby, The key here is that supervisor will fail to start! It is not about the content, it is about the timestamp. Here is the sequence: When A try to start, it will try to get the conf from distributed cache, and it expect the timestamp be 100, but actually it gets 110, so A fails to download resource, and A fails to start. |
Before this fix, I have to start supervisor one by one. When trying to start multiple supervisors, it won't work: |
I understand, that is why moving the creation of the file to happen only once would fix it, because the time stamp would not change after that. If you want me to merge this in for now I am fine with that. it just means we are now wasting a lot more HDFS name space. Which is a much smaller problem, I would just like another issue raised to try and reduce the namespace usage later on. |
OK, I will create a pull request for this, let's move this in and then And maybe we should clean the folders on HDFS when shutdowning the On Thu, Jan 2, 2014 at 11:20 PM, Robert (Bobby) Evans <
|
Hi,
When starting supervisor container, it will try to write config under /user/xx/.storm/appattempt_xx/conf. When starting multiple supervisors, they will share the same path. Then one supervisor may overwrite the setting of another supervisor, which will results in following errors.
2014-01-02 14:37:31,183 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 39 cluster_timestamp: 1388020955932 } attemptId: 1 } id: 3 } state: C_COMPLETE diagnostics: "Resource hdfs://IDHV22-01:8020/user/yarn/.storm/appattempt_1388020955932_0039_000001/conf changed on src filesystem (expected 1388644474340, was 1388644474889\n" exit_status: -1000
The text was updated successfully, but these errors were encountered: