Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storm supervisor configs overwrite with each other #62

Open
clockfly opened this issue Jan 2, 2014 · 6 comments
Open

Storm supervisor configs overwrite with each other #62

clockfly opened this issue Jan 2, 2014 · 6 comments

Comments

@clockfly
Copy link
Contributor

clockfly commented Jan 2, 2014

Hi,

When starting supervisor container, it will try to write config under /user/xx/.storm/appattempt_xx/conf. When starting multiple supervisors, they will share the same path. Then one supervisor may overwrite the setting of another supervisor, which will results in following errors.

2014-01-02 14:37:31,183 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 39 cluster_timestamp: 1388020955932 } attemptId: 1 } id: 3 } state: C_COMPLETE diagnostics: "Resource hdfs://IDHV22-01:8020/user/yarn/.storm/appattempt_1388020955932_0039_000001/conf changed on src filesystem (expected 1388644474340, was 1388644474889\n" exit_status: -1000

@clockfly
Copy link
Contributor Author

clockfly commented Jan 2, 2014

Patch for this:

Index: StormAMRMClient.java
===================================================================
--- StormAMRMClient.java    (revision 23570)
+++ StormAMRMClient.java    (revision 23571)
@@ -182,8 +182,11 @@
               LocalResourceType.ARCHIVE, LocalResourceVisibility.APPLICATION));

     String appHome = Util.getApplicationHomeForId(appAttemptId.toString());
-    Path confDst = Util.createConfigurationFileInFs(fs, appHome,
+    String containerHome = appHome + Path.SEPARATOR + container.getId().getId();
+    
+    Path confDst = Util.createConfigurationFileInFs(fs, containerHome,
             this.storm_conf, this.hadoopConf);
+    
     localResources.put("conf", Util.newYarnAppResource(fs, confDst));

     launchContext.setLocalResources(localResources);

@revans2
Copy link
Collaborator

revans2 commented Jan 2, 2014

The exact same config file should be output for each of the supervisors. This would only change if the AM/nimbus crashed and needed to be brought up on a new node, which should be handled by the app attempt id. Or if someone pushes new config values to the AM, which still needs some work so that the changes are reflected in the supervisors.

That is why it was made an info message and not an error or a warning. If you want to separate them out you can, but I would prefer moving the creation of the config file to when Nimbus comes up, and then it can be reused.

@clockfly
Copy link
Contributor Author

clockfly commented Jan 2, 2014

Hi Bobby,

The key here is that supervisor will fail to start! It is not about the content, it is about the timestamp.

Here is the sequence:
write conf for supervsor A at time 100, and then start A.
write conf for supervor B at time 110, and then start B.
A is starting..

When A try to start, it will try to get the conf from distributed cache, and it expect the timestamp be 100, but actually it gets 110, so A fails to download resource, and A fails to start.

@clockfly
Copy link
Contributor Author

clockfly commented Jan 2, 2014

Before this fix, I have to start supervisor one by one. When trying to start multiple supervisors, it won't work:
storm-yarn addSupervisors -appId=xx -supervisors=3

@revans2
Copy link
Collaborator

revans2 commented Jan 2, 2014

I understand, that is why moving the creation of the file to happen only once would fix it, because the time stamp would not change after that. If you want me to merge this in for now I am fine with that. it just means we are now wasting a lot more HDFS name space. Which is a much smaller problem, I would just like another issue raised to try and reduce the namespace usage later on.

@clockfly
Copy link
Contributor Author

clockfly commented Jan 2, 2014

OK, I will create a pull request for this, let's move this in and then
improve the file duplication later.
Since we will only have a copy when supervisor starts, then the problem
should not be big considing a cluster only has limited supervisors.

And maybe we should clean the folders on HDFS when shutdowning the
clusters. 1st folder for master, 2nd folder for supervisors, then no
problems any more.

On Thu, Jan 2, 2014 at 11:20 PM, Robert (Bobby) Evans <
[email protected]> wrote:

I understand, that is why moving the creation of the file to happen only
once would fix it, because the time stamp would not change after that. If
you want me to merge this in for now I am fine with that. it just means we
are now wasting a lot more HDFS name space. Which is a much smaller
problem, I would just like another issue raised to try and reduce the
namespace usage later on.


Reply to this email directly or view it on GitHubhttps://github.com//issues/62#issuecomment-31458756
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants