Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command line arguments from json file #33

Open
dnr2 opened this issue May 7, 2013 · 19 comments
Open

Command line arguments from json file #33

dnr2 opened this issue May 7, 2013 · 19 comments

Comments

@dnr2
Copy link
Member

dnr2 commented May 7, 2013

We started implementing in issue #3 the -in option which allowed the user to specify a json file containing the name of the projects that were going to be downloaded and analysed by Groundhog. Now we are going to improve this -in feature in a way that the json file will provide not only the name of the projects, but also the major arguments that would be passed through command line.

We will have to define the structure of the json file, as well as decide which arguments it will contain. The list of current arguments are (extracted from the Option class):

Option(name="-forge", usage="forge to be used in search and crawling process")
Option(name="-dest", usage="destination folder into which projects will be downloaded") 
Option(name="-out", usage="output folder to metrics files")
Option(name="-datetime", usage="datetime of projects source code to be processed")
Option(name="-nprojects", usage="maximum number of projects to be downloaded and processed")
Option(name="-nthreads", usage="maximum number of concurrent threads")  
Option(name="-o", usage="determine the output format of the metrics")   
Arguments( usage="list of names of the projects to be downloaded and processed" )

Therefore I believe that a good structure to the json file would be more or less like:

{
    "forge": "github",
    "dest": "C:/groundhog/dest",
    "out": "C:/groundhog/metrics",
    "datetime": "2012-07-01_12_00",
    "nprojects": 30,
    "nthreads": 4,
    "outputformat": "csv",
    "search": [
        { "project":"rails" },
        { "project":"bootstrap" },
        { "username":"gustavopinto" }
    ]
}

P.S.: @rodrigoalvesvieira argued that it would be better to omit some arguments such as "nthreads" (only allow it in the command line itself) because this json file should only provide information concerning the projects and the searching, not the details of the computation, but we can discuss his point of view.

@gustavopinto
Copy link
Member

Sorry, maybe I missed the point. Is this json the file that will be provided to java -jar groundhog.jar ... -in projects.json?

If so, will be user responsibility to create this file? Or groundhog will, somehow, create this file? I think, for me, it will be very difficult to create a file like this by hand, which could be an adoption barrier of groundhog.

@dnr2
Copy link
Member Author

dnr2 commented May 8, 2013

Yes @gustavopinto, that was the initial idea. We thought that in the future, or even now, groundhog may require too many parameters to be passed through command line and that it would be tedious for the user to write each single parameter every time they open a new console/terminal. Nevertheless, some terminals limit the size of the command line parameters [1] , (although the original -in option was already solving this). The json file would be a solution for these problems.

Another advantage is that whenever we want to create a new parameter that may take many arguments (like searching for projects by usernames) it would be easy to adapt this json file to the new requirements.

I also think that this json wouldn't be so difficult to create/understand (we could also provide a sample json file). Besides, the user will still be able to use the traditional command line parameters, so anyone that is not familiar with json or the -in input format will still be capable of using groundhog.

But I understand that this format is not so user friendly. So we could change it or, maybe, discard this idea and close this issue. What you guys think? @fernandocastor, @rodrigoalvesvieira

[1] http://askubuntu.com/questions/14081/what-is-the-maximum-length-of-command-line-arguments-in-gnome-terminal

@fernandocastor
Copy link
Member

I think this is, at least currently, our best option to specify the search parameters. Your arguments just strengthened this impression, @dnr2. I don't think using json is an obstacle as it would be if we employed XML or a domain-specific language. As for the parameters, we should focus on the ones we already know and create our system so that it is extensible.

@gustavopinto
Copy link
Member

hmm.. ok! great arguments! It really change my opinion 😉

@gustavopinto
Copy link
Member

Did you have already started the implementation of this issue?

@rodrigoalvesvieira
Copy link

@dnr2
Copy link
Member Author

dnr2 commented May 10, 2013

Not yet @gustavopinto, I normally assign myself to an issue whenever I start implementing it.

@rodrigoalvesvieira
Copy link

oops

@gustavopinto
Copy link
Member

I changed a bit the json format in the search attribute.

{
    "forge": "github",
    "dest": "C:/groundhog/dest",
    "out": "C:/groundhog/metrics",
    "datetime": "2012-07-01_12_00",
    "nprojects": 30,
    "nthreads": 4,
    "outputformat": "csv",
    "search": {
        "projects": ["rails", "bootstrap"],
        "username":"gustavopinto"
    }
}

But, I'm thinking if projects and username are independent or related. For example, are rails and bootstrap projects created by gustavopinto? Or, in this file, I want to download rails and bootsrap and also download all projects created by gustavopinto?

@rodrigoalvesvieira
Copy link

For me, it'd mean: "download the 'rails' and 'bootstrap' projects from the user 'gustavopinto'". Anything else looks very confusing to me.

@gustavopinto
Copy link
Member

Ok. Another question: Are projects and username required? If the user do not pass the projects attribute, it will download all projects created by 'gustavopinto'? Or it simply does not work?

Moreover, could I pass more than one username?

@rodrigoalvesvieira
Copy link

It would work. Adding both projects and username is just a way of narrowing the search (diminishing the possibilities of results). Providing only projects should download them independently of the username and providing only username should return download all projects created by that user, as you mentioned.

@dnr2
Copy link
Member Author

dnr2 commented May 13, 2013

Agreed! I think the same way as @rodrigoalvesvieira. Nevertheless, we should consider that the user may want to make different kind of searches at once. e.g : I may want make searches about both (projects related to: groundhog created by the user: gustavopinto) AND (projects related to: bootstrap created by the user: dnr2). We could provide this functionality by modifying the structure of the JSON (possibly creating an array of searches), but this may become a bit complicated for the user.

@fernandocastor
Copy link
Member

I agree with @dnr2 in that we should provide some kind of operator for users to specify both ANDs and ORs. To me, the simplest answer would be to think about username and projects as specifying sets of projects and multiple items would always have an AND semantics for items within a search clause. For example:

"search": {
"projects": ["rails", "bootstrap"],
"username":"gustavopinto"
}

would mean "download projects rails and bootstrap created by the user gustavopinto". What if we want to download every project by user gustavopinto and, at the same time, projects named rails and bootstrap? We could specify two different search clauses in the same JSON file:

"search": {
"projects": ["rails", "bootstrap"],
}

"search": {
"username":"gustavopinto"
}

This would have an OR semantics, instead of AND and would get a considerably larger number of projects. A relevant question in this case is: what if there are multiple projects named "rails"? Do we download them all? Moreover, what other kinds of options are we interested in supporting? For example, do we need to support a search where the user wants to download only projects FORKED by the user "gustavopinto"? Would that be required to answer any of those RQs?

What do you think of this solution?

@gustavopinto
Copy link
Member

The last commit enables groundhog to use AND and OR (only thru json file) semantics. In the future we can add more parameters, such as is_fork or watchers.

@rodrigoalvesvieira
Copy link

whoa! 👍

@dnr2
Copy link
Member Author

dnr2 commented May 21, 2013

Cool!! =D

@dnr2
Copy link
Member Author

dnr2 commented Jun 23, 2013

@gustavopinto, Is this issue already implemented?

If the answer is yes, then we should close it...

@gustavopinto
Copy link
Member

This issue was labeled as 'continuous'. So, it may change during the groundhog evolution, and thus, we should keep it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants