Command line arguments from json file #33

dnr2 · 2013-05-07T23:40:49Z

We started implementing in issue #3 the -in option which allowed the user to specify a json file containing the name of the projects that were going to be downloaded and analysed by Groundhog. Now we are going to improve this -in feature in a way that the json file will provide not only the name of the projects, but also the major arguments that would be passed through command line.

We will have to define the structure of the json file, as well as decide which arguments it will contain. The list of current arguments are (extracted from the Option class):

Option(name="-forge", usage="forge to be used in search and crawling process")
Option(name="-dest", usage="destination folder into which projects will be downloaded") 
Option(name="-out", usage="output folder to metrics files")
Option(name="-datetime", usage="datetime of projects source code to be processed")
Option(name="-nprojects", usage="maximum number of projects to be downloaded and processed")
Option(name="-nthreads", usage="maximum number of concurrent threads")  
Option(name="-o", usage="determine the output format of the metrics")   
Arguments( usage="list of names of the projects to be downloaded and processed" )

Therefore I believe that a good structure to the json file would be more or less like:

{
    "forge": "github",
    "dest": "C:/groundhog/dest",
    "out": "C:/groundhog/metrics",
    "datetime": "2012-07-01_12_00",
    "nprojects": 30,
    "nthreads": 4,
    "outputformat": "csv",
    "search": [
        { "project":"rails" },
        { "project":"bootstrap" },
        { "username":"gustavopinto" }
    ]
}

P.S.: @rodrigoalvesvieira argued that it would be better to omit some arguments such as "nthreads" (only allow it in the command line itself) because this json file should only provide information concerning the projects and the searching, not the details of the computation, but we can discuss his point of view.

gustavopinto · 2013-05-08T01:17:29Z

Sorry, maybe I missed the point. Is this json the file that will be provided to java -jar groundhog.jar ... -in projects.json?

If so, will be user responsibility to create this file? Or groundhog will, somehow, create this file? I think, for me, it will be very difficult to create a file like this by hand, which could be an adoption barrier of groundhog.

❓

dnr2 · 2013-05-08T02:40:20Z

Yes @gustavopinto, that was the initial idea. We thought that in the future, or even now, groundhog may require too many parameters to be passed through command line and that it would be tedious for the user to write each single parameter every time they open a new console/terminal. Nevertheless, some terminals limit the size of the command line parameters [1] , (although the original -in option was already solving this). The json file would be a solution for these problems.

Another advantage is that whenever we want to create a new parameter that may take many arguments (like searching for projects by usernames) it would be easy to adapt this json file to the new requirements.

I also think that this json wouldn't be so difficult to create/understand (we could also provide a sample json file). Besides, the user will still be able to use the traditional command line parameters, so anyone that is not familiar with json or the -in input format will still be capable of using groundhog.

But I understand that this format is not so user friendly. So we could change it or, maybe, discard this idea and close this issue. What you guys think? @fernandocastor, @rodrigoalvesvieira

[1] http://askubuntu.com/questions/14081/what-is-the-maximum-length-of-command-line-arguments-in-gnome-terminal

fernandocastor · 2013-05-08T10:21:24Z

I think this is, at least currently, our best option to specify the search parameters. Your arguments just strengthened this impression, @dnr2. I don't think using json is an obstacle as it would be if we employed XML or a domain-specific language. As for the parameters, we should focus on the ones we already know and create our system so that it is extensible.

gustavopinto · 2013-05-08T12:00:28Z

hmm.. ok! great arguments! It really change my opinion 😉

gustavopinto · 2013-05-10T00:46:59Z

Did you have already started the implementation of this issue?

rodrigoalvesvieira · 2013-05-10T03:22:54Z

he has https://github.com/spgroup/groundhog/tree/ft-metrics-output-csv

dnr2 · 2013-05-10T03:28:16Z

Not yet @gustavopinto, I normally assign myself to an issue whenever I start implementing it.

rodrigoalvesvieira · 2013-05-10T03:54:50Z

oops

gustavopinto · 2013-05-12T02:28:13Z

I changed a bit the json format in the search attribute.

{
    "forge": "github",
    "dest": "C:/groundhog/dest",
    "out": "C:/groundhog/metrics",
    "datetime": "2012-07-01_12_00",
    "nprojects": 30,
    "nthreads": 4,
    "outputformat": "csv",
    "search": {
        "projects": ["rails", "bootstrap"],
        "username":"gustavopinto"
    }
}

But, I'm thinking if projects and username are independent or related. For example, are rails and bootstrap projects created by gustavopinto? Or, in this file, I want to download rails and bootsrap and also download all projects created by gustavopinto?

rodrigoalvesvieira · 2013-05-12T13:59:39Z

For me, it'd mean: "download the 'rails' and 'bootstrap' projects from the user 'gustavopinto'". Anything else looks very confusing to me.

gustavopinto · 2013-05-12T15:10:51Z

Ok. Another question: Are projects and username required? If the user do not pass the projects attribute, it will download all projects created by 'gustavopinto'? Or it simply does not work?

Moreover, could I pass more than one username?

rodrigoalvesvieira · 2013-05-12T15:58:48Z

It would work. Adding both projects and username is just a way of narrowing the search (diminishing the possibilities of results). Providing only projects should download them independently of the username and providing only username should return download all projects created by that user, as you mentioned.

dnr2 · 2013-05-13T03:59:04Z

Agreed! I think the same way as @rodrigoalvesvieira. Nevertheless, we should consider that the user may want to make different kind of searches at once. e.g : I may want make searches about both (projects related to: groundhog created by the user: gustavopinto) AND (projects related to: bootstrap created by the user: dnr2). We could provide this functionality by modifying the structure of the JSON (possibly creating an array of searches), but this may become a bit complicated for the user.

fernandocastor · 2013-05-13T23:51:47Z

I agree with @dnr2 in that we should provide some kind of operator for users to specify both ANDs and ORs. To me, the simplest answer would be to think about username and projects as specifying sets of projects and multiple items would always have an AND semantics for items within a search clause. For example:

"search": {
"projects": ["rails", "bootstrap"],
"username":"gustavopinto"
}

would mean "download projects rails and bootstrap created by the user gustavopinto". What if we want to download every project by user gustavopinto and, at the same time, projects named rails and bootstrap? We could specify two different search clauses in the same JSON file:

"search": {
"projects": ["rails", "bootstrap"],
}

"search": {
"username":"gustavopinto"
}

This would have an OR semantics, instead of AND and would get a considerably larger number of projects. A relevant question in this case is: what if there are multiple projects named "rails"? Do we download them all? Moreover, what other kinds of options are we interested in supporting? For example, do we need to support a search where the user wants to download only projects FORKED by the user "gustavopinto"? Would that be required to answer any of those RQs?

What do you think of this solution?

… have to be done on #33

gustavopinto · 2013-05-19T03:19:56Z

The last commit enables groundhog to use AND and OR (only thru json file) semantics. In the future we can add more parameters, such as is_fork or watchers.

rodrigoalvesvieira · 2013-05-20T13:26:29Z

whoa! 👍

dnr2 · 2013-05-21T04:32:27Z

Cool!! =D

dnr2 · 2013-06-23T06:23:21Z

@gustavopinto, Is this issue already implemented?

If the answer is yes, then we should close it...

gustavopinto · 2013-06-23T22:48:57Z

This issue was labeled as 'continuous'. So, it may change during the groundhog evolution, and thus, we should keep it open.

ghost assigned gustavopinto May 12, 2013

gustavopinto pushed a commit that referenced this issue May 12, 2013

importing json file from command line. related to #33

d629494

dnr2 mentioned this issue May 13, 2013

-in option to search from file #3

Closed

gustavopinto pushed a commit that referenced this issue May 17, 2013

using args4j as default. working on #33

8b51012

gustavopinto pushed a commit that referenced this issue May 17, 2013

initializing groundhog parameters thru json file. a lot of work still…

ffbd6f9

… have to be done on #33

gustavopinto pushed a commit that referenced this issue May 17, 2013

improving log messages. #33 and #6

8414def

gustavopinto pushed a commit that referenced this issue May 17, 2013

few changes in the code. #33 and #6

7a9149f

gustavopinto pushed a commit that referenced this issue May 17, 2013

with method to gather projects list by user. working on #33

2464e8b

gustavopinto pushed a commit that referenced this issue May 19, 2013

merging from master.. still working on #33

5002628

gustavopinto pushed a commit that referenced this issue May 19, 2013

download projects from a giver user. almost done. rel #33

7abcafb

gustavopinto pushed a commit that referenced this issue Jun 5, 2013

adding tests and improving code quality. rel #33

46ef0f5

dnr2 mentioned this issue Jun 23, 2013

Download projects by username #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command line arguments from json file #33

Command line arguments from json file #33

dnr2 commented May 7, 2013

gustavopinto commented May 8, 2013

dnr2 commented May 8, 2013

fernandocastor commented May 8, 2013

gustavopinto commented May 8, 2013

gustavopinto commented May 10, 2013

rodrigoalvesvieira commented May 10, 2013

dnr2 commented May 10, 2013

rodrigoalvesvieira commented May 10, 2013

gustavopinto commented May 12, 2013

rodrigoalvesvieira commented May 12, 2013

gustavopinto commented May 12, 2013

rodrigoalvesvieira commented May 12, 2013

dnr2 commented May 13, 2013

fernandocastor commented May 13, 2013

gustavopinto commented May 19, 2013

rodrigoalvesvieira commented May 20, 2013

dnr2 commented May 21, 2013

dnr2 commented Jun 23, 2013

gustavopinto commented Jun 23, 2013

Command line arguments from json file #33

Command line arguments from json file #33

Comments

dnr2 commented May 7, 2013

gustavopinto commented May 8, 2013

dnr2 commented May 8, 2013

fernandocastor commented May 8, 2013

gustavopinto commented May 8, 2013

gustavopinto commented May 10, 2013

rodrigoalvesvieira commented May 10, 2013

dnr2 commented May 10, 2013

rodrigoalvesvieira commented May 10, 2013

gustavopinto commented May 12, 2013

rodrigoalvesvieira commented May 12, 2013

gustavopinto commented May 12, 2013

rodrigoalvesvieira commented May 12, 2013

dnr2 commented May 13, 2013

fernandocastor commented May 13, 2013

gustavopinto commented May 19, 2013

rodrigoalvesvieira commented May 20, 2013

dnr2 commented May 21, 2013

dnr2 commented Jun 23, 2013

gustavopinto commented Jun 23, 2013