Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing few features #1

Open
wants to merge 98 commits into
base: master
Choose a base branch
from

Conversation

kanishka-linux
Copy link

Following features, corresponding to json schema v4 have been implemented

  1. type
  2. anyOf
  3. required
  4. allOf
  5. enum
  6. minimum
  7. maximum
  8. pattern
  9. items
  10. minItems
  11. maxItems
  12. uniqItems
  13. minLength
  14. maxLength
  15. minProperties
  16. maxProperties
  17. additionalItems
  18. additionalProperties
  19. properties
  20. multipleOf

lib/jake.ex Outdated
_ ->
list
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have to do all these? can't we just use member_of, assuming the given schema is valid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the case when enum list for type number consists of null or string items along with regular numbers, then if we use only member_of without filtering first, then we may get wrong result, even if it is possible to get correct result.

You can also check out the test integer enum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed enum value would be always correct, I don't understand why would anybody use invalid values.

But let's say we follow the spec and validate the value against enclosing spec, we only seem to handle the type right? what if it's a complex case which has other constraints like maximum etc? Do you think this could be handled in a simpler way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed enum value would be always correct

In fact, I had also assumed the same, but later on looked at schema doc, which has mentioned that
the elements in the enum array should also be valid against the enclosing schema:, therefore, I decided to filter the enum list depending on the schema type.

But let's say we follow the spec and validate the value against enclosing spec, we only seem to handle the type right?

yes

what if it's a complex case which has other constraints like maximum etc? Do you think this could be handled in a simpler way?

I also thought about it, but it is difficult and hence handled only simple case. I think, validating particular value against schema is out of scope of this project and is a separate project in itself. For this we can use something like ExJsonSchema.Validator, in the beginning, to filter out valid enum values, which then can feed to StreamData.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly a check on the sanity of the schema itself? It complicates the code here. Probably a separate tool could be used to warn/show error to user, if the schema has wrong enclosing type for enums given. This library could assume that the schema is correct. And only concentrate on generating data for correct schema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library could assume that the schema is correct.

It will certainly simplify things.


def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])

def arraytype(map, enum, items) when enum != nil, do: Jake.gen_enum(enum, "array")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this happen? won't the enum get handle at the Jake module?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider schema,
{"type": "integer", "enum": [10,12, null]}
It will first check type and enter into Jake.Number module. If it finds, enum list there then it will be handled by it.

If schema is {"enum": [10,12, null]}, without mentioning type then it will be handled by Jake module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the pattern match for enum happens early? so this will never get executed?

  def gen_all(map, enum, type) when enum != nil, do: gen_enum(enum, type)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll move it lower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is it really necessary to handle enum in all type-specific generator? anyway it's going to call the root module? we could instead just remove the enum handling from all the type-specific generators?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking the same. I'll check it out!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could instead just remove the enum handling from all the type-specific generators?

It won't handle case like {"type":["integer", "string"], "enum": [1,2,3,"hello", null, true]}. Otherwise all other cases will work. In order to accept above case, we'll need to change/override Jake.gen_enum() method to handle list of types.

I'll see, if it can be done without complicating things.

for(n <- @type_list, is_map(n), do: Jake.gen_init(n)) |> StreamData.one_of()
end

def add_additional_items(list, bool, max, min) when is_boolean(bool) and bool do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the same as pattern matching bool = true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is same as bool == true, but gives much better idea about code, since bool can be map also, which is handled by another overloaded function.


def add_additional_items(list, bool, max, min) when is_boolean(bool) and not bool do
if length(list) in min..max do
StreamData.fixed_list(list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen on else? could it return nil

Copy link
Author

@kanishka-linux kanishka-linux Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have list, but additional items are not allowed and length(list) (i.e. total items) do not fit in the range then test will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this pattern in multiple places, my question is why are returning nil when we know it's going to fail anyway. Assuming the expected return type is a generator, returning nil here would cause some random error in some unrelated place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way, I look at it is that it is easier to understand what code is doing even after few years.

Assuming the expected return type is a generator, returning nil here would cause some random error in some unrelated place?

Yes, it is possible. May be we can use try..catch block or raise some custom error to make things better. I think, nil can also be used as error code in this case, which means no generator, and this case needs to be handled by caller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing error would be better. Nil being used as error is an assumption that would have to be remembered. Explicit is better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing error would be better

I'm also thinking the same.

end

def get_one_of() do
for(n <- @type_list, is_map(n), do: Jake.gen_init(n)) |> StreamData.one_of()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this reuse the notype generator

Copy link
Author

@kanishka-linux kanishka-linux Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In notype, a type will be added to the schema depending on the properties.

In get_one_of, it is not needed, since we already know type, here we using it to generate random values from random types, when no information is given about array. I think here, we can also use StreamData.term() . What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamData.term() might not generate valid json values, tuple for example. I was assuming notype should handle cases like empty spec {}, and would generate some valid value

StreamData.integer()
@num_min 0

@num_max 100_000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the default so low? The generator should ideally generate all possible value, assuming the cost of generation is not that high.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are just temporary, I also wanted to discuss about possible limits. Cost of generation is not high, except for floating point multiple, which I'll look into afterwards.

end

def get_float_number(min, max) do
StreamData.filter(StreamData.float([{:min, min}, {:max, max}]), fn x -> true end, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use filter here? seems to return true always

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For smaller floating point ranges, it was throwing FilterTooNarrowError. Therefore, I'm using filter only for the sake of keep on trying for 100 times.

If there is any other way to keep on trying even after failure, then we can change that option.

Copy link
Contributor

@ananthakumaran ananthakumaran Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter & bind_filter should be avoided if possible or should be used in cases where we know we will only filter a small number of values generated by the underlying filter

I might be misunderstanding the problem. Why would StreamData.float throw FilterTooNarowError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I made mistake while testing. It won't throw FilterTooNarrowError. I was using different method earlier which was giving that error. After, studying StreamData properly, and then using :min, :max with StreamData.float, I did not get that error. I'll remove unnecessary filter.

re = Randex.stream(~r/[a-zA-Z0-9\_]{#{min},#{max}}/)

if min <= max do
StreamData.bind_filter(StreamData.string(:alphanumeric), fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use min_length/max_length instead of bind_filter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was written when I did not read StreamData.string method properly. Later on I kept it, as it is just to know your opinion on the above manual method.

{:cont, StreamData.constant(x)}

x when true ->
{:cont, Enum.take(re, 1) |> StreamData.member_of()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return the same value every time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, when I checked, it was returning different value every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but it's not supposed to be used like this. you could just use Randex.stream(~r/[a-zA-Z0-9\_]{#{min},#{max}}/, mod: Randex.Generator.StreamData), because randex itself just returns a generator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not supposed to be used like this

I've also realised it
In the beginning, I was somewhat confused about how to convert Randex.stream into StreamData. Later I came to know about it when I looked at activesphere/jake implementation. But as I said in the previous post, I wanted to know your opinion on it.
I'll change the method.


if min <= max do
StreamData.bind_filter(pat, fn
x when byte_size(x) in min..max -> {:cont, StreamData.constant(x)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the different between bind_filter and filter? is bind_filter necessary here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind_filter is not necessary here, one can do it with just filter.

lib/jake.ex Outdated
_ ->
list
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly a check on the sanity of the schema itself? It complicates the code here. Probably a separate tool could be used to warn/show error to user, if the schema has wrong enclosing type for enums given. This library could assume that the schema is correct. And only concentrate on generating data for correct schema.

{min, max} = get_min_max(map)

case map["additionalItems"] do
x when (is_boolean(x) and x) or is_nil(x) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

      x when is_map(x) ->
        generate_list(list, Jake.gen_init(map), max, min)
      x when x == true or x == nil -> 
        generate_list(list, get_one_of(), max, min)
      _ -> 
        unless length(list) in min..max, do: raise ...
        StreamData.fixed_list(list)

Avoid extra indirection? Otherwise it becomes difficult to understand the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better. I'll apply necessary changes.


def add_additional_items(list, bool, max, min) when is_boolean(bool) and not bool do
if length(list) in min..max do
StreamData.fixed_list(list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing error would be better. Nil being used as error is an assumption that would have to be remembered. Explicit is better.

when length(list) in min..max ->
{:cont, StreamData.constant(list)}

nlist when true ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nlist when true ->
_ -> :skip

Won't that be enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that be enough?

certainly

end

def decide_min_max(map, item, min, max)
when is_integer(min) and is_integer(max) and min < max do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this doesn't match? If it should throw error, then being explicit about it might be better.

{min, max}
end

def arraytype(map, enum, items) when is_nil(items) and is_nil(enum) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly could be combined with the previous arraytype definition (Assuming we are handling enum like one of the previous comments).

def list_gen(map, items) when is_nil(items) or is_map(items) do
  item = if is_nil(items), do: get_on_of(), else: Jake.gen_init(items)
  {min, max} = get_min_max(map)
  decide_min_max(map, item, min, max)
end


@max_items 1000

def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])
Copy link
Member

@ciju ciju Nov 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can name things by what they are about. E.g. Array could be about list or typle. The code seems to handle both cases, but probably the cases could be named better?

E.g. Rename arraytype methods to either list_gen or tuple_gen based on the case handled.

def gen_array(%{"items": items}=map, _) do
  cond do
    is_map(items) or items == nil -> list_gen(map, items) # note: not passing map["enum"], as mentioned in previous comment
    is_array(items) -> tuple_gen(map, items)
    _ -> raise error # ?
  end
end
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about naming things properly. It will avoid unnecessary ambiguity .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants