Implementing few features #1

kanishka-linux · 2018-11-12T18:33:42Z

Following features, corresponding to json schema v4 have been implemented

type
anyOf
required
allOf
enum
minimum
maximum
pattern
items
minItems
maxItems
uniqItems
minLength
maxLength
minProperties
maxProperties
additionalItems
additionalProperties
properties
multipleOf

ananthakumaran · 2018-11-14T11:32:01Z

lib/jake.ex

+        _ ->
+          list
+      end
+


why do we have to do all these? can't we just use member_of, assuming the given schema is valid?

Consider the case when enum list for type number consists of null or string items along with regular numbers, then if we use only member_of without filtering first, then we may get wrong result, even if it is possible to get correct result.

You can also check out the test integer enum

I had assumed enum value would be always correct, I don't understand why would anybody use invalid values.

But let's say we follow the spec and validate the value against enclosing spec, we only seem to handle the type right? what if it's a complex case which has other constraints like maximum etc? Do you think this could be handled in a simpler way?

I had assumed enum value would be always correct

In fact, I had also assumed the same, but later on looked at schema doc, which has mentioned that
the elements in the enum array should also be valid against the enclosing schema:, therefore, I decided to filter the enum list depending on the schema type.

But let's say we follow the spec and validate the value against enclosing spec, we only seem to handle the type right?

yes

what if it's a complex case which has other constraints like maximum etc? Do you think this could be handled in a simpler way?

I also thought about it, but it is difficult and hence handled only simple case. I think, validating particular value against schema is out of scope of this project and is a separate project in itself. For this we can use something like ExJsonSchema.Validator, in the beginning, to filter out valid enum values, which then can feed to StreamData.

This is possibly a check on the sanity of the schema itself? It complicates the code here. Probably a separate tool could be used to warn/show error to user, if the schema has wrong enclosing type for enums given. This library could assume that the schema is correct. And only concentrate on generating data for correct schema.

This library could assume that the schema is correct.

It will certainly simplify things.

ananthakumaran · 2018-11-14T11:33:51Z

lib/jake/array.ex

+
+  def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])
+
+  def arraytype(map, enum, items) when enum != nil, do: Jake.gen_enum(enum, "array")


could this happen? won't the enum get handle at the Jake module?

Consider schema,
{"type": "integer", "enum": [10,12, null]}
It will first check type and enter into Jake.Number module. If it finds, enum list there then it will be handled by it.

If schema is {"enum": [10,12, null]}, without mentioning type then it will be handled by Jake module.

but the pattern match for enum happens early? so this will never get executed?

def gen_all(map, enum, type) when enum != nil, do: gen_enum(enum, type)

Good catch! I'll move it lower.

btw, is it really necessary to handle enum in all type-specific generator? anyway it's going to call the root module? we could instead just remove the enum handling from all the type-specific generators?

I was also thinking the same. I'll check it out!

we could instead just remove the enum handling from all the type-specific generators?

It won't handle case like {"type":["integer", "string"], "enum": [1,2,3,"hello", null, true]}. Otherwise all other cases will work. In order to accept above case, we'll need to change/override Jake.gen_enum() method to handle list of types.

I'll see, if it can be done without complicating things.

ananthakumaran · 2018-11-14T11:37:26Z

lib/jake/array.ex

+    for(n <- @type_list, is_map(n), do: Jake.gen_init(n)) |> StreamData.one_of()
+  end
+
+  def add_additional_items(list, bool, max, min) when is_boolean(bool) and bool do


is this the same as pattern matching bool = true?

yes, it is same as bool == true, but gives much better idea about code, since bool can be map also, which is handled by another overloaded function.

ananthakumaran · 2018-11-14T11:38:14Z

lib/jake/array.ex

+
+  def add_additional_items(list, bool, max, min) when is_boolean(bool) and not bool do
+    if length(list) in min..max do
+      StreamData.fixed_list(list)


what will happen on else? could it return nil

If we have list, but additional items are not allowed and length(list) (i.e. total items) do not fit in the range then test will fail.

I have seen this pattern in multiple places, my question is why are returning nil when we know it's going to fail anyway. Assuming the expected return type is a generator, returning nil here would cause some random error in some unrelated place?

One way, I look at it is that it is easier to understand what code is doing even after few years.

Assuming the expected return type is a generator, returning nil here would cause some random error in some unrelated place?

Yes, it is possible. May be we can use try..catch block or raise some custom error to make things better. I think, nil can also be used as error code in this case, which means no generator, and this case needs to be handled by caller.

Throwing error would be better. Nil being used as error is an assumption that would have to be remembered. Explicit is better.

Throwing error would be better

I'm also thinking the same.

ananthakumaran · 2018-11-14T11:41:50Z

lib/jake/array.ex

+  end
+
+  def get_one_of() do
+    for(n <- @type_list, is_map(n), do: Jake.gen_init(n)) |> StreamData.one_of()


could this reuse the notype generator

No. In notype, a type will be added to the schema depending on the properties.

In get_one_of, it is not needed, since we already know type, here we using it to generate random values from random types, when no information is given about array. I think here, we can also use StreamData.term() . What do you think?

StreamData.term() might not generate valid json values, tuple for example. I was assuming notype should handle cases like empty spec {}, and would generate some valid value

ananthakumaran · 2018-11-14T11:48:02Z

lib/jake/number.ex

-    StreamData.integer()
+  @num_min 0
+
+  @num_max 100_000


why is the default so low? The generator should ideally generate all possible value, assuming the cost of generation is not that high.

They are just temporary, I also wanted to discuss about possible limits. Cost of generation is not high, except for floating point multiple, which I'll look into afterwards.

ananthakumaran · 2018-11-14T11:49:13Z

lib/jake/number.ex

+  end
+
+  def get_float_number(min, max) do
+    StreamData.filter(StreamData.float([{:min, min}, {:max, max}]), fn x -> true end, 100)


why do we use filter here? seems to return true always

For smaller floating point ranges, it was throwing FilterTooNarrowError. Therefore, I'm using filter only for the sake of keep on trying for 100 times.

If there is any other way to keep on trying even after failure, then we can change that option.

~~filter & bind_filter should be avoided if possible or should be used in cases where we know we will only filter a small number of values generated by the underlying filter~~

I might be misunderstanding the problem. Why would StreamData.float throw FilterTooNarowError

It seems I made mistake while testing. It won't throw FilterTooNarrowError. I was using different method earlier which was giving that error. After, studying StreamData properly, and then using :min, :max with StreamData.float, I did not get that error. I'll remove unnecessary filter.

ananthakumaran · 2018-11-14T11:54:39Z

lib/jake/string.ex

+    re = Randex.stream(~r/[a-zA-Z0-9\_]{#{min},#{max}}/)
+
+    if min <= max do
+      StreamData.bind_filter(StreamData.string(:alphanumeric), fn


why not use min_length/max_length instead of bind_filter?

It was written when I did not read StreamData.string method properly. Later on I kept it, as it is just to know your opinion on the above manual method.

ananthakumaran · 2018-11-14T11:55:42Z

lib/jake/string.ex

+          {:cont, StreamData.constant(x)}
+
+        x when true ->
+          {:cont, Enum.take(re, 1) |> StreamData.member_of()}


This will return the same value every time?

No, when I checked, it was returning different value every time.

ok, but it's not supposed to be used like this. you could just use Randex.stream(~r/[a-zA-Z0-9\_]{#{min},#{max}}/, mod: Randex.Generator.StreamData), because randex itself just returns a generator

it's not supposed to be used like this

I've also realised it
In the beginning, I was somewhat confused about how to convert Randex.stream into StreamData. Later I came to know about it when I looked at activesphere/jake implementation. But as I said in the previous post, I wanted to know your opinion on it.
I'll change the method.

ananthakumaran · 2018-11-14T11:56:59Z

lib/jake/string.ex

+
+    if min <= max do
+      StreamData.bind_filter(pat, fn
+        x when byte_size(x) in min..max -> {:cont, StreamData.constant(x)}


what is the different between bind_filter and filter? is bind_filter necessary here?

I'll check this out!

bind_filter is not necessary here, one can do it with just filter.

ciju · 2018-11-18T06:32:18Z

lib/jake.ex

+        _ ->
+          list
+      end
+


This is possibly a check on the sanity of the schema itself? It complicates the code here. Probably a separate tool could be used to warn/show error to user, if the schema has wrong enclosing type for enums given. This library could assume that the schema is correct. And only concentrate on generating data for correct schema.

ciju · 2018-11-18T06:47:39Z

lib/jake/array.ex

+    {min, max} = get_min_max(map)
+
+    case map["additionalItems"] do
+      x when (is_boolean(x) and x) or is_nil(x) ->


How about

x when is_map(x) -> generate_list(list, Jake.gen_init(map), max, min) x when x == true or x == nil -> generate_list(list, get_one_of(), max, min) _ -> unless length(list) in min..max, do: raise ... StreamData.fixed_list(list)

Avoid extra indirection? Otherwise it becomes difficult to understand the code.

This is much better. I'll apply necessary changes.

ciju · 2018-11-18T06:56:32Z

lib/jake/array.ex

+
+  def add_additional_items(list, bool, max, min) when is_boolean(bool) and not bool do
+    if length(list) in min..max do
+      StreamData.fixed_list(list)


Throwing error would be better. Nil being used as error is an assumption that would have to be remembered. Explicit is better.

ciju · 2018-11-18T07:12:34Z

lib/jake/array.ex

+          when length(list) in min..max ->
+            {:cont, StreamData.constant(list)}
+
+          nlist when true ->


Suggested change

nlist when true ->

_ -> :skip

Won't that be enough?

Won't that be enough?

certainly

ciju · 2018-11-18T07:19:04Z

lib/jake/array.ex

+  end
+
+  def decide_min_max(map, item, min, max)
+      when is_integer(min) and is_integer(max) and min < max do


What happens if this doesn't match? If it should throw error, then being explicit about it might be better.

ciju · 2018-11-18T07:30:39Z

lib/jake/array.ex

+    {min, max}
+  end
+
+  def arraytype(map, enum, items) when is_nil(items) and is_nil(enum) do


Possibly could be combined with the previous arraytype definition (Assuming we are handling enum like one of the previous comments).

def list_gen(map, items) when is_nil(items) or is_map(items) do item = if is_nil(items), do: get_on_of(), else: Jake.gen_init(items) {min, max} = get_min_max(map) decide_min_max(map, item, min, max) end

ciju · 2018-11-18T07:42:51Z

lib/jake/array.ex

+
+  @max_items 1000
+
+  def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])


May be we can name things by what they are about. E.g. Array could be about list or typle. The code seems to handle both cases, but probably the cases could be named better?

E.g. Rename arraytype methods to either list_gen or tuple_gen based on the case handled.

def gen_array(%{"items": items}=map, _) do cond do is_map(items) or items == nil -> list_gen(map, items) # note: not passing map["enum"], as mentioned in previous comment is_array(items) -> tuple_gen(map, items) _ -> raise error # ? end end end

I agree about naming things properly. It will avoid unnecessary ambiguity .

kanishka-linux added 11 commits November 12, 2018 23:07

add more tests

ba18332

modify Jake module

c3c6789

handle type key when merging

0907115

modify Array module

8334625

add separate module for cases when type not present

d7cd346

handle multipleOf property

63a16bf

modify Object module

7f57601

modify String module

61ab36b

remove files

693fc77

add and modify dependencies

42a1bed

add comment

57923ba

ananthakumaran reviewed Nov 14, 2018

View reviewed changes

kanishka-linux added 15 commits November 15, 2018 03:17

lower order of gen_all for enum

47485ad

modify getmultipleof and get_float_number

5313a2b

remove usage of bind_filter from stringer

c464a02

use deep_merge again in in allOf gen_init

e2cf5f7

handle map item and list items separately

d302482

add separate get_min_max method to fix exclusiveMaximum feature

0d61b1e

fix test_generator method

574d7a9

add rudimentary support for patternProperties

4d8c737

add patternProperties to prop map

9bdc74a

add dependencies support for object

87fae25

remove unnecessary line

6887e97

add tests for object dependencies

f4abe43

add dependencies key to prop map

9a4dc7f

handle dependencies when property is null

ee27150

handle test cases of dependencies.json

e598f80

ciju reviewed Nov 18, 2018

View reviewed changes

kanishka-linux added 2 commits November 18, 2018 19:39

add default.json

da4e3fa

separate tuple and list generation methods

b3803e9

kanishka-linux added 30 commits January 3, 2019 17:09

remove debug info

1c8b801

make size double initially

a145b7e

use bind in gen_init

8a9a80d

remove debug info

a4fd93d

format code

7dfcff3

add separate get_lazy_streamkey function

def8fde

obtain tuple when expanding ref

84e40bc

return whether ref exists or not along with map

762da75

check ref key if type not present

ec064bf

check whether ref points to root or not

6d3cb88

modify get_lazy_streamkey

622947e

fix not property

3dfb63d

remove checking ref

75c974c

add separate expand_ref for root ref

d31413d

rewrite filter for debugging

6fa2978

modify complex recursive ref test

0068f16

add few more tests for checking "not" property

0ad2def

remove check_ref_string and related functions

2dcc315

remove check_ref parameter from expand_ref

3383b23

fix typo

dc103d3

remove unused variables warning

4acc525

mix format

4a63d2a

remove omap and size parameters from gen_object

3b5f6f1

remove omap and size parameters from gen_number

1ca1c20

remove size parameter from gen_string

7a86668

remove omap and size parameters from gen_array

b025f98

remove omap and size parameters from gen_mixed

79adbb5

remove omap and size parameters from gen_notype

7baa611

mix format

a2fb44a

pass single schema parameter to gen_init

f8dc911


		def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])

		def arraytype(map, enum, items) when enum != nil, do: Jake.gen_enum(enum, "array")


		@max_items 1000

		def gen_array(map, type), do: arraytype(map, map["enum"], map["items"])

Implementing few features #1

Are you sure you want to change the base?

Implementing few features #1

Conversation

kanishka-linux commented Nov 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanishka-linux Nov 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanishka-linux Nov 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ananthakumaran Nov 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciju Nov 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanishka-linux Nov 14, 2018 •

edited

Loading

kanishka-linux Nov 14, 2018 •

edited

Loading

ananthakumaran Nov 14, 2018 •

edited

Loading

ciju Nov 18, 2018 •

edited

Loading