Deduplicate instances and fix some cases.#155
Deduplicate instances and fix some cases.#155zx80 wants to merge 2 commits intosourcemeta-research:mainfrom
Conversation
|
@michaelmior Does this mean we should look for more instances now, or is it OK? |
|
@jviotti I'd like to know more about what was changed. There may be some cases where what was written wasn't what was intended, but at the same time, these are real documents found in the wild so I'd personally lean towards keeping any "errors." |
|
Also, as far as the duplicate instances go, these instances were collected independently as opposed to being duplicated on our end, so I'm personally not too concerned about it. Although it could be worth looking into the amount of duplication. |
My guess is that duplicate values come from file copies, not necessary two distinct projects which just happend to use the very same json value written independendently one from the other. The amount of duplication is 20%. |
They are listed in
I do not understand the logic of keeping broken values as well as broken schemas in a benchmark: for me the point of such tools is too show how fast json validation can be, not to show great performance figures because checks are silently skipped or validating obviously wrong values very quickly, as it does not serve the use case well. Anyway, if you want to keep them, I'll add an option to my scripts in order not to fix obvious user error when generating and compiling models. There is already one, but more is needed to reach 100% pass. |
This remove duplicate instances from test files and makes minor adjustments so that values are valid wrt to the schema intent, not only its letter.
The total number of remaining test cases is 28,188 instead of 35,407 initially.