In some schema less tvf/tables it would be nice to hint about the coming schema on beforehand.
This have several pros.
- We get a strong static query at compile time which can provide compile errors about missing columns etc.
- Schema less tvfs/tables can be really optimized if the schema is known up front. Ex. A csv reader could skip to read a lot of data if only a handful columns are wanted.
A good syntax for this could be to have a special section inside table options Ie.
For tvfs this works good since the framework is prepared for this already to provide the Schema to be used in execute method
Select *
From http#query('http://') x
with
(
option = 123,
option.two = 'value'
schema (
column1 Int,
column2 String
)
)
.... but for tables it will be a bit weird since the schema is provided earlier I Catalog#getTableSchema. Could make this automatic by in the framework extract the schema from options and if the #getTableSchema returns a non empty schema throw exception since the table already has a static schema provided.
Should we allow all column types?
Maybe wait with array/table since than can be messy with recursivness.
Would be nice to build a couple of System-tvfs that can be accessed by catalogs for json/csv/xml that takes an input schema. Could be optimized, for example build a janino class that implements TupleVector etc.
In some schema less tvf/tables it would be nice to hint about the coming schema on beforehand.
This have several pros.
A good syntax for this could be to have a special section inside table options Ie.
For tvfs this works good since the framework is prepared for this already to provide the Schema to be used in execute method
Select *
From http#query('http://') x
with
(
option = 123,
option.two = 'value'
schema (
column1 Int,
column2 String
)
)
.... but for tables it will be a bit weird since the schema is provided earlier I Catalog#getTableSchema. Could make this automatic by in the framework extract the schema from options and if the #getTableSchema returns a non empty schema throw exception since the table already has a static schema provided.
Should we allow all column types?
Maybe wait with array/table since than can be messy with recursivness.
Would be nice to build a couple of System-tvfs that can be accessed by catalogs for json/csv/xml that takes an input schema. Could be optimized, for example build a janino class that implements TupleVector etc.