Skip to content

Schema specification on table sources from SQL #54

@kuseman

Description

@kuseman

In some schema less tvf/tables it would be nice to hint about the coming schema on beforehand.

This have several pros.

  • We get a strong static query at compile time which can provide compile errors about missing columns etc.
  • Schema less tvfs/tables can be really optimized if the schema is known up front. Ex. A csv reader could skip to read a lot of data if only a handful columns are wanted.

A good syntax for this could be to have a special section inside table options Ie.

For tvfs this works good since the framework is prepared for this already to provide the Schema to be used in execute method

Select *
From http#query('http://') x
with
(
option = 123,
option.two = 'value'

schema (
column1 Int,
column2 String
)
)

.... but for tables it will be a bit weird since the schema is provided earlier I Catalog#getTableSchema. Could make this automatic by in the framework extract the schema from options and if the #getTableSchema returns a non empty schema throw exception since the table already has a static schema provided.

Should we allow all column types?
Maybe wait with array/table since than can be messy with recursivness.

Would be nice to build a couple of System-tvfs that can be accessed by catalogs for json/csv/xml that takes an input schema. Could be optimized, for example build a janino class that implements TupleVector etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions