Conversation
| } | ||
|
|
||
| rangeSQL := fmt.Sprintf( | ||
| "SELECT min(%[1]s) as `min`, max(%[1]s) as `max`, %[2]s as `watermark` FROM %[3]s %[4]s", |
There was a problem hiding this comment.
This is not an efficient query even when running on partition column
There was a problem hiding this comment.
An optimization can be done where we check if this is the partition column in the table and directly check on min/max partition metadata.
Given this is an often executed query I think it can done in a follow-up. @begelundmuller thoughts ?
There was a problem hiding this comment.
If the optimization can be done in a fast/cheap/safe way, then yeah it sounds good to me
| // MaxBytesBilled is the maximum number of bytes billed for a query. This is a safety mechanism to prevent accidentally running large queries. | ||
| // Set this to -1 for project defaults. | ||
| // Only applies to dashboard queries and does not apply when ingesting data from BigQuery into Rill. | ||
| MaxBytesBilled int64 `mapstructure:"max_bytes_billed"` |
There was a problem hiding this comment.
Is it normal for BI tools to offer this for BigQuery? If we do, I think it should be much higher than 10 GB by default (which at the high usage-based list price is only $0.0625).
Another danger of setting a default limit is that it may break existing projects that use a BigQuery query for partition discovery.
| ctx, cancel := context.WithTimeout(ctx, drivers.DefaultQuerySchemaTimeout) | ||
| defer cancel() | ||
|
|
||
| res, err := c.Query(ctx, &drivers.Statement{ |
There was a problem hiding this comment.
Did you check if BigQuery's dry run feature can be used for this? IIRC they have some of the best dry run support of any database.
| if d == DialectBigQuery && isFullJoin { | ||
| // BigQuery requires plain equality for FULL joins | ||
| // TODO: find a better way to handle this | ||
| return fmt.Sprintf("coalesce(CAST(%s AS STRING), '__rill_sentinel__') = coalesce(CAST(%s AS STRING), '__rill_sentinel__')", lhs, rhs) |
There was a problem hiding this comment.
Does this TODO need to be addressed?
(Since optimizing small things like this for BigQuery perhaps doesn't really make sense, consider removing the isFullJoin param and just using this approach for the BigQuery dialect for any join type)
| case DialectBigQuery: | ||
| // BigQuery uses UNION ALL for generating time series |
There was a problem hiding this comment.
Did you check if it can use BigQuery's RANGE function? https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/range-functions#range
| // BigQuery converts time.Time type to TIMESTAMP which is not compatible with DATE type dimensions | ||
| // so we need to convert it back to civil.Date if the dimension type is DATE | ||
| // TODO: remove conversion of civil.Date in the rill driver and handle it wherever required and remove this conversion here | ||
| if result.Schema.Fields[0].Type.Code == runtimev1.Type_CODE_DATE { | ||
| if t, ok := dimVal.(time.Time); ok { | ||
| dimVal = civil.DateOf(t) | ||
| } | ||
| } |
There was a problem hiding this comment.
Does this TODO need to be addressed?
Did you consider moving this conversion into the BigQuery driver (i.e. iterate over the args passed to Query and Exec?)
| // Store the time dimension's data type so it's available for downstream queries (e.g. Schema validation). | ||
| e.metricsView.Dimensions = append(e.metricsView.Dimensions, &runtimev1.MetricsViewSpec_Dimension{ | ||
| Name: e.metricsView.TimeDimension, |
There was a problem hiding this comment.
My previous comment applies here as well. I thought we already had this
| if d == DialectBigQuery && typeCode == runtimev1.Type_CODE_DATE { | ||
| return "DATE(?)" | ||
| } |
There was a problem hiding this comment.
Is this also needed if you cast to civil in Query? Or is there any other way we could push this into the driver and avoid knowing the time type in advance?
| ctx := t.Context() | ||
|
|
||
| _, currentFile, _, _ := goruntime.Caller(0) | ||
| projectPath := filepath.Join(currentFile, "..", "testdata", "ad_bids_bigquery") |
There was a problem hiding this comment.
The testruntime/testdata directory is only used for legacy tests. Is it really necessary here? Or can the same be accomplished with a normal NewInstanceWithOptions call like we use for new tests that defines the file(s) needed for the specific test inline?
|
|
||
| inst := &drivers.Instance{ | ||
| Environment: "test", | ||
| OLAPConnector: "bigquery", |
There was a problem hiding this comment.
The hard-coded OLAP connector option predates the time when we supported changing the OLAP connector in rill.yaml. It shouldn't be necessary in new tests – can you just put olap_connector: bigquery in the test's rill.yaml?
| @@ -180,33 +181,157 @@ func (q *TableHead) generalExport(ctx context.Context, rt *runtime.Runtime, inst | |||
| } | |||
|
|
|||
| func (q *TableHead) buildTableHeadSQL(ctx context.Context, olap drivers.OLAPStore) (string, error) { | |||
There was a problem hiding this comment.
It seems like there's a huge complexity increase in this function. Two questions:
- We don't run
TableHeadvery often, so is it necessary to optimize it so hard? In general, I would assume people who connect a BI tool to a data warehouse are fine with aSELECT * FROM tbl LIMIT 100query being run. - If it really is necessary, is it possible to combine it into one nested query and push it into the dialect somehow?
closes https://linear.app/rilldata/issue/PLAT-450/metrics-views-on-bigquery
Added
TODOs to be done with follow ups:
civil.Datetotime.Timein the rill driver and handle it wherever requiredChecklist: