-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
ext/datafusionRelates to the DataFusion integrationRelates to the DataFusion integration
Description
I don't think this statistic is currently used anywhere in the upstream so its hard for me to tell how they intended to use it.
The current rustdoc is:
/// Estimated size of this column's data in bytes for the output.
///
/// Note that this is not the same as the total bytes that may be scanned,
/// processed, etc.
///
/// E.g. we may read 1GB of data from a Parquet file but the Arrow data
/// the node produces may be 2GB; it's this 2GB that is tracked here.
///
/// Currently this is accurately calculated for primitive types only.
/// For complex types (like Utf8, List, Struct, etc), this value may be
/// absent or inexact (e.g. estimated from the size of the data in the source Parquet files).
///
/// This value is automatically scaled when operations like limits or
/// filters reduce the number of rows (see [`Statistics::with_fetch`]).
Personally happy to leave it absent or inexact until we have more clarity about that.
Originally posted by @AdamGS in #6309 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ext/datafusionRelates to the DataFusion integrationRelates to the DataFusion integration