Allow dataset_names to update without package re-release

Currently, metadata about the datasets like `df_summary` and `dataset_names` are static. This means each time a dataset is changed/added, we would need to re-release the package so everything is updated. I propose we make these into functions that fetch related metadata in real time (perhaps with a session cache). I already made this change in pmlbr https://github.com/EpistasisLab/pmlbr/pull/5 (new release coming on CRAN in a day or two) but I'll leave the python implementation for someone else with more expertise. 🙏🏽  @lacava  @weixuanfu 

https://github.com/EpistasisLab/pmlb/blob/7c1f4bdc00136dc2e55c87fa6b8ba6e8af6d1a68/pmlb/dataset_lists.py#L29-L32


	df_summary = pandas.read_csv(StringIO(data.decode("utf-8")) , sep='\t')
	regression_dataset_names = df_summary.query('task=="regression"')['dataset'].tolist()
	classification_dataset_names = df_summary.query('task=="classification"')['dataset'].tolist()
	dataset_names = regression_dataset_names + classification_dataset_names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow dataset_names to update without package re-release #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow dataset_names to update without package re-release #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions