Name	Name	Last commit message	Last commit date
parent directory ..
Example	Example
FeatureExtractor	FeatureExtractor
Plotly	Plotly
WebTable	WebTable
CDLA-Permissive-2.0.md	CDLA-Permissive-2.0.md
README.md	README.md

Name

Last commit message

Last commit date

CDLA-Permissive-2.0.md

Data

In addition to our Excel chart corpus, we use public Plotly corpus to evaluate baselines and HTML tables (crawled from the public web) to do human evaluation.

Plotly corpus

The public Plotly corpus is available at VizML repository. Following steps are the details on preparing Plotly data:

Download its full Plotly corpus with retrieve_data.sh in VizML repository.
Deduplicate data with its data_cleaning directory.
The same procedures of combo chart splitting and down sampling are applied to the remaining (table, charts) pairs as in Excel corpus.
- Chart splitting: Begin chart splitting with Data/Plotly/ChartSplit/HandlePlotlyTable.cs in this repository. ExtractForPlotlyTablesAll() is the entrance of splitting all (table, chart) pairs from a TSV file, and ExtractForPlotlyTables() is the entrance of splitting one (table, chart) pair from JSON file.
- Down sampling: See details in Data/Plotly/DownSampling folder in this repository.

Human Evaluation Data

We crawl 500 public web HTML tables with different schema, and after human evaluation we get 330 tables who are suitable for generating charts. In the JSON file Results/HumanEvaluation/human_eval_results.json, there are the original 330 tables, their corresponding charts recommended by Table2Charts, Data2Vis & DeepEye, and human evaluation results.

The result of one (table, charts) being evaluated is organized as the following format:

{
    "Table": {
        "Url": "...",
        "Header": ["..."],
        "Value": [["..."],["..."]]
    },
    "Table2Chart chart": {
        "ANA": "...",
        "Y": ["...", "..."],
        "X": ["..."],
        "GRP": ["..."],
        "score": 0.8
    },
    "DeepEye chart": {
        ...
    },
    "Data2Vis chart": {
        ...
    },
    "Table2Chart ratings": [5, 5, 5],
    "DeepEye ratings": [4, 4, 4],
    "Data2Vis ratings": [3, 3, 3]
}

For a table, Url, Header, Values mean the webpage url the table comes from, the header of the table, and the values of the table. For a chart, ANA, X, Y, GRP, score mean the chart type, x fields, y fields, grouping type, and confidence score of recommending this chart. The ratings for a chart from all three raters are stored in a list.

Results/HumanEvaluation/humanEvaluation.py gives the distribution of the ratings of the three systems, conducts Wilcoxon signed-rank test and computes Cliff's delta effect size for comparison between Table2Charts and the two other systems.

The tables and human evaluation labels are published under the CDLA license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Data

Plotly corpus

Human Evaluation Data

FilesExpand file tree

Data

Directory actions

More options

Directory actions

More options

Latest commit

History

Data

Folders and files

parent directory

README.md

Data

Plotly corpus

Human Evaluation Data