Skip to content

Perplexity when generating dataset #180

@battmanux

Description

@battmanux

This code leads to LLM perplexity.

                # Add information about already generated data
                if generated_data:
                    already_generated = "\nAlready generated data (please avoid these exact combinations):\n"
                    for row in generated_data:
                        already_generated += f"{','.join(map(str, row))}\n"
                    content += already_generated

already_generated = "\nAlready generated data (please avoid these exact combinations):\n"

At a certain point, whatever the model, the context will be too big and generation instructions will be ignored conducting to unusable generation (model perplexity)

I suggest sampleing a few lines instead of sending already_generated

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions