Skip to main content
data intermediate

Generate Synthetic Test Data for Database Testing

Create realistic synthetic test data for databases and applications. Generate CSV files with custom schemas, relationships, and realistic values.

Works with: chatgptclaudegemini

Prompt Template

I need you to generate synthetic test data for a [DATABASE_TABLE] with the following specifications: Table Schema: - [COLUMN_SPECIFICATIONS] Data Requirements: - Generate [NUMBER_OF_RECORDS] records - Ensure data follows realistic patterns and distributions - Include edge cases (nulls, boundary values, etc.) in approximately 5-10% of records - Maintain referential integrity if foreign keys are specified - Use realistic naming conventions and formatting for the [DOMAIN_CONTEXT] Output Format: - Provide the data as a CSV format with headers - Include data validation rules or constraints that should be applied - Add comments explaining any complex relationships or business logic Additional Constraints: [ADDITIONAL_CONSTRAINTS] Please ensure the generated data is: 1. Realistic and contextually appropriate 2. Diverse enough to test various scenarios 3. Compliant with common data privacy practices (no real personal information) 4. Suitable for [TESTING_PURPOSE] After generating the data, provide a brief summary of the data characteristics, any patterns included, and suggestions for additional test scenarios this dataset could support.

Variables to Customize

[DATABASE_TABLE]

Name and type of the database table or dataset

Example: user_profiles table for an e-commerce application

[COLUMN_SPECIFICATIONS]

Detailed column definitions with data types and constraints

Example: user_id (INTEGER, PRIMARY KEY), email (VARCHAR, UNIQUE), first_name (VARCHAR), last_name (VARCHAR), birth_date (DATE), account_status (ENUM: active, inactive, suspended), created_at (TIMESTAMP)

[NUMBER_OF_RECORDS]

How many test records to generate

Example: 500

[DOMAIN_CONTEXT]

Business domain or industry context

Example: e-commerce platform

[ADDITIONAL_CONSTRAINTS]

Any specific business rules or data constraints

Example: 20% of users should be inactive, birth_dates should span ages 18-80, email domains should be realistic

[TESTING_PURPOSE]

What type of testing this data will be used for

Example: performance testing and user authentication workflows

Example Output

Here's your synthetic test data for user_profiles: ```csv user_id,email,first_name,last_name,birth_date,account_status,created_at 1,sarah.johnson@gmail.com,Sarah,Johnson,1985-03-15,active,2023-01-15 09:23:45 2,mike.chen@yahoo.com,Michael,Chen,1992-07-22,active,2023-01-16 14:12:33 3,invalid.email,Emma,Wilson,1978-11-03,inactive,2023-01-17 11:45:21 4,david.brown@hotmail.com,David,Brown,1995-12-08,suspended,2023-01-18 16:34:12 ``` Data Characteristics: - 500 total records generated - Age distribution: 18-25 (25%), 26-40 (40%), 41-60 (25%), 61-80 (10%) - Account status: 70% active, 20% inactive, 10% suspended - Edge cases included: 3% invalid emails, 2% future birth dates, 1% null values - Email domains: Gmail (35%), Yahoo (25%), Hotmail (20%), Corporate domains (20%) Test Scenarios Supported: - Email validation testing - Age-based feature restrictions - Account status workflow testing - Data migration and cleanup procedures - Performance testing with realistic data distribution

Pro Tips for Best Results

  • Be specific about data relationships and foreign key constraints to ensure referential integrity
  • Request a mix of valid and invalid data to test error handling and validation logic
  • Specify realistic distributions (percentages) for categorical fields to mirror production data
  • Include edge cases like boundary values, nulls, and special characters for comprehensive testing
  • Ask for data export in multiple formats if you need compatibility with different testing tools

Tags

Want 500+ Expert Prompts?

Get the Premium Prompt Pack — organized, tested, and ready to use.

Get it for $29

Related Prompts You Might Like