Skip to main content

8. Data Validations

Data validations are essential for maintaining data integrity and ensuring that the information processed by Data Steward meets your organization's quality standards. This section will guide you through understanding, configuring, and applying data validations to your submissions.

8.1 Understanding Validation Types

Data Steward uses a hierarchical approach to validations:

  1. Validation Templates: Global definitions of validation rules.
  2. Validation Types: Organization-specific instances of validation templates.
  3. Validations: Actual application of a validation type to a specific data submission.

This structure allows for both standardization and customization of validation rules across your organization.

8.2 Creating and Configuring Validations

Available Validation Templates

Data Steward offers several pre-defined validation templates to address common data quality needs:

  1. Range Check: Ensures numeric values fall within specified limits.
  2. Format Validation: Verifies that data conforms to expected patterns (e.g., email addresses, phone numbers).
  3. Uniqueness Check: Ensures certain fields contain unique values.
  4. Referential Integrity: Checks that values in one dataset exist in another (e.g., product SKUs against a master list).
  5. Completeness Check: Verifies that required fields are not empty.
  6. Consistency Check: Ensures related fields have logically consistent values.
  7. Enum Validation: Checks that values belong to a predefined set of options.

Creating a Validation Type

To create a new validation type:

  1. Navigate to "Settings" > "Validations" in the main menu.

  2. Click "Create New Validation Type."

  3. Select the base Validation Template.

  4. Provide a name and description for your validation type.

  5. Configure the parameters specific to your needs. These may include:

    • Acceptable value ranges
    • Regular expressions for format checks
    • Reference datasets for integrity checks
    • Lists of valid enum values
  6. Set the error message to display when validation fails.

  7. Choose the severity level (Warning or Error).

  8. Save your new validation type.

Assigning Validations to Submission Types

To apply validations to incoming data:

  1. Go to "Submission Types" and select the relevant type.
  2. Navigate to the "Validations" tab.
  3. Click "Add Validation."
  4. Select the validation type you want to apply.
  5. Set the order of application if using multiple validations.
  6. Save your changes.

8.3 Validation Results and Error Handling

Viewing Validation Results

To check the results of validations on a submission:

  1. Navigate to the "Submissions" section.
  2. Select a specific submission.
  3. Go to the "Validations" tab to see:
    • List of applied validations
    • Status of each validation (Passed, Warning, Failed)
    • Number of records that passed/failed each validation

Detailed Error Reports

For a closer look at validation issues:

  1. In the submission's "Validations" tab, click on a validation with warnings or errors.
  2. View the detailed report, which includes:
    • Specific records that failed the validation
    • The reason for each failure
    • Suggested corrective actions (if available)

Handling Validation Errors

When validations fail:

  1. Review the error details in the validation report.

  2. Decide on the appropriate action:

    • Reject the submission and request corrections from the data provider.
    • Apply automatic corrections if the errors are minor and you have predefined rules.
    • Manually correct the data within Data Steward (for minor issues).
    • Approve with warnings, if the errors are not critical and you want to proceed.
  3. Document your decision and any actions taken in the submission comments.

8.4 Advanced Validation Techniques

Chaining Validations

You can apply multiple validations in sequence:

  1. In the submission type configuration, add multiple validations.
  2. Use the drag-and-drop interface to set the order.
  3. Consider dependencies between validations when setting the order.

Custom Validations

For unique validation needs:

  1. Go to "Settings" > "Custom Validations."
  2. Click "Create Custom Validation."
  3. Write your validation logic using the provided scripting interface.
  4. Test your validation thoroughly before deploying.

AI-Assisted Validations

Data Steward can use AI for complex validation scenarios:

  1. Enable AI assistance in your validation configuration.

  2. The AI can help with tasks like:

    • Detecting anomalies in time-series data
    • Identifying inconsistencies in product descriptions
    • Suggesting potential data quality issues
  3. Review AI-suggested validations before applying them.

8.5 Best Practices for Data Validations

  • Start with critical validations that directly impact data usability and gradually add more nuanced checks.
  • Use clear, specific error messages that guide users on how to correct issues.
  • Regularly review and update your validations to adapt to changing data patterns or business rules.
  • Balance strictness with practicality; overly rigid validations may lead to unnecessary rejections.
  • Use warnings for less critical issues to flag potential problems without blocking data processing.
  • Leverage Data Steward's reporting features to track validation performance over time and identify recurring issues.
  • Provide training and documentation to data providers on your validation rules to improve first-time quality.
  • Consider the performance impact of validations, especially for large datasets, and optimize where necessary.
  • Use version control for your validation configurations to track changes over time.

By implementing robust data validations in Data Steward, you ensure that only high-quality, consistent data enters your system. This is crucial for maintaining data integrity and enabling accurate analytics and decision-making in your semiconductor and high-tech manufacturing processes. Remember, the goal of validation is not just to catch errors, but to continually improve the overall quality of your data ecosystem.