Skip to main content

11. Export and Integration

Exporting data from Data Steward and integrating it with other systems is where the true power of the platform is realized. This section will guide you through the export capabilities, integration options, and best practices for seamlessly incorporating Data Steward into your technology ecosystem.

11.1 Configuring Export Databases

Data Steward leverages Prisma's database support, allowing you to connect to a wide range of databases for exporting processed data. This flexibility ensures that you can seamlessly transfer enriched and validated data to your preferred operational systems.

Supported Database Types

Data Steward supports all databases that Prisma supports, including but not limited to:

  • PostgreSQL
  • MySQL
  • Microsoft SQL Server
  • SQLite
  • MongoDB
  • CockroachDB

Additionally, Data Steward supports Snowflake and similar cloud data warehouses.

Setting Up an Export Database

  1. Navigate to "Settings" > "Export Databases" in the main menu.
  2. Click "Add New Export Database."
  3. Provide the following details:
    • Name: A descriptive name for the export database
    • Connection String: The database connection string
    • Dialect: Choose from the list of supported database types
  4. Test the connection to ensure it's properly configured.
  5. Save the export database configuration.

Database User Permissions

When setting up the database user account for the export mapping:

  1. Create a dedicated user account for Data Steward exports.

  2. Grant this user the following permissions:

    • SELECT: To read from tables (if needed for validation)
    • INSERT: To add new records
    • UPDATE: To modify existing records
    • DELETE: If your export process needs to remove data
    • CREATE TABLE: If Data Steward needs to create new tables for exports
  3. For Snowflake:

    • Assign the user to an appropriate role with the necessary privileges
    • Grant USAGE on the target schema
    • Grant appropriate warehouse privileges

Always follow the principle of least privilege, granting only the permissions necessary for the export operations.

11.2 Setting Up Export Mappings

Export mappings define how data from specific submission types should be exported to your configured databases.

Creating an Export Mapping

  1. Go to "Settings" > "Export Mappings."
  2. Click "Create New Export Mapping."
  3. Select the Submission Type you want to export.
  4. Choose the target Export Database.
  5. Configure the mapping:
    • Map Data Steward fields to database columns
    • Set up any necessary data transformations for the export
  6. Set the export schedule or trigger (e.g., automatic export on approval, manual export)
  7. Configure the Approval Timeout:
    • Set the maximum amount of time (in seconds) before automatically marking a submission as APPROVED
    • This helps prevent delays in data export due to pending approvals
  8. Save the export mapping configuration.

Managing Export Mappings

  • Review existing mappings in the Export Mappings list.
  • Edit mappings as needed to adjust field mappings or export settings.
  • Disable or delete mappings that are no longer needed.

11.3 Integration with External Systems

Data Steward offers various methods to integrate with external systems, allowing for seamless data flow across your technology stack.

API Integration

While Data Steward doesn't have a built-in API as per the provided schema, you can leverage your export database as an integration point:

  1. Configure your external systems to connect directly to the export database.
  2. Implement database triggers or scheduled jobs to detect and react to new or updated data.
  3. Use database views or stored procedures to present the data in the most useful format for your external systems.

Webhook Notifications

Set up webhooks to notify external systems of events in Data Steward:

  1. Go to "Settings" > "Webhooks" (if supported by your implementation).

  2. Configure webhooks to send notifications for events such as:

    • New submission received
    • Submission approved or rejected
    • Export process completed
  3. Your external systems can listen for these webhook events and trigger appropriate actions.

File-based Integration

For systems that prefer file-based data exchange:

  1. Configure an SFTP server or cloud storage location for file drops.
  2. Set up a process to periodically export data from your export database to files.
  3. Place these files in the designated location for your external systems to consume.

ETL Tool Integration

For complex data integration scenarios, you can use ETL (Extract, Transform, Load) tools:

  1. Configure your ETL tool to connect to Data Steward's export database.
  2. Set up ETL jobs to transfer and transform data as needed.
  3. Schedule ETL jobs to run in coordination with Data Steward's export processes.

11.4 Real-time Data Synchronization

For use cases requiring real-time data access:

  1. Implement a change data capture (CDC) mechanism on the export database.
  2. Set up a streaming pipeline (e.g., using Apache Kafka) to propagate changes in real-time.
  3. Configure your downstream systems to consume the real-time data stream.

11.5 Best Practices for Export and Integration

  • Data Governance: Ensure that your data exports comply with your organization's data governance policies.
  • Error Handling: Implement robust error handling in your integrations to manage potential issues (e.g., network failures, data inconsistencies).
  • Monitoring: Set up monitoring for your export processes and integrations to quickly identify and resolve any issues.
  • Data Reconciliation: Regularly reconcile data between Data Steward and integrated systems to ensure consistency.
  • Performance Optimization: Monitor the performance of your exports and optimize as needed, especially for large data volumes.
  • Security: Use encryption for data in transit and ensure that all integration points are properly secured.
  • Versioning: Maintain version control for your export mappings and integration configurations.
  • Testing: Thoroughly test your exports and integrations, including edge cases and error scenarios.
  • Documentation: Keep detailed documentation of your export and integration setups for troubleshooting and knowledge transfer.
  • Scalability: Design your integrations to handle growing data volumes and potential spikes in data flow.

11.6 Troubleshooting Common Export and Integration Issues

  • Connection Failures: Check network connectivity, firewall settings, and credential validity.
  • Data Type Mismatches: Ensure that data types are compatible between Data Steward and the target system.
  • Missing Data: Verify that all required fields are mapped correctly in your export configuration.
  • Performance Issues: Monitor query performance and optimize database indexes as needed.
  • Inconsistent Data: Use Data Steward's validation features to ensure data quality before export.

11.7 Example Scenario: Processing and Exporting a POS Submission

To illustrate how Data Steward processes and exports data, let's follow a Point-of-Sale (POS) submission through the system. This example will demonstrate how the data is transformed, enriched, and ultimately exported to a database.

Initial POS Submission

Let's say a vendor submits the following POS data:

Vendor ID,Product SKU,Product Description,Quantity Sold,Invoice Date,Invoice Number,Sold To Company,Sold To Address,Sold To City,Sold To State,Sold To Zip,Sold To Country,End Customer,End Customer Address,End Customer City,End Customer State,End Customer Zip,End Customer Country
123456,ABC-123,High-Performance Laptop,5,05/15/2023,INV-001,TechRetailer,123 Main St,Anytown,CA,12345,USA,Corp Inc,456 Oak Rd,Bizville,NY,67890,USA

Data Processing in Data Steward

  1. Normalization: The "Normalize Point-of-Sale Columns" transformation standardizes column names and formats.

  2. Product Type Classification: The "Enrich Product Type" transformation classifies the product based on its SKU and description.

  3. System Data Enrichment: The "Enrich System Data" transformation adds detailed system information.

  4. CPU Data Enrichment: The "Enrich CPU Data" transformation adds CPU-specific details.

Transformed and Enriched Data

After processing, the data might look like this (simplified for brevity):

Vendor ID,Product Manufacturer SKU,Product Description,Quantity Sold,Invoice Date,Invoice Number,Sold To Company Name,Sold to Address,Sold to City,Sold to State,Sold to Zip Code,Sold to Country,End Customer Company Name,End Customer Address,End Customer City,End Customer State,End Customer Zip Code,End Customer Country,Product Type,System Form Factor,System Manufacturer,System Family,System Line,System Model,System Name,CPU Manufacturer,CPU Family,CPU Line,CPU Model,CPU Name
123456,ABC-123,High-Performance Laptop,5,05/15/2023,INV-001,TechRetailer,123 Main St,Anytown,CA,12345,USA,Corp Inc,456 Oak Rd,Bizville,NY,67890,USA,Laptop,Notebook,TechBrand,ProSeries,UltraBook,X1,TechBrand ProSeries UltraBook X1,IntelCorp,Core,i7,1185G7,Intel Core i7-1185G7

Exporting to Database

Assuming we've set up an export mapping to a PostgreSQL database, here's what the user can expect:

  1. Database Table Creation: If not already present, Data Steward will create a table in the export database, e.g., pos_submissions, with columns matching the enriched data fields.

  2. Data Insertion: The enriched data will be inserted into the pos_submissions table. The SQL might look like this:

    INSERT INTO pos_submissions (
    vendor_id, product_manufacturer_sku, product_description, quantity_sold,
    invoice_date, invoice_number, sold_to_company_name, sold_to_address,
    sold_to_city, sold_to_state, sold_to_zip_code, sold_to_country,
    end_customer_company_name, end_customer_address, end_customer_city,
    end_customer_state, end_customer_zip_code, end_customer_country,
    product_type, system_form_factor, system_manufacturer, system_family,
    system_line, system_model, system_name, cpu_manufacturer, cpu_family,
    cpu_line, cpu_model, cpu_name
    ) VALUES (
    123456, 'ABC-123', 'High-Performance Laptop', 5, '2023-05-15', 'INV-001',
    'TechRetailer', '123 Main St', 'Anytown', 'CA', '12345', 'USA',
    'Corp Inc', '456 Oak Rd', 'Bizville', 'NY', '67890', 'USA',
    'Laptop', 'Notebook', 'TechBrand', 'ProSeries', 'UltraBook', 'X1',
    'TechBrand ProSeries UltraBook X1', 'IntelCorp', 'Core', 'i7', '1185G7',
    'Intel Core i7-1185G7'
    );
  3. Metadata Logging: Data Steward will log metadata about the export process, including:

    • Timestamp of the export
    • Number of records exported
    • Any validation warnings or errors

What the User Can Expect

  1. Data Availability: The enriched POS data is now available in the export database for querying and analysis.

  2. Data Quality: The exported data includes standardized column names, enriched product information, and consistent formatting.

  3. Traceability: The original submission is linked to the exported data through Data Steward's internal tracking, allowing for auditing and troubleshooting if needed.

  4. Automated Processing: Subsequent POS submissions will be automatically processed and exported following the same workflow.

  5. Integration Ready: The exported data in the PostgreSQL database is now ready for integration with other systems, such as business intelligence tools or inventory management systems.

This example demonstrates how Data Steward takes raw POS data, applies multiple transformations to enrich and standardize it, and then exports the result to a structured database format. This process ensures that high-quality, consistent data is available for downstream analysis and decision-making in your semiconductor and high-tech manufacturing operations.