## Data Profiler The Data Profiler workflow will be using the `orm-profiler` processor. After running a Metadata Ingestion workflow, we can run the Data Profiler workflow. While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. ### 1. Define the YAML Config This is a sample config for the profiler: {% codePreview %} {% codeInfoContainer %} #### Source Configuration - Source Config You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). {% codeInfo srNumber=14 %} **profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. {% /codeInfo %} {% codeInfo srNumber=15 %} **threadCount**: Number of threads to use during metric computations. {% /codeInfo %} {% codeInfo srNumber=18 %} **timeoutSeconds**: Profiler Timeout in Seconds {% /codeInfo %} {% codeInfo srNumber=19 %} **databaseFilterPattern**: Regex to only fetch databases that matches the pattern. {% /codeInfo %} {% codeInfo srNumber=20 %} **schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. {% /codeInfo %} {% codeInfo srNumber=21 %} **tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. {% /codeInfo %} {% codeInfo srNumber=22 %} #### Processor Configuration Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: **tableConfig**: `tableConfig` allows you to set up some configuration at the table level. {% /codeInfo %} {% codeInfo srNumber=23 %} #### Sink Configuration To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. {% /codeInfo %} {% partial file="/v1.5/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} {% codeBlock fileName="filename.yaml" %} ```yaml {% isCodeBlock=true %} source: type: {% $connector %} serviceName: {% $connector %} sourceConfig: config: type: Profiler ``` ```yaml {% srNumber=14 %} # profileSample: 85 ``` ```yaml {% srNumber=15 %} # threadCount: 5 ``` ```yaml {% srNumber=18 %} # timeoutSeconds: 43200 ``` ```yaml {% srNumber=19 %} # databaseFilterPattern: # includes: # - database1 # - database2 # excludes: # - database3 # - database4 ``` ```yaml {% srNumber=20 %} # schemaFilterPattern: # includes: # - schema1 # - schema2 # excludes: # - schema3 # - schema4 ``` ```yaml {% srNumber=21 %} # tableFilterPattern: # includes: # - table1 # - table2 # excludes: # - table3 # - table4 ``` ```yaml {% srNumber=22 %} processor: type: orm-profiler config: {} # Remove braces if adding properties # tableConfig: # - fullyQualifiedName: