BulkloadV2 Ingestion in Amorphic
- Create a Bulk Load Process in Amorphic which ingest data from multiple SQL tables and regularly stores it in a specified target location.
- The Bulk Load process should be updatable
This use case can be satisfied with various Amorphic Resources like
- Connections
- Tasks
- Schedules
- DataSets
The CLI offers 4 subcommands for this purpose which are
amorphic-cli ingestion create_or_update_connectionsamorphic-cli ingestion create_or_update_tasksamorphic-cli ingestion create_or_update_schedulesamorphic-cli ingestion create_or_update_datasets
Steps Required:
- Create a connection with the target SQL database in Amorphic with the required configuration data
amorphic-cli ingestion create_or_update_connections --filepath conn_config.json
The connection details are passed through the configuration file in JSON
Output: Successful connection creation returns the connection id
- Create multiple datasets which take connection id, schemaConfig, and file path config JSON as input
amorphic-cli ingestion create_or_update_datasets --filepath task_config.json" –schemapath schema_config.json --connectionId “id”
The dataset details are passed through configuration files in JSON
The connection id is passed as a flag
Output: Successful dataset creation creates a dataset associated with a given connection and returns dataset Id
- Create a task that takes connection id, load type, and config JSON as input
amorphic-cli ingestion create_or_update_tasks --filepath task_config.json" --connectionId “id” --loadtype v2 datasetids ”id1,id2”
The task details are passed through configuration files in JSON
The connection id is passed as a flag Output: Successful task creation creates a task associated with a given connection and returns the task id
- Create a schedule that takes task id and task configuration as input
amorphic-cli ingestion create_or_update_schedules --taskId "taskid" --filepath schedule_config.json
The schedule details are passed through a config file in JSON
Task id is passed as flag
Output: Successful schedule creation return schedule ID
Updating existing setup: Rerun the commands after modifying the values in the config file
NOTE: not every value in the config files is updatable