Skip to main content
 print this page

BulkloadV2 Ingestion in Amorphic

  • Create a Bulk Load Process in Amorphic which ingest data from multiple SQL tables and regularly stores it in a specified target location.
  • The Bulk Load process should be updatable

  • This use case can be satisfied with various Amorphic Resources like

    • Connections
    • Tasks
    • Schedules
    • DataSets

The CLI offers 4 subcommands for this purpose which are

  • amorphic-cli ingestion create_or_update_connections
  • amorphic-cli ingestion create_or_update_tasks
  • amorphic-cli ingestion create_or_update_schedules
  • amorphic-cli ingestion create_or_update_datasets

Steps Required:

  1. Create a connection with the target SQL database in Amorphic with the required configuration data

amorphic-cli ingestion create_or_update_connections --filepath conn_config.json

The connection details are passed through the configuration file in JSON

Output: Successful connection creation returns the connection id

  1. Create multiple datasets which take connection id, schemaConfig, and file path config JSON as input

amorphic-cli ingestion create_or_update_datasets --filepath task_config.json" –schemapath schema_config.json --connectionId “id”

The dataset details are passed through configuration files in JSON

The connection id is passed as a flag
Output: Successful dataset creation creates a dataset associated with a given connection and returns dataset Id

  1. Create a task that takes connection id, load type, and config JSON as input

amorphic-cli ingestion create_or_update_tasks --filepath task_config.json" --connectionId “id” --loadtype v2 datasetids ”id1,id2”

The task details are passed through configuration files in JSON

The connection id is passed as a flag Output: Successful task creation creates a task associated with a given connection and returns the task id

  1. Create a schedule that takes task id and task configuration as input

amorphic-cli ingestion create_or_update_schedules --taskId "taskid" --filepath schedule_config.json

The schedule details are passed through a config file in JSON
Task id is passed as flag
Output: Successful schedule creation return schedule ID

Updating existing setup: Rerun the commands after modifying the values in the config file
NOTE: not every value in the config files is updatable