Skip to content

feat: allow persistence of runtime generated job metadata to the DB and allow addition of config-driven job attributes#113

Merged
Yash Shrivastava (alephys26) merged 8 commits into
patterninc:mainfrom
ShivangNagta:feat/spark-history-link
Jul 1, 2026
Merged

feat: allow persistence of runtime generated job metadata to the DB and allow addition of config-driven job attributes#113
Yash Shrivastava (alephys26) merged 8 commits into
patterninc:mainfrom
ShivangNagta:feat/spark-history-link

Conversation

@ShivangNagta

@ShivangNagta Shivang Nagta (ShivangNagta) commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a generic, config-driven mechanism to attach extra metadata to a job and surface it in the UI, plus persistence of that metadata to the DB via a new extra_job_attributes column. The first use case is a "Spark History" link on the job details page for Spark-on-EKS jobs.

Rather than a plugin-specific column, attributes are defined in config on a cluster and rendered by core after the job runs:

clusters:
  - name: spark-eks
    context:
      spark_history_url: https://spark-history.example.com
    attributes:
      Spark History:
        kind: link
        value: "{{ .Cluster.spark_history_url }}/history/{{ .Outputs.spark_application_id }}/jobs/"

Each attribute is label → { kind, value }. kind (link/text) tells the UI how to render, and value is a Go text/template rendered over four namespaces: .Job, .Command, .Cluster (context maps), and .Outputs (runtime values published by the plugin).

Flow:

Plugins publish raw runtime values to a transient outputs channel with one call - the sparkeks plugin captures Status.SparkApplicationID during monitoring and emits job.SetOutput("spark_application_id", id).
Core renders cluster.Attributes after Execute into job.ExtraJobAttributes and persists it to the new extra_job_attributes jsonb not null default '{}' column.
UI renders each attribute by kind on the job details page.
This means new attributes (static, or from existing plugin outputs) can be added by editing config alone - devs don't touch plugin code per parameter unless very specific runtime generated metadata needs to be stored. The outputs channel is transient (not persisted); only the rendered extra_job_attributes is stored.

Test

Tested locally (migration, persistence, UI);
Haven't done e2e testing in sandbox as this adds no extra API call for the id. The monitor loop
already fetches Status (for AppState), and SparkApplicationID is just another field on that same object, so reading it adds no new call or failure mode.

Confirmed the spark-operator populates Status.SparkApplicationID at runtime by running the operator's spark-pi example on a local kind cluster and reading the field back.
local_spark_cluster

Manual seeding for testing (for spark and non-spark job)
manual_seed

Button Rendering in Job Details Page (for a spark job)
button_on_dashboard

Some Notes (open for comments)

spark_application_id is the first plugin-specific column on jobs (the other columns are generic). It looks a little bit odd to me but Claude's reasoning for it was - "there's no generic home for plugin runtime metadata as of now", which seems to be true, because our use case is to store a runtime generated data (spark_application_id) in the heimdall database. I could not find in any other plugins, doing something like this.
Some other options could be:
a. add a separate table for storing spark_application_id with a foreign key reference to the original job table. This separates the spark specific data from generic job table but that adds an extra API/read call, and also does not avoid the fact that we would still have to add spark specific table update somewhere in updateAsyncJobStatus function.
b. If more plugins need to store runtime metadata, a generic metadata column may be preferable to per-plugin columns. But this seems like an early abstraction.

EDIT

Option b styled approach was chosen after discussions

@ShivangNagta Shivang Nagta (ShivangNagta) marked this pull request as ready for review June 24, 2026 09:49
Copilot AI review requested due to automatic review settings June 24, 2026 09:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

<Button
styleType='text-blue'
as='externalLink'
href={`https://spark-history.data-platform.aws.pattern.com/history/${jobData?.spark_application_id}/jobs/`}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shivang Nagta (@ShivangNagta) this is going to be shipped with oss docker image. Lets find another way to inject this in UI

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to the jobs table, add something more generic, maybe a json that stores key-value and the column is extra_job_attributes. And the frontend then uses the key as display text and the value as the hyperlink for all the attributes that exist for that column.
Or some better approach, anyway, a specific column for spark application ID is not what I would like to see.

@ShivangNagta

Copy link
Copy Markdown
Contributor Author

If this is going to the jobs table, add something more generic, maybe a json that stores key-value and the column is extra_job_attributes. And the frontend then uses the key as display text and the value as the hyperlink for all the attributes that exist for that column. Or some better approach, anyway, a specific column for spark application ID is not what I would like to see.

I have added an extra_job_attributes column in job table instead of spark_app_id which was plugin specific. It is a key value pair as you said, but for json value I have preferred using a nested object containing kind and value instead of just value. The kind field allows to store both hyperlinks and non-hyperlinks which would be a more flexible change for the future.

....
ExtraJobAttributes map[string]Attribute
....

const (
	AttributeKindLink = "link"
	AttributeKindText = "text"
)

type Attribute struct {
	Kind  string `yaml:"kind,omitempty" json:"kind,omitempty"`
	Value string `yaml:"value,omitempty" json:"value,omitempty"`
}

@ShivangNagta

Shivang Nagta (ShivangNagta) commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Also, the spark history server URL is now added in the cluster context instead

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@ShivangNagta

Copy link
Copy Markdown
Contributor Author

I have moved the logic for defining the template for extra job attribute values to config itself(cluster). Now the runtime metadata is stored in-memory (output field in job struct), and is rendered based on what was passed in the template. It is finally persisted to the DB column - extra_job_atttributes as it was previously
Example cluster config

  - name: spark-eks
    attributes:
      Spark History:
        kind: link
        value: "{{ .Cluster.spark_history_url }}/history/{{ .Outputs.spark_application_id }}/jobs/"
    status: active
    ....

@prasadlohakpure prasadlohakpure left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM

@ShivangNagta Shivang Nagta (ShivangNagta) changed the title feat: add Spark History Server link to job details feat: allow persistence of runtime generated job metadata to the DB via extra_job_attributes and allow config-driven job attributes Jul 1, 2026
@ShivangNagta Shivang Nagta (ShivangNagta) changed the title feat: allow persistence of runtime generated job metadata to the DB via extra_job_attributes and allow config-driven job attributes feat: allow persistence of runtime generated job metadata to the DB and allow addition of config-driven job attributes Jul 1, 2026
@ShivangNagta

Shivang Nagta (ShivangNagta) commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Rendering of config driven links and tests (they can be separated if needed)
Screenshot 2026-07-01 at 3 50 02 PM

@alephys26 Yash Shrivastava (alephys26) merged commit 8a43a12 into patterninc:main Jul 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants