Skip to content

A model is a software program that uses algorithms or rules to make informed decisions, predictions, or generations from a set of inputs without being given explicit instructions for every scenario — ML models, lookup tables, if-else rules, and LLMs all qualify.

A registered model in GGX typically includes:

  • Model file — stored weights, parameters, lookup tables, tensors, or other data needed to initialise the model.
  • Scoring Logic — the code that takes inputs and produces a prediction or generation.

Model anatomy: typed inputs flow into scoring logic, which calls a model source — API provider, Python logic, or uploaded weights — and returns the model output.

Inputs flow into scoring logic, which calls a model source and returns the model’s output.

Every model registered in GGX is one of three types. Choosing the right one is the first real decision you make on the registration form — it determines where the model runs and how it is configured.

External provider Best for hosted foundation models — OpenAI, Anthropic, Google Vertex AI, Azure OpenAI, AWS Bedrock, Hugging Face Inference.

  • GGX calls the provider over HTTPS using your configured credentials.
  • You pick the Model Provider and the specific Model from the dropdown.
  • You do not upload any weights.

The Model Catalog is the central place where every registered model lives, organised into customisable groups. From here you can track, monitor, test, and create new models.

Click Create on the Model Catalog page, then work through the form:

  1. Name and description. Give the model a clear name and a description of what it does and when to use it.

  2. Properties. Set the Group, Permissible Purpose, Approval Workflow, Ownership Type (Proprietary, Open Source, Internal), and Model Type (for example, LLM).

  3. Alias. required A code-safe variable name pipelines use to refer to this model — lowercase with underscores, no spaces.

  4. Input Type. Pick API-Based, Python-Based, or Custom. If API-Based, also pick the Model Provider and the specific Model.

  5. Output Type. The data type the model returns, e.g. dict[str, str].

  6. Input Arguments. For each argument (typically text, temperature, system_instruction, etc.), set its Alias, Type, whether it is optional, and a default value.

  7. Resources and weights. Attach any registered Global Functions or Prompts the scoring logic needs. Upload the model file under Pipeline Model File if the type is Custom.

  8. Scoring Logic. Write the Python that initialises the model and produces a result. Use Test Code to validate it against sample input.

  9. Save. Add notes or attach documentation under Additional Information, then click Create. The model is saved as a Draft until it goes through approval.

API-Based models can connect to any of the following providers. Each integration page covers the credentials and configuration the provider expects.

ProviderUse it for
OpenAIGPT family and OpenAI-hosted models.
AnthropicClaude family.
AWS BedrockBedrock-hosted foundation models from multiple vendors.
Google Vertex AIGemini and Vertex-hosted models.
Azure AIAzure-hosted OpenAI and other Azure foundation models.
Hugging FaceInference endpoints for open-source models.

Testing a model means confirming three things: the scoring logic runs without error, it can reach its source (provider API, uploaded file, or in-platform code), and the output matches the Output Type you declared. GGX gives you two levels for this.

Inside the Model Catalog page, the Test Code button at the bottom-right of the Scoring Logic editor runs the model against sample inputs without saving. Use it during development to:

  • Confirm the API credentials configured in Integrations actually reach the provider.
  • Verify the return value matches the declared Output Type (e.g. dict[str, str]).
  • Sanity-check temperature, max-token, and system-instruction handling.
  • Catch import errors or missing dependencies before saving.

A single Test Code call tells you the model works; a Bulk Simulation tells you how it behaves across many real cases. It runs every row of a dataset through the model and produces one output per record — useful for:

  • Spotting edge cases (empty input, very long input, non-English) a single test would miss.
  • Measuring quality across a representative sample before promoting to production.
  • Attaching the run as evidence in the model’s risk-assessment evidence tab.

Registering Google’s Gemini 2.0 Flash as an API-Based model. The form fields:

FieldValue
NameGemini 2.0 Flash
Aliasgemini_2_0_flash
Input TypeAPI-Based
Model ProviderGoogle Vertex AI
Output Typedict[str, str]
Input Argumentstext (String, required) · temperature (Numerical, optional, default 0) · system_instruction (String, optional, default None)

The scoring logic authenticates with an environment-variable token and calls Vertex AI:

gemini_2_0_flash — scoring logic
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.getenv("GOOGLE_API_TOKEN"))
config = types.GenerateContentConfig(
temperature=temperature,
seed=2025,
system_instruction=system_instruction,
)
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=text,
config=config,
)
return {"response": response.text}

Once saved, it can be called from any downstream pipeline:

reply = gemini_2_0_flash(
text=user_prompt,
temperature=0.7,
system_instruction="You are a helpful assistant.",
)
output_text = reply["response"]

A registered model is not only the thing being tested — it can also be the thing that does the testing. An LLM-as-a-judge is simply a model whose scoring logic pairs an LLM with a Prompt to score another system’s output against criteria, returning a structured score (for example, answer relevancy on a 0–4 scale) and a reason.

Because a judge is a registered model, it is reusable across any Report and pre/post-production evaluation, and it can itself be validated: run it over a ground-truth dataset with Bulk Simulation and compare its verdicts to the known answers before you trust it. Refine its prompt — including via Prompt Optimization — to adapt a generic judge to your use case.

Registering a model — rather than calling it from a one-off script — is what turns it into a governed, reusable asset:

CapabilityWhat you get
Change trackingEvery modification to a draft is snapshotted in Change History; approved versions are locked.
Purpose enforcementAutomatic detection of Permissible Purpose violations when the model is used downstream.
Testing & evaluationQuick Test Code during development and Bulk Simulation across datasets before promoting.
ReusabilityReuse across pipelines, with visibility through Lineage Tracking.
API fingerprintingExternal API connectivity is fingerprinted so changes upstream are detectable.
Auditable path to productionA transparent, fully auditable journey from Draft through Approval to use in pipelines.
Executable artifactsExtract ready-to-productionise artifacts straight from the Catalog.