Einblick
Search…
User-Defined Operators (UDO)
If you need functionality that the built-in operators (e.g. automl, what-if, and pivot chart) can't provide, you can create a user-defined operator (UDO). With UDOs, you can create new transformations, train models for future use, and make custom visualizations. See the UDO Showcase for examples.

Creating UDOs

A new operator can be created from the main menu, just as with a workspace. Click operator in the drop-down menu from the add new button in the top-left corner.
Adding a new user-defined operator

Using UDOs

Once an operator has been created, it will appear in the operators tab in the main menu.
Viewing UDOs from the operators tab in the main menu
To use an operator for your analysis, you must add it to a workspace. To do this, drag the operator onto the operators section of the workspace's side panel, in a similar fashion to adding datasets to a workspace. After adding a UDO to a workspace, it will appear in the list of operators in the workspace and can be used just like any built-in operator, as in the image below.
A workspace with several associated UDOs

Editing and Testing UDOs

Before a UDO can perform useful tasks, you must implement its specification. This specification covers all aspects of the operator's functionality, including:
    Name, size, and description
    Custom parameters
    Python code to run upon execution
    Visualization logic (using Vega or Vega-Lite)
You can access a UDO's specification by entering the UDO editor. Before entering the UDO editor, select the UDO in the main menu. A panel will appear to the right. In this panel, as with workspaces, you can add datasets and collaborators. The datasets specified here will become available for testing purposes in the UDO editor, and any collaborators you specify will be able to view the UDO's specification as well.
A UDO panel showing two added datasets
The UDO editor can be entered by clicking the edit button on the UDO side panel. An image of the UDO editor, with several numbered areas to be discussed next, is shown below.

Editor Menu

The menu bar at the top of the UDO editor exposes a few different functions. Each of these is numbered in the above image for reference.
Home Button (1)
Click the Home button to go back to the main menu.
Operator Name (2)
To change the name of the UDO, edit this field. The name displayed here is what will be shown in the workspace.
Import / Export (3)
UDOs can be saved as .zip files by clicking the export button, saving all of the UDO's information in the exported file. By clicking import and selecting such a .zip file, the contents of the file will be loaded into the currently opened UDO. Alternatively, a UDO can be imported directly in the main menu by dragging and dropping a .zip file, just as with .csv files.
Importing UDOs can be useful when importing directly from Einblick's public UDO repository. To do this, find the UDO you want to import, then download the directory as a .zip file. The downloaded file can be directly imported into Einblick.
Downloading a UDO from Einblick's public UDO repository
Operator Type and Model Type (4)
The type class of the operator can be set through these two menus (note that the output model type menu is unavailable for operators with type UDF.) In most cases, the UDF operator type is used for operators which perform operations on individual rows (e.g. adding a new column based on the value of each row of another column) while the UDA operator type is used for operators which perform operations over entire columns at a time (e.g. clustering operators).
These types are described in further detail in the Leveraging Einblick's Progressive Engine section.
Play Button (5)
Runs the operator for purposes of testing. See the Run Operator View section below.
Publish (6)
Clicking publish removes the draft status of the operator and makes the operator available for use in workspaces. If the operator had previously been published, this changes all instances of the operator (e.g. in workspaces) to use the latest specification.

Run Operator View

The right side of the UDO editor contains the input settings and the resulting output after running the UDO. The image below, with relevant sections numbered, shows what this looks like after updating the UDO specification clicking the play button.
Right-hand side of the UDO editor after running a UDO
Dataset Input (1)
In the first part of the input section, you can specify datasets to use as inputs into the operator. The number and name of these inputs will correspond to the specified DataframeInputModels in the operator specification code tab (discussed in more detail later.) The available datasets to use here will be those added to the datasets section of the UDO panel in the main menu.
Attribute and Parameter Inputs (2)
The second part of the input section allow you to choose attribute and parameter settings for testing your UDO. The inputs that appear here will correspond to the AttributeConfigGroupInputDescription and ValueInputDescription sections of the operator specification code, and will be visually controlled by the corresponding sections of the InputUI section. These will also be discussed in detail further down.
Visualization (3)
If you've specified code in the visualization tab, the generated Vega or Vega-Lite visualization will appear here. Otherwise, a tabular representation of the data will appear here.
Logs (4)
If there are any logs resulting from the execution of the operator (e.g. through print statements), or if the system encounters errors from running the operator, the corresponding information will appear here.

Editing UDO Code

The primary function of the UDO editor, of course, is to specify the code underlying UDOs. A UDO's code is divided between a number of code files, each displayed in a separate tab in the UDO editor, with each governing a specific part of the UDO's behavior. While most tabs are unnecessary in most cases, the ones most commonly used are bolded below.
    operator specification (JSON): defines the inputs and parameters of the operator
    requirements (Python): a list of required Python packages
    model definition (Python): definition of the trained model for trained UDOs
    on_open (Python): code to run upon operator initialization
    on_batch (Python): code to run upon receiving a new batch of data
    on_close (Python): code to run upon finishing execution
    on_reset (Python): code to run upon execution resets
    visualization (Vega or Vega-Lite): the Vega or Vega-Lite specification
    filters (JSON): defines filters for customizing user interactions with visualizations
Each tab is explained in further detail below. Also note that each tab has a description and documentation link available to its top-right, as illustrated in the image below.
Description and Documentation buttons

Operator Specification

This tab controls various properties of a UDO. The most important properties include:
    Visual properties, such as width, height
    Inputs and outputs, such as the name and number of input dataframes
    Attribute input menus, controlling how users select columns from input dataframes
    Custom parameter menus, allowing users to set custom values
There are two possible input methods for setting the operator specification. The first, which appears by default, is a form which allows you to specify most of the properties available to UDOs. The form view is displayed in the image below.
Operator specification form view
In the form view, you can specify the following properties:
    Basic properties:
      Width (pixels)
      Height (pixels)
      Description (Markdown): a short description describing the UDO, which will be available at the bottom of the operator
      Auto-execute on change: whether to re-execute the operator automatically when any input dataframes or parameters change
      Include output dataframe: whether to expose an output dataframe once the UDO is run
    Dataframe Inputs: how many input dataframes to expose, and their names
    Attribute Selection Inputs: operator input menus allowing the selection of attributes from input dataframes
      Customizable properties include:
        Name of the input dataframe to select attributes from
        Number and type of selectable attributes
    Custom Value Inputs: operator input menus allowing the specification of custom parameters
      Available input types include:
        Text field
        Numeric field
        Numeric slider
        Checkbox
        Multiple-choice list
The second input method is through a JSON file which describes all of the above properties along with more advanced customizability. The code view of the operator specification tab is described in detail on the Operator Specification JSON page.

Requirements

In this tab, enter a list of python packages to include for the Python code tabs.
Example requirements tab

Package Whitelist

Among external packages, only the following are currently supported as requirements:
    nltk
    sklearn
    scikit-learn
    pandas
    numpy
    scipy
    pycountry
    reverse_geocoder
    xgboost

Model Definition

If specified, a UDO can output a trained model instead of a dataframe. A trained model, just as with the built-in automl operator, allows us to build models on prior data and predict values on new data. Trained UDOs can be useful if, for example, you want to use a custom machine learning model during your analysis.
The specification for the code of the model definition tab is available on the Model Definition page.

Trained Model UDO Usage

If the output model type of the UDO is set, running the UDO, instead of returning a dataframe or visualization as output, will return a trained model. This model will be returned in the form of an executor, as with the automl operator. This is shown in the image below.
Using a trained model UDO
To use an executor of the returned model, drag out the gray box that appears in the UDO element once it finishes running. The executor can then be used with other datasets.

Batch Events (on_open, on_batch, on_close, on_reset)

These four files describe the UDO's behavior upon the OPEN, BATCH, CLOSE, and RESET events with respect to Einblick's progressive engine. Unless you are working with large datasets, you will generally only need to fill out the on_batch tab. A quick summary of the four tabs is given below.
    on_open: code for initializing variables for the BATCH, CLOSE, and RESET events
    on_batch: code to run for each batch
      for smaller datasets, only a single batch will be encountered, which allows us to ignore the other three tabs
    on_close: code specifying behavior after all batches have been run
    on_reset: code specifying behavior upon the RESET event
See Leveraging Einblick's Progressive Engine for more details on the various event types.

Accessing Settings From the Operator Menus

To access the input dataframe(s), you will need the df (dfs) keyword. To access any custom parameters or selected attributes in the operator's menus, you will need to use the attributes and params keywords. See Keywords for more details.

Visualization

If a valid Vega or Vega-Lite specification is provided in the visualization tab, the visual output of a UDO will be the corresponding visualization. The type of specification (Vega or Vega-Lite) is automatically inferred. To see what kinds of visualizations are possible and begin creating new visualizations, see the example specifications in the Vega Example Gallery and Vega-Lite Example Gallery.

Accessing Settings From the Operator Menus

To access any custom parameters or selected attributes in the operator's menus, you will need to use the attributes and params keywords. See Keywords for more details.

Visualization Data

The data output from a UDO (which may be modified by any provided Python code) will automatically be included in the data field of the specification. Therefore, in most cases, the data field can be left blank (or nearly blank in the case of Vega-Lite specifications) as below:
1
// Vega
2
"data": [],
3
4
// Vega-Lite
5
"data" { "values": [] }
Copied!
If you want to provide additional, hard-coded data points, you may do so by inserting them appropriately into the data field. Any additional data points will be insert after the automatically included data.

Auto-Completion and Documentation

To enable autocompletion and gain access to documentation in the editor, include a $schema value in your specification, dependent on whether you are using Vega or Vega-Lite:
1
// Vega
2
$schema: "https://vega.github.io/schema/vega/v5.json"
3
4
// Vega-Lite
5
$schema: "https://vega.github.io/schema/vega-lite/v4.json"
Copied!

Filters

The filters tab is used to specify filters for the visualization. For example, this allows selecting specific groups of data points just by selecting a single point belonging to that group.
To use a filter, add an entry to the filters array indicating the class of points to group together. For example, the following code groups together all points sharing the same value of x. If this were used in a scatterplot UDO, then upon clicking a point, all points sharing the same value of x as the selected point would also be selected.
1
{
2
"filters": [
3
"attributes['x'][0]"
4
]
5
}
Copied!

Keywords (Dataframes, Attributes, Custom Parameters, Container Sizing)

To access a user's settings in a UDO (e.g. selected attributes, dataframe inputs), a few keywords are available for use in the Python code tabs (e.g. on_batch) and in the visualization tab. They are described below.

Dataframe Keywords (df, dfs[i]) [Python only]

As with the python operator, the dataframe inputs of a UDO can be accessed with the df or dfs keywords. df is used when there is only one input dataframe, while dfs[i] is used when there are multiple.
The following code block (in on_batch) returns a new dataframe, produced by adding a column of zeros to the input dataframe.
1
# on_batch
2
3
df["zeros"] = 0
4
return df
Copied!
Similarly, the next code block returns a new dataframe, which is the result of multiplying the Score column from the second dataframe by 100 and adding it to the first dataframe.
1
# on_batch
2
3
dfs[0]["Score"] = dfs[1]["Score"] * 100
4
return dfs[0]
Copied!

Attribute and Custom Parameter Keywords (attributes, params) [Python, Vega/Vega-Lite]

To retrieve the names of selected attributes, use the attributes keyword. This keyword refers to either a Python dictionary or a JavaScript object, with keys corresponding to the attribute names (e.g. attributes, features, target). Each of these keys will map to a list (or array) of selected attribute names.
For example, in the following image, the attributes Score, GDP per capita, and Social support of the input dataset have been selected.
Example attribute selections
The resulting attributes keyword object is as follows:
1
print(attributes)
2
3
# Result:
4
5
# {
6
# 'features': [
7
# 'GDP per capita',
8
# 'Social support',
9
# 'Score'
10
# ]
11
# }
Copied!
Similarly, the params keyword is available for any custom parameters. It is an object mapping parameter names to values. For example, given the following inputs:
Example custom number range parameters
the params object is:
1
print(params)
2
3
# Result
4
5
# {
6
# 'n_clusters': 3,
7
# 'n_components': 3
8
# }
Copied!
Both attributes and params are available in Python and Vega/Vega-Lite tabs.

Container Sizing (Vega/Vega-Lite)

The keywords container_width and container_width are available in the visualization tab. They are useful for setting the visualization's size to fit the operator's dimensions.

Selections (Vega/Vega-Lite)

Adding the following line to a Vega or Vega-Lite specification will enable selections within the visualization, manipulated through clicking.
1
"selection": "SELECT_STORE",
Copied!
Then selection store itself will then be available under the name select_store (lowercase).
Last modified 1mo ago