PostgreSQL Queries on JSONB Fields with Ecto

18 min read
elixirectopostgresql

The Promotions team at theScore is responsible for enabling dynamic marketing campaigns within theScore Bet app. Each promotional campaign allows us to specify an audience of users who will be available to participate in the campaign, by comparing on past actions that a given user has taken against a set of dynamic rules. For example, when creating a new promotion, we can specify that its audience can only include patrons who have previously deposited a specific amount of money, or placed a specific quantity of bets.

Disclaimer: The specific scenario, and format of the data shown in the examples throughout, is contrived entirely for the purposes of this article.

An overview of the PostgreSQL schema

The Promotions service depends on the ability to read past events placed by a patron. These events are received asynchronously from other internal services, via Kafka, and then stored internally in the Promotions database as a generic "user action". These actions are used for a variety of purposes within our Promotions service, but in this case, we are exploring looking at the past actions of a given user to see if they fulfill the audience criteria of a given promotional campaign. The table which they get stored within has been created with a migration that resembled the code sample below.

defmodule Promotions.Repo.Migrations.CreateUserActionsTable do
  use Ecto.Migration

  def change do
    create table(:user_actions) do
      add :action_id, :string, null: false
      add :action_type, :action_type, null: false
      add :occurred_at, :utc_datetime_usec, null: false
      add :region, :region, null: false
      add :user_id, :binary_id, null: false
      add :attributes, :jsonb, null: false

      timestamps()
    end
  end
end

The corresponding Ecto schema for the "user_actions" table looks like the following.

defmodule Promotions.UserAction do
  use Ecto.Schema

  alias Promotions.UserAction.CasinoWagerAttributes
  alias Promotions.UserAction.DepositAttributes
  alias Promotions.UserAction.RegistrationAttributes
  alias Promotions.UserAction.SportsbookBetAttributes

  schema "user_actions" do
    field :action_id, :string
    field :action_type, Ecto.Enum, values: [:registration, :deposit, :sportsbook_bet, :casino_wager]
    field :occurred_at, :utc_datetime_usec
    field :region, Ecto.Enum, values: [:usa_new_jersey, :usa_colorado, :usa_indiana, :usa_iowa, :can_ontario]
    field :user_id, Ecto.UUID

    embeds_one :casino_wager_attributes, CasinoWagerAttributes, source: :attributes
    embeds_one :deposit_attributes, DepositAttributes, source: :attributes
    embeds_one :registration_attributes, RegistrationAttributes, source: :attributes
    embeds_one :sportsbook_bet_attributes, SportsbookBetAttributes, source: :attributes

    timestamps()
  end
end

Each of the various embedded schemas to populate the attributes field is represented in a separate Ecto schema of its own. For the purposes of this article, only a truncated sample of the SportsbookBetAttributes embedded schema will be shown below as well.

defmodule Promotions.UserAction.SportsbookBetAttributes do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field :amount_cents_wagered, :integer
    field :amount_cents_payout, :integer
    field :bet_type, Ecto.Enum, values: [:straight, :parlay]
    field :odds, :integer

    embeds_many :legs, SportsbookBetLegAttributes
  end
end

defmodule Promotions.UserAction.SportsbookBetLegAttributes do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field :event_id, :string
    field :event_type, :string
    field :market_id, :string
    field :market_selection_id, :string
    field :market_type, :string
    field :in_play, :boolean
    field :odds, :integer
  end
end

Example Criteria to Turn into Database Queries

The audience criteria of a promotional campaign can be based on any type of action that we record (ie. deposit, sportsbook bet, etc) and the various attributes of the action. The following are examples of such an audience rule.

  • The patron registered their account in January 2022
  • The patron has placed at least 3 deposits from within New Jersey
  • The patron placed at least one sportsbook bet on the last years Super Bowl

For the first two examples, the data is stored within more fields that are of a more primitive data type within the user_actions table of the database, and are standard to query. For the last example, it requires querying the JSONB attributes field to determine if any of the "legs" on any of the users previous sportsbook bets contain the event_id associated with last years Super Bowl.

Similarly, the audience criteria also needs to support scenarios where "The patron has placed a sportsbook bet that was not on last years Super Bowl" as well. This would require looking for a sportsbook bet that does not have an event_id corresponding to the Super Bowl for the previous year in any of its legs.

The queries we need to write need to work for both "at least one leg of a sportsbook contains" and "none of the legs of a sportsbook bet can contain" scenarios. To do so, we need to investigate how to make precise queries on a set of data within a JSONB field in PostgreSQL, and subsequently how we can turn those queries into composable Ecto queries within our Elixir application as well.

Notably, we want to be able to support operators of "in" and "not in" when querying for data, as opposed to operators such as "equals", "not equals", "greater than or equals", etc.

Query of a JSONB Field

The first results our team came across while looking into this was using PostgreSQL native functions and operators. We quickly realized that these would not fit all of our use-cases. It would be challenging to turn it into a reusable Ecto query that would work for a variety of fields and would be an inelegant solution for deeply nested data.

The next option we came across was to use the function jsonb_to_recordset to create a queryable schema of records from an array within a JSONB field (ie. the embeds_many :legs section of our bet attributes).

In the example below, we’ll try to find if the user has at least one bet with the following attributes.

user_id: "fcca980b-8edc-4848-91cc-4e00ff47019c"
region: :can_ontario
amount_cents_wagered: at least 1500 (ie. $15)
event_id: (not in any of the following)
  - "e8a95b17-972c-4421-9090-8bf66c0804a9"
  - "5fbef652-e772-4bae-84f5-31255fe2fadd"
  - "dd92d86c-fe6d-43a7-8765-a3e1026145db"

Using jsonb_to_recordset with SQL

The first step to transforming this into a usable query within our Elixir app was to try to get it working with a query made directly against the database to validate that it is the right tool for the job.

After some initial research, we composed the following SQL query:

SELECT
    public.user_actions AS action,
    jsonb_to_recordset(action.attributes->'legs') AS leg("event_id" varchar)
WHERE
    action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
    action.region = 'ontario' AND
    (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
    leg.event_id NOT IN (
        'e8a95b17-972c-4421-9090-8bf66c0804a9',
        '5fbef652-e772-4bae-84f5-31255fe2fadd',
        'dd92d86c-fe6d-43a7-8765-a3e1026145db'
    );

While this allows us to use the operators we want (ie. >= and NOT IN) on data within the "attributes" JSONB field, and gets us the exact records we want, it would be a challenge to try to make it fit into an Ecto query since Ecto queries only allow for one entity to be in the from clause of a given query. For example, we are permitted to begin a query with from UserAction, but are unable to create a query that selects from two data sources such as from UserAction, fragment("jsonb_to_recordset(...)"). If there is a way around that, we were unable to find a solution.

The next option we looked to was modifying the first query by utilizing a lateral-join to try to make a "user action" such that it has a one-to-many relationship with its own "legs" nested within the JSONB "attributes" field. By doing so, we eliminate the need to use multiple clauses in the FROM section of the query, which helps gets us closer to a valid Ecto query translation. The query then looks like this:

SELECT
    action.*
FROM
    user_actions AS action
INNER JOIN LATERAL (
    SELECT *
    FROM   jsonb_to_recordset(action.attributes->'legs')
    AS     leg("event_id" varchar)
) AS leg ON true
WHERE
    action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
    action.region = 'can_ontario' AND
    (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
    leg.event_id NOT IN (
        'e8a95b17-972c-4421-9090-8bf66c0804a9',
        '5fbef652-e772-4bae-84f5-31255fe2fadd',
        'dd92d86c-fe6d-43a7-8765-a3e1026145db'
    )

The problem we quickly saw with this query is that will look to see if any of the "legs" of a specific bet adhere to the condition. Instead, we want to make sure that the all of the legs within the bet are not on those listed events (ie. cannot have any of the specified "event id" values). As such, we found that putting the inner join in a subquery helped to achieve the goal, wrapping it in a "NOT EXISTS" condition.

SELECT
    action.*
FROM
    user_actions AS action
WHERE
    action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
    action.region = 'ontario' AND
    (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
    NOT EXISTS (
        SELECT
            *
        FROM
            user_actions AS lateral_action
        INNER JOIN LATERAL (
            SELECT *
            FROM jsonb_to_recordset(lateral_action.attributes->'legs')
            AS leg("event_id" varchar)
        ) AS leg ON true
        WHERE
            action.id = lateral_action.id AND
            leg.event_id IN (
                'e8a95b17-972c-4421-9090-8bf66c0804a9',
                '5fbef652-e772-4bae-84f5-31255fe2fadd',
                'dd92d86c-fe6d-43a7-8765-a3e1026145db'
            )
    );

This query seems to meet all of the goals that we were looking to achieve, so the next step was to take it and turn it into a composable query within Ecto.

Translating SQL query to an Ecto Query

Many parts of our query are pretty quick and easy to make into an Ecto.Query structure. We can begin with everything before the subquery. For this example, we can assume all of the values were passed in and the operators for each part of the expression are known:

def query(user_id, region, amount_cents, event_ids)
  from action in Promotions.UserAction,
    as: :user_action,
    where: action.user_id == ^user_id,
    where: action.region == ^region,
    where: fragment("(?->?)::integer", action.sportsbook_bet_attributes, "amount_cents_wagered") >= ^amount_cents
end

We also need to build a query around the jsonb_to_recordset data that we created in the lateral join seen earlier.

This was a bit challenging, as jsonb_to_recordset is a native function of PostgreSQL and there are no functions built into Ecto to work with it directly. It also requires us to define an AS ... ON ... set of clauses to use with it. Since Ecto will create its own AS alias with each "from" or "join", that presents challenges as well.

After some research on this issue on the elixir forums, we were pointed in the right direction to wrap the subquery in parentheses, so that we could use our own AS within it, and it would not conflict with the one Ecto adds on.

def sportsbook_bet_legs_query do
  from action in Promotions.UserAction,
    as: :leg_user_action,
    inner_lateral_join:
      legs in fragment(
        "(SELECT * FROM jsonb_to_recordset(?->'legs') AS record(event_id varchar)",
        field(action, :sportsbook_bet_attributes)
      )
    on: true
    as: :sportsbook_bet_legs
end

To utilize both queries together, we need to include the sportsbook_bet_legs_query function as a subquery within the first, and wrap it in a where not exists clause of the outer query.

Since the subquery uses a lateral join, we need to remember to join it back to the outer query using Ecto’s parent_as/1 function. Another example of this can be found in Ecto’s documentation for exists/1 as well (https://hexdocs.pm/ecto/Ecto.Query.API.html#exists/1)

def sportsbook_bet_query(user_id, region, amount_cents_wagered, event_ids) do
  from action in Promotions.UserAction,
    as: :action,
    where: action.user_id == ^user_id,
    where: action.region == ^region,
    where: fragment("(?->?)::integer", action.sportsbook_bet_attributes, "amount_cents_wagered") >= ^amount_cents_wagered,
    where:
      not exists(
        from [
               leg_user_action: action_subquery,
               sportsbook_bet_legs: sportsbook_bet_legs
             ] in event_ids_query(),
             where: parent_as(:action).id == action_subquery.id,
             where: sportsbook_bet_legs.event_id in ^event_ids
      )
end

With the function now complete, it can be called by simply taking the earlier example and passing in the arguments to the reusable function:

user_id = "fcca980b-8edc-4848-91cc-4e00ff47019c"
region = :can_ontario
amount_cents_wagered = 1_500
event_ids = [
  "e8a95b17-972c-4421-9090-8bf66c0804a9",
  "5fbef652-e772-4bae-84f5-31255fe2fadd",
  "dd92d86c-fe6d-43a7-8765-a3e1026145db"
]

ecto_query = sportsbook_bet_query(user_id, region, amount_cents_wagered, event_id)
matching_bets = Repo.all(ecto_query)

if not Enum.empty?(matching_bets) do
  # user qualified for promotion
end

As a reminder, one of the goals of our function was to find a "user action" record that did not contain any of the values of the list of event_id’s from the attributes of any of its array of "leg" data points. One benefit of structuring the query in the way that has been shown here is that if we want to flip it to find if one (or more) legs does include a particular event_id, we can simply remove the not from the not exists condition without any other modifications.

The queries that we need in our application are more robust, including changing up the operators that we use instead of statically choosing "equals", "greater than or equals" and "not in" as outlined in our example, but from this template we were able to build out a library of robust and composable queries to scope results based on attributes both in more primitive data types within our database schema, as well as JSONB fields too.

I've set up a small-scale Elixir application which can be used to quickly scaffold a database with records, as seen in the format throughout this blog post, so you can test the queries seen within this article and tweak them as needed to experiment. Github Repository: Ecto JSONB Example.