PostgreSQL Queries on JSONB Fields with Ecto
The Promotions team at theScore is responsible for enabling dynamic marketing campaigns within theScore Bet app. Each promotional campaign allows us to specify an audience of users who will be available to participate in the campaign, by comparing on past actions that a given user has taken against a set of dynamic rules. For example, when creating a new promotion, we can specify that its audience can only include patrons who have previously deposited a specific amount of money, or placed a specific quantity of bets.
Disclaimer: The specific scenario, and format of the data shown in the examples throughout, is contrived entirely for the purposes of this article.
An overview of the PostgreSQL schema
The Promotions service depends on the ability to read past events placed by a patron. These events are received asynchronously from other internal services, via Kafka, and then stored internally in the Promotions database as a generic "user action". These actions are used for a variety of purposes within our Promotions service, but in this case, we are exploring looking at the past actions of a given user to see if they fulfill the audience criteria of a given promotional campaign. The table which they get stored within has been created with a migration that resembled the code sample below.
defmodule Promotions.Repo.Migrations.CreateUserActionsTable do
use Ecto.Migration
def change do
create table(:user_actions) do
add :action_id, :string, null: false
add :action_type, :action_type, null: false
add :occurred_at, :utc_datetime_usec, null: false
add :region, :region, null: false
add :user_id, :binary_id, null: false
add :attributes, :jsonb, null: false
timestamps()
end
end
end
The corresponding Ecto schema for the "user_actions" table looks like the following.
defmodule Promotions.UserAction do
use Ecto.Schema
alias Promotions.UserAction.CasinoWagerAttributes
alias Promotions.UserAction.DepositAttributes
alias Promotions.UserAction.RegistrationAttributes
alias Promotions.UserAction.SportsbookBetAttributes
schema "user_actions" do
field :action_id, :string
field :action_type, Ecto.Enum, values: [:registration, :deposit, :sportsbook_bet, :casino_wager]
field :occurred_at, :utc_datetime_usec
field :region, Ecto.Enum, values: [:usa_new_jersey, :usa_colorado, :usa_indiana, :usa_iowa, :can_ontario]
field :user_id, Ecto.UUID
embeds_one :casino_wager_attributes, CasinoWagerAttributes, source: :attributes
embeds_one :deposit_attributes, DepositAttributes, source: :attributes
embeds_one :registration_attributes, RegistrationAttributes, source: :attributes
embeds_one :sportsbook_bet_attributes, SportsbookBetAttributes, source: :attributes
timestamps()
end
end
Each of the various embedded schemas to populate the attributes
field is represented in a separate Ecto schema of its own. For the purposes of this article, only a truncated sample of the SportsbookBetAttributes
embedded schema will be shown below as well.
defmodule Promotions.UserAction.SportsbookBetAttributes do
use Ecto.Schema
@primary_key false
embedded_schema do
field :amount_cents_wagered, :integer
field :amount_cents_payout, :integer
field :bet_type, Ecto.Enum, values: [:straight, :parlay]
field :odds, :integer
embeds_many :legs, SportsbookBetLegAttributes
end
end
defmodule Promotions.UserAction.SportsbookBetLegAttributes do
use Ecto.Schema
@primary_key false
embedded_schema do
field :event_id, :string
field :event_type, :string
field :market_id, :string
field :market_selection_id, :string
field :market_type, :string
field :in_play, :boolean
field :odds, :integer
end
end
Example Criteria to Turn into Database Queries
The audience criteria of a promotional campaign can be based on any type of action that we record (ie. deposit, sportsbook bet, etc) and the various attributes of the action. The following are examples of such an audience rule.
- The patron registered their account in January 2022
- The patron has placed at least 3 deposits from within New Jersey
- The patron placed at least one sportsbook bet on the last years Super Bowl
For the first two examples, the data is stored within more fields that are of a more primitive data type within the user_actions
table of the database, and are standard to query. For the last example, it requires querying the JSONB attributes
field to determine if any of the "legs" on any of the users previous sportsbook bets contain the event_id
associated with last years Super Bowl.
Similarly, the audience criteria also needs to support scenarios where "The patron has placed a sportsbook bet that was not on last years Super Bowl" as well. This would require looking for a sportsbook bet that does not have an event_id
corresponding to the Super Bowl for the previous year in any of its legs.
The queries we need to write need to work for both "at least one leg of a sportsbook contains" and "none of the legs of a sportsbook bet can contain" scenarios. To do so, we need to investigate how to make precise queries on a set of data within a JSONB field in PostgreSQL, and subsequently how we can turn those queries into composable Ecto queries within our Elixir application as well.
Notably, we want to be able to support operators of "in" and "not in" when querying for data, as opposed to operators such as "equals", "not equals", "greater than or equals", etc.
Query of a JSONB Field
The first results our team came across while looking into this was using PostgreSQL native functions and operators. We quickly realized that these would not fit all of our use-cases. It would be challenging to turn it into a reusable Ecto query that would work for a variety of fields and would be an inelegant solution for deeply nested data.
The next option we came across was to use the function jsonb_to_recordset
to create a queryable schema of records from an array within a JSONB field (ie. the embeds_many :legs
section of our bet attributes).
In the example below, we’ll try to find if the user has at least one bet with the following attributes.
user_id: "fcca980b-8edc-4848-91cc-4e00ff47019c"
region: :can_ontario
amount_cents_wagered: at least 1500 (ie. $15)
event_id: (not in any of the following)
- "e8a95b17-972c-4421-9090-8bf66c0804a9"
- "5fbef652-e772-4bae-84f5-31255fe2fadd"
- "dd92d86c-fe6d-43a7-8765-a3e1026145db"
Using jsonb_to_recordset
with SQL
The first step to transforming this into a usable query within our Elixir app was to try to get it working with a query made directly against the database to validate that it is the right tool for the job.
After some initial research, we composed the following SQL query:
SELECT
public.user_actions AS action,
jsonb_to_recordset(action.attributes->'legs') AS leg("event_id" varchar)
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
leg.event_id NOT IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
);
While this allows us to use the operators we want (ie. >=
and NOT IN
) on data within the "attributes" JSONB field, and gets us the exact records we want, it would be a challenge to try to make it fit into an Ecto query since Ecto queries only allow for one entity to be in the from
clause of a given query. For example, we are permitted to begin a query with from UserAction
, but are unable to create a query that selects from two data sources such as from UserAction, fragment("jsonb_to_recordset(...)")
. If there is a way around that, we were unable to find a solution.
The next option we looked to was modifying the first query by utilizing a lateral-join to try to make a "user action" such that it has a one-to-many relationship with its own "legs" nested within the JSONB "attributes" field. By doing so, we eliminate the need to use multiple clauses in the FROM
section of the query, which helps gets us closer to a valid Ecto query translation. The query then looks like this:
SELECT
action.*
FROM
user_actions AS action
INNER JOIN LATERAL (
SELECT *
FROM jsonb_to_recordset(action.attributes->'legs')
AS leg("event_id" varchar)
) AS leg ON true
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'can_ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
leg.event_id NOT IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
)
The problem we quickly saw with this query is that will look to see if any of the "legs" of a specific bet adhere to the condition. Instead, we want to make sure that the all of the legs within the bet are not on those listed events (ie. cannot have any of the specified "event id" values). As such, we found that putting the inner join in a subquery helped to achieve the goal, wrapping it in a "NOT EXISTS" condition.
SELECT
action.*
FROM
user_actions AS action
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
NOT EXISTS (
SELECT
*
FROM
user_actions AS lateral_action
INNER JOIN LATERAL (
SELECT *
FROM jsonb_to_recordset(lateral_action.attributes->'legs')
AS leg("event_id" varchar)
) AS leg ON true
WHERE
action.id = lateral_action.id AND
leg.event_id IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
)
);
This query seems to meet all of the goals that we were looking to achieve, so the next step was to take it and turn it into a composable query within Ecto.
Translating SQL query to an Ecto Query
Many parts of our query are pretty quick and easy to make into an Ecto.Query
structure. We can begin with everything before the subquery. For this example, we can assume all of the values were passed in and the operators for each part of the expression are known:
def query(user_id, region, amount_cents, event_ids)
from action in Promotions.UserAction,
as: :user_action,
where: action.user_id == ^user_id,
where: action.region == ^region,
where: fragment("(?->?)::integer", action.sportsbook_bet_attributes, "amount_cents_wagered") >= ^amount_cents
end
We also need to build a query around the jsonb_to_recordset
data that we created in the lateral join seen earlier.
This was a bit challenging, as jsonb_to_recordset
is a native function of PostgreSQL and there are no functions built into Ecto to work with it directly. It also requires us to define an AS ... ON ...
set of clauses to use with it. Since Ecto will create its own AS
alias with each "from" or "join", that presents challenges as well.
After some research on this issue on the elixir forums, we were pointed in the right direction to wrap the subquery in parentheses, so that we could use our own AS
within it, and it would not conflict with the one Ecto adds on.
def sportsbook_bet_legs_query do
from action in Promotions.UserAction,
as: :leg_user_action,
inner_lateral_join:
legs in fragment(
"(SELECT * FROM jsonb_to_recordset(?->'legs') AS record(event_id varchar)",
field(action, :sportsbook_bet_attributes)
)
on: true
as: :sportsbook_bet_legs
end
To utilize both queries together, we need to include the sportsbook_bet_legs_query
function as a subquery within the first, and wrap it in a where not exists
clause of the outer query.
Since the subquery uses a lateral join, we need to remember to join it back to the outer query using Ecto’s parent_as/1
function. Another example of this can be found in Ecto’s documentation for exists/1
as well (https://hexdocs.pm/ecto/Ecto.Query.API.html#exists/1)
def sportsbook_bet_query(user_id, region, amount_cents_wagered, event_ids) do
from action in Promotions.UserAction,
as: :action,
where: action.user_id == ^user_id,
where: action.region == ^region,
where: fragment("(?->?)::integer", action.sportsbook_bet_attributes, "amount_cents_wagered") >= ^amount_cents_wagered,
where:
not exists(
from [
leg_user_action: action_subquery,
sportsbook_bet_legs: sportsbook_bet_legs
] in event_ids_query(),
where: parent_as(:action).id == action_subquery.id,
where: sportsbook_bet_legs.event_id in ^event_ids
)
end
With the function now complete, it can be called by simply taking the earlier example and passing in the arguments to the reusable function:
user_id = "fcca980b-8edc-4848-91cc-4e00ff47019c"
region = :can_ontario
amount_cents_wagered = 1_500
event_ids = [
"e8a95b17-972c-4421-9090-8bf66c0804a9",
"5fbef652-e772-4bae-84f5-31255fe2fadd",
"dd92d86c-fe6d-43a7-8765-a3e1026145db"
]
ecto_query = sportsbook_bet_query(user_id, region, amount_cents_wagered, event_id)
matching_bets = Repo.all(ecto_query)
if not Enum.empty?(matching_bets) do
# user qualified for promotion
end
As a reminder, one of the goals of our function was to find a "user action" record that did not contain any of the values of the list of event_id
’s from the attributes of any of its array of "leg" data points. One benefit of structuring the query in the way that has been shown here is that if we want to flip it to find if one (or more) legs does include a particular event_id
, we can simply remove the not
from the not exists
condition without any other modifications.
The queries that we need in our application are more robust, including changing up the operators that we use instead of statically choosing "equals", "greater than or equals" and "not in" as outlined in our example, but from this template we were able to build out a library of robust and composable queries to scope results based on attributes both in more primitive data types within our database schema, as well as JSONB fields too.
I've set up a small-scale Elixir application which can be used to quickly scaffold a database with records, as seen in the format throughout this blog post, so you can test the queries seen within this article and tweak them as needed to experiment. Github Repository: Ecto JSONB Example.