Data model

Last updated:

|Edit this page

PostHog data flow

PostHog’s data model starts with events, single actions that a user triggered at a specific point in time. These are sent either from one of our SDKs or directly via our API.

Events are flexible: they can be captured automatically, via autocapture, or you can emit your own custom events, attaching additional metadata via properties.

You might create an event to represent purchasing an upgrade, with custom properties like price or renewal_period.

Meanwhile, users of your product are given a person profile, which gather these events. Person profiles similarly contain properties. Some are set automatically:

  • browser details
  • geo IP data
  • referrers
  • UTM values

You can also set your own properties on person profiles, which will appear in reports and data tables.

If a user upgrades to a paid tier, for example, you could set a property called paid_tier with the details.

Person profiles need distinct identifiers, so PostHog can accurately track behavior. You might see a few identifiers on each profile: anonymous IDs created before a user has been identified, an ID you set after they log in, and IDs that are created on the client and backend, later merged together into a single profile.

Further reading:

Discovering activity

You can create ongoing queries to surface person profiles either according to their properties, or the details of their events. We call these cohorts. If you want to see a list of every user in your paid tier, you could query for all profiles where that paid_tier property has been set. Your cohort would then show you a periodically-updated listing of your paid customers.

Alternatively, you might want to understand group behavior. By defining groups, you can see a cross-section of events across multiple person profiles. This can be helpful if you’re selling to multi-seat customers, and want to understand the overall behavior of their users.

Groups require that you have the Group analytics add-on and that you enable person profiles.

Event fields

Each event contains the following base fields within ClickHouse:

ColumnTypeDescription
uuidUUIDID of the event
team_idInt64Foreign key which links to the team
eventVARCHARName of the event
distinct_idVARCHARThe unique or anonymous ID of the user that triggered the event.
propertiesVARCHARAny key: value pairs in a dict.
- $current_url - we use this in a couple of places (like /paths, /events) as the URL the user was visiting at that time.
elements_*VariousColumns used for $autocapture to track which DOM element was clicked on
timestampDateTime64(6, 'UTC')Defaults to timezone.now at ingestion time if not set
created_atDateTime64(6, 'UTC')The timestamp for when the event was ingested
person_idUUIDThis is the id of the Person that sent this event
person_created_atDateTime64(3)The timestamp of the earliest event associated with this person
person_propertiesVARCHARA JSON object with all the properties for a user, which can be altered using the $set, $set_once, and $unset arguments
group*VariousColumns used for group analytics

Events are only stored within ClickHouse, and once they have been written they can't be changed. This limitation comes from a trade-off in the design of ClickHouse: inserting data and running queries on large tables is extremely fast, but updating or deleting specific rows is generally not efficient.

Person fields

Each person contains the following base fields within PostgreSQL:

ColumnTypeDescription
idintegerSequential ID for the person
team_idintegerForeign key which links to the team
uuidUUIDUUID of the person within ClickHouse. This is referenced by the person_id field on events
created_attimestamptzThe timestamp of the earliest event associated with this person
propertiesjsonbA JSON object with all the properties for a user, which can be altered using the $set, $set_once, and $unset arguments
versionbigintIncremented every time a person is updated. Helps to keep ClickHouse and PostgreSQL in sync.

Persons are stored in PostgreSQL but are additionally replicated into ClickHouse for certain queries. For example, when viewing the global list of persons from the dashboard, this information is retrieved from ClickHouse.

Person properties are also stored directly on each event. Their value is determined during ingestion by looking up the person who sent the event in PostgreSQL and combining these values with any updates from the event itself.

The properties field on each person object can be updated at any time. As a result, the PostgreSQL table represents the one source of truth for the most up-to-date values for the properties of a person.

Questions?

Was this page useful?

Next article

Ingestion pipeline

In simple terms, the ingestion pipeline is a collection of services which listen for events as they are sent in, processed, and stored for later analysis. Capture API The Capture API represents the user-facing side of the ingestion pipeline and is exposed as API routes where events can be sent. Before an event reaches the ingestion pipeline, there are a couple of preliminary checks and actions that we perform so that we can return a response immediately to the client: Decompressing and…

Read next article