Amazon S3
Sync Expectations
The following table describes the sync behavior of Fullstory events to Amazon S3 in parquet files.
Sync Interval | Relevant Timestamps | Setup Guide |
---|---|---|
Approximately processed_time rounded to the next hour + 1 hour | event_time ingested_time processed_time updated_time | Help Doc |
Events Parquet File Schema
The following table describes the schema for the parquet files containing Fullstory events.
Field | Type | Nullable | Description |
---|---|---|---|
event_id | string | N | The unique identifier for the event. |
event_time | string | Y | The time in UTC that the event occurred, formatted as yyyy-MM-ddTHH:mm:ss.SSSSSSZ . |
processed_time | string | N | The time in UTC that the event was packaged in the parquet file, formatted as yyyy-MM-ddTHH:mm:ss.SSSSSSZ . |
updated_time | long | Y | If set, the time when the parquet file was recreated represented by the number of milliseconds since the epoch. |
device_id | long | N | The device ID as defined in the base event model. |
session_id | long | Y | The session ID as defined in the base event model. |
view_id | long | Y | The view ID as defined in the base event model. |
event_type | string | N | The type of this event. |
event_properties | string | N | A json string containing the associated event properties. |
source_type | string | N | The source type of the event. |
source_properties | string | N | A json string containing the associated source properties. |
Events Parquet File Path
The path is following the hive partitioning layout (key=value, supported by BigQuery, etc.) that contains an ingested_time
column:
fullstory_<org_id>/events/ingested_time=yyyy-MM-dd HH:mm:ss/<file_name>.parquet
.
The ingested_time
column is the formatted time in UTC that the events are ingested by Fullstory's servers, truncated to seconds.
Note that the ingested time does not imply the range of the event time in a parquet file, use a larger range for the ingested time to make sure the target event time range is included in a query.
Querying as an External Table
The parquet files containing events can be queried as an external table. For example, an Athena external table can be created using:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name (
event_id string,
event_time string,
processed_time string,
updated_time bigint,
device_id bigint,
session_id bigint,
view_id bigint,
event_type string,
event_properties string,
source_type string,
source_properties string)
PARTITIONED BY(ingested_time string)
STORED AS PARQUET
LOCATION 's3://bucket/prefix/fullstory_<org_id>/events/'
tblproperties ("parquet.compress"="ZSTD");
Before querying, add the partitions using (refer to the Athena documentation for more ways to add partitions):
ALTER TABLE table_name ADD PARTITION (ingested_time='2024-01-01 01:00:00')
LOCATION 's3://bucket/prefix/fullstory_<org_id>/events/ingested_time=2024-01-01 01:00:00/';
The following example shows how to count the number of rage clicks broken down by browser and URL for a single day.
SELECT
json_extract_scalar(source_properties, '$.user_agent.browser') AS browser,
json_extract_scalar(source_properties, '$.url.path') AS path,
COUNT(1) AS rage_clicks
FROM table_name
WHERE
ingested_time BETWEEN '2024-01-01 00:00:00' AND '2024-01-03 00:00:00'
AND event_time >= '2024-01-01T00:00:00.000000Z' AND event_time <= '2024-01-01T99:99:99.999999Z'
AND event_type = 'click'
AND CAST(json_extract_scalar(event_properties, '$.fs_rage_count') AS INTEGER) > 0
GROUP BY
1,
2
ORDER BY
rage_clicks DESC;