Anywhere: Warehouse
Anywhere: Warehouse enables seamless integration with your data infrastructure, providing flexible access to Fullstory's comprehensive user experience data, offering two primary methods for accessing your data:
- Raw Data: normalized event data without transformations.
- Ready to Analyze Views: pre-transformed data ready for analysis.
This guide provides an overview and comparison of the two primary data models available in Fullstory's Anywhere Warehouse offering: Raw Data and Ready-to-Analyze Views.
Destinations
The following destinations are available for Warehouse:
Destination | Raw Data | Ready to Analyze Views |
---|---|---|
Amazon Redshift | Amazon Redshift Views | |
Amazon S3 | Amazon S3 Raw Data | |
Azure Blob Storage | Azure Blob Storage Raw Data | |
BigQuery | BigQuery Views | |
Google Cloud Storage | Google Cloud Storage Raw Data | |
Snowflake | Snowflake Views |
For detailed instructions on configuring each destination, visit the specific destination documentation above.
Comparing Raw Data and Ready to Analyze Views
Raw Data | Ready to Analyze Views | |
---|---|---|
Data Model | Single raw file format for events plus defined objects | Organized into 30+ Fact, Dimension, and Sub-dimension tables |
Query Cost | More expensive to query | Less expensive to query due to pre-transformation |
Readability | Less readable, requires more transformation | More readable with neat folders of core event and user data |
Data Duplication | Possibility of duplicate events (see below) | Built-in de-duplication |
Transformation Required | Extensive transformation needed by data engineers | Pre-transformed tables ready for analysis |
Use Case Focus | Raw data access | Business intelligence and analytics |
Sync Expectations
Anywhere: Warehouse syncs with destinations on an hourly interval. See each destination's documentation page for details.
Important Timestamps
There are three timestamp fields that are relevant for destinations: event_time
, processed_time
, and updated_time
.
event_time
is an immutable field that records the timestamp of each event according to the user's device.processed_time
is a timestamp field indicating when the event was processed by Fullstory's servers. On average, 95% of events are captured and processed within 20 minutes of the original event time. Several events, including server side events, may reach Fullstory's servers much later than the original event time. Depending on the contents of these late events, Fullstory may need to reprocess them to report on the most accurate metrics, including session length, page active time, etc. In these scenarios, theprocessed_time
for all events in the session or on the page will be updated and the events will be re-synced to the warehouse.updated_time
indicates when a record was last modified (inserted or updated) and is populated by a built-in function in the warehouse during the sync. This field tracks the last time a record in your warehouse was changed and can be used as a filter to determine which new records to pull into your query.
Sync Latency
Sync latency is a concept that tracks the cadence with which Fullstory syncs new events to the destination.
Captured events flow into Fullstory constantly, then are processed on a defined cadence to be sent to the destination.
The interval is dependent on processed_time
, which indicates when our servers processed the event for the particular destination.
For example, when syncing to Snowflake, the sync latency is
Approximately
processed_time
rounded to the next hour + 1 hour.
This is because Fullstory writes data to a file based on the processed_time
, with each file containing events that
were processed in a given hour. The file is then merged into the database table within the next hour. This timing is approximate
because syncing to the destination depends on how large the file is and how much compute is available to run the merge.
Duplicate events
We guarantee at least one delivery for each event. For Raw Data destinations, events may be duplicated in your warehouse (less than 1%).