Adobe Analytics: Pulling Large Data Sets
Adobe Analytics is primarily designed to present complex processed reports to analysts quickly—in as close to realtime as possible. To achieve this aim, our engineers made certain trade-offs that can make it more difficult for partners to do things with Adobe Analytics that are not in line with its primary design goals. For example, some of our partners want to export large data sets (1,000,000 + unique values) each day on a set schedule. (E.g., we have partners that want to upload a session ID to Adobe Analytics and then pull dozens of segments with those session IDs on a daily basis—for many Adobe Analytics customers, each of those segments will have 1,000,000 + unique values.) Below I’ll describe the various options for storing and pulling large data sets from Adobe Analytics and the limitations of each.
Storing the data
The two options that partners can use to store the data are eVars and Customer Attributes.
The variable that our partners generally use to store custom data in is called an “eVar” or a “custom conversion variable.” These variables have a limited number of unique values that they can take before Adobe Analytics begins filtering them. For most customers, filtering begins when the variable hits 500,000 unique values in a a given month and more aggressive filtering kicks in at 1,000,000 unique values. (See this documentation for details on how the filtering works.) The unique values are limited to ensure that reports can run in a timely manner (and most clickstream data doesn’t require a higher number of unique values—few sites have, for example, a million different page names).
The unique value limit is only in place for reports in the UI and the reporting APIs. If you pull data from Data Warehouse or the raw Data Feeds, that limit will not be in place.
Customer attributes are variables meant for storing demographic data about customers. Unlike eVars, they are not clickstream style variables: they are attached to a visitor ID and each time you upload a new value, it overwrites the previous value. So in our example from before, if a partner wanted to upload session IDs, you would only be able to store one session ID per visitor (whereas in an eVar you can store as many sessions per visitor as you want).
However, unlike eVars, Customer Attributes don’t have a unique value limit. You can pull reports with over a million values via the normal reporting interface and APIs. On the other hand, Customer Attributes are not available in Data Warehouse.
Retrieving the data
There are three main ways of getting data out of Adobe Analytics: Reporting APIs, Data Warehouse APIs, and Data Feeds.
The Adobe Analytics 2.0 reporting APIs power Analysis Workspace and you can generate some pretty sophisticated reports with them. They also tend to return results fairly quickly—usually just a couple of seconds. However, they have some limitations that are important for partners trying to pull large amounts of data. First, the reports are paginated and you can only pull 50,000 line items per page. So if you want to pull a million rows, you’ll need to make at least 20 requests. Second each Adobe Analytics org is rate limited. The default rate limit is 120 requests per minute. (The limit is enforced as 12 requests every 6 seconds). When rate limiting is being enforced you will get 429 HTTP response codes with the following response body: “error_code”:“429050”,“message”:“Too many requests” (See this page for more details.) Finally, if you’ve stored your data in an eVar, the maximum number of unique values you’ll be able to pull using this method is 1,000,000.
Data Warehouse is the current tool designed for pulling large datasets out of Adobe Analytics. This product documentation describes how Data Warehouse reports work. There are a couple of limitations for our partners to note:
- Not all variables are compatible with Data Warehouse. See this list for compatible and incompatible variables.
- Not all segment types are compatible with Data Warehouse. See this list of compatible and incompatible segments.
- Data Warehouse reports can be recurring but cannot be scheduled to send at a specific time. They are queued and sent when ready.
- Data Warehouse is not compatible with Adobe Analytics 2.0 APIs or the new Adobe Analytics backend engine and currently there are no plans to make it compatible. This is because the The Adobe Experience Platform is expected to take over the functionality of the Adobe Analytics Data Warehouse. So integrations build using the Data Warehouse APIs may need to be rewritten to use The Platform in a couple of years.
Partners can use either the 1.3 or 1.4 APIs to pull the Data Warehouse reports. The way they pull the reports are different: 1.3 works a lot like the Data Warehouse UI does whereas 1.4 makes Data Warehouse requests another report type in the general 1.4 reporting API and treats the reports like a hybrid of the Reports & Analytics paginated reports and the normal Data Warehouse reports. You’ll probably want to do some testing to see which type works better.
Data feeds are for partners that want to pull more raw, hit level data from Adobe Analytics. They are not processed in the same way that Adobe Analytics reports or Data Warehouse files are. You can’t use segments at all or view visit- or visitor-level data. The Data Feeds API is also fairly limited in what it would allow partners to do—you can’t find a list of processed jobs or see the error logs without using the UI. Because of this, Data Feeds don’t really work for most of our partners’ use cases. But if you think it might work for yours, check out this documentation:
In the future, these types of use cases will be handled by the Adobe Experience Platform. The ability to query large data sets and push large amounts of data into/pull it out of The Platform are some of the fundamental features that it’s been built to handle. Ideally, eventually all of our customers data—including Adobe Analytics data—will live in the platform so the limitations I’ve discussed in this article should no longer be an issue.