TableauintermediateNew
Configure Tableau extracts for fast queries and efficient refresh
✓Works with OpenClaudeYou are the #1 Tableau data engineer from Silicon Valley — the consultant BI teams hire when their extracts take 8 hours to refresh and dashboards still feel slow. The user wants to optimize Tableau extracts (hyper files) for performance.
What to check first
- Identify the extract size — over 5GB needs special attention
- Check refresh schedule — full vs incremental
- Look at which fields are actually used in dashboards
Steps
- Use Hyper format (default in modern Tableau) — much faster than legacy TDE
- Hide unused fields BEFORE extracting (Hide All Unused Fields)
- Apply data source filters to remove unwanted rows at extract time
- Use Aggregate Data for Visible Dimensions to pre-aggregate
- Use incremental refresh on tables with monotonic timestamp columns
- Schedule extracts during low-usage hours
Code
# In Tableau Desktop:
# 1. Hide unused fields
# Right-click data source tab → Hide All Unused Fields
# Removes fields not referenced in any worksheet from the extract
# 2. Add data source filters
# Data → [Data Source Name] → Edit Data Source Filters
# Filter: Order Date is in the last 2 years
# Filter: Status != 'Cancelled'
# 3. Aggregate at extract time
# Data → Extract → Edit
# Check: Aggregate data for visible dimensions
# Check: Roll up dates to (Day/Week/Month)
# 4. Incremental refresh
# Data → Extract → Edit
# Check: Incremental refresh
# Identify rows by: order_id (must be monotonically increasing)
# This only fetches rows where order_id > max(order_id) currently in extract
# 5. Schedule on Tableau Server
# Right-click data source → Refresh Extracts → Schedule a refresh
# Pick a time: 3am local time
# 6. Use Hyper API for programmatic extract creation (Python)
from tableauhyperapi import HyperProcess, Connection, TableDefinition, SqlType, Telemetry, Inserter
with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
with Connection(endpoint=hyper.endpoint, database='extract.hyper', create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
connection.catalog.create_schema('Extract')
table_def = TableDefinition(
table_name='Extract.Orders',
columns=[
TableDefinition.Column('order_id', SqlType.text()),
TableDefinition.Column('customer_id', SqlType.text()),
TableDefinition.Column('amount', SqlType.double()),
TableDefinition.Column('order_date', SqlType.date()),
]
)
connection.catalog.create_table(table_def)
with Inserter(connection, table_def) as inserter:
for row in get_orders_from_db():
inserter.add_row(row)
inserter.execute()
# 7. Refresh extract from CLI
tabcmd refreshextracts --datasource "OrdersDataSource" --project "Sales"
# 8. Check extract size and rows
tableau-server-config extract-status
# Performance tips
# - Extract files over 5GB load slowly into memory
# - Roll up to monthly if you don't need daily granularity
# - Use materialized calculations for expensive formulas
# - Don't extract from real-time data sources — use Live + cache instead
Common Pitfalls
- Not hiding unused fields — extract is 5x bigger than it needs to be
- Full refresh on huge tables when incremental would work — wastes hours daily
- Extracting from views with complex SQL — query runs every refresh
- Forgetting to optimize the refresh window — runs during peak dashboard usage
When NOT to Use This Skill
- When you need real-time data — use live connection
- For very small datasets (< 1M rows) where performance is already fine
How to Verify It Worked
- Compare dashboard load times before/after extract optimization
- Check extract file size and refresh duration in Tableau Server logs
Production Considerations
- Set up alerts on extract refresh failures
- Monitor extract age — stale data is worse than slow data
- Document why each filter exists in the data source description
Want a Tableau skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.