# Building a Connector to Any External System We've built connectors to ERP systems, IoT sensor platforms, payment processors, shipping APIs, and a Dutch flower auction system that communicates via SOAP. Each connector is different in the details but identical in the architecture. Here's the pattern that makes connector development predictable. ## The Connector Architecture Every connector follows the same four-layer architecture: ### Layer 1: Authentication How you connect to the external system. OAuth2, API keys, basic auth, client certificates — each system has its own approach. The auth layer handles token management, refresh flows, and credential storage. ### Layer 2: Transport How data moves between systems. REST calls, GraphQL queries, SOAP envelopes, webhook receivers, or file-based exchange (SFTP, S3). The transport layer abstracts the communication protocol. ### Layer 3: Transformation How external data maps to your internal schema. Field mapping, type conversion, unit conversion, and data enrichment. The transformation layer converts foreign data into your canonical model. ### Layer 4: Sync State How you track what's been synced. Cursors, timestamps, change tokens, and record-level sync status. The sync state layer enables incremental syncs and error recovery. ## Auth Pattern Implementations ### OAuth2 (Most Common for Modern APIs) The standard flow: redirect user to the external service's auth page, receive a callback with an authorization code, exchange it for access and refresh tokens, store tokens securely, use access token for API calls, refresh when expired. Key implementation details: - Store tokens encrypted at rest - Implement automatic token refresh 5 minutes before expiry - Handle token revocation (user disconnects the service) - Support multiple concurrent connections to the same service ### API Key Authentication Simpler but requires secure storage. The API key is sent with every request, typically as a header or query parameter. Store keys encrypted, never log them, and provide a way for users to rotate keys without downtime. ### Certificate-Based Auth Common in enterprise and financial systems. Mutual TLS (mTLS) requires both sides to present certificates. Store certificates and private keys in a secrets manager, not in the database. ## Sync Strategy Patterns ### Full Sync Pull all records from the external system every time. Simple but inefficient. Only appropriate for small datasets (under 10,000 records) or systems that don't support incremental queries. ### Timestamp-Based Incremental Sync Query the external API for records modified since the last sync timestamp. Efficient and widely supported. Gotcha: clock skew between systems can cause records to be missed. Solution: overlap the query window by 5 minutes. ### Cursor-Based Incremental Sync The external API provides a cursor token that represents the sync position. Each sync request passes the cursor and receives a new cursor with the results. More reliable than timestamps but requires cursor storage. ### Event-Driven (Webhooks) The external system pushes changes to your webhook endpoint as they occur. Near-real-time sync with minimal API calls. But you need to handle webhook delivery failures, out-of-order events, and duplicate deliveries. Best practice: use webhooks as the primary sync mechanism with a periodic full sync as a safety net to catch missed events. ## Data Transformation: The Hard Part Mapping fields between systems is where most connector complexity lives. ### Type Coercion The external system returns dates as strings ("2026-02-25"), your system expects ISO 8601 timestamps (2026-02-25T00:00:00Z). Currencies might be integers (cents) or floats (euros). Boolean fields might be "Y"/"N", 1/0, or true/false. Build a type coercion library for your connector framework. Define conversion rules once, apply them across all connectors. ### Schema Mapping Configuration Hard-coding field mappings in code is fast but inflexible. A configuration-driven approach lets administrators adjust mappings without code changes: Source field → transformation rule → target field. For example: "price_incl_vat" → divide by 1.21 → "price_excl_vat". ### Handling Missing Data External systems return null, empty strings, or simply omit fields. Your connector must decide: is null meaningful (clear the field) or is it absent (keep the existing value)? Document this decision per field. ## Error Handling in Production ### Transient Errors (Retry) Network timeouts, 503 Service Unavailable, rate limit 429 responses. Retry with exponential backoff: 1s, 2s, 4s, 8s, up to a maximum of 5 minutes. Most transient errors resolve within the first three retries. ### Permanent Errors (Log and Skip) Invalid data (wrong type, missing required field), authentication failures, resource not found (404). Log the error with full context, skip the record, continue with the next one. Alert if skip count exceeds a threshold. ### Partial Failures A batch of 1,000 records where 3 fail. Don't roll back the entire batch — commit the 997 successful records and queue the 3 failures for retry or manual review. ### Circuit Breaker If the external system returns errors on 50%+ of requests, stop making requests temporarily. A circuit breaker prevents wasting resources (and rate limit budget) on a system that's clearly down. Retry after a cooldown period. ## Monitoring Your Connectors Essential metrics for connector health: - **Records synced per interval:** Should be consistent. A sudden drop to zero indicates a problem. - **Error rate:** Percentage of records that failed to sync. Normal: under 1%. Warning: 1-5%. Critical: over 5%. - **Sync latency:** Time between a change in the external system and its reflection in your system. - **API quota usage:** How much of your rate limit budget has been consumed. - **Connector uptime:** Is the connector running? When was the last successful sync? ## The Connector Template After building 15+ connectors, we've standardized on a template that new connectors implement: 1. **Config schema:** What configuration does this connector need? (API key, URL, sync frequency) 2. **Auth handler:** How does this connector authenticate? 3. **Sync handler:** How does this connector fetch data incrementally? 4. **Transformer:** How does this connector map external data to internal models? 5. **Event handler:** How does this connector process incoming webhooks? New connectors typically take 3-5 days from start to production-ready, because the template handles all the infrastructure — authentication management, retry logic, error handling, monitoring — and the developer focuses solely on the external system's API specifics. ## Getting Started Pick the external system your organization most needs to connect. Read their API documentation (all of it, not just the getting-started guide). Map the data model. Then implement the four layers: auth, transport, transformation, sync state. The first connector is the hardest. Each subsequent one is faster, because the patterns are now familiar and the infrastructure is reusable.