Connector & Integration
Bridging Your Critical Research Systems 65% real-world coverage on day one. InSplice will target 4 key connectors for its MVP to streamline common clinical, lab, and genomics data flows from launch - with plenty more connectors on the roadmap. Medidata Rave The #1 EDC platform used in over 50% of U.S. drug trials, ensuring your clinical data is instantly unified and inspection-ready. Key benefit: Accelerate site setup, amendments, and analyses without custom coding or manual exports. Thermo Fisher SampleManager The most widely adopted LIMS in life sciences, streamlining lab results and sample tracking data for downstream workflows. Key benefit: Clean, traceable, and governed lab data ready for regulatory review or AI training. AWS S3 Industry-standard storage powering petabyte-scale genomics and research pipelines across biotech and pharma. Key benefit: Stream large files directly into a unified data layer without custom glue code or compliance gaps. Oracle Clinical One The main EDC alternative to Rave, prevalent in top pharma and cross-border clinical trials. Key benefit: Never say "no" to a new sponsor or CRO due to incompatible systems, signaling enterprise readiness.
Schema, Metadata, & Governance
Building the Foundation for Trusted Data InSplice's Schema & Metadata Management provides a single source of truth for data definitions. This enables robust validation, evolution, and auditability through centralized schema registry and AI-powered automation. Central Schema Registry Supporting multiple dialects (JSON Schema, XML XSD, Avro, Protobuf), our registry normalizes definitions into a unified model while preserving original specifications. Each schema receives unique identifiers with semantic versioning and compatibility rules to prevent breaking changes. Ingest & Normalize Workflow Our system will automatically detect schema dialects, converts non-JSON schemas into internal JSON-schema format, and pre-generates validators to eliminate per-message parsing overhead. Schema Relationships & Lineage We will index and traverse references to build complete dependency maps of complex schemas, enabling impact analysis to determine which pipelines or models would be affected by schema changes. Governance Hooks Custom annotations for fields (PII, PHI, Calculated) drive tokenization and access-control policies, while embedded consent-status and retention-period metadata ensure compliance with regulations. AI-Enabled Schema Detection & Enrichment Our Functions Layer will leverage machine-learning techniques to automate schema discovery, classification, and quality control—transforming raw data samples into fully annotated metadata definitions. Automated Field Inference ML models analyze value patterns and context to propose field names, data types, and mappings to industry vocabularies like LOINC and SNOMED. Sensitivity Detection Functions automatically scan for HIPAA/HITRUST-sensitive elements and flag them with appropriate PHI tags to drive downstream security and compliance controls. Continuous Learning User corrections feed back into the model pipeline, improving future inference accuracy through a feedback loop of refinements. Pipeline Integration Enriched schemas propagate instantly to downstream validation stages, ensuring events are tagged, checked, and routed correctly.
Streaming Data to Functions
Core Vision The planned InSplice Functions System will enable researchers and data engineers to process live data from any source—without proprietary storage lock-in. This MVP roadmap outlines our technical approach. How Functions Will Work Each InSplice "function" operates as a stateless code block running in containerized environments. These functions execute when triggered by streaming data events. Functions will support Python, TypeScript/Node, and Go. They'll process event payloads, perform transformations, and emit outputs based on user-defined logic. User-Controlled Data Flow Users maintain full control of data storage destinations. Our "bring your own datastore" approach prevents vendor lock-in while providing execution environments and event delivery. Scaling Architecture The system enforces configurable concurrency limits per function. Our trigger router operates as a stateless service—simple for early deployments but shardable as usage grows. Transparent Development This roadmap represents our MVP vision. We're seeking design partners to provide input as we build InSplice, ensuring it meets real-world research and data engineering needs.
Agent Studio
Architecture & Technical Foundations Core Architecture Agent Studio will function as a modular orchestration platform connecting to life science databases through standardized, policy-controlled connectors. Key components include: Secure database connectors supporting relational, NoSQL, file repositories, and specialized clinical/genomic systems Centralized credential management ensuring data remains under customer control Automated data profiling that infers structure, content types, and relationships Parallel distributed processing infrastructure for handling high volume data Optional vector embedding for unstructured text to enable semantic query Advanced Reasoning System The platform's differentiating feature will be its multi-agent orchestration layer that transforms database connections into a dynamic research environment: Query decomposition into retrieval and reasoning steps Self-prompting and multi-agent reasoning patterns that simulate expert collaboration Hypothesis generation and verification against original data sources Transparent, citation-supported answers with explicit sourcing Error handling, ambiguity flagging, and edge case identification No LLM Retraining Required Unlike systems that need custom model training for each data source, Agent Studio will use dynamic retrieval and reasoning to adapt to any database structure or content type without model retraining. Enterprise-Grade Security All data processing will occur within customer-controlled environments, with credentials and sensitive information never leaving organizational boundaries. Compliance with HIPAA, GDPR, and other regulations will be built-in. Reasoning Similar to Human Experts By implementing collaborative multi-agent architecture, we aim to achieve research-grade analysis that mirrors how expert teams approach complex scientific questions with rigor and methodical validation. Agent Studio represents our technical vision for transforming life science data access and analysis. By focusing on secure connections to existing databases and employing sophisticated reasoning rather than simplistic RAG patterns, we're building a platform that will make specialized life science knowledge as accessible as web search, while maintaining the rigor required for scientific and clinical applications.