DISK

Automate the hypothesize-test-evaluate discovery cycle
Visit
0
0
Votes By Price Discipline Year Launched
DISK OPENS SOURCE Interdisciplinary
Description
Features
Offers
Reviews

DISK is an open-source research framework designed to automate the cycle of hypothesis generation, testing, evaluation and revision by analysing large, growing scientific data repositories. The project’s website describes it as a “novel framework to test and revise hypotheses based on automatic analysis of scientific data repositories that grow over time.” It acts as a meta-workflow engine that monitors data influx, applies test workflows, tracks provenance of results and suggests refined hypotheses when the data or context changes.

Who it serves & how
This tool is aimed at researchers, data scientists and domain experts who deal with dynamic, large-scale data — for example multi-omics cancer datasets, climate proxy time series, or other domains where data accumulates continuously. For a researcher, DISK can help by:

  • accepting a hypothesis (e.g., “Gene X is over-expressed in condition Y”) and automatically seeking relevant data sets, executing the test, and returning results, 
  • Refining or triggering new analyses with new data, revisiting prior hypotheses, 
  • providing transparent provenance: tracking how a hypothesis was modified, what data supported the change, what workflows were used.
    It thus supports exploratory and adaptive science workflows rather than one-off static analyses.

Key features & value

  • Hypothesis-driven automation: Unlike standard pipelines that run once, DISK continuously monitors data growth and triggers re-analysis or new hypotheses.
  • Domain-agnostic portals: The project provides “portals” configured for specific domains (e.g., a “Climate DISK” portal for paleoclimate data via the LinkedEarth platform, a “NeuroDISK” portal for neuroscience data) which illustrate how the framework adapts to different scientific fields. 
  • Provenance recording: All steps in the hypothesis-test-revise loop are logged and traceable, enhancing transparency and reproducibility.
  • Open source: The source code is available for developers and labs to adapt the framework to their own data streams and hypotheses. 

Considerations

  • The system is designed for large and evolving datasets—including “data that grows over time”—so its utility is highest when one has an ongoing data-acquisition workflow rather than a static snapshot. 
  • Domain-specific setup is required: configuring data-sources, defining hypothesis templates, integrating workflows and ensuring provenance capture demands infrastructure and expertise.
  • The framework supports hypothesis revision, but the quality of the output still depends on the design of the hypothesis-test workflows and the interpretability of results—so it is a tool for augmentation, not replacement of domain-expert judgement.
Discover Data, Data Analysis