XiaLab Newsletter

Like many of you, I kept hearing the same promise: just hand your data to AI and let it run the analysis. So over the past few months I put it to the test, hard – pushing the latest large language models to see whether they could truly act as a reliable, autonomous co-pilot for complex omics analysis. I wanted the real answer, not the marketing.

The verdict is mixed.

These models are deeply impressive at high-level conceptual thinking. If you need to suggest a specific statistical method, explain a parameter, or brainstorm a biological mechanism, they are highly capable. But when it comes to the actual execution – planning the multiple, sequential, precise steps required to answer a real-world research question – AI fundamentally struggles. It misses the nuances. It takes quiet logical leaps.

Ironically, this multi-step strategic planning is exactly what a researcher needs the most.

In practice, I found myself watching the AI's output like a hawk, constantly auditing it to catch compounding methodological mistakes. My conclusion: it is simply too dangerous to let AI loose on a raw dataset on its own. Far from saving time, unguided AI analysis has proven to be incredibly costly – both in hours spent debugging and in the inherent risk to analytical integrity.

Grounding AI For Omics Data Analytics

If AI cannot reliably run a multi-step analysis on its own yet, we shouldn't force it to.

This is the motivation behind OmicsVerse: to provide grounding for omics analytics. In much the same way that retrieval-augmented generation (RAG) anchors a language model in trusted sources, OmicsVerse anchors AI in verified analytics – a library of predefined workflows, built on the community's best practices and the tools I have spent my career developing. The AI works within that frame, and every step stays open to inspect and verify in a conventional web interface.

That frees AI for what it is genuinely good at – and, in my experience, what that is depends on who you are.

If you are a researcher running an analysis, AI is most useful as a thinking partner: it helps you plan an approach, evaluate what the results are telling you, and interpret the biology – alongside the smaller conveniences of tidying a file format, refining a figure, or drafting a first report. It is a partner, never an autopilot: the verified workflow does the running, and you make the call.

If you are building the tools, as we are, AI earns its keep elsewhere – it is a genuinely strong coding partner, writing and debugging analysis code far faster than before. The same models, a different strength for a different role; OmicsVerse is the bridge that hands each side the part that actually works.

You should bring your own AI. Many of us already work with a preferred assistant – one that has learned our projects, our habits, the shape of our questions. Since analysis and interpretation are so context-specific, that familiar agent will usually serve you better than a general one wired into the platform. And it puts the token cost where it naturally belongs – with access to open-source AI models hosted locally.

The familiar worries about AI – where your data goes, what the token bills come to, whether you have the compute – need not be ours to settle for you. With local installation they are entirely in your hands: your data stays on your machine, you choose the model and set the budget, and everything runs on hardware you already own.

How OmicsVerse Works

From the data you bring in to the machine it runs on, here is how that balance plays out in practice.

Data is always the first, most frustrating hurdle. Wrangling heterogeneous omics files into shape is tedious. More importantly, keeping that data secure is non-negotiable. For clinical cohorts, patient samples, and unpublished findings, the casual advice to "just upload it to the cloud" is risky – and often violates ethics agreements, institutional policies, and privacy laws.

OmicsVerse can run entirely on your own machine, your data never has to leave your sight. It remains in standard folders that you own, control, and back up. Privacy isn't a toggle or a setting here; it is the fundamental architecture. The exact same boundary applies to the built-in AI helper: it only sees what is actively on your screen – never the wider internet, and never anyone else's files.

And getting started is just as effortless. You simply drop in your files, and OmicsVerse takes it from there – automatically detecting the data type and format, and reading your study design directly from the data. The tedious wrangling, and the manual remapping of groups and metadata, is handled for you before the analysis even begins.

From there, the workflow takes over – built on the trusted platforms that more than a million researchers already rely on: MetaboAnalyst, ExpressAnalyst, MicrobiomeAnalyst, and ProteoAnalyst.

To keep you in total control, you can work fluidly in two modes, switching between them at any moment:

Manual Mode Hands-on, tactile exploration where you control every single step.
Workflow Mode Automated execution where those exact same rigorous steps run for you.

In workflow mode, the repetitive paths researchers travel every day are captured as roughly 100 "atomic workflows" – small, rigorously verified building blocks you chain together to reach your objective. What sets them apart is a built-in intelligence for the seemingly "routine" decisions we normally make by hand, one at a time. A workflow can try every normalization and rank them by how cleanly your groups separate and how biologically consistent the result looks; sweep across every taxonomic rank rather than guessing one; and run the common methods for a task side by side, so you can see where they agree and where they diverge. The drudgery becomes a comparison you can actually review – in the form of a dashboard, a live report, or PowerPoint slides.

Drop Your Data

type, format & design auto-detected – with manual override

↓

~100 Atomic Workflows

LC-MS Spectra Processing GC-MS Spectra Processing RNA-seq Quantification Marker-Gene Profiling Shotgun Metagenomics Differential Test Enrichment … +90 more

↓

Select & Compose Workflows

based on your analysis objectives

Biomarker Discovery Pathway Analysis Multi-Omics Integration

↓

Dashboard · Live Report · Slides

every comparison, ready to review

↻

Refine, Iterate & Interpret

tune a step, ask the AI, draw your conclusions

The day-to-day reality follows that same path. You drop your data in, and the platform detects its type, format, and study design – correcting any guess by hand if it reads one wrong. You then select and compose the workflows that fit your objective, and the results come back as a dashboard, a live report, and presentation-ready slides. From there you refine, iterate, and interpret: tune a figure, ask the integrated AI to dig deeper into a method, or drop into manual mode instantly to inspect or alter a single step – until the conclusion is one you trust.

And all of this runs wherever you choose. Modern hardware is more than capable: many labs already have powerful workstations, and even a recent laptop with 64 GB of memory is, in my experience, enough for most omics workflows – provided we stay careful about memory and CPU. A single command brings the whole platform to life on the machine you already have, be it a lab server or your own laptop. No specialized supercomputer required.

Running locally means no waiting on slow uploads and no data on outside servers – and it behaves exactly like the web version, so you are never locked in.

The Options	What It's For	Where Your Data Goes
Trainingbook & webinars	Mastering the foundational concepts behind the workflows.	Nothing to upload.
Cloudomicsverse.com	Quick analysis of small to medium datasets – nothing to install.	Secure cloud servers – ideal for open, public data.
Dockeryour own computer	Large datasets and raw-data processing, end to end on your machine.	Stays entirely on your local machine.

These are not separate products – just one platform, run where it fits: the cloud for small to medium data, and your own machine for large or raw data.

The Journey So Far

January – establishing the foundational tools and the early vision of an automated pilot.
February – evaluating why deep biological context is absolutely essential to meaningful analysis.
March – bringing that context into physical form with the release of my printed text.
April – launching the unified platform that marries core analytics tools, scientific context, and AI.
May (Now) – an honest look at what AI can and cannot do, and why our workflows must stay grounded in human logic.

Where the Journey Leads

Choosing a pre-defined workflow is the reality today. What I am building toward next is genuine agency: a system that can look at your data, propose a logical analysis strategy, and draft a narrative report alongside your own custom AI agent.

And because AI has made us so much faster at building, we can now do something new on our side: tailor and ship a version of OmicsVerse shaped around a single lab's own data and workflows – not just configured for you, but built around how you actually work. If that would help your team, I would be glad to talk it through.

But the core rule will never change: every automated step will always stay wide open for manual inspection, and the final judgment remains yours. Science has long imagined an automated assistant for discovery. The way to get there is by grounding it in verified, reliable workflows – not by trusting AI to improvise. At least, not for now.

What's Coming

All of this is far easier to see than to read about. So before anything else, I would like to show you – live. We are running two free 90-minute sessions this summer: a hands-on demonstration of OmicsVerse, from raw data to a finished report, followed by an open Q&A where you can ask me anything. It is the same session offered at two different times, so wherever you are in the world, one should fit your schedule.

Two Free Sessions – Pick Your Time Zone

Friday, July 10 · 8:00–9:30 am Montreal time Timed for Asia and Europe – 8:00 pm Beijing, 2:00 pm Berlin, 8:00 am New York. A live OmicsVerse demonstration, from raw data to a finished report, with an open Q&A.
Friday, August 7 · 3:00–4:30 pm Montreal time Timed for the Americas and Europe – 12:00 pm Los Angeles, 3:00 pm New York, 9:00 pm Berlin. The same demonstration and open Q&A.

Going Deeper

Multi-Omics Bootcamp · August 18–22 A five-day immersive on practical multi-omics integration and AI agents – from raw data to biological insight, running the whole workflow on your own computer. Seats are strictly limited.
Weekly Fall Course · Saturday mornings, September to November Concise deep dives through individual workflows, each mapped to a chapter of Omics Data Science. See the Training Page.

Whichever path fits, I hope you will join one of the two free sessions. It is the easiest way to see whether OmicsVerse belongs in your own work, and the best chance for me to hear what you actually need. I would genuinely love to see you there.

Best regards, Jianguo (Jeff) Xia, PhD