Visual Intelligence & Computer Use

Machines that
see and act

Proteus is an AI research lab building the next generation of computer use agents, combining fast visual understanding, efficient context compression, and frontier reasoning to let machines operate any device, any interface, autonomously.

Reasoning has
outpaced vision

AI can reason and write code better than most humans. But it still can't reliably use a computer. Visual understanding is the bottleneck, slow image encodings, bloated context windows, and no ability to act continuously over long horizons. We're fixing that.

01 — Fast perception

Small models, rapid action

Fine-tuned compact models for instant visual grounding and action prediction, paired with large VLMs that handle high-level planning and reasoning when it matters.

02 — Efficient encoding

Efficient representations

We use fast OCR, segmentation, and video models to build optimized visual representations that eliminate the redundancy of feeding raw screenshots into frontier models, faster inference, longer horizons, lower cost.

03 — Continuous operation

Long horizon planning

Agents that run autonomously across diverse, long-horizon workflows, navigating real interfaces on desktop, mobile, and the web without human hand-holding.

Computer use, anywhere

Our agents operate across platforms, desktop, web, and mobile, including iPhone and Android. Here's a preview of autonomous operation with real-time visual understanding.

Android & iPhone agent · Live demo coming soon

Autonomous mobile agents

Rapid computer use on Android and iPhone. The agent sees the screen, reasons about what to do, and takes action, navigating apps, filling forms, and completing multi-step tasks with no predefined scripts.

Powered by compact vision models for fast grounding and frontier LLMs for planning, with custom optimizations for efficient visual context that keep latency low and accuracy high.

Windows

Native computer use on Windows desktops. Operates Win32 and UWP apps, manages files, runs multi-step workflows across the OS.

Ubuntu

Full desktop automation on Ubuntu Linux. Navigates GNOME, terminal, and GUI apps with the same visual understanding pipeline.

macOS

Seamless computer use on macOS. Controls native apps, Finder, and system interfaces through real-time screen understanding.

Deep expertise in AI

Select papers our team members published prior to Proteus.

We also contributed to work on decoding visual imagery via fNIRS, adversarial examples, and reinforcement learning.

Our team comes from

Let's build the
agentic layer

We partner with teams building computer use agents, visual AI infrastructure, and autonomous systems that need to see and act in the real world.