Robots Can See.
They Can’t Feel.
Vision-only policies plateau on contact-rich tasks. Grasping a sponge, cracking an egg, or gripping a screwdriver requires tactile signal that cameras cannot capture.
Vision Alone Plateaus
VLA models trained on vision-only data fail at force-sensitive tasks. Deformable grasping, fragile handling, slip detection — all require tactile signal that cameras cannot capture.
Tactile Data Doesn’t Exist
There is no ImageNet for touch. No public dataset pairs high-resolution gel-sensor contact images with synchronized RGB-D video across diverse object categories.
Sim-to-Real Fails for Contact
Simulated tactile data cannot replicate gel deformation, real surface friction, or material compliance. Contact-rich manipulation policies must train on physical interaction data.
Vision + Touch.
Time-Synced.
Every grasp episode captures RGB-D video and high-resolution tactile signals simultaneously. Force feedback from the operator’s Meta Quest controller is recorded alongside. MCAP format, annotated, model-ready.
RGB-D + Point Clouds
- Up to 30 fps color + depth
- Per-frame 3D point cloud
- Multi-angle camera coverage
- Depth-aligned RGB frames
GelSight Gel + Force Sensors
- High-res contact geometry (GelSight)
- Normal + shear force
- Surface texture imprint
- Slip / contact event detection
Tactile + Force sensors.
Operator Haptic Signal
- Real-time force feedback to Meta Quest
- Grip force modulation signal
- Contact / release event timestamps
- Bilateral force mapping
UR-5 Joint States + EE Pose
- 6-DoF end-effector pose
- Full joint angle trajectory
- Gripper aperture sequence
- NL task annotation per episode
All streams recorded simultaneously. Sync jitter <10ms. Operator-in-the-loop ensures natural, diverse manipulation strategies.
How We Collect
Trained operators teleoperate UR-5 arms via Meta Quest 3. Tactile sensors and force sensors capture every contact. Force feedback flows back in real-time.
Teleop
Meta Quest 3 headset controls a UR-5 arm with parallel-jaw gripper. GelSight + TouchTac sensors on fingertips. Force feedback to operator hands for natural grasp modulation.
Record
All streams captured simultaneously: RGB-D video, GelSight contact images, force/torque, joint states, EE pose. Sync jitter <10ms. Packaged as MCAP.
Annotate
Professional annotators label every episode: NL task description, grasp outcome (success / partial / fail), object ID, material class, grip type, force profile tags. Delivered model-ready.
Human teleoperation + gel sensors = the richest manipulation signal available. No simulation. No autonomy artifacts. Natural, diverse grasp strategies on real objects.
GelSight contact · real sensor outputWhat the sensor sees
Every frame is real sensor data from a real arm grasping a real object. The GelSight image above shows actual contact geometry captured during a grasp episode.
50+ Objects.
6 Material Categories.
Every object chosen to elicit a specific tactile signal. YCB-aligned where applicable.
Rigid Household
Standard benchmarks. Known ground-truth grasps.
Deformable
Compress under grasp. Requires real-time force adaptation.
Textured Surfaces
Core material classification benchmark.
Granular / Filled
Contents shift during grasp. Dynamic tactile sensing.
Tools & Handles
Grip point matters, not just “hold it.”
Fragile / Thin
Learn minimum viable grasp force.
Plug Into Your Training Pipeline
Data ships as MCAP with per-frame annotations. Compatible with every major VLA framework. Drop into your existing PyTorch / JAX dataloader with zero preprocessing.
Frequently Asked Questions
Sensors, collection method, objects, annotation, and how to start a pilot.
GelSight gel sensors for high-resolution contact geometry, TouchTac tactile pads, and force sensors for normal and shear force. All mounted on UR-5 arms with parallel-jaw grippers.
Trained operators wear Meta Quest 3 headsets and teleoperate UR-5 arms in real-time. Force feedback flows back to the controller so operators modulate grip naturally. Every session is recorded at <10ms sync across all streams.
50+ objects across 6 material categories: rigid household (YCB-aligned), deformable, textured surfaces, granular/filled, tools & handles, and fragile/thin. Each object is chosen to elicit specific tactile signals.
Professional annotators label every episode: natural language task description, grasp outcome (success/partial/fail), object ID, material class, grip type, and force profile tags. Delivered in MCAP with per-frame labels.
Compatible with OpenVLA, LeRobot, and custom PyTorch/JAX dataloaders. MCAP format with NL annotations. No preprocessing required.
Email hello@physicalaidata.co or fill out the contact form. We’ll scope a pilot around your target objects and manipulation tasks, collect data, annotate, and deliver within weeks.
Start a pilot with us
Tell us your target objects and manipulation tasks. We’ll scope a collection pilot and deliver annotated visuo-tactile data within weeks.
Email or form. We respond within 24 hours.
