Data Collection
OpenArm is designed as a data-native platform. This guide covers everything from wiring cameras to recording episodes in LeRobot format, with quality checks at every stage.
Hardware Connections for Recording
Data collection requires more hardware than basic arm control. This section covers what to connect and where before you start LeRobot.
Wrist Camera
Mount a USB webcam or Intel RealSense D435i to the end-effector flange. Connect via USB 3.0 for 30+ fps. Verify: ls /dev/video*
Overhead / Workspace Camera
Fixed camera above the workspace for a global view. Mount at ~60 cm above the table, angled down 30°. Second USB 3.0 port.
CAN Bus (arm control)
Already connected from setup. Verify: ip link show can0. The CAN interface must be up before starting LeRobot.
Teleop Device
3D SpaceMouse, a second OpenArm as a leader arm, or a gamepad. Leader-follower with two OpenArms gives the highest quality demonstrations.
Camera sync note: LeRobot timestamps all streams at the host PC level. For multi-camera setups, use USB 3.0 hubs (not USB 2.0 hubs) to minimize latency skew between camera frames and joint state readings. Target: < 5ms skew between streams.
Step-by-Step Recording Workflow
Follow these steps for each recording session. Each step builds on the last — do not skip steps.
Pre-session safety check
Clear the workspace (1 m radius), verify the arm reaches the home position freely, test E-stop before recording. See Safety page.
Bring up the CAN interface and ROS2
sudo ip link set up can0
source /opt/ros/humble/setup.bash
source ~/openarm_ws/install/setup.bash
ros2 launch openarm_ros2 openarm.launch.py use_fake_hardware:=false can_interface:=can0
Home the arm
Run the homing routine to set the reference position before each session. The arm must reach its home position with no load on the end-effector.
python3 -m openarm_can.scripts.home --interface can0
Verify camera feeds
Check that all cameras are streaming before starting LeRobot. A missing camera will silently corrupt your dataset if LeRobot does not report it.
# Quick camera check (press Q to exit)
python3 -c "
import cv2
for i in range(4):
cap = cv2.VideoCapture(i)
if cap.isOpened():
print(f'Camera {i}: OK')
cap.release()
"
Set up the task scene
Place objects in consistent starting positions. Consistent scene initialization is critical for policy generalization. Photograph or mark the starting configuration.
Start LeRobot recording
source ~/.venvs/openarm/bin/activate
python -m lerobot.scripts.control_robot \
--robot.type=openarm \
--control.type=record \
--control.fps=30 \
--control.repo_id=your-username/openarm-pick-place-v1 \
--control.num_episodes=50 \
--control.single_task="Pick up the red cube and place it in the bin" \
--control.warmup_time_s=5 \
--control.reset_time_s=10
LeRobot will prompt you before each episode. Use warmup_time_s to prepare your teleop position before recording starts.
Review and replay episodes
After recording, replay suspicious episodes before finalizing the dataset. Delete poor-quality episodes immediately.
python -m lerobot.scripts.visualize_dataset \
--repo_id=your-username/openarm-pick-place-v1 \
--episode_index=0
Push to HuggingFace Hub
huggingface-cli login
python -m lerobot.scripts.push_dataset_to_hub \
--repo_id=your-username/openarm-pick-place-v1
LeRobot Dataset Format
LeRobot stores datasets in the HuggingFace dataset format using Parquet files for tabular data and MP4/PNG files for image streams. Each episode is a sequence of timestamped observations and actions.
Directory structure
your-username/openarm-pick-place-v1/
├── meta/
│ ├── info.json # Dataset metadata, fps, shapes
│ ├── episodes.jsonl # Per-episode metadata (task, length, outcome)
│ └── stats.json # Min/max/mean/std for all fields
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ ├── episode_000001.parquet
│ └── ...
└── videos/
└── chunk-000/
├── observation.images.wrist_cam/
│ ├── episode_000000.mp4
│ └── ...
└── observation.images.overhead_cam/
└── ...
Episode data schema
Failure episodes as data
OpenArm is designed to safely record failed attempts, not just successes. Failure trajectories — slippage, misgrasp, collision, recovery attempts — are first-class data critical for robust policy generalization. Do not delete failure episodes automatically. Instead, annotate them with the success field in episode metadata and let the training framework decide whether to use them.
Quality Checklist for Collected Data
Run through this checklist after each recording session and before pushing to the Hub. Poor-quality data hurts policy performance more than low episode count.
-
1Episode length is consistent All episodes for the same task should be within ±30% of median length. Outliers usually indicate the operator paused, missed a grasp, or the recording was interrupted.
-
2No missing camera frames Check that every episode has the expected number of frames per stream. Run
lerobot.scripts.visualize_dataseton 3–5 episodes to verify video quality. -
3Joint positions are within safe limits Verify
observation.statenever exceeds the joint limits in specs. High-velocity spikes indicate a CAN dropout or control glitch — delete those episodes. -
4Task scene was reset between episodes Each episode must start with the object in the same initial position. If you skipped a reset, the policy will learn from inconsistent initial conditions and generalize poorly.
-
5Camera coverage is complete The wrist camera should always show the end-effector and the object being manipulated. The overhead camera should show the full workspace. Re-adjust mounts if the object leaves frame mid-episode.
-
6Demonstration style is consistent All operators should use the same approach path and grasp style. Mixed strategies confuse policy training. Use a single operator per task version, or label episodes by operator.
-
7Dataset stats look reasonable Check
meta/stats.jsonafter recording. Verify action mean is near zero (not stuck at joint limits), and action std is large enough to show variation across episodes. -
8Success rate is documented Record the human success rate during collection. A 60–70% success rate is typical for contact-rich tasks. Lower success may indicate the task is too hard; higher may mean the task is too easy to provide useful training signal.
Training a Policy from Your Dataset
Once your dataset passes the quality checklist, you can train ACT or Diffusion Policy directly with LeRobot.
Train ACT
python -m lerobot.scripts.train \
--policy.type=act \
--dataset.repo_id=your-username/openarm-pick-place-v1 \
--policy.chunk_size=100 \
--training.num_epochs=5000 \
--output_dir=outputs/act-pick-place
Train Diffusion Policy
python -m lerobot.scripts.train \
--policy.type=diffusion \
--dataset.repo_id=your-username/openarm-pick-place-v1 \
--training.num_epochs=8000 \
--output_dir=outputs/diffusion-pick-place
Go deeper: Read the full Data Collection Pipeline Overview in the Robotics Library for a thorough treatment of episode structure, dataset versioning, sim-to-real alignment, and multi-task dataset composition.