Nvidia and University of Toronto Make Robotics Research Available to Small Businesses
The The Transform Technology Summits begin October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!
The human hand is one of nature’s fascinating creations and one of the most sought-after goals of researchers in artificial intelligence and robotics. A robotic hand that could manipulate objects like we do would be extremely useful in factories, warehouses, offices, and homes.
Yet despite enormous progress in the field, research on robotic hands remains extremely expensive and limited to a few very wealthy companies and research labs.
Now, new research promises to make robotics research available to organizations with limited resources. In an article published on arXiv, researchers from the University of Toronto, Nvidia and other organizations presented a new system that exploits highly effective deep reinforcement learning techniques and simulated environments optimized to train hands. robotics at a fraction of the costs that it would normally require. .
Training robotic hands is expensive
As far as we know, the technology to create human-like robots is not yet here. However, with enough resources and time, you can make significant progress on specific tasks such as manipulating objects with a robotic hand.
In 2019, OpenAI introduced Dactyl, a robotic hand capable of manipulating a Rubik’s cube with impressive dexterity (although still significantly inferior to human dexterity). But it took 13,000 years of training to get to the point where he could reliably manipulate objects.
How do you integrate 13,000 years of training in a short period of time? Fortunately, many software tasks can be parallelized. You can train multiple reinforcement learning agents simultaneously and merge their learned parameters. Parallelization can help reduce the time it takes to train the AI that controls the robotic hand.
However, speed comes at a price. One solution is to create thousands of physical robotic hands and train them simultaneously, a path that would be financially prohibitive even for the wealthiest tech companies. Another solution is to use a simulated environment. With simulated environments, researchers can train hundreds of AI agents at the same time, then refine the model on a real physical robot. The combination of simulation and physical training has become the norm in robotics, autonomous driving, and other areas of research that require interactions with the real world.
Simulations have their own challenges, however, and the computational costs can still be too high for small businesses.
OpenAI, which has financial backing from some of the wealthiest companies and investors, developed Dactyl using expensive robotic hands and an even more expensive compute cluster comprising around 30,000 processor cores.
Reduce robotics research costs
In 2020, a group of researchers from the Max Planck Institute for Intelligent Systems and New York University came up with an open source robotic research platform that was dynamic and used affordable hardware. Named TriFinger, the system used the PyBullet physics engine for simulated learning and a low-cost robotic hand with three fingers and six degrees of freedom (6DoF). The researchers then launched the Real Robot Challenge (RRC), a Europe-based platform that gave researchers remote access to physical robots on which to test their reinforcement learning models.
The TriFinger platform has reduced the costs of robotic research but still has several challenges to overcome. PyBullet, which is a CPU-based environment, is noisy and slow and makes it difficult to effectively train reinforcement learning models. Poor simulated learning creates complications and widens the “sim2real gap”, the performance degradation that the trained RL model suffers when transferred to a physical robot. Therefore, robotics researchers must go through multiple cycles of switching between simulated training and physical testing to tune their RL models.
“Previous work on manual manipulation required large clusters of processors to function. Additionally, the engineering effort required to scale reinforcement learning methods has been prohibitive for most research teams, ”said Arthur Allshire, lead author of the paper and simulation intern. and robotics at Nvidia. TechTalks. “This meant that despite the progress made in scaling up deep RL, the pursuit of algorithmic or systems advancements has been difficult. And the cost of hardware and maintenance time associated with systems such as the Shadow Hand [used in OpenAI Dactyl] … Limited the accessibility of the hardware on which to test the learning algorithms.
Building on the work of the TriFinger team, this new group of researchers aimed to improve the quality of simulated learning while keeping costs low.
RL agent training with single GPU simulation
Researchers replaced the PyBullet with Isaac Gym from Nvidia, a simulated environment that can run efficiently on desktop GPUs. Isaac Gym leverages Nvidia’s GPU PhysX accelerated engine to enable thousands of parallel simulations on a single GPU. It can deliver around 100,000 samples per second on an RTX 3090 GPU.
“Our task is suitable for research laboratories with limited resources. Our method took a day to train on a single GPU and CPU at the desktop level. Every university lab working in machine learning has access to this level of resources, ”said Allshire.
According to the document, a complete setup to run the system, including training, inference, and physical robot hardware, can be purchased for less than $ 10,000.
The efficiency of the GPU-powered virtual environment allowed researchers to train their reinforcement learning models in high-fidelity simulation without reducing the speed of the training process. Higher fidelity makes the training environment more realistic, reducing the sim2real gap and the need to refine the model with physical robots.
The researchers used an example object manipulation task to test their reinforcement learning system. As input, the RL model receives proprioceptive data from the simulated robot as well as eight key points which represent the pose of the target object in three-dimensional Euclidean space. The output of the model corresponds to the torques applied to the motors of the nine joints of the robot.
The system uses Proximal Policy Optimization (PPO), a modelless RL algorithm. Modelless algorithms avoid the need to calculate all the details of the environment, which is very computationally expensive, especially when dealing with the physical world. AI researchers often seek cost-effective, model-less solutions to their reinforcement learning problems.
The researchers designed the RL robotic hand award as a balance between the distance of the fingers from the object, the destination location of the object, and the intended pose.
To further improve the robustness of the model, the researchers added random noise to different elements of the environment during training.
Test on real robots
After the reinforcement learning system was trained in the simulated environment, the researchers tested it in the real world through remote access to the TriFinger robots provided by the Real Robot Challenge. They replaced the simulator’s proprioceptive and image input with sensor and camera information provided by the remote robotic lab.
The trained system transferred its capabilities to the real robot with a seven percent drop in accuracy, an impressive improvement in sim2real deviation from previous methods.
Key point-based object tracking has been particularly useful in ensuring that the robot’s object management capabilities generalize to different scales, poses, conditions, and objects.
“One of the limitations of our method – deployment on a cluster to which we did not have direct physical access – was the difficulty of trying other objects. However, we were able to try other objects in simulation and our policies were found to be relatively robust with zero transfer performance from the cube, ”said Allshire.
Researchers say the same technique can work on robotic hands with more degrees of freedom. They didn’t have the physical robot to measure the sim2real gap, but the Isaac Gym simulator also includes complex robotic hands such as the Shadow Hand used in Dactyl.
This system can be integrated with other reinforcement learning systems that deal with other aspects of robotics, such as navigation and path finding, to form a more complete solution for training mobile robots. “For example, you might have our method controlling the low level control of a clamp while higher level planners or even learning-based algorithms are able to operate at a higher level of abstraction,” he said. said Allshire.
The researchers believe their work presents “a path for democratizing robot learning and a viable solution through large-scale simulation and robotics as a service.”
Ben Dickson is a software engineer and founder of TechTalks. He writes about technology, business and politics.
This story originally appeared on Bdtechtalks.com. Copyright 2021
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member