{"id":4382,"date":"2020-01-20T11:53:17","date_gmt":"2020-01-20T16:53:17","guid":{"rendered":"http:\/\/blog.dankohn.info\/?p=4382"},"modified":"2020-08-17T17:32:57","modified_gmt":"2020-08-17T17:32:57","slug":"openai-teaches-robot-hand-to-solve-rubiks-cube","status":"publish","type":"post","link":"https:\/\/blog.dankohn.info\/index.php\/2020\/01\/20\/openai-teaches-robot-hand-to-solve-rubiks-cube\/","title":{"rendered":"OpenAI Teaches Robot Hand to Solve Rubik&#8217;s Cube"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">From <a href=\"https:\/\/spectrum.ieee.org\/automaton\/robotics\/robotics-hardware\/openai-demonstrates-sim2real-by-with-onehanded-rubiks-cube-solving?utm_campaign=roboticsnews-10-22-19&amp;utm_medium=email&amp;utm_source=roboticsnews&amp;mkt_tok=eyJpIjoiTjJSak9HWXlaR00zWVdReiIsInQiOiIxNXMrS0dkaUxGWWlZUGRPTGI4VEVXbGhJXC9KaGVkUEozNTdCR3UyQkVJcHpaamRaU0RNcEttSlpZRUFCZmErRnpvWjIyQlQ3dnc1M2dtOTRwbjlETFBVNHZKRFRVN2trRUFLT2dNaWRiMElzVW11RjRnMFpiMkpXQUxpM3pnbksifQ%3D%3D\">IEEE Spectrum<\/a><br>Oct 15, 2019<br>By Evan Ackerman<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"OpenAI Robot Hand Solves Rubik&#039;s Cube\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/jm-ihc7CASY?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">In-hand manipulation is a skill that, as far as I\u2019m aware, humans in \ngeneral don\u2019t actively learn. We just sort of figure it out by doing \nother, more specific tasks with our fingers and hands. This makes it \nparticularly tricky to teach robots to solve&nbsp;in-hand manipulation tasks&nbsp;because\n the way we do it is through experimentation and trial and error. Robots\n can learn through trial and error as well, but since it usually ends up\n being mostly error, it takes a very, very long time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/spectrum.ieee.org\/automaton\/robotics\/artificial-intelligence\/openai-demonstrates-complex-manipulation-transfer-from-simulation-to-real-world\">Last June, we wrote about OpenAI\u2019s approach to teaching a five-fingered robot hand to manipulate a cube<\/a>. The method that OpenAI used leveraged the same kind of experimentation and trial and error, but in <em>simulation<\/em>\n rather than on robot hardware. For complex tasks that take a lot of \nfinesse, simulation generally translates poorly into real-world skills, \nbut OpenAI made their system super robust by introducing a whole bunch \nof randomness into the simulation during the training process. That way,\n even if the simulation didn\u2019t perfectly match reality (which it \ndidn\u2019t), the system could still handle the kinds of variations that it \nexperienced on the real-world hardware.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In a preprint <a href=\"https:\/\/d4mucfpksywv.cloudfront.net\/papers\/solving-rubiks-cube.pdf\">paper<\/a>  published online today, OpenAI has managed to teach its robot hand to  solve a much more difficult version of in-hand cube manipulation:  single-handed solving of a 3&#215;3 Rubik\u2019s cube. The new work is also based  on the idea of solving a problem using advanced simulations and then  transferring the solution\u00a0to a real-world system, or what researchers  call \u201csim2real.\u201d\u00a0In the paper, OpenAI says the new approach \u201cvastly  improved sim2real transfer.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The initial step was to break down the&nbsp;robot manipulation of the \nRubik\u2019s cube&nbsp;into two different tasks: 1. rotating a single face of the \ncube 90 degrees in either direction, and 2. flipping the cube to bring a\n different face to the top. Since rotating the top face is much simpler \nfor the robot than rotating other faces, the most reliable strategy is \nto just do a 90-degree flip to get the face you want to rotate on top. \nThe actual process of solving the cube is computationally \nstraightforward, although the solving process is optimized for the \nmotions that the robot can perform rather than the solve that would take\n the least number of steps.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The physical setup that\u2019s doing the real-world cube solving is a <a href=\"https:\/\/robots.ieee.org\/robots\/shadow\/?utm_source=spectrum\">Shadow Dexterous E Series Hand<\/a> with a <a href=\"http:\/\/phasespace.com\/\">PhaseSpace motion capture system<\/a>,  plus RGB cameras for visual pose estimation. The cube that\u2019s being  manipulated is also pretty fancy:\u00a0It\u2019s stuffed with sensors that report  the orientation of each face with an accuracy of five degrees, which is  necessary because it\u2019s otherwise very difficult to know the state of a  Rubik\u2019s cube when some of its faces are occluded.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> While the video makes it easy to focus on the physical robot, the magic  is mostly happening in simulation, and transferring things learned in  simulation to the real world. Again, the key to this is domain  randomization\u2014jittering parts of the simulation around so that your  system has to adapt to different situations similar to those that might  be encountered in the real-world. For example, maybe you slightly alter  the weight of the cube, or change the friction of the fingertips a  little bit, or turn down the lighting. If your system can handle these  simulated variations, it\u2019ll be more robust to real-world operation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/spectrum.ieee.org\/image\/MzM5MzA2Mg.jpeg\" alt=\"OpenAI solves Rubik's cube\"\/><figcaption> Image: OpenAI   <br>       The physical setup includes\u00a0the Shadow Dexterous Hand, a PhaseSpace  motion capture system, and RGB cameras. OpenAI modified the Shadow  Dexterous Hand by moving the PhaseSpace LEDs and cables inside the  fingers and by adding rubber to the fingertips.    <\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/spectrum.ieee.org\/automaton\/robotics\/artificial-intelligence\/openai-demonstrates-complex-manipulation-transfer-from-simulation-to-real-world\">When we spoke to last year to Jonas Schneider<\/a>\n (one of the authors of the cube manipulation work) and asked him where \nhe thought that system was the weakest, he said that the biggest problem\n at that point was that the randomizations were both task-specific and \nhand designed. It\u2019s probably not surprising, then, that one of the big \ncontributions of the Rubik\u2019s cube work is \u201ca novel method for \nautomatically generating a distribution over randomized environments for\n training reinforcement learning policies and vision state estimators,\u201d \nwhich the researchers call automatic domain randomization (ADR). Here\u2019s \nwhy ADR is important, according to the <a href=\"https:\/\/d4mucfpksywv.cloudfront.net\/papers\/solving-rubiks-cube.pdf\">paper<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><em>Our main hypothesis that motivates ADR is that training on a  maximally diverse distribution over environments leads to transfer via  emergent meta-learning. More concretely, if the model has some form of  memory, it can learn to adjust its behavior during deployment to improve  performance on the current environment over time, i.e. by implementing a  learning algorithm internally. We hypothesize that this happens if the  training distribution is so large that the model cannot memorize a  special-purpose solution per environment due to its finite capacity. ADR  is a first step in this direction of unbounded environmental  complexity: it automates and gradually expands the randomization ranges  that parameterize a distribution over environments.\u00a0<\/em><\/p><\/blockquote>\n\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Solving Rubik\u2019s Cube with a Robot Hand\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/x4O8pojMF0w?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Special-purpose solutions per environment are bad, because they work \nfor that environment, but not for other environments. You can think of \neach little tweak to a simulation as creating a new environment, and the\n idea behind ADR is to automate these tweaks to create so many new \nenvironments that the system is forced to instead come up with general \nsolutions that can work for many different environments all at once. \nThis reflects the robustness required for real-world operation, where no\n two environments are ever exactly alike. It turns out that ADR is both \nbetter and more efficient than the previous manual tuning, say the \nresearchers:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><em>ADR clearly leads to improved transfer with much less need for \nhand-engineered randomizations. We significantly outperformed our \nprevious best results, which were the result of multiple months of \niterative manual tuning.<\/em><\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">In terms of results, the researchers were mostly concerned with how \nmany flips and rotations the system could do in a row without failing, \nrather than how many complete solves it was capable of. It sounds like a\n complete solve was a bit of an outlier\u2014the starting configuration of \nthe cube could be solved by the system in 43 successful moves, while the\n average successful run of the best trained policy (continuously trained\n over multiple months) was about 27 moves. Sixty percent of the time, \nthe system could get halfway to a complete solve, and it made it the \nentire way 20 percent of the time.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/spectrum.ieee.org\/image\/MzM5MzE0NQ.jpeg\" alt=\"OpenAI dexterous robot hand\"\/><figcaption>\n   Image: OpenAI \n <\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The researchers point out that the method they\u2019ve developed here is  general purpose, and you can train a real-world robot to do pretty much  any task that you can adequately simulate. You don\u2019t need any real-world  training at all, as long as your simulations are diverse enough, which  is where the automatic domain randomization comes in. The long-term goal  is to reduce the task specialization that\u2019s inherent to most robots,  which will help them be more useful and adaptable in real-world  applications.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>From IEEE SpectrumOct 15, 2019By Evan Ackerman<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-4382","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/posts\/4382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/comments?post=4382"}],"version-history":[{"count":1,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/posts\/4382\/revisions"}],"predecessor-version":[{"id":4621,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/posts\/4382\/revisions\/4621"}],"wp:attachment":[{"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/media?parent=4382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/categories?post=4382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.dankohn.info\/index.php\/wp-json\/wp\/v2\/tags?post=4382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}