Getting your Trinity Audio player ready...
|
MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have utilised machine learning to streamline the conventional iterative task planning process. Through PIGINet, they eliminate task plans that fail to meet collision-free requirements, significantly reducing planning time by 50-80% with just 300-500 problems used for training.
In the past, robots would typically try different task plans and continuously adjust their actions until they discover a workable solution. However, this approach can be slow and ineffective, particularly when confronted with movable and articulated obstacles.
Conventional household robots often rely on predefined recipes for performing tasks, which may need to be more suitable for handling diverse or changing environments. PIGINet, on the other hand, avoids these predefined rules. It is a neural network that takes input from “Plans, Images, Goals, and Initial facts” and predicts the likelihood that a task plan can be refined to generate feasible motion plans. Simply put, PIGINet employs a versatile and cutting-edge model called a transformer encoder, specifically designed to process data sequences.
The input sequence includes task plan details, environment images, and symbolic representations of the initial and desired states. The encoder uses these inputs to predict the feasibility of the task plan, facilitating adaptive and efficient task planning for household robots in diverse and dynamic environments.
The research team developed virtual kitchen environments with diverse layouts and specific tasks. They measured the time to solve these tasks and compared PIGINet’s performance to previous approaches.
A correct task plan, for example, could involve:
- Opening the left fridge door
- Removing a pot lid
- Transferring the cabbage from the pot to the fridge
- Placing a potato in the fridge
- Picking up a bottle from the sink
- Putting the bottle in the sink
- Picking up a tomato
- Placing the tomato somewhere
The results showed that PIGINet significantly reduced planning time, achieving an 80% reduction in simpler scenarios and a 20-50% reduction in more complex scenarios with longer plan sequences and less training data.
According to Leslie Pack Kaelbling, MIT Professor and CSAIL Principal Investigator, systems like PIGINet combine the efficiency of data-driven methods in handling familiar cases with the ability to resort to “first-principles” planning methods to validate suggestions derived from learning and solve new problems. This approach offers the advantages of both worlds, delivering reliable and efficient solutions for a wide range of problems.
PIGINet leverages multimodal embeddings within its input sequence to enhance the representation and comprehension of intricate geometric relationships. By incorporating image data, the model better understands spatial arrangements and object configurations, even without access to precise 3D object meshes for collision checking. This capability enables the model to make rapid decisions in diverse environments.
Developing PIGINet faced challenges due to limited training data availability and time-consuming traditional planning methods. The team effectively used pre-trained vision language models and data augmentation techniques to overcome this. This approach significantly reduced plan time for both familiar and previously unseen objects, demonstrating PIGINet’s impressive performance.
During PIGINet’s development, limited high-quality training data posed a challenge. The team used pre-trained vision language models and data augmentation to overcome this. This approach effectively addressed data scarcity and significantly reduced plan time. PIGINet demonstrated impressive performance with familiar objects, showcasing zero-shot generalisation to previously unseen objects.
According to Zhutian Yang, an MIT CSAIL PhD student and lead author, adaptability is essential for robots in diverse home environments. Instead of rigidly following predefined instructions, robots should be able to solve problems flexibly. The team’s approach involves using a general-purpose task planner to generate potential plans and employing a deep learning model to select the most promising ones. This results in an efficient and adaptable household robot that can easily navigate complex and dynamic environments.
The team aims to enhance PIGINet by suggesting alternative task plans for infeasible actions. This improvement would accelerate the generation of viable plans without the need for extensive training datasets. The researchers anticipate that this advancement could revolutionise the training and application of robots in diverse home environments.