Getting your Trinity Audio player ready...
|
Researchers at The Ohio State University have pioneered the initiative to enhance internet accessibility for individuals with disabilities. In a world where the internet has become an intricately woven fabric of society, its complexity poses challenges, especially for those with disabilities.
The team, led by Yu Su, an assistant professor of computer science and engineering, is developing an artificial intelligence (AI) agent capable of executing complex tasks on any website using simple language commands, simplifying digital interactions.
As the internet has evolved over the last three decades, its intricacies have grown exponentially. Yu Su highlighted the need to address this complexity, particularly for individuals with disabilities, stating, “For some people, especially those with disabilities, it’s not easy for them to browse the internet. We rely increasingly on the computing world in our daily life and work, but there are increasingly a lot of barriers to that access, which, to some degree, widens the disparity.”
The team presented their innovative work at a conference for AI and machine learning research. Their approach leveraged large language models to create web agents—online AI helpers—that mimic human behaviour when browsing the web. The AI agent demonstrated an ability to understand the layout and functionality of different websites using language processing, a testament to the power of large language models.
A pivotal aspect of their research is the creation of Mind2Web, the first dataset specifically designed for generalist web agents. Unlike previous efforts focused on simulated websites, Mind2Web embraces the dynamic and complex nature of real-world websites. It underscored the agent’s capacity to generalise, even when faced with entirely new websites. The team collected over 2,000 open-ended tasks from 137 different real-world websites, providing diverse challenges for training the AI agent.
Tasks included in the dataset range from booking international flights and following celebrity accounts to browsing specific genres of films on streaming platforms. The versatility showcased by the AI agent opens up new possibilities for future models to navigate and learn autonomously across various websites.
The success of this research is partly attributed to the recent development of large language models. The large language model has been widely used to generate content automatically, spanning poetry, jokes, cooking advice, and even medical diagnoses. However, the challenge lies in processing a single website’s vast information, as one can contain thousands of raw HTML elements.
To address this challenge, the researchers introduced a framework called MindAct. This framework utilises a two-pronged agent combining small and large language models to carry out complex tasks. The results show that MindAct outperforms other common modelling strategies and effectively understands various concepts.
While the potential of this AI agent to simplify internet interactions and enhance accessibility is evident, the study also highlights ethical concerns. The ability of the model to translate online instructions into real-world actions raises the possibility of misuse, from manipulating financial information to spreading misinformation. Yu Su emphasises the need for caution, stating, “We should be extremely cautious about these factors and make a concerted effort to try to mitigate them.”
As AI research progresses, Su anticipated growth in generalist web agents’ commercial use and performance. Despite the potential risks, he sees the real value of these tools in saving time and making seemingly impossible tasks possible.
The research received support from the National Science Foundation, the U.S. Army Research Lab, and the Ohio Supercomputer Centre. The collaborative effort involved co-authors Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, and Huan Sun, all from Ohio State. As the digital landscape evolves, the delicate balance between innovation and responsible use of advanced AI technologies will play a crucial role in shaping the future of digital accessibility.