The neural network has been used to assist humans with everything from determining whether a loan applicant will be approved to deciding whether or not a patient has a specific illness. But many aspects of neural networks are still poorly understood by researchers.
Machine learning models inspired by the human brain are called neural networks. Data is processed by a network of synapses with many levels of connections. Scientists teach a network to operate by feeding it millions of data samples.
For instance, an encoded picture could be presented to a network that has been taught to distinguish between, say, dogs and cats. Layer by layer, the network executes a complex multiplication until only one number remains. Then, the network decides whether the image is of a dog or a cat based on whether or not that figure is optimistic.
It is still being determined whether any specific model is the best choice for a particular job. The team at MIT has chosen to investigate and find out more. They analysed neural networks and demonstrated that they could be optimised to reduce the likelihood of misclassifying debtors or patients when given a large amount of labelled training data. These networks must be constructed with a particular architecture to function efficiently.
The study’s authors uncovered that developers sometimes use the building elements that make for an optimal neural network. Researchers claim that the optimal building blocks they deduce from the new analysis are novel and have never been studied.
These optimal building elements, called activation functions, are described in a paper reported this week in the Proceedings of the National Academy of Sciences. The components demonstrate how they can be incorporated into developing neural networks that outperform the competition on any dataset.
The findings remain consistent even when the size of the neural networks is enormous. According to the study’s senior author and EECS professor Caroline Uhler, developers can create neural networks that classify data more accurately across many domains if they choose the proper activation function.
The activation functions aid the network in discovering intricate structures in the data. They transform the data after each tier before sending it to the next. Scientists must settle on a single activation function when developing a neural network. They also select the network’s breadth (the number of synapses per layer) and depth of how many layers are in the network.
To quote the paper: “It turns out that, if you take the standard activation functions that people use in practise and keep raising the depth of the network, it gives you terrible performance,” Adityanarayanan Radhakrishnan, a graduate student in electrical and computer engineering, is the paper’s lead author explained. “We show that if you design with different activation functions, your network will improve as you get more data.”
The team investigated a neural network that is infinitely deep and wide, meaning that it is constructed by continuously adding more layers and nodes and is then taught to carry out classification tasks. The network learns to classify data sources into distinct groups through the classification process.
They tested this theory on several classification benchmarking tasks and found that, in many instances, it led to better performance. Furthermore, their methods could be used by neural network designers to choose an activation function with higher classification accuracy.
Researchers hope to apply their findings to future analyses of less-than-ideal data sets and networks constrained in scope and depth. They are also interested in expanding this research to unlabelled data sets.