(Below, the answer of ChatGPT to my silly question on self-awareness.)
Of course, large language models (LLM) are not it – they are not conscient, they are not intelligent (at least according to most of the meanings of the term we would be comfortable with), they cannot be really argued to “think”, and they were not designed with an infrastructure capable of generating conscience. This is a way I read into Yann Le Cun’s recent tweet about LLMs being an “off track” in the highway to artificial general intelligence.
Yet if you experiment with ChatGPT (do it – see here – it is free and quite fun) you cannot help be left thinking that with all its defects – it sometimes hallucinates into suggesting non-existing references, it cannot solve complex math problems, etcetera – it feels like talking to a human being. ChatGPT does possess the capability of discussing any topic to a high level, abstract and concrete, and it does so with fluent language; and if we compare it to ourselves, we cannot hide that we ourselves sometimes make things up or say silly things.
If anything, the machine is quite human in the way it behaves when confronted with questions it cannot answer properly. In fact, maybe precisely because of that it is an off-track! It seems conceivable to me that if we one day manage to develop machines possessing a true general intelligence, we will find out that their intelligence is fundamentally different from our own.
Computing in machines and in brains
Think about the way we have been developing computing capabilities in machines. Digital computing does not work as our brains do: while computers perform computations by encoding instructions and data in the same substrate, and move bits around at a fixed clock rate, our brains accomplish tasks by encoding information in the timing structure of individual bits, and in the connection strengths between synapses and neuron somata.
The difference could not be starker, and it explains why brains are enormously more efficient power-wise than digital computers; it is estimated that the power required to perform on a digital computer the computing of a human brain, which costs us 20 watts, would be of 20MW, i.e. a million times higher.
The other large advantage of biological computing over digital computing lays in the avoidance of the “von Neumann bottleneck” – the limit in the information transfer between the computing and memory that is intrinsic in the way digital computing works. In a brain, the processing and memory reside in the same substrate – the neuron.
The two advantages above make it compelling to investigate ways to develop artificial systems capable of time-encoding and co-location of processing and memory, as such systems would be technology enablers; for instance, the internet of things requires devices to be able to collect and process data with minimal power consumption, and with no data transfer mechanisms to a central CPU.
While the ideas of neuromorphic computing have a long history, the development of technology that implements it has been relatively recent. Nowadays we have a variety of available neuromorphic systems based on CMOS technology (either analog or digital); I can cite a few here based on their acronyms, so you can look them up: Neurogrid, SpiNNaker, DYNAP, TrueNorth, BrainScaleS, Spirit, Tianjic… They are used in the modeling of biological systems, to understand better the mechanisms that Nature has developed to solve problems, as well as in synthesis of systems that have direct applications. These systems are characterised by in-memory computing, large parallelism, event-based information transfer and communication. To a varied degrees, all the developed systems are inspired by the working of biological brains, and exploit the solutions that are implemented there – from the analog sum of potentials activating neuron discharge, to delays and time structure of response, to plasticity mechanisms.
The literature on decrypting the mechanisms at play in biological brains to encode and process sensory inputs is vast, and recent research manages to really get to the bottom of the functional elements and their interplay. E.g., one recent study by F. Sandin and collaborators involved modeling in DYNAP-SE the auditory system of crickets. In this article they discuss how they could reproduce the feature detection of the biological systems and prove how multiple delays in single neurons can be effective to enable the detection of specific sound patterns. The figure shows the basic circuit that provide the pattern recognition in the crickets, from the auditory stimulus to its downstream processing (see figure below, taken from the publication).
Overall, it is all quite fascinating stuff if you ask me. So much so that in fact, a few months ago I accepted the invitation of some professors at the Machine Learning group of the Lulea University of Technology to join them and develop potential applications together.
For that reason, I have been spending a few months here in the north (Lulea is located in northern Sweden, very close to the arctic polar circle). I have been delighted by their hospitality, and the friendly environment allowed me to learn a lot on neuromorphic computing. I am still a newcomer, but in the past weeks I have identified, together with my colleagues Fredrik Sandin and Marcus Liwicki, at least a couple of potential applications of neuromorphic computing for fundamental physics, and we are setting up a collaboration to attack those difficult, exciting new projects.
A Spiking Neural Network Finds Particle Tracks Without Being Taught What Those Are
I will leave a description of those projects for the near future. Instead, here I would like to show what the time-encoding of signals in a network of neurons can do for you. You can easily write a simulation of a neuromorphic system on your laptop, if you understand the basic working principles of signal propagation and neuron potentials, and that is what I have done.
The basic unit of the brain, the neuron, works by collecting signals from a certain number of synapses – contacts to previous neurons in the chain – and firing a signal of its own when its electric potential reaches a given threshold. While this is the same way that all modern neural networks are modeled (a network of nodes, each collecting a weighted sum of signals from previous layers of other nodes, and firing a signal to nodes of the next layer), in neuromorphic computing what counts is the time of arrival of signals. There are several reasons while the time of arrival is important, but the most direct one is that the electric potential of the neuron membrane tends to decrease with time – electricity “leaks off”, reducing the possibility that the neuron reaches the threshold. So only with continuous stimulation above some rate you can maintain a steady neuron firing rate.
The above concept can be modeled by considering the shape of electric signals, and their time; a simple function can then take care of modeling the electric potential of these signals, by also accounting for the strength of the synapse-neuron connection.
In order to understand more how neuromorphic processing works I coded up a spiking neural network (SNN) – SNNs are the networks that can simulate neuromorphic system, by tracking the time structure and propagation of digital spikes. I set up to simulate a detector module made of eight layers of particle ionization planes, divided each into 256 elements (you may call them “strips”, as the idea is to model a silicon strip tracker) and stacked one on top of the other (see figure). Tracks come in from the bottom layer and leave “hits” in the closest strip they pass by when traversing each layer. A magnetic field orthogonal to the plane of particle flow generates a curvature of the track, which is proportional to the (inverse) momentum of the tracks.
An encoding in the time of generation of signal “spikes” can be performed on each of the eight layers by making a direct correspondence between the location of the strip along the horizontal direction with the time of the generated spike. One thus has eight trains of spikes that can be directed to a network of neurons. One may model, as I did, the real particles coming in at some fixed rate (this is a normal setup in a particle detector application), but noise is also generated in the strips at random times. The spike trains thus contain both “real hits” from particle ionization, and fake ones due to electronic noise. Here is a train of a few tens of “events”, where the eight spike trains are stacked vertically so that you can see the spikes due to real tracks (in blue) and the noise ones (in red).
Once the trains of spikes arrive at the synapse of the neurons, the potential of the neuron membrane is modified with a time structure that is characteristic of the incoming spikes. If the potential exceeds some threshold, it fires a signal through its own “axon”, generating an output spike train (one spike per discharge). This signal can reach other neurons downstream. The arrival of spikes to a neuron membrane has also effects such as a modification of the strength of the synapse-membrane interface, and some short-term plasticity effects that modify the neuron behavior based on the rate of incoming signals. Optionally, delays in the arrival of the signals can be modeled in a biologically-inspired way.
Below you can see a sketch of the network I put together. The eight trains of spikes come in from the left, are read out by eight neurons (each neuron sees all trains) at Layer 0 (L0) as well as at Layer 1 (L1), but some random delays are initially added to create some device mismatch and give more flexibility to the system. All L0 neurons also generate spikes into the L1 ones. The output spikes can then report what these eight neurons individually think about the incoming trains: has there been a track or not in the last event?
I coded up all the above and set my system to work at the task of watching those trains of spikes encoding particle crossing in my toy detector. To my amazement, the system quickly “learns” the recurring patterns of spikes among the various trains – it starts recognizing the particle tracks! The way this is observed is by counting how often the output neurons fire in response to the presence of particles of different momentum (which generate different patterns of spikes), and how often they fire when that particular stimulus was not present.
While quite rudimentary, my little program demonstrated – at least to myself – that these systems are fast learners of recurring patterns. Why, my spiking network learns what particle tracks are without anybody telling it what a track should look like!, and it does so after seeing only 10,000 tracks or so.
This has obvious applications to anomaly detection systems, of course (and in fact, SNNs are used for that task among others). So, once I realized the potential of these systems, I have started to think at ways to exploit them for other particle physics applications: finding tracks without having been taught what tracks are is fun, of course, but finding tracks is something we can do very effectively with more direct means, and besides, there is no point in hiding the information on what a track is to a system that needs to find precisely those patterns!
In principle, all instances where one looks for high parallelism, low power consumption, sensitivity to time structure of signals or possibility of time-encoding information, and co-location of information and processing, lend themselves to be excellent areas of applications of neuromorphic computing systems. The actual implementations, however, may be tricky; one challenge, e.g., is the need to sidestep the intrinsic variability of response of the hardware emulating neurons; but a solution may be to actually exploit the device mismatch of the components of a large system to make it more flexible and adaptable.
I will end this post by describing a crazy idea I had to exploit the peculiarities of neuromorphic processing systems in particle physics applications. An emerging new area of studies involves asteroids, as we today have the technology to land probes on their surface. In principle, asteroids could be mined for precious substances. Making prospections of the material content of an asteroid is quite challenging and costly. Also, in space, one has very little power available to run ones’ equipment.
(Below, an image generated by Canva on the prompt “particle detector on asteroid surface”)
One idea to study the atomic composition of the interior of an asteroid would be to land on its surface a set of detector planes that would record the trajectories of charged particles crossing them. By studying the flux of particles it would be possible to map the density of the asteroid body with a technique called “absorption tomography”: in essence, one exploits the fact that particles get absorbed if they traverse larger density materials, and manage to pass through otherwise. But this requires long data acquisition times to become a sensitive method; and detectors electronics consumes power. A neuromorphic system could in principle perform the tracking of particles and the collection of data more effectively than a digital computing system, and require much smaller power to be operated. Who knows, maybe one day somebody will actually build something similar!