i like to think about neural networks working in terms of signal flows. which i don’t think is remarkably standard but its like the easiest metaphors for me. the interesting thing about neural nets in terms of signal processing is that usually in signals we work with 1 dimensional systems in terms of a stream of voltage or bits, even video and image is dealt with by multiplexing. this is good enough for a lot of purposes, lots of things you would like to do in terms of processing signals can be easily done with this one at a time approach. however if you are interested in doing things with larger scale patterns in signals then it gets a little cumbersome because these patterns aren’t happening 1 bit or 1 slice of voltage at a time, they happen over a range of bits or slices of voltages so you’d then need to have some kind of a buffer around for storing chunks of data and then run convolutions on the buffer to try and extract information on a single bit’s relation with its neighborhood. a convolution is simply a weighted average of a neighborhood in some kind of stream of data. you can think of it as being like an audio filter, an image filter, or like a derivative.
it gets much easier to work with convolutions if you have a situation where you have access to every bit of data at the same time, this is basically how neural nets work, they are parallel in terms of every bit of information is processed at the same time (and massively parallel in terms of theres usually multiple levels of neural nets working in feedback and feedforward loops with one another. think about running multiple shader passes on an image for processing before it gets drawn to the screen like in a standard sharpen algorithm).
if i want to find out information about what kind of shapes are in an image its really hard to do that with individual pixels one at a time but becomes easier when dealing with clusters of pixels. this can be abstracted to any kind of data tho, emergent structures in data are difficult to deal with in the one bit at a time approach but get easier once you start clumping things together and taking advantage of emergent processes to identify emergent behaviors.
this ties in with how video feedback works in that the act of pointing a camera at a screen is a parallel computation. (well strictly speaking not 100 percent of the time but since like a good chunk of the processing is happening at the speed of light and/or the speed of electricty it’s usually close enuf for human perception). it gets massively parallel because the signal is constantly being fed back upon itself. the convolutions are happening because that is in a nutshell how video feedback works. the feedforward in this situation tho is different in terms of the human who is operating the signal chain is doing the feedforward by steering the camera, changing the signal processing, fuckin around with iris/video fx, etc etc as opposed to the kind of bland stock algorithmic feedforward goals hardcoded into neural nets.