In reading an article in the New Scientist
yesterday, I came across a description of the 'delta function'. It
seems as if it has this property:

{b,a} (f(x))dx =f(t)

What on Earth would you want that result for? How does the delta
function do it, anyway?

[Simon on Compuserve, around 1993]
I hope that after my answer you shall receive also the
opinion of other mathematicians, of some physicist or engineer. There
is nothing mystical in the delta function, but this mathematical
concept belongs to a very huge area of mathematical disciplines, some
of which are really hard.  The easiest way, I think, to understand
this sort of things, is to concentrate on two aspects:

1. Summation rules, 2. Transformation of problems.

Consider some set V of functions (which may be defined on the real
axis, or also only on the natural numbers 0,1,2,..., so that, in this
second case, our functions are simply sequences, or even only on the
numbers 1,2,...,N, so the functions are vectors in an N-dimensional
space), whose values may be real or complex numbers. In each of these
cases you will assume that also the sum of two functions belonging to
V belongs to V, and any scalar multiple of such a function. One says
then that V is a vector space of functions.  For simplicity, assume
the last case considered (functions defined on a finite set), so that
a function f is given by its values f(1),...,f(N). You may then
consider the sum Sf of these values, hence Sf = f(1)+...+f(N). It is
clear, that S(f+g)=Sf+Sg and that S(af)=aS(f), if a is any number.
One says that the summation operator S is a linear operator. In many
applications it happens now that instead of this simple sum you
consider a weighted sum, i.e., for fixed w(1),...,w(N), you consider
Pf = w(1)f(1)+...+w(N)f(N), and it is clear that also this operator P
is linear.  In probability theory this is a mean or average. So one is
lead to define an average, or mean operator or a summation rule as a
linear operator T, defined on the vector space V, with real or complex
values (the same as for the single functions). In the case that the
functions are defined for infinitely many arguments, e.g., if they are
functions of a real variable, one may (and this becomes then the
really difficult part) require some continuity condition, i.e., a nice
behaviour with respect to limits. Now go back to our simple example,
and take a fixed argument, say i, with weight 1, and put all other
weights equal to 0. Then Pf = w(i)f(i) = f(i), that is, the operator P
picks out the value of f at i. And this is, in this case, the delta
function centered at i (you have, also in the general case, a delta
function for each argument of the function). Therefore such a delta
function is really a very simple object, but the important thing is
that it is still a summation rule as the other ones and can be treated
together with them: it is the simplest sort of them, similarly as the
number 1 in multiplication.  Fix now some function w and associate to
each function f the integral (in some sense) Pf of the product
wf. Then (not bothering about the nature of that integral) you shall
still have a summation rule P. Now also in this case you may consider
the weighted sum which associates to each f its value f(0). And this
is here the delta function! But now we have a difficulty, because (in
this infinite case) you cannot more consider delta as a function w!
Only as a summation rule!  Probably your phantasy is already satisfied
and can fill in details alone.

Try now to find the coefficients c(n) of the product of two
polynomials a(0)+a(1)x+..., b(0)+b(1)x+..., and you shall find that
c(n) is the sum of all terms a(k)b(n-k), where one has to put equal to
0 all not really existing terms. For functions f,g defined on the real
numbers there exists a similar operation: you define (g*f)(s) as the
integral from - to + infinite with integrand g(t)f(s-t). g*f,
considered as a function of s, is called the convolution of g and f.
What is then delta*f? Again, instead of "integral" you have to use the
word "summation rule": (delta*f)(s) is then the value at t=0 of the
function t-->f(s-t), that is (delta*f)(s)=f(s), hence delta*f=f.  So
delta is the 1-element for convolution.

The entire discipline we entered belongs to transform theory. For
example you may associate to a function f its Fourier transform Ff,
which has the fundamental properties that F(f*g)=Ff.Fg and that it
transforms derivation into multiplication by the variable
(approximately). Often (this is a very subtle point) f and Ff
determine each other, and one of them is more easy to study. So you
transform a problem, which for f seems very hard, into a problem,
which for Ff is easy, solve that problem, and then go back to f.
Well, the philosophy is not difficult, but already in this
simplicistic distillation rather long, and the mathematics involved is
almost all of mathematics, and the applications involved are almost
all of applied mathematics! Mathematicians call distributions, what I
called ummation rules.

Josef Eschgfäller
I'm sorry, my message was so long that I forgot the most evident
reason for the delta function being necessary. Philosophically
speaking, harmonic analysis is the mathematical theory of transforming
problems, usually it means that one tries to represent an arbitrary
function as a sum or integral of trigonometric functions. Things
become most manageable if one works with the complex exponential, for
which e^ix = cos(x)+isin(x), the two main reasons for introducing
complex numbers being here that e^i(x+y)=(e^ix)(e^iy) (from which you
may recover the addition formulas for sin and cos) and that
multiplication by i is rotation by a right angle. Well, let us
therefore try to represent a function f as a weighted sum ("integral")
of functions of the form x-->e^ivx, using many v's, and call the
weight of the v-th function (Ff)(v). Then you obtain a "function" Ff,
which is the Fourier transform of f, and in many cases you shall have

 f(x) = (integral) ((Ff)(v) e^ivx dv)

(maybe with some constant factor). But what happens, if the function f
is already of the form x-->e^ivx? It is clear that in this case you
have the situation, where our weight is degenerated, i.e. 1 in v and 0
elsewhere. And as I mentioned in the other message, in this case Ff is
not more a function, but of course a (very simple) summation rule, the
delta function centered at v!

And this comes out: The Fourier transform of the constant function 1
(which corresponds to v=0) is the delta function centered at 0 ("the"
delta function), the Fourier transform of the function x-->e^ivx is
the delta function centered at v. Usually one has to include some
norming factor, for example 2PI, forget about it here.

Let's try to write it down:

(integral) (delta(w) e^iwx dw) = (e^iwx) (evaluated at w=0) = 1
(indep. of x), and, more generally, (integral) (delta)(w-v) e^iwx dw)
= (e^iwx) (evaluated at w=v) = e^ivx,

and remember that the notation (delta)(w-v) is only symbolic, it does
not mean that delta is a function. Delta is a summation rule.
The second equation means simply that the function x-->e^ivx may be
obtained as a weighted sum, whose only summand (with weight 1) is this
function itself.  The power and the difficulty of the theory
mathematicians constructed is that all things can be combined
together, that one can even differentiate such "generalized functions"
and that many other goodies (and headaches) are found.

Josef Eschgfäller