Time-frequency
An appropriate way in order to discretize some continuous transformation is to build an orthogonal bases, well suited for some application such as compression. The question is: is-it possible to build an ortogonal basis from the STFT ? Can we found an analysis window $g$ such that
$$ \langle g_{b\nu},g_{b’\nu’}\rangle = 0 \text{ si } \left(b,\nu\right)\neq\left(b’,\nu’\right) $$ The answer is given by the Balan-Low’s Theorem
Balian-Low If $\lbrace g_{b\nu}\rbrace$ is an orthogonal bases of $L^2\left(\mathbb{R}\right)$ then $$ \int_{\mathbb{R}} t^2 |g(t)|^2 \mathrm{d}t = + \infty\quad \text{ or } \int_{\mathbb{R}} \nu^2 |\hat g\left(\nu\right)|^2 \mathrm{d}\nu = + \infty $$
The construction of redundancy dictionary is given by the notion of frame, which allows one to build a stable discretization for both analysis and synthesis of a signal.
Frame A set $\lbrace \varphi_n, \varphi_n\in\cH\rbrace$ is a frame of $\cH$ if it exists two constants $A,B>0$ such that for all $f\in\cH$ $$ A|f|^2 \leq \sum_n |\langle f,\varphi_n\rangle|^2 \leq B|f|^2 $$ if $A=B$ the frame is said tight.
We can define two operators associated with a frame
The synthesis operator $\Phi$ such that $$ \Phi \alpha = \sum_k \alpha_k \varphi_k $$
The analysis operator $\Phi^*$ $$ \Phi^* f = \lbrace \langle f,\varphi_n\rangle\rbrace $$ In finite dimension, $\Phi$ is a matrix of size $M\times N$, where $M$ is the size of the signal and $N$ the number of atoms in the frame $\lbrace \varphi_n\rbrace_{n=0}^{N-1}$, with $\varphi_n\in\mathbb{C}^M$
When the frame is redundant (ie, the frame is not a basis), it exists an infinity of left inverse. The canonical left inverse if given by the pseudo-inverse. This new frame is called the dual frame
Dual frame inversion Let $\tilde \varphi_n = \left(\Phi\Phi^*\right)^{-1} \varphi_n$. Then $$ \frac{1}{B}|f|^2 \leq \sum_n |\langle f,\tilde \varphi_n\rangle|^2 \leq \frac{1}{A}|f|^2 $$ and $$f = \sum \langle f,\varphi_n \rangle \tilde \varphi_n \sum \langle f,\tilde \varphi_n \rangle \varphi_n$$
When the frame is tight (i.e. $A=B$), then $\Phi\Phi^* = A \text{Id}$. When $A=1$, the frame is called a Parseval frame (the energy is preserved in the transform domain). If the frame is a Parseval frame but is not an orthogonal basis, then necessarily we have $|\varphi_n| \neq 1$.
The frame bounds $A$ and $B$ are the smaller and bigger eigne values of $\Phi\Phi^*$. We get $\frac{1}{B}$ and $\frac{1}{A}$ for $\left(\Phi\Phi^*\right)^{-1}$ .
In practice, a frame (and its dual) is redundant representation, which allows one to reconstruct a signal from its inner products with the atoms of the frame.
In the following, we give the construction of a Gabor frame (that is, a discretized version of the STFT). The STFT is build on the Fourier transform, using a sliding analysis window. A simple way to obtain a discretized STFT is the same framework used to define the Finite Fourier Transfrom, that is to say, the periodic discrete signals. Let $x=\lbrace x[0],\ldots,x[T-1]\rbrace$ a discrete signal of period $T$, and a periodic discrete window $g=\lbrace g[0],\ldots,g[L-1]\rbrace$, $L\leq T$, with the same period $T$. The discretized STFT reads $$ X[m,\nu] = \sum_{n=0}^T x[n]g[n-m]e^{-\frac{i2\pi \nu n}{T}} $$ and one can check that it can be inverted by $$ s[n] = \frac{1}{|g|^2}\sum_{m=0}^T\sum_{k=0}^T X[m,\nu ]\boldsymbol{g}[n-m]e^{\frac{i2\pi \nu n}{T}} $$ However, such a discretization is extremely redundant: we have $T^2$ time-frequency coefficients for a signal of size $T$. Even if the redundancy can be usefull, it would be nice to play on the time redundancy (ie, the time shift between two windows), and the frequency resolution (for example, using a Fourier transform of size $L$ or $2L$ instead of $T$).
Let $\nu_0\in\mathbb{N}$ et $k_0\in\mathbb{N}$ the sampling rates in frequency and time of the time-frequency plan used for the Gabor transform. Let the set of translation and modulation of the “mother” window generating a dictionary of Gabor atoms $\left(\boldsymbol{\varphi}_{m\nu}\right)_{m\nu}$. We denote this dictionary by $\boldsymbol{\Phi} \in \mathbb{C}^{T\times K}$. The Gabor atoms read $$ \boldsymbol{\varphi}_{m\nu} [n]= \boldsymbol{g}[n-m k_0]e^{\frac{i2\pi \nu_0 \nu n}{T}},\quad m\in \left\lbrace 0,\ldots,\frac{T}{k_0} - 1 \right\rbrace , \nu\in\left\lbrace 0,\ldots,\frac{T}{\nu_0}-1\right\rbrace $$ If the product $\nu_0k_0$ is “small enough”, more precisely if $\nu_0k_0 < T$, i.e. if the time-frequency plan is ampled enough, then the set $\left( \varphi_{m\nu}\right)_{m,\nu}$ is a frame of $\mathbb{R}^T$.
The Gabor transform is necessarily redundant (from the Balian-Low’s Theorem). Then, it exists an infinity of possibilities in order to reconstruct $\boldsymbol{x}$ from its Gabor coefficients.
The canonical reconstruction of $\boldsymbol{x}$ from its Gabor coefficients is given by the canonical invert frame, build from the dual window denoted by $\boldsymbol{\tilde g}$.
$$ \boldsymbol{x} = \sum_{m,\nu} \langle \boldsymbol{x}, \boldsymbol{\varphi}_{m\nu} \rangle \tilde{\boldsymbol{\varphi}}_{m\nu} = \sum_{m,\nu} \langle \boldsymbol{x}, \tilde{\boldsymbol{\varphi}}_{m\nu} \rangle \boldsymbol{\varphi}_{m\nu} = \boldsymbol{\Phi}^{*}\boldsymbol{x}\boldsymbol{\tilde{\Phi}} = \boldsymbol{\tilde{\Phi}}^{*}\boldsymbol{x}\boldsymbol{\Phi} $$ where $\boldsymbol{\tilde{\Phi}}$ is the Gabor dictionary obtained from the dual windows.
The gabor frame can be tight. In that case, we simply have $\tilde{\boldsymbol{g}} = \boldsymbol{g}$. More particularly, we have $\boldsymbol{\Phi}\boldsymbol{\Phi}^* = \Vert \boldsymbol{\Phi}\boldsymbol{\Phi}^*\Vert \boldsymbol{Id}$ That is why in practice, the Gaussian window, even if it has the minimal time-frequency spread, is not the best choice. It can be convenient to choose a window allowing to get a tight Gabor frames, such as the Hann window.
Remark: we cannot say anything, in general, about $\boldsymbol{\Phi}^*\boldsymbol{\Phi}\ $.
In practice, Gabor coefficients can be computed thanks to the FFT. The synthesis operation (that is, the invert transform) is obtained by invert FFT and overlap-add technics. These operations are efficiently implemented the MatLab toolbox LTFAT (http://ltfat.sourceforge.net/).