Estimate function f(x) in high-dimensional space

I’m working on a problem of estimating a function $y=f(x): \mathbb{R}^d \rightarrow \mathbb{R}$. Namely, I have an unknown function $f(x)$ (like a black box), what I can do is to input $x^{(i)}$ to it, and obtain $y^{(i)}$ ($i=1,2,\cdots, N$). Then I get a dataset $(x^{(i)}, y^{(i)})$ and am able to fit a function on it.

The simplest/dumbest way is to use a uniform grid of $x$, namely sampling $m$ points on each dimension ($x_i, i=1,2,\cdots, d$) and form a grid with $m^d$ points. The number of samples would explode with high dimension (very large $d$).

A better choice might be using Latin Hypercube or some low discrepancy sequences.

I’m wondering are there any literature with rigorous analysis on using those sampling methods to estimate functions? How can I use as less samples as possible (i.e. small $N$) to get a good estimate of $f(x)$.

For simplicity, we could assume $f(x)$ is infinitely differentiable ($f \in C^\infty$).

Presumably the question is not to estimate $f(x)$ per se, but $\bar f(x)$ over the function’s domain.

The OP is correct that in many instances, especially those involving high dimensions, using a low discrepancy sequencing method to sample the points can be more efficient that using uniform grids, jittered grids or latin (hyper-) squares.

Thus, for high dimensions there are really only two efficient estimation methods:

• Monte Carlo integration based on uniform random sampling, and
• Quasirandom Monte Carlo Integration, based on a low discrepancy sequences.

In theory, the convergence of standard Monte Carlo integration is $O(1/\sqrt{N})$, which is independent of the dimension. The fact that it is independent of $d$ is quite remarkable and suggests that as $d$ gets very large it is almost impossible to beat simple random sampling!

Whereas, the convergence rate for quasi Monte Carlo methods is $O((\log N)^d/N)$.

By simple inspection it seems that for fixed and small values of $d$, quasirandom sampling is far superior as it converges $O(1/N)$ rather than as $O(1/\sqrt{N})$ for standard Monte Carlo.

However, in practice life is not so simple and the crossover of when simple Monte Carlo integration becomes superior to QMC is not obvious at all. This is for two reasons.

Firstly, because the big O notation describes that convergence as $n \rightarrow \infty$ and does not indicate the constants of proportionality. Thus, even for a very high dimensional problem, and a fixed/finite value of $n$, quasirandom sampling may still be optimal.

The second reason is even more subtle and the reason why quasirandom sampling has had a revivial in the last few decade, and is now used in machine learning more than one might expect. It has been found that it is not the explicit number dimensions that are the key, but rather the intrinsic dimensional of the problem.

(Apparently, the revival of quasirandom monte carlo methods began when an analyst in the financial industry tried to apply to calculate a 300-dimensional integral associated with the value of options. It turned out that quasirandom sampling was far superior to standard MC methods, and they interpreted this to mean that the problem could be described by a lower dimensional manifold embedded in a very high dimension.)

One of the seminal papers, which is very readable and considers all the above-mentioned options for numerical integration is Owen’s “Quasi-Monte Carlo Sampling“. Another nice one is by L’Ecuyer. Other academic papers can be found here.

Furthermore, I have a blog post, “The unreasonable effectiveness of quasirandom sequences” which includes multiple sections that compare various approaches to estimating functions in high dimensions.

Below is an comparison of different low discrepancy methods for an 8-dimensional integral, where you can clearly see the $O(1/N)$ convergence for all methods.