I know what “surface area” means for:
 a 2d shape
 a cylinder or cone
but I don’t know what it actually means for a sphere.
For a 2d shape
Suppose I’m given a 2d shape, such as a rectangle, or a triangle, or a drawing of a puddle. I can cut out a 1cm by 1cm piece of paper, and trace that piece of paper on the shape. Many full 1 cm squares will be traced on the shape, and there will likely be many partial squares traced on the edges of the shape. Suppose I can accept that I can “combine” the partial squares into full squares. Then I count the total number of full squares, to find the surface area.
For a cone or cylinder
I can convert a paper cone into two 2d shapes. The bottom of the cone is a circle. I can then cut the curved (ie notbottom) part of the cone using scissors, and unfold that part into a flat 2d shape.
Similarly, I can convert a cylinder into flat 2d shapes: two circles and a rectangle.
For a sphere
But the above methods for understanding surface area don’t work for a sphere. I can’t lay a 1 cm by 1 cm piece of paper onto a sphere in a flat way. I can’t even trace a square centimetre onto the sphere using that piece of paper!
People might say, “suppose you have an orange, and you peel the orange. Then you can lay the peel flat onto the table, into a flat 2d shape”. But they’re lying! The orange peel can never be mashed down perfectly flat onto the table!
So, I don’t know what “surface area of a sphere” even means, if you cannot measure it using flat square pieces of paper!
What does “surface area of a sphere” even mean?
Answer
This is actually an interesting question. It involves how to define “area” on a curved surface. The examples you have provided are surfaces that are developable (can be flattened onto a plane) after a few cuts. And you can compute the flattened area. You can never do this to a sphere, because no matter how small a patch from a sphere is, it can never be flattened onto a plane. The idea is to break down the sphere to small patches such that each is flat enough and you compute the area as if it is flat, and then add up the areas of the patches.
Mathematically, suppose $S$ is a sphere. The above procedure is stated as:

Break up $S$ into patches $P_1,\dots,P_n$, where each $P_i$ is a patch that is flat enough, and $n$ is the number of patches you have.

Compute $\operatorname{Area}(P_i)$ as if each $P_i$ is flat. As suggest by levap, one way to do it is to project each patch onto one of its tangent planes. Note that I am not saying this is the only way to approximate a patch, and I am also not saying that one way that would seem correct at first glance would really be correct, see Update 2 for an example, there’s also discussion about this in the comments.

Use $\operatorname{Area}(P_1)+\dots+\operatorname{Area}(P_n)$ as an approximation of the area of $S$.

If the patches are small enough, then the approximation should be a good one. But if you want better precision, use smaller patches and do the above again.

This is to make the math precise, I can’t guarantee that a thirdgrade student can understand this: As you take smaller and smaller patches, the value of the approximation above should tend to a fixed number, which is the mathematical definition of the area.
P.S. For a visualization of this approximation, you can search online for sphere parametrization, or simply think of a football (soccer ball).
Update 1: Thanks to Leander, we have a visualization:
One might notice that this visualization is slightly different from cutting up a sphere; it takes sample points on the sphere and attach triangles to these sample points. I want to remark that there is no essential difference between this and my method. The idea is the same: approximation.
Update 2: A comment (by Tanner Swett) mention that the method of using a polygon mesh may be flawed. Indeed, the example of Schwarz lantern shows that some pathological choice of the polygon mesh may produce a limit different from the surface area. The following explanation should be helpful:
As I have mentioned in step 2 above, if we are not careful with how we approximate the areas of the patches, the approximation may not work. The Schwarz lantern is an example where a careful choice of the approximating triangles can lead to the following result: Suppose $T$ is a triangle we use to approximate a patch $P$, then it is possible ${\rm Area}(T)/{\rm Area}(P)\to a\neq1$. To illustrate this, consider a single triangle on the Schwarz lantern:
We assume the cyclinder has total height $1$ and radius $1$. We take $n+1$ axial slices, and on each slice $m$ points. The area enclosed by the red curves is a patch on the cylinder, and the triangle enclosed by the blue dashed lines is the one used to approximate the patch. Let $P$ and $T$ denote the patch and the triangle respectively. We see that the bottom edge of $P$ and $T$ has ratio $1$ as $m\to\infty$. What really makes a difference is the ratio of their heights. Suppose along the vertical direction the height of $P$ is
$$h=1/n$$
Then the height of the triangle is
$$h_T=\sqrt{1/n^2+a^2}$$
By a simple computation we know $a=1\cos(\pi/m)\approx(\pi^2/m^2)/2$. Therefore,
$$h_T/h=\sqrt{1+\frac{\pi^4n^2}{m^4}}$$
If $n$ has higher order than $m^2$, then the limit is bigger than $1$, and consequently ${\rm Area}(T)/{\rm Area}(P)\not\to1$.
This problem would have a smaller probability of occurring in practice. Imagine if you do cut the cyclinder into patches, you’d use $h$ instead of $h_T$ to estimate the area. But again, it is hard to make this (what approximation is acceptable) precise without using the language of calculus.
Attribution
Source : Link , Question Author : silph , Answer Author : trisct