It can be done in constant time by precalculating sums of some basic rectangles (extending all the way to the border of the matrix). That precalculation times time O(n) by simple dynamic programming.

Please elaborate, which "basic rectangles"? Are you recursively dividing each rectangle into 4 smaller rectangles? Precalc time for doing that is not O(n)?!?

Compute the sum of the rectangles, for all i,j, bounded by (i,j), (i,m), (n,j), (n,m), where (n,m) is the size of the matrix M. Call that sum s(i,j). You can calculate s(i,j) by dynamic programming: s(i,j) = M(i,j) + s(i+1,j) + s(i,j+1) - s(i+1,j+1). And the sum of any rectangle can be computed from s(i,j).

The answer is already popular in computer vision fields!! It is called integral imaging. See this page http://en.wikipedia.org/wiki/Haar-like_features

Let a[][] be the 2d array, int i=0; for( j = row_start; j <= row_end; j++) for( k = col_start; k <= col_end; k++) i+=a[j][k];

Iterate over matrix as an array storing (new sums array) in each position the cumulative sum up to that point. For each row in the desired submatrix we can compute its sum as a difference between its end and start positions. Repeat for other rows. Add up all the row sums.