This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
In my experience the terms "data matrix" and "design matrix" are used interchangeably, and the article linked does not seem to state any real differentiation between the two concepts. The definitions seem to be the same. — Preceding unsigned comment added by Denziloe (talk • contribs) 22:58, 26 April 2016 (UTC)[reply]
"Data matrix" does not seem to be an established term, but a simple composition of "data" and "matrix". References on the page do not show that "data matrix" is a separate concept. Merging "Data Matrix" and "Design Matrix" seems to be a reasonable proposition.Akaravaev (talk) 19:12, 27 June 2017 (UTC)[reply]
Agreed and DoneKlbrain (talk) 13:58, 1 February 2018 (UTC)[reply]
@Klbrain: The point surely is that this article presents the design matrix, which is the matrix X in the equation
y ≈ Xb
where a vector b of parameters is estimated from a vector y of observations of a single observable. This is the equation presented in this article.
But in multivariate statistics, when there are many observables, that gives a matrix of observations Y, that are used to estimate a matrix of regression coefficients B,
Y ≈ XB
Here X is the design matrix. It encodes how the parameters or regression coefficients B are modelled to affect the observations Y.
Y is not the design matrix. It is the matrix of actual data observed, hence its name the data matrix. It is important to keep clear the distinction between the two, underlined by their different names.
It's hard to follow why you think this article is sufficiently presenting the data matrix, when it doesn't cover multivariate regression at all. Jheald (talk) 22:04, 1 February 2018 (UTC)[reply]
If you look at the unopposed comments from Denziloe and Akaravaev (from 2016 and 2017), you can see that the key argument is that there is no evidence presented for the data matrix as a distinct and notable term. I accept that your use does show a distinction, but can you point to reliable sources which demonstrate that use? Adding those to the article would be helpful. Given the potential for confusion, perhaps the differences between data matrix and design matrix could be discussed on this page. Klbrain (talk) 22:17, 1 February 2018 (UTC)[reply]
Examples
Unless I’m missing something, the first example - calculating the arithmetic mean - states the wrong design matrix. It says that the design matrix needed would be a single column on ones - which cannot be true because that’d mean that the original data plays no part in the calculation!Instead wouldn’t the correct design matrix be a column vector containing x_0, x_2 … x_n, or the same divided by ‘n’ (depending upon what beta is desired to be). BrianOfRugby (talk) 11:16, 7 March 2023 (UTC)[reply]