Hi-kvadratni test

Hi-kvadratni test, takođe napisan kao $χ 2$ test, jeste test statističke hipoteze gde je distribucija uzorka testirane statistike hi-kvadratna distribucija kad je nulta hipoteza istinita. Bez druge kvalifikacije, hi-kvadratni test često se koristi kao zamena za Pirsonov hi-kvadratni test.^[2] Hi-kvadratni test se koristi da se utvrdi da li postoji značajna razlika između očekivanih frekvencija i posmatranih frekvencija u jednoj ili više kategorija.

U standardnim primenama ovog testa, zapažanja su svrstana u međusobno isključive klase, i postoji neka teorija, ili nulta hipoteza, koja daje verovatnoću da bilo koje opažanje padne u odgovarajuću klasu. Svrha testa je da se proceni koliko su verovatne opservacije, pod pretpostavkom da je nulta hipoteza tačna.

Hi-kvadratni testovi se obično konstruišu iz sume kvadriranih grešaka, ili pomoću varijanse uzorka. Statistika testa koja sledi hi-kvadratnu distribuciju proizilazi iz pretpostavke nezavisnih normalno distribuiranih podataka,^[3]^[4]^[5]^[6] koja je u mnogim slučajevima validna zbog centralne granične teoreme. Hi-kvadratni test može se koristiti za pokušaj odbacivanja nulte hipoteze da su podaci nezavisni.

Istorija

U 19. veku, statističke analitičke metode uglavnom su primenjivane u analizi bioloških podataka i bilo je uobičajeno da istraživači pretpostavljaju da su zapažanja pratila normalnu distribuciju, kao što su to činili Ser Džordž Eri i profesor Meriman, čije je radove kritikovao Karl Pirson u svom radu iz 1900. godine.^[7]

Do kraja 19. veka, Pirson je uočio postojanje značajne asimetrije unutar nekih bioloških posmatranja. Da bi modelovao zapažanja nezavisno od toga da li su normalna ili asimetrična, Pirson je u nizu članaka objavljenih od 1893 do 1916,^[8]^[9]^[10]^[11] osmislio Pirsonovu distribuciju, porodicu neprekidnih raspodela verovatnoće, koja uključuje normalnu distribuciju i mnoge asimetrične distribucije. On je predložio metod statističke analize koji se sastoji od upotrebe Pirsonove distribucije za modelovanje posmatranja i vršenja testa dobrog uklapanja kako bi se utvrdilo koliko se model i posmatranje zaista uklapaju.

Pirsonov hi-kvadratni test

Godine 1900, Pirson je objavio publikaciju^[7] o $χ 2$ testu, koja se smatra jednim od fundamentalnih radova u modernoj statistici.^[12] U tom radu, Pirson je istražio test adekvatnosti uklapanja.

Ako se pretpostavi da je $n$ opažanja slučajnog uzorka iz populacije klasifikovano u $k$ međusobno isključujućih klasa sa odgovarajućim posmatranim brojevima $x i$ (za $i = 1,2,\dots, k$ ), a da nulta hipoteza daje verovatnoću $p i$ da će opažanje pasti u $i$ -tu klasu. Dakle, dostupni su očekivani brojevi $m i = np i$ za svako $i$ , gde

{\begin{aligned}\sum _{i=1}^{k}{p_{i}}&=1\\[8pt]\sum _{i=1}^{k}{m_{i}}&=n\sum _{i=1}^{k}{p_{i}}=\sum _{i=1}^{k}x_{i}\end{aligned}}

Pirson je predložio da u slučaju da je nulta hipoteza tačna kao $n \to \infty$ , ograničavajuća distribucija navedene količine je raspodela $χ 2$ .

X^{2}=\sum _{i=1}^{k}{\frac {(x_{i}-m_{i})^{2}}{m_{i}}}=\sum _{i=1}^{k}{{\frac {x_{i}^{2}}{m_{i}}}-n}

Pirson se prvo bavio slučajem u kojem su očekivani brojevi $m i$ dovoljno veliki poznati brojevi u svim ćelijama pod pretpostavkom da se svaki $x i$ može uzeti kao normalno raspodeljen, i postigao je rezultat da, na granici kada $n$ postane veliko, $X 2$ sledi $χ 2$ raspodelu sa $k - 1$ stepeni slobode.

Pirson je zatim razmotrio slučaj u kojem očekivani brojevi zavise od parametara koji se moraju proceniti iz uzorka. On je predložio da kad $m i$ označava prave očekivane brojeve, a $m' i$ procenjene očekivane brojeva, razlika

X^{2}-{X'}^{2}=\sum _{i=1}^{k}{\frac {x_{i}^{2}}{m_{i}}}-\sum _{i=1}^{k}{\frac {x_{i}^{2}}{m'_{i}}}

obično biva pozitivna i dovoljno mala da se može zanemariti. U zaključku, Pirson je tvrdio da ako se smatra da je $X' 2$ takođe raspodeljena kao $χ 2$ distribucija sa $k - 1$ stepeni slobode, greška u ovoj aproksimaciji neće uticati na praktične odluke. Ovaj zaključak izazvao je određene kontroverze u praktičnim primenama i to nije bilo rarešeno tokom 20 godina, do objavljivanja Fišerovih publikacija iz 1922. i 1924. godine.^[13]^[14]

Aplikacije

U kriptoanalizi, hi-kvadratni test se koristi za upoređivanje distribucije otvorenog teksta i (mogućeg) dekriptovanja šifroteksta. Najniža vrednost testa znači da je dešifrovanje bilo uspešno sa velikom verovatnoćom.^[15]^[16] Ova metoda se može generalizovati za rešavanje savremenih kriptografskih problema.^[17]

U bioinformatici, hi-kvadratni test se koristi za upoređivanje raspodele određenih svojstava gena (npr. genomskog sadržaja, brzine mutacije, interakcione mreže klasterovanja itd.) koje pripadaju različitim kategorijama (npr. geni bolesti, osnovni geni, geni na određenom hromozomu itd.).^[18]^[19]

Reference

^ M.A. Sanders. „Characteristic function of the central chi-square distribution” (PDF). Архивирано из оригинала (PDF) 15. 7. 2011. г. Приступљено 6. 3. 2009.
^ Pearson, Karl (1900). „On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157—175. doi:10.1080/14786440009463897.
^ Abramowitz, Milton; Stegun, Irene Ann, ур. (1983) [јун 1964]. „поглавље 26”. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series. 55 (Ninth reprint with additional corrections of tenth original printing with corrections (December 1972); first изд.). Washington D.C.; New York: United States Department of Commerce, National Bureau of Standards; Dover Publications. стр. 940. ISBN 978-0-486-61272-0. LCCN 64-60036. MR 0167642. LCCN 65-12253.
^ NIST (2006). Engineering Statistics Handbook – Chi-Squared Distribution
^ Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). „Chi-Square Distributions including Chi and Rayleigh”. Continuous Univariate Distributions. 1 (Second изд.). John Wiley and Sons. стр. 415—493. ISBN 978-0-471-58495-7.
^ Mood, Alexander; Graybill, Franklin A.; Boes, Duane C. (1974). Introduction to the Theory of Statistics (Third изд.). McGraw-Hill. стр. 241–246. ISBN 978-0-07-042864-5.
^ ^а ^б Pearson, Karl (1900). „On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157—175. doi:10.1080/14786440009463897.
^ Pearson, Karl (1893). „Contributions to the mathematical theory of evolution [abstract]”. Proceedings of the Royal Society. 54: 329—333. JSTOR 115538. doi:10.1098/rspl.1893.0079.
^ Pearson, Karl (1895). „Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material”. Philosophical Transactions of the Royal Society. 186: 343—414. Bibcode:1895RSPTA.186..343P. JSTOR 90649. doi:10.1098/rsta.1895.0010.
^ Pearson, Karl (1901). „Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation”. Philosophical Transactions of the Royal Society A. 197 (287–299): 443—459. Bibcode:1901RSPTA.197..443P. JSTOR 90841. doi:10.1098/rsta.1901.0023.
^ Pearson, Karl (1916). „Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation”. Philosophical Transactions of the Royal Society A. 216 (538–548): 429—457. Bibcode:1916RSPTA.216..429P. JSTOR 91092. doi:10.1098/rsta.1916.0009.
^ Cochran, William G. (1952). „The Chi-square Test of Goodness of Fit”. The Annals of Mathematical Statistics. 23 (3): 315—345. JSTOR 2236678. doi:10.1214/aoms/1177729380.
^ Fisher, Ronald A. (1922). „On the Interpretation of chi-squared from Contingency Tables, and the Calculation of P”. Journal of the Royal Statistical Society. 85 (1): 87—94. JSTOR 2340521. doi:10.2307/2340521.
^ Fisher, Ronald A. (1924). „The Conditions Under Which chi-squared Measures the Discrepancey Between Observation and Hypothesis”. Journal of the Royal Statistical Society. 87 (3): 442—450. JSTOR 2341149.
^ „Chi-squared Statistic”. Practical Cryptography. Архивирано из оригинала 18. 02. 2015. г. Приступљено 18. 2. 2015.
^ „Using Chi Squared to Crack Codes”. IB Maths Resources. British International School Phuket.
^ Ryabko, B. Ya.; Stognienko, V. S.; Shokin, Yu. I. (2004). „A new test for randomness and its application to some cryptographic problems” (PDF). Journal of Statistical Planning and Inference. 123 (2): 365—376. doi:10.1016/s0378-3758(03)00149-6. Приступљено 18. 2. 2015.
^ Feldman, I.; Rzhetsky, A.; Vitkup, D. (2008). „Network properties of genes harboring inherited disease mutations”. PNAS. 105 (11): 4323—432. Bibcode:2008PNAS..105.4323F. PMC 2393821 . doi:10.1073/pnas.0701722105.
^ „chi-square-tests” (PDF). Архивирано из оригинала (PDF) 29. 06. 2018. г. Приступљено 29. 6. 2018.

Literatura

Weisstein, Eric W. „Chi-Squared Test”. MathWorld.
Corder, G. W.; Foreman, D. I. (2014), Nonparametric Statistics: A Step-by-Step Approach, New York: Wiley, ISBN 978-1118840313
Greenwood, Cindy; Nikulin, M. S. (1996), A guide to chi-squared testing, New York: Wiley, ISBN 0-471-55779-X
Nikulin, M. S. (1973), „Chi-squared test for normality”, Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, 2, стр. 119—122
Bagdonavicius, V.; Nikulin, M. S. (2011), „Chi-squared goodness-of-fit test for right censored data” (PDF), The International Journal of Applied Mathematics and Statistics, стр. 30—50 ^{[потребан је пун навод]}
Hald, Anders (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. ISBN 978-0-471-17912-2.
Elderton, William Palin (1902). „Tables for Testing the Goodness of Fit of Theory to Observation”. Biometrika. 1 (2): 155—163. doi:10.1093/biomet/1.2.155.
Hazewinkel Michiel, ур. (2001). „Chi-squared distribution”. Encyclopaedia of Mathematics. Springer. ISBN 978-1556080104.
Ramsey, PH (1988). „Evaluating the Normal Approximation to the Binomial Test”. Journal of Educational Statistics. 13 (2): 173—82. JSTOR 1164752. doi:10.2307/1164752.
Lancaster, H.O. (1969), The Chi-squared Distribution, Wiley
Dasgupta, Sanjoy D. A.; Gupta, Anupam K. (januar 2003). „An Elementary Proof of a Theorem of Johnson and Lindenstrauss” (PDF). Random Structures and Algorithms. 22 (1): 60—65. doi:10.1002/rsa.10073. Приступљено 1. 5. 2012.
M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1
Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. стр. 118. ISBN 978-0471093152.
Bartlett, M. S.; Kendall, D. G. (1946). „The Statistical Analysis of Variance-Heterogeneity and the Logarithmic Transformation”. Supplement to the Journal of the Royal Statistical Society. 8 (1): 128—138. JSTOR 2983618. doi:10.2307/2983618.
Pillai, Natesh S. (2016). „An unexpected encounter with Cauchy and Lévy”. Annals of Statistics. 44 (5): 2089—2097. arXiv:1505.01957 . doi:10.1214/15-aos1407.

Spoljašnje veze

Chi-squared test Архивирано на веб-сајту Wayback Machine (2. август 2019)

[1] M.A. Sanders. „Characteristic function of the central chi-square distribution” (PDF). Архивирано из оригинала (PDF) 15. 7. 2011. г. Приступљено 6. 3. 2009.

[2] Pearson, Karl (1900). „On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157—175. doi:10.1080/14786440009463897.

[abramowitz-3] Abramowitz, Milton; Stegun, Irene Ann, ур. (1983) [јун 1964]. „поглавље 26”. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series. 55 (Ninth reprint with additional corrections of tenth original printing with corrections (December 1972); first изд.). Washington D.C.; New York: United States Department of Commerce, National Bureau of Standards; Dover Publications. стр. 940. ISBN 978-0-486-61272-0. LCCN 64-60036. MR 0167642. LCCN 65-12253.

[4] NIST (2006). Engineering Statistics Handbook – Chi-Squared Distribution

[Johnson_et_al-5] Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). „Chi-Square Distributions including Chi and Rayleigh”. Continuous Univariate Distributions. 1 (Second изд.). John Wiley and Sons. стр. 415—493. ISBN 978-0-471-58495-7.

[6] Mood, Alexander; Graybill, Franklin A.; Boes, Duane C. (1974). Introduction to the Theory of Statistics (Third изд.). McGraw-Hill. стр. 241–246. ISBN 978-0-07-042864-5.

[Pearson1900-7] а ^б Pearson, Karl (1900). „On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157—175. doi:10.1080/14786440009463897.

[Pearson1893-8] Pearson, Karl (1893). „Contributions to the mathematical theory of evolution [abstract]”. Proceedings of the Royal Society. 54: 329—333. JSTOR 115538. doi:10.1098/rspl.1893.0079.

[Pearson1895-9] Pearson, Karl (1895). „Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material”. Philosophical Transactions of the Royal Society. 186: 343—414. Bibcode:1895RSPTA.186..343P. JSTOR 90649. doi:10.1098/rsta.1895.0010.

[Pearson1901-10] Pearson, Karl (1901). „Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation”. Philosophical Transactions of the Royal Society A. 197 (287–299): 443—459. Bibcode:1901RSPTA.197..443P. JSTOR 90841. doi:10.1098/rsta.1901.0023.

[Pearson1916-11] Pearson, Karl (1916). „Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation”. Philosophical Transactions of the Royal Society A. 216 (538–548): 429—457. Bibcode:1916RSPTA.216..429P. JSTOR 91092. doi:10.1098/rsta.1916.0009.

[Cochran1952-12] Cochran, William G. (1952). „The Chi-square Test of Goodness of Fit”. The Annals of Mathematical Statistics. 23 (3): 315—345. JSTOR 2236678. doi:10.1214/aoms/1177729380.

[Fisher1922-13] Fisher, Ronald A. (1922). „On the Interpretation of chi-squared from Contingency Tables, and the Calculation of P”. Journal of the Royal Statistical Society. 85 (1): 87—94. JSTOR 2340521. doi:10.2307/2340521.

[Fisher1924-14] Fisher, Ronald A. (1924). „The Conditions Under Which chi-squared Measures the Discrepancey Between Observation and Hypothesis”. Journal of the Royal Statistical Society. 87 (3): 442—450. JSTOR 2341149.

[practicalcrypto-15] „Chi-squared Statistic”. Practical Cryptography. Архивирано из оригинала 18. 02. 2015. г. Приступљено 18. 2. 2015.

[ibmath-16] „Using Chi Squared to Crack Codes”. IB Maths Resources. British International School Phuket.

[elsevier-17] Ryabko, B. Ya.; Stognienko, V. S.; Shokin, Yu. I. (2004). „A new test for randomness and its application to some cryptographic problems” (PDF). Journal of Statistical Planning and Inference. 123 (2): 365—376. doi:10.1016/s0378-3758(03)00149-6. Приступљено 18. 2. 2015.

[pnas-bics-18] Feldman, I.; Rzhetsky, A.; Vitkup, D. (2008). „Network properties of genes harboring inherited disease mutations”. PNAS. 105 (11): 4323—432. Bibcode:2008PNAS..105.4323F. PMC 2393821 . doi:10.1073/pnas.0701722105.

[chi-bics-19] „chi-square-tests” (PDF). Архивирано из оригинала (PDF) 29. 06. 2018. г. Приступљено 29. 6. 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]