Ugrađivanje reči

U obradi prirodnog jezika (NLP), ugrađivanje reči je reprezentacija reči. Ugrađivanje se koristi u analizi teksta. Tipično, reprezentacija je vektor realne vrednosti koji kodira značenje reči na takav način da se očekuje da su reči koje su bliže u vektorskom prostoru slične po značenju.^[1] Ugrađivanje reči se može dobiti korišćenjem jezičkog modelovanja i tehnika učenja karakteristika, gde se reči ili fraze iz rečnika mapiraju u vektore realnih brojeva.

Metode za generisanje ovog mapiranja uključuju neuronske mreže,^[2] smanjenje dimenzionalnosti na matrici pojavljivanja reči,^[3]^[4]^[5] modele verovatnoće,^[6] objašnjivu metodu baze znanja,^[7] i eksplicitno predstavljanje u smislu konteksta u kome se reči pojavljuju.^[8]

Pokazalo se da ugrađivanje reči i fraza, kada se koristi kao osnovna ulazna reprezentacija, poboljšava performanse u NLP zadacima kao što su sintaksičko raščlanjivanje^[9] i analiza osećanja.^[10]

Reference

^ Jurafsky, Daniel; H. James, Martin (2000). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0-13-095069-7.
^ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). „Distributed Representations of Words and Phrases and their Compositionality”. arXiv:1310.4546  [cs.CL].
^ Lebret, Rémi; Collobert, Ronan (2013). „Word Emdeddings through Hellinger PCA”. Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2014. arXiv:1312.5542 .
^ Levy, Omer; Goldberg, Yoav (2014). Neural Word Embedding as Implicit Matrix Factorization (PDF). NIPS.
^ Li, Yitan; Xu, Linli (2015). Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective (PDF). Int'l J. Conf. on Artificial Intelligence (IJCAI).
^ Globerson, Amir (2007). „Euclidean Embedding of Co-occurrence Data” (PDF). Journal of Machine Learning Research.
^ Qureshi, M. Atif; Greene, Derek (2018-06-04). „EVE: explainable vector based embedding technique using Wikipedia”. Journal of Intelligent Information Systems (на језику: енглески). 53: 137—165. ISSN 0925-9902. S2CID 10656055. arXiv:1702.06891 . doi:10.1007/s10844-018-0511-x.
^ Levy, Omer; Goldberg, Yoav (2014). Linguistic Regularities in Sparse and Explicit Word Representations (PDF). CoNLL. стр. 171—180.
^ Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew (2013). Parsing with compositional vector grammars (PDF). Proc. ACL Conf. Архивирано из оригинала (PDF) 2016-08-11. г. Приступљено 2014-08-14.
^ Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (PDF). EMNLP.

[1] Jurafsky, Daniel; H. James, Martin (2000). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0-13-095069-7.

[2] Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). „Distributed Representations of Words and Phrases and their Compositionality”. arXiv:1310.4546  [cs.CL].

[3] Lebret, Rémi; Collobert, Ronan (2013). „Word Emdeddings through Hellinger PCA”. Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2014. arXiv:1312.5542 .

[4] Levy, Omer; Goldberg, Yoav (2014). Neural Word Embedding as Implicit Matrix Factorization (PDF). NIPS.

[5] Li, Yitan; Xu, Linli (2015). Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective (PDF). Int'l J. Conf. on Artificial Intelligence (IJCAI).

[6] Globerson, Amir (2007). „Euclidean Embedding of Co-occurrence Data” (PDF). Journal of Machine Learning Research.

[7] Qureshi, M. Atif; Greene, Derek (2018-06-04). „EVE: explainable vector based embedding technique using Wikipedia”. Journal of Intelligent Information Systems (на језику: енглески). 53: 137—165. ISSN 0925-9902. S2CID 10656055. arXiv:1702.06891 . doi:10.1007/s10844-018-0511-x.

[8] Levy, Omer; Goldberg, Yoav (2014). Linguistic Regularities in Sparse and Explicit Word Representations (PDF). CoNLL. стр. 171—180.

[9] Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew (2013). Parsing with compositional vector grammars (PDF). Proc. ACL Conf. Архивирано из оригинала (PDF) 2016-08-11. г. Приступљено 2014-08-14.

[10] Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (PDF). EMNLP.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]