Estudo de modelos de word embedding

Sousa, Samanta de

Use este identificador para citar ou linkar para este item: http://repositorio.utfpr.edu.br/jspui/handle/1/12522

Registro completo de metadados

Campo DC	Valor	Idioma
dc.creator	Sousa, Samanta de
dc.date.accessioned	2020-11-16T13:09:45Z	-
dc.date.available	2020-11-16T13:09:45Z	-
dc.date.issued	2016-11-16
dc.identifier.citation	SOUSA, Samanta de. Estudo de modelos de word embedding. 2016. 53 f. Trabalho de Conclusão de Curso (Graduação) - Universidade Tecnológica Federal do Paraná, Medianeira, 2016.	pt_BR
dc.identifier.uri	http://repositorio.utfpr.edu.br/jspui/handle/1/12522	-
dc.description.abstract	The area of Artificial Intelligence seeks to construct mechanisms that simulate the intelligence of the Human beings so that they perform tasks that help them. There is the field of Natural Language Processing, an AI sub-area that seeks to understand and To generate the natural language, in this way the PLN is used by AI as a means to The mechanisms that use the natural language in its execution, such as writing and production Of a text, translation, learning and teaching among others. The language follows a format Not difficult to process by the computer, such as sd morphological variations and Syntactic as well as the ambiguity in the natural language that hinder the process of comprehension, In this way, area methodologies convert such information so that the manipulation Computer are easier. Among the information representations Existing Word Embedding technique is currently in the PLN field, where The information is represented in vectors where their values are similar when the Words are similar, that is, it is a representation that encodes similarity relations Between the words besides having a low computational cost. In this way the goal of Work was to carry out a comparison between three models ofWord Embeddings Cbow, Skip- Gram and Glove with the purpose of identifying which presents better performance in the generation of Vectors of representation of words (embeddings). First, construction was carried out Of a corpus using Wikipedia in sequence, the pre-processing of those corpus Information to be used as a training set, the models were trained Using scripts that are created using the Gensim and Glove Python libraries, the Embedding evaluations were done with the files available from Pennington et al. (2014), where in each evaluation / test the parameters were modified in order to verify the Their influence on the performance of models. Some specific settings for running Of the training of the models were identified and reported in the study, the results obtained Demonstrated that the Cbow was the model that presented better performances in the majority Of the tests. It has been found that the Word Embeddings technique fairly Similarity information between words even with the values of the parameters being Small compared to other jobs.	pt_BR
dc.language	por	pt_BR
dc.publisher	Universidade Tecnológica Federal do Paraná	pt_BR
dc.rights	openAccess	pt_BR
dc.subject	Inteligencia Artificial	pt_BR
dc.subject	Processamento de linguagem natural (Computação)	pt_BR
dc.subject	Bibliotecas digitais	pt_BR
dc.subject	Artificial intelligence	pt_BR
dc.subject	Natural language processing (Computer science)	pt_BR
dc.subject	Digital libraries	pt_BR
dc.title	Estudo de modelos de word embedding	pt_BR
dc.title.alternative	Study word embedding models	pt_BR
dc.type	bachelorThesis	pt_BR
dc.description.resumo	A área de Inteligência Artificial busca construir mecanismos que simulem a inteligência do ser humano de forma que os mesmos executem tarefas que os auxiliem. Tem-se o campo de estudo de Processamento de Língua Natural uma sub área de IA que busca compreender e gerar a língua natural, dessa forma o PLN ´e utilizado pela IA como um meio para aprimorar os mecanismos que utilizam da língua natural na sua execução, como escrita e produção de um texto, tradução, aprendizagem e ensino entre outros. A língua segue um formato não estruturado de difícil processamento pelo computador, como as variações morfológicas e sintáticas além da ambiguidade na língua natural que dificultam o processo de compreensão, dessa forma metodologias da área convertem tais informações de forma que a manipulação das mesmas pelo computador sejam mais fáceis. Dentre as representações de informações existentes a técnica deWord Embedding está em tendência atualmente no campo de PLN, onde as informações são representadas em vetores onde os seus valores são semelhantes quando as palavras são similares, ou seja, ´e uma representação que codifica as relações de similaridade entre as palavras além de possuir um custo computacional baixo. Dessa forma o objetivo do trabalho foi realizar um comparativo entre três modelos de Word Embeddings Cbow, Skipgram e Glove com a finalidade de identificar qual apresenta melhor desempenho na geração dos vetores de representação das palavras (embeddings). Primeiramente foi realizada a construção de um corpus utilizando a Wikipédia em sequência foi realizado o pré-processamento dessas informações para serem utilizadas como conjunto de treinamento, os modelos foram treinados utilizando scripts que forma criados utilizando as bibliotecas do Python Gensim e Glove, as avaliações dos embeddings foram feitas com as arquivos disponíveis por Pennington et al. (2014), onde em cada avaliação/teste feito os parâmetros eram modificados afim de verificar a sua influência no desempenho dos modelos. Algumas configurações específicas para execução do treinamento dos modelos foram identificadas e relatadas no trabalho, os resultados obtidos demonstraram que o Cbow foi o modelo que apresentou melhores desempenhos na maioria dos testes. Foi verificado que a técnica de Word Embeddings codifica razoavelmente bem as informações de similaridade entre as palavras mesmo com os valores dos parâmetros sendo pequenos se comparados com outros trabalhos.	pt_BR
dc.degree.local	Medianeira	pt_BR
dc.publisher.local	Medianeira	pt_BR
dc.contributor.advisor1	Candido Junior, Arnaldo
dc.contributor.advisor-co1	Hartmann, Nathan Siegle
dc.contributor.referee1	Candido Junior, Arnaldo
dc.contributor.referee2	Aikes Junior, Jorge
dc.contributor.referee3	Pessini, Evando Carlos
dc.publisher.country	Brasil	pt_BR
dc.publisher.program	Graduação em Ciência da Computação	pt_BR
dc.publisher.initials	UTFPR	pt_BR
dc.subject.cnpq	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO	pt_BR
Aparece nas coleções:	MD - Ciência da Computação

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
estudomodeloswordembedding.pdf		852,42 kB	Adobe PDF	Visualizar/Abrir

Mostrar registro simples do item Recomendar este item Visualizar estatísticas