aiffel_fastpaper_6_All you need is attention

aiffel 대전 1기 nlp 반 fast paper

fast paper 는 논문의 abstract, introduction 만 읽는 읽는 모임

2일차 3편 중 3편

원문 논문 링크 :

1.abstract

- 기존의 CNN, RNN 구조들을 다 뺐다.

2.introduction

- 기존 RNN 의 문제점 : 1)입력 문장의 길이가 길어지면 성능이 저하된다. 2)재귀/순환구조는 병렬화가 불가능하다. 3)receptive field 가 작다(hidden state of t-1, input vector of t)

- Tranformer 구조에서는 어떻게 위 문제들을 해결하나?

- 1)global receptive field : 문장을 통째로 보기 때문에, (1~T) 모든 vector 보고 결정한다.

- 2)multi-head attention 구조로 병렬연산 가능

- 3)앙상블 구조

# keywords

self-attention, encoder

self-attention, decoder

encoder - decoder attention

multi-head attention

scaled dot product attention

positional encoding

LEARNER