Spaces:
Paused
Paused
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
rendered properly in your Markdown viewer. | |
--> | |
# μ΄ν μ λ©μ»€λμ¦[[attention_mechanisms]] | |
λλΆλΆμ νΈλμ€ν¬λ¨Έ λͺ¨λΈμ μ λ°©νλ ¬μΈ μ 체 μ΄ν μ μ μ¬μ©ν©λλ€. | |
νμ§λ§ μ΄λ κΈ΄ ν μ€νΈλ₯Ό λ€λ£° λλ ν° κ³μ° λ³λͺ© νμμ μ λ°ν μ μμ΅λλ€. | |
`Longformer`μ `Reformer`λ νλ ¨ μλλ₯Ό λμ΄κΈ° μν΄ μ΄ν μ νλ ¬μ ν¬μ λ²μ μ μ¬μ©νμ¬ ν¨μ¨μ λμ΄λ €λ λͺ¨λΈμ λλ€. | |
## LSH μ΄ν μ [[lsh_attention]] | |
[Reformer](#reformer)λ LSH(Locality Sensitive Hashing) μ΄ν μ μ μ¬μ©ν©λλ€. softmax(QK^t)μμλ νλ ¬ QK^tμ (softmax μ°¨μμμ) κ°μ₯ ν° μμλ€λ§ μ μ©ν κΈ°μ¬λ₯Ό ν κ²μ λλ€. | |
λ°λΌμ κ°κ°μ 쿼리 qμ λν΄, qμ κ°κΉμ΄ ν€ kλ§ κ³ λ €ν μ μμ΅λλ€. ν΄μ ν¨μλ qμ kκ° κ°κΉμ΄μ§ μ¬λΆλ₯Ό κ²°μ νλ λ° μ¬μ©λ©λλ€. | |
μ΄ν μ λ§μ€ν¬λ νμ¬ ν ν°μ λ§μ€νΉνμ¬ λ³κ²½λ©λλ€. μ΄ λ 첫 λ²μ§Έ μμΉμ ν ν°μ μ μΈν©λλ€. μλνλ©΄ 쿼리μ ν€κ° λμΌν κ°μ κ°κ² λκΈ° λλ¬Έμ λλ€(μλ‘ λ§€μ° μ μ¬ν¨). | |
ν΄μλ μ½κ°μ 무μμμ±μ κ°μ§ μ μμΌλ―λ‘, μ€μ λ‘λ μ¬λ¬ κ°μ ν΄μ ν¨μκ° μ¬μ©λκ³ (`n_rounds` 맀κ°λ³μμ μν΄ κ²°μ λ¨) κ·Έ νμ νκ· κ°μ μ·¨νκ² λ©λλ€. | |
## μ§μ μ΄ν μ [[local_attention]] | |
[Longformer](#longformer)λ μ§μ μ΄ν μ μ μ¬μ©ν©λλ€. μ’ μ’ νΉμ ν ν°μ λν΄ μ§μ 컨ν μ€νΈ(μ: μΌμͺ½κ³Ό μ€λ₯Έμͺ½μ μλ λ κ°μ ν ν°μ 무μμΈκ°μ?)λ§μΌλ‘λ μμ μ μννλλ° μΆ©λΆν©λλ€. | |
λν μμ μ°½(window)μ κ°μ§ μ΄ν μ λ μ΄μ΄λ₯Ό μμμΌλ‘μ¨ λ§μ§λ§ λ μ΄μ΄λ μ°½ λ΄μ ν ν°λΏλ§ μλλΌ λ λ§μ μμ ν ν°μ λν μμ© μμ(receptive field)μ κ°κ² λμ΄ μ 체 λ¬Έμ₯μ ννμ ꡬμΆν μ μμ΅λλ€. | |
μ¬μ μ μ νλ μΌλΆ μ λ ₯ ν ν°λ€μ μ μ μ΄ν μ μ λ°μ΅λλ€. μ΄ λͺ κ°μ ν ν°μ λν΄μλ μ΄ν μ νλ ¬μ΄ λͺ¨λ ν ν°μ μ κ·Όν μ μμΌλ©°, μ΄ κ³Όμ μ λμΉμ μΌλ‘ μ΄λ£¨μ΄μ§λλ€. | |
λ€λ₯Έ λͺ¨λ ν ν°λ€μ λ‘컬 μ°½ λ΄μ ν ν°λ€μ λν΄ ν΄λΉ νΉμ ν ν°λ€μλ μ κ·Όν μ μμ΅λλ€. μ΄λ λ Όλ¬Έμ Figure 2dμμ λνλλ©°, μλμ μν μ΄ν μ λ§μ€ν¬κ° μ μλμ΄ μμ΅λλ€: | |
<div class="flex justify-center"> | |
<img scale="50 %" align="center" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/local_attention_mask.png"/> | |
</div> | |
μ μ νλΌλ―Έν°μ μ΄ν μ νλ ¬μ μ¬μ©νλ©΄ λͺ¨λΈμ΄ λ ν° μνμ€ μ λ ₯ κΈΈμ΄λ₯Ό κ°μ§ μ μμ΅λλ€. | |
## λ€λ₯Έ λ°©λ²λ€[[other_tricks]] | |
### μΆλ³ μμΉ μΈμ½λ©[[axial_positional_encodings]] | |
[Reformer](#reformer)λ μΆλ³ μμΉ μΈμ½λ©(axial positional encodings)μ μ¬μ©ν©λλ€. κΈ°μ‘΄μ νΈλμ€ν¬λ¨Έ λͺ¨λΈμμλ μμΉ μΈμ½λ© νλ ¬ Eλ ν¬κΈ°κ° \\(l \times d\\)μΈ νλ ¬μ΄λ©°, | |
μ¬κΈ°μ \\(l\\)μ μνμ€ κΈΈμ΄(sequence length)μ΄κ³ \\(d\\)λ μ¨κ²¨μ§ μν(hidden state)μ μ°¨μμ λλ€. λ§€μ° κΈ΄ ν μ€νΈμ κ²½μ°, μ΄ νλ ¬μ λ§€μ° ν¬λ©° GPU μμμ 곡κ°μ λ§μ΄ μ°¨μ§ν μ μμ΅λλ€. | |
μ΄λ₯Ό μννκΈ° μν΄, μΆλ³ μμΉ μΈμ½λ©μ ν° νλ ¬ Eλ₯Ό λ κ°μ μμ νλ ¬ E1κ³Ό E2λ‘ λΆν΄ν©λλ€. μ΄λ E1μ ν¬κΈ°λ \\(l_{1} \times d_{1}\\)μ΄κ³ , E2μ ν¬κΈ°λ \\(l_{2} \times d_{2}\\)μ λλ€. | |
μ΄λ \\(l_{1} \times l_{2} = l\\)μ΄κ³ \\(d_{1} + d_{2} = d\\)(κΈΈμ΄μ λν κ³±μ μ°μ°μ μ¬μ©νλ©΄ ν¨μ¬ μμμ§λλ€). Eμ μκ° λ¨κ³ jμ λν μλ² λ©μ E1μμ μκ° λ¨κ³ \\(j \% l1\\)μ μλ² λ©κ³Ό E2μμ μκ° λ¨κ³ \\(j // l1\\)μ μλ² λ©μ μ°κ²°νμ¬ μ»μ΅λλ€. |