![Checkout](https://naologiccom.imgix.net/website-update/general/checkout.png?auto=compress&w=64&fm=png)
Start free trial
Take Naologic for a spin today, no credit card needed and no obligations.
Start free trial Question
Attention Mechanism - Does Bert use attention mechanism?
Answer
The attention mechanism is used differently by Bert for each model. Twelve to twenty-four attention layers, with twelve to sixteen attention heads apiece, are possible in the model. Since the weights are not shared between layers, a single Bert model might have as many as 384 different attention processes.