Start free trial
Take Naologic for a spin today, no credit card needed and no obligations.
Start free trial

Attention Mechanism - Does Bert use attention mechanism?


The attention mechanism is used differently by Bert for each model. Twelve to twenty-four attention layers, with twelve to sixteen attention heads apiece, are possible in the model. Since the weights are not shared between layers, a single Bert model might have as many as 384 different attention processes.