RoBERTa-PreLayerNorm

class transformers.RobertaPreLayerNormConfig

( vocab_size = 30522 hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 pad_token_id = 1 bos_token_id = 0 eos_token_id = 2 position_embedding_type = 'absolute' use_cache = True classifier_dropout = None **kwargs )

Parameters

vocab_size (int, optional, defaults to 30522) — Vocabulary size of the RoBERTa-PreLayerNorm model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling RobertaPreLayerNormModel or TFRobertaPreLayerNormModel.
hidden_size (int, optional, defaults to 768) — Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (int, optional, defaults to 12) — Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (int, optional, defaults to 3072) — Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
hidden_act (str or Callable, optional, defaults to "gelu") — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
hidden_dropout_prob (float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (float, optional, defaults to 0.1) — The dropout ratio for the attention probabilities.
max_position_embeddings (int, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (int, optional, defaults to 2) — The vocabulary size of the token_type_ids passed when calling RobertaPreLayerNormModel or TFRobertaPreLayerNormModel.
initializer_range (float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (float, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers.
position_embedding_type (str, optional, defaults to "absolute") — Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).
is_decoder (bool, optional, defaults to False) — Whether the model is used as a decoder or not. If False, the model is used as an encoder.
use_cache (bool, optional, defaults to True) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.
classifier_dropout (float, optional) — The dropout ratio for the classification head.

This is the configuration class to store the configuration of a RobertaPreLayerNormModel or a TFRobertaPreLayerNormModel. It is used to instantiate a RoBERTa-PreLayerNorm model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the RoBERTa-PreLayerNorm andreasmadsen/efficient_mlm_m0.40 architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

>>> from transformers import RobertaPreLayerNormConfig, RobertaPreLayerNormModel

>>> # Initializing a RoBERTa-PreLayerNorm configuration
>>> configuration = RobertaPreLayerNormConfig()

>>> # Initializing a model (with random weights) from the configuration
>>> model = RobertaPreLayerNormModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

Transformers

RoBERTa-PreLayerNorm

Overview

RobertaPreLayerNormConfig

class transformers.RobertaPreLayerNormConfig

RobertaPreLayerNormModel

class transformers.RobertaPreLayerNormModel

forward

RobertaPreLayerNormForCausalLM

class transformers.RobertaPreLayerNormForCausalLM

forward

RobertaPreLayerNormForMaskedLM

class transformers.RobertaPreLayerNormForMaskedLM

forward

RobertaPreLayerNormForSequenceClassification

class transformers.RobertaPreLayerNormForSequenceClassification

forward

RobertaPreLayerNormForMultipleChoice

class transformers.RobertaPreLayerNormForMultipleChoice

forward

RobertaPreLayerNormForTokenClassification

class transformers.RobertaPreLayerNormForTokenClassification

forward

RobertaPreLayerNormForQuestionAnswering

class transformers.RobertaPreLayerNormForQuestionAnswering

forward

TFRobertaPreLayerNormModel

class transformers.TFRobertaPreLayerNormModel

call

TFRobertaPreLayerNormForCausalLM

class transformers.TFRobertaPreLayerNormForCausalLM

call

TFRobertaPreLayerNormForMaskedLM

class transformers.TFRobertaPreLayerNormForMaskedLM

call

TFRobertaPreLayerNormForSequenceClassification

class transformers.TFRobertaPreLayerNormForSequenceClassification

call

TFRobertaPreLayerNormForMultipleChoice

class transformers.TFRobertaPreLayerNormForMultipleChoice

call

TFRobertaPreLayerNormForTokenClassification

class transformers.TFRobertaPreLayerNormForTokenClassification

call

TFRobertaPreLayerNormForQuestionAnswering

class transformers.TFRobertaPreLayerNormForQuestionAnswering

call

FlaxRobertaPreLayerNormModel

class transformers.FlaxRobertaPreLayerNormModel

__call__

FlaxRobertaPreLayerNormForCausalLM

class transformers.FlaxRobertaPreLayerNormForCausalLM

__call__

FlaxRobertaPreLayerNormForMaskedLM

class transformers.FlaxRobertaPreLayerNormForMaskedLM

__call__

FlaxRobertaPreLayerNormForSequenceClassification

class transformers.FlaxRobertaPreLayerNormForSequenceClassification

__call__

FlaxRobertaPreLayerNormForMultipleChoice

class transformers.FlaxRobertaPreLayerNormForMultipleChoice

__call__

FlaxRobertaPreLayerNormForTokenClassification

class transformers.FlaxRobertaPreLayerNormForTokenClassification

__call__

FlaxRobertaPreLayerNormForQuestionAnswering

class transformers.FlaxRobertaPreLayerNormForQuestionAnswering

__call__

call

call

call

call

call

call

call