%load_ext autoreload
%autoreload 2
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

filter_loss[source]

filter_loss(loss, features, problem)

class BertMultiTaskBody[source]

BertMultiTaskBody(*args, **kwargs) :: Model

Model to extract bert features and dispatch corresponding rows to each problem_chunk.

for each problem chunk, we extract corresponding features and hidden features for that problem. The reason behind this is to save computation for downstream processing. For example, we have a batch of two instances and they're from problem a and b respectively: Input: [{'input_ids': [1,2,3], 'a_loss_multiplier': 1, 'b_loss_multiplier': 0}, {'input_ids': [4,5,6], 'a_loss_multiplier': 0, 'b_loss_multiplier': 1}] Output: { 'a': {'input_ids': [1,2,3], 'a_loss_multiplier': 1, 'b_loss_multiplier': 0} 'b': {'input_ids': [4,5,6], 'a_loss_multiplier': 0, 'b_loss_multiplier': 1} }

mtl_body = BertMultiTaskBody(params=params)
set_phase(TRAIN)
features, hidden_features = mtl_body(one_batch_data)
404 Client Error: Not Found for url: https://huggingface.co/voidful/albert_chinese_tiny/resolve/main/tf_model.h5
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFAlbertModel: ['predictions.LayerNorm.bias', 'predictions.LayerNorm.weight', 'predictions.dense.weight', 'predictions.decoder.weight', 'predictions.dense.bias', 'predictions.decoder.bias', 'predictions.bias']
- This IS expected if you are initializing TFAlbertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFAlbertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFAlbertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFAlbertModel for predictions without further training.
2021-06-12 22:40:51.719 | CRITICAL | m3tl.embedding_layer.base:__init__:58 - Modal Type id mapping: 
 {
    "array": 0,
    "cate": 1,
    "text": 2
}
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f4a1c08b9f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

class BertMultiTaskTop[source]

BertMultiTaskTop(*args, **kwargs) :: Model

Model to create top layer, aka classification layer, for each problem.

# top layer takes per-problem features and hidden features
from m3tl.utils import dispatch_features

features_per_problem, hidden_features_per_problem = {}, {}
for problem in params.problem_list:
    features_per_problem[problem], hidden_features_per_problem[problem] = dispatch_features(
        features=features['all'], hidden_feature=hidden_features['all'], problem=problem, mode=TRAIN
    )

input_embeddings = m3tl.utils.get_embedding_table_from_model(
    mtl_body.bert.bert_model)
mtl_top = BertMultiTaskTop(params=params, input_embeddings=input_embeddings)
set_phase(TRAIN)
logit_dict = mtl_top((features_per_problem, hidden_features_per_problem))
2021-06-12 22:40:58.059 | WARNING  | m3tl.problem_types.masklm:__init__:41 - Share embedding is enabled but hidden_size != embedding_size
for problem, problem_logit in logit_dict.items():
    # last dim of logits equals to num_classes
    assert problem_logit.shape[-1] == params.get_problem_info(problem=problem, info_name='num_classes')

class BertMultiTask[source]

BertMultiTask(*args, **kwargs) :: Model

Model groups layers into an object with training and inference features.

Arguments: inputs: The input(s) of the model: a keras.Input object or list of keras.Input objects. outputs: The output(s) of the model. See Functional API example below. name: String, the name of the model.

There are two ways to instantiate a Model:

1 - With the "Functional API", where you start from Input, you chain layer calls to specify the model's forward pass, and finally you create your model from inputs and outputs:

import tensorflow as tf

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

2 - By subclassing the Model class: in that case, you should define your layers in __init__ and you should implement the model's forward pass in call.

import tensorflow as tf

class MyModel(tf.keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)

  def call(self, inputs):
    x = self.dense1(inputs)
    return self.dense2(x)

model = MyModel()

If you subclass Model, you can optionally have a training argument (boolean) in call, which you can use to specify a different behavior in training and inference:

import tensorflow as tf

class MyModel(tf.keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
    self.dropout = tf.keras.layers.Dropout(0.5)

  def call(self, inputs, training=False):
    x = self.dense1(inputs)
    if training:
      x = self.dropout(x, training=training)
    return self.dense2(x)

model = MyModel()

Once the model is created, you can config the model with losses and metrics with model.compile(), train the model with model.fit(), or use the model to do prediction with model.predict().

mtl = BertMultiTask(params=params)
logit_dict = mtl(one_batch_data)
for problem, problem_logit in logit_dict.items():
    # last dim of logits equals to num_classes
    assert problem_logit.shape[-1] == params.get_problem_info(problem=problem, info_name='num_classes')
404 Client Error: Not Found for url: https://huggingface.co/voidful/albert_chinese_tiny/resolve/main/tf_model.h5
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFAlbertModel: ['predictions.LayerNorm.bias', 'predictions.LayerNorm.weight', 'predictions.dense.weight', 'predictions.decoder.weight', 'predictions.dense.bias', 'predictions.decoder.bias', 'predictions.bias']
- This IS expected if you are initializing TFAlbertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFAlbertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFAlbertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFAlbertModel for predictions without further training.
2021-06-12 22:41:03.366 | CRITICAL | m3tl.embedding_layer.base:__init__:58 - Modal Type id mapping: 
 {
    "array": 0,
    "cate": 1,
    "text": 2
}
2021-06-12 22:41:03.459 | WARNING  | m3tl.problem_types.masklm:__init__:41 - Share embedding is enabled but hidden_size != embedding_size
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
mtl.compile()
hist = mtl.fit(train_dataset, validation_data=train_dataset, validation_steps=1, steps_per_epoch=1, epochs=3, verbose=1)
2021-06-12 22:41:04.946 | CRITICAL | __main__:compile:62 - Initial lr: 0.0
2021-06-12 22:41:04.947 | CRITICAL | __main__:compile:63 - Train steps: 0
2021-06-12 22:41:04.947 | CRITICAL | __main__:compile:64 - Warmup steps: 0
Epoch 1/3
WARNING: AutoGraph could not transform <bound method BertMultiTaskBody.call of <__main__.BertMultiTaskBody object at 0x7f490c151cd0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: invalid value for "node": expected "ast.AST", got "<class 'NoneType'>"; to visit lists of nodes, use "visit_block" instead
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
1/1 [==============================] - ETA: 0s - mean_acc: 2.4794 - weibo_fake_cls_acc: 0.5000 - weibo_fake_ner_acc: 0.1143 - BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9767 - BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2088 - BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0114 - BertMultiTaskTop/weibo_masklm/losses/0: 10.0653
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
1/1 [==============================] - 14s 14s/step - mean_acc: 2.4794 - weibo_fake_cls_acc: 0.5000 - weibo_fake_ner_acc: 0.1143 - BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9767 - BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2088 - BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0114 - BertMultiTaskTop/weibo_masklm/losses/0: 10.0653 - val_loss: 14.3237 - val_mean_acc: 0.3214 - val_weibo_fake_cls_acc: 0.5000 - val_weibo_fake_ner_acc: 0.1429 - val_BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9819 - val_BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2721 - val_BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0120 - val_BertMultiTaskTop/weibo_masklm/losses/0: 10.0577
Epoch 2/3
1/1 [==============================] - 1s 717ms/step - mean_acc: 2.4846 - weibo_fake_cls_acc: 0.5000 - weibo_fake_ner_acc: 0.1143 - BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9992 - BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.1767 - BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0816 - BertMultiTaskTop/weibo_masklm/losses/0: 10.0360 - val_loss: 14.3237 - val_mean_acc: 0.3214 - val_weibo_fake_cls_acc: 0.5000 - val_weibo_fake_ner_acc: 0.1429 - val_BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9819 - val_BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2721 - val_BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0120 - val_BertMultiTaskTop/weibo_masklm/losses/0: 10.0577
Epoch 3/3
1/1 [==============================] - 1s 602ms/step - mean_acc: 2.5813 - weibo_fake_cls_acc: 0.5000 - weibo_fake_ner_acc: 0.1429 - BertMultiTaskTop/weibo_fake_cls/losses/0: 1.5078 - BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2698 - BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0046 - BertMultiTaskTop/weibo_masklm/losses/0: 10.0627 - val_loss: 14.3237 - val_mean_acc: 0.3214 - val_weibo_fake_cls_acc: 0.5000 - val_weibo_fake_ner_acc: 0.1429 - val_BertMultiTaskTop/weibo_fake_cls/losses/0: 0.9819 - val_BertMultiTaskTop/weibo_fake_multi_cls/losses/0: 1.2721 - val_BertMultiTaskTop/weibo_fake_ner/losses/0: 2.0120 - val_BertMultiTaskTop/weibo_masklm/losses/0: 10.0577
params.output_body_pooled_hidden = True
params.output_body_seq_hidden = True
params.output_mtl_model_hidden = True
mtl = BertMultiTask(params=params)
logit_dict = mtl(one_batch_data)
assert 'pooled' in logit_dict
assert 'seq' in logit_dict
assert 'mtl' in logit_dict
404 Client Error: Not Found for url: https://huggingface.co/voidful/albert_chinese_tiny/resolve/main/tf_model.h5
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFAlbertModel: ['predictions.LayerNorm.bias', 'predictions.LayerNorm.weight', 'predictions.dense.weight', 'predictions.decoder.weight', 'predictions.dense.bias', 'predictions.decoder.bias', 'predictions.bias']
- This IS expected if you are initializing TFAlbertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFAlbertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFAlbertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFAlbertModel for predictions without further training.
2021-06-12 22:41:25.001 | CRITICAL | m3tl.embedding_layer.base:__init__:58 - Modal Type id mapping: 
 {
    "array": 0,
    "cate": 1,
    "text": 2
}
2021-06-12 22:41:25.094 | WARNING  | m3tl.problem_types.masklm:__init__:41 - Share embedding is enabled but hidden_size != embedding_size
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.