Keras Functional models: Few pointers for debugging
Keras is a high-level API over popular deep-learning frameworks like tensorflow, theano and CNTK. Although it is an excellent platform for beginners due to abstraction, the same advantage can become a problem while debugging.
In my personal opinion, you should write your keras model without bringing the backend functionalities into picture. For that, I prefer using functional model over Sequential models. As a brief introduction, in functional models, you need to mention the input to a layer and get the output to a variable.
Following is an example sequential model:
my_model = Sequential()
my_model.add(Embedding(max_features, embd_size))
my_model.add(LSTM(lstm_size, dropout=lstm_drop_rate, recurrent_dropout=lstm_drop_rate))
my_model.add(Dense(hidden_size, activation='sigmoid'))
my_model.add(Dense(1, activation='sigmoid'))
Same model can be written as a functional model as following:
in_tensor = Input(shape=(maxlen,))
embd_out = Embedding(max_features, embd_size)(in_tensor)
lstm_out = LSTM(lstm_size, dropout=lstm_drop_rate, recurrent_dropout=lstm_drop_rate)(embd_out)
hid_tensor = Dense(hidden_size, activation=’sigmoid’)(lstm_out)
out_tensor = Dense(1, activation=’sigmoid’)(hid_tensor)my_model = Model(inputs=in_tensor, outputs=out_tensor)
As you can see, the syntax is not very different from sequential model. Using functional API helps you develop your own model architecture. Although this post mentions as being specific to a functional model, few points work on the sequential model.
Before pointers, an advice: use jupyter-notebook or similar tools while building model as it makes the process much easier. After checking the model, you can convert to script and run on full data.
Know about your keras configuration
People using CNN for first time generally get stuck at error due to incorrect order of image channels, width and height. Read your keras.json file and understand what each option means from here https://keras.io/backend/#kerasjson-details
Look at model.summary()
This will give you an overview of the model. You can look at the shapes of layers to check if it is same as what you were expecting.
my_model.summary() gave following output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 100) 0
_________________________________________________________________
embedding_3 (Embedding) (None, 100, 128) 2560000
_________________________________________________________________
lstm_2 (LSTM) (None, 128) 131584
_________________________________________________________________
dense_3 (Dense) (None, 32) 4128
_________________________________________________________________
dense_4 (Dense) (None, 1) 33
=================================================================
Total params: 2,695,745.0
Trainable params: 2,695,745.0
Non-trainable params: 0.0
_________________________________________________________________
If I were aiming for output at each layer of LSTM, I can see that the model is not right. I would need a return_sequences=True in LSTM layer and follow it by TimeDistributed Dense layers (go to documentation to understand both).
This is very useful while using CNN because beginners confuse about batch_size and image dimensions.
Getting output at intermediate layers
At certain times, you would like to look at what the layers in the model are predicting. Let me give an example of one such problem I faced: in a model similar to the one given in the post, all my predictions were same. I checked in training phase, the loss and accuracy were changing nicely. Thus I needed to check the output of the layers.
model.layers will give you a list of all layer objects. my_model.layers gives following:
[<keras.engine.topology.InputLayer at 0x224162554a8>,
<keras.layers.embeddings.Embedding at 0x224162554e0>,
<keras.layers.recurrent.LSTM at 0x22424ddc7b8>,
<keras.layers.core.Dense at 0x22416251e10>,
<keras.layers.core.Dense at 0x22416255e10>]
Say you want to look at output of LSTM layer(index 2 in model.layers), create a temporary model as following :
temp_model = Model(inputs=my_model.input, outputs=my_model.layers[2].output)
temp_model.summary() gives:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 100) 0
_________________________________________________________________
embedding_3 (Embedding) (None, 100, 128) 2560000
_________________________________________________________________
lstm_2 (LSTM) (None, 128) 131584
=================================================================
Total params: 2,691,584.0
Trainable params: 2,691,584.0
Non-trainable params: 0.0
_________________________________________________________________
now you can use model.predict() to look at LSTM output.
Another way of doing it:
from keras import backend as K
model_func = K.function([my_model.layers[0].input],[my_model.layers[2].output])
Now just call model_func([x_test]) to get the output.
Getting layer weights
For my problem, I found that at a certain layer, the output was very small thus causing the activation function to give out zero and the final layer predicting same number for all the inputs.
I confirmed it by looking at the weights of the layer, which can be done by using layer.get_weights() . Let’s say you wanted to look at the last layer’s weights, you should run my_model.layers[-1].get_weights()
These points have been written as per my experience and my conversation with few other users. Feedback for improvement and addition in the post are welcome.