Evaluation of models using CTRAIN¶

In this example, we evaluate a network trained on the MNIST dataset against $l_\infty$ perturbations with radius $\epsilon = 0.1$ in terms of standard accuracy, adversarial accuracy and certified accuracy.

First, we import the necessary torch library and CTRAIN functions

In [1]:

Copied!

import torch

from CTRAIN.model_definitions import CNN7_Shi
from CTRAIN.model_wrappers import ShiIBPModelWrapper
from CTRAIN.data_loaders import load_mnist
import torch

from CTRAIN.model_definitions import CNN7_Shi
from CTRAIN.model_wrappers import ShiIBPModelWrapper
from CTRAIN.data_loaders import load_mnist

Adding complete_verifier to sys.path

Thereafter, we load the MNIST dataset and define the neural network.

In [2]:

Copied!

in_shape = [1, 28, 28]
train_loader, test_loader = load_mnist(batch_size=128, val_split=False, data_root="../../data")

model = CNN7_Shi(in_shape=in_shape, n_classes=10)
in_shape = [1, 28, 28]
train_loader, test_loader = load_mnist(batch_size=128, val_split=False, data_root="../../data")

model = CNN7_Shi(in_shape=in_shape, n_classes=10)

MNIST dataset - Min value: -0.4242129623889923, Max value: 2.821486711502075

To evaluate the network, we have to wrap it around one of the model wrappers of CTRAIN. Here, we choose the Shi IBP wrapper, but all wrappers behave the same regarding evaluation.

In [3]:

Copied!





wrapped_model = ShiIBPModelWrapper(
    model, 
    input_shape=in_shape, 
    eps=0.1,
    num_epochs=70
)
wrapped_model = ShiIBPModelWrapper(
    model, 
    input_shape=in_shape, 
    eps=0.1,
    num_epochs=70
)

Now, we load the weights obtained from a previous training run (see the tutorial "Certified Training with CTRAIN").

In [4]:

Copied!

wrapped_model.load_state_dict(torch.load('../../mnist_0.1_model.pt'))
wrapped_model.load_state_dict(torch.load('../../mnist_0.1_model.pt'))

Out[4]:

<All keys matched successfully>

To get a rough assessment of the model performance, we call the evaluate function that uses the cheap incomplete verification methods IBP, CROWN-IBP and CROWN for certification. In addition, the PGD attack is run to identify adversarial examples for which the network is not robust. To save resources, we carry out the evaluation only for the first 1000 images of the test set.

In [11]:

Copied!

std_acc, cert_acc, adv_acc = wrapped_model.evaluate(test_loader, test_samples=1_000)
std_acc, cert_acc, adv_acc = wrapped_model.evaluate(test_loader, test_samples=1_000)

79it [00:00, 118.32it/s]

certified 990.0 / 1024 using IBP

10000it [00:02, 4481.84it/s]

certified 967.0 / 1000 after using CROWN

8it [00:08,  1.12s/it]

When printing the accuracy values, we see that the network is provably robust for 96.70% of the first 1000 images in the MNIST test set.

In [12]:

Copied!

print(f"Standard Accuracy {std_acc}")
print(f"Certified Accuracy {cert_acc}")
print(f"Adversarial Accuracy {adv_acc}")
print(f"Standard Accuracy {std_acc}")
print(f"Certified Accuracy {cert_acc}")
print(f"Adversarial Accuracy {adv_acc}")

Standard Accuracy 0.992
Certified Accuracy 0.9670000672340393
Adversarial Accuracy 0.978

However, these values were obtained using incomplete methods. Let's investigate whether we can achieve a more precise measurement using complete verification with $\alpha\beta$-CROWN.

In [ ]:

Copied!

std_acc, cert_acc, adv_acc = wrapped_model.evaluate_complete(test_loader, test_samples=1_000)
std_acc, cert_acc, adv_acc = wrapped_model.evaluate_complete(test_loader, test_samples=1_000)

In [10]:

Copied!

print(f"Standard Accuracy {std_acc}")
print(f"Certified Accuracy {cert_acc}")
print(f"Adversarial Accuracy {adv_acc}")
print(f"Standard Accuracy {std_acc}")
print(f"Certified Accuracy {cert_acc}")
print(f"Adversarial Accuracy {adv_acc}")

Standard Accuracy 0.992
Certified Accuracy 0.9780000448226929
Adversarial Accuracy 0.9779999852180481

After the complete evaluation, we see that we got a definitive result for each input as indicated by matching certified and adversarial accuracy. Complete verification revealed, that every input for which we could not find an adversarial example using PGD is actually certifiably robust. Thus, we conclude that the network achieves a certified accuracy of 97.8% on the first 1000 MNIST test images.