ship-vs-iceberg/README.md
2021-10-16 08:49:48 +02:00

63 lines
1.9 KiB
Markdown

# Ship vs iceberg discriminator
TL;DR: Discriminate between ships and icebergs from SAR imagery.
## Approach
Data augmentation and parameter sharing. CNN and ResNets.
## Data directory
The data directory is expected to have the structure as shown below:
data/
├── params
│   ├── base_cnn-scaling.pkl
│   ├── base_cnn-weights-loss.h5
│   ├── base_cnn-weights-val_loss.h5
│   ├── icenet-weights-loss.h5
│   └── icenet-weights-val_loss.h5
├── predictions
│   └── icenet-dev.csv
├── sample_submission.csv
├── test.json
└── train.json
where `{train,test}.json` is the data from the
[kaggle website](https://www.kaggle.com/c/statoil-iceberg-classifier-challenge).
## Log
### Residual base CNN
Summary:
* Test loss: 0.5099
* Test accuracy: 0.7932
* Epochs: 100
* Best val loss at epoch 70 (converged until 100, did not overfit)
Comments:
* Low variance -- training loss is consistently a bit lower than validation
loss.
* Since images are "artificially" labeled, it is hard to say what the bias is.
There should be some bias since this network does not overfit, and it also
looks like training converges after 100 epochs (with decaying learning rate).
* There may also be labeling noise. It is indeed suspicious that the validation
loss converges with very low variance. Perhaps revisit the labeling
approach for the base generator.
* Conclusion: Check labeling, then bring out the big guns and expand the
residual net.
With this model as a basis for the 9 regions, followed by a reshape, conv and
two dense layers, yields ok performance: Around 0.20 loss after few epochs.
However, validation loss is often lower than training loss. It might be that
the two distributions are not the same for both networks -- check the random
seed and verify! Might also be noisy training (because of augmentation).