{ "cells": [ { "cell_type": "markdown", "id": "bac26684-8586-4ec4-ab7d-f3848e5c5fe4", "metadata": {}, "source": [ "In this practical work, you will manipulate the Segment Anything[1] API of Meta." ] }, { "cell_type": "markdown", "id": "0fb6ee88-64ec-42b8-8817-9cc411d977b0", "metadata": {}, "source": [ "1. First Part: Understanding of the API" ] }, { "cell_type": "markdown", "id": "a0345ab4-8992-41c7-84ad-2fdde79f02d7", "metadata": {}, "source": [ "Go to the Github page of the project:\\\n", "https://github.com/facebookresearch/segment-anything.git\n" ] }, { "cell_type": "markdown", "id": "4798a30f-7e8a-4d90-bc38-5303d2e249cb", "metadata": {}, "source": [ "Follow the guidelines to be able to render outputs for a given image.\\\n", "You can experiment on a small part of the SAM Dataset that you can find here:\\\n", "https://drive.google.com/drive/folders/1FCsy7yqoDoP_R1z7yTzgyvRqUs8rx17e?usp=sharing" ] }, { "cell_type": "markdown", "id": "b34e2baa-bbd6-4bc2-9e81-7e8fcd6a1c35", "metadata": {}, "source": [ "Render the results for a couple of images.\\\n", "Pay attention to the different components of the outputs. What are they?\\\n", "What are the classes that are part of the model prediction?\\\n", "What other insights does the model provide?\n" ] }, { "cell_type": "markdown", "id": "614a8ac1-f107-43f1-8b58-6a965b79f6f1", "metadata": {}, "source": [ "2. Second Part: Use SAM to generate ground truth" ] }, { "cell_type": "markdown", "id": "1aaea0c4-7d84-45b1-93f3-b985aa6ee3e7", "metadata": {}, "source": [ "You will use SAM as a zero-shot prediction tool on an unseen dataset to generate the semantic segmentation ground truth.\\\n", "To do so, we will use the INFRAPARIS Dataset[2] that can be found here:\\\n", "https://drive.google.com/drive/folders/1WMErTgOy4WcBJJzTKmRLz72eTr38tmFB?usp=sharing\n" ] }, { "cell_type": "markdown", "id": "cdada55d-06c3-4f80-b768-99eb151b5741", "metadata": {}, "source": [ "Use SAM on raw images to get the associated semantic segmentation maps. You don't need to predict all classes. Choose just a couple of them.\n", "Even though INFRAPARIS provides ground truth annotations, we will train a model on the pseudo-ground-truth generated by SAM to challenge its generalization power." ] }, { "cell_type": "markdown", "id": "f1cfbab4-8693-4309-aab6-2e81ce218ec3", "metadata": {}, "source": [ "3. Third Part: Train a U-Net on the SAM annotations" ] }, { "cell_type": "markdown", "id": "3909f069-ab73-4fd0-bf54-72101d2618f8", "metadata": {}, "source": [ "Once you have your training dataset all set, train a U-Net on it:\\\n", "https://github.com/milesial/Pytorch-UNet.git" ] }, { "cell_type": "markdown", "id": "1944313c-a505-4c36-91c5-f3ec9dd905bd", "metadata": {}, "source": [ "Please do not use the ground truth of INFRAPARIS for training. You only use them to test the performance of the model trained on the SAM predictions." ] }, { "cell_type": "markdown", "id": "af02b294-6205-46a0-9386-daa00ad08b75", "metadata": {}, "source": [ "What metrics are relevant in our case. Give the results and compare to the performance of a model trained on the INFRAPARIS annotations. Is the difference significant? What can you conclude regarding the generalization power of SAM?" ] }, { "cell_type": "code", "execution_count": null, "id": "2ced7c0d-faf9-454d-b080-0c0158324e40", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 5 }