Ö>: tháng 4 2019

I recently had to add an anaconda environment to already created conda environment and faced some issue. Thought I will explain this so others don’t have to go through the same thing.

I am using PyCharm community edition 2017.3.4 so if you are using any other version of PyCharm you will have to use your JetBrains tool skills to change the steps. Check this video out if you want to improve those skills. A JetBrains employee told me he intends to do a PyCharm version too.

Now the actual steps for this version

Click on Configure > Settings to open up settings in PyCharm
Search for “Project Interpreter”. My PyCharm looks like this

Click on Add local via the settings on the right side

Select “conda environment”
Click on “Existing environment” and navigate to the environment that you want to use. Note that you have to select the bin/python file inside the conda environment for PyCharm to be able to recognise the environment
Make sure to click the “Make available to all projects” if you want the interpreter to be used by multiple projects

Click ok and you are done

Until now attention mechanisms could be generally divided into two types:

1) Detection proposals, such as the Faster R-CNN (RPN) proposals. The ROI-Pooling operation is an attention mechanism that enables the second stage of the detector to attend only to the relevant features. The disadvantage of this approach is that it doesn’t use information outside of that proposal that can be very important for classifying it correctly in the second stage.

2) Global attention mechanisms, which re-weight the entire feature map according to a learned attention “heat map”. The disadvantage of this approach is that it doesn’t use the information about the objects in the image to generate the attention map.

This paper combines the two approaches into one, and thus mitigating their disadvantages. This is done by generating the attention map over the proposals generated by the RPN, instead of an attention map over the global feature map. This is a very strong mechanism and you can get an impression for that in the images below.

To implement this approach, they use Faster R-CNN to generate the 36 top proposals, and ROI-Pool each proposal to a 2048-d feature map (with average pooling).

These pooled feature maps are averaged into a single feature map and fed into the attention LSTM. The output of the attention LSTM is a weight vector of size 36 (one weight for each proposal).

The next stage of the process is to calculate the attended feature map, by summing all of the pooled feature maps according to their predicted weights. These attended feature maps can be used as an input for a second network that performs the actual task. In the paper it was served as an input to another LSTM which generated a single word for the image captioning task at each timestep.

------------------------------------------------------------------------

Bottom-up vs top-down

There are two kinds of attention mechanisms in the human visual system. Top-down attention is determined by the current task, and we focus on the part that is closely related to the task based on the current task (ie, the problem in VQA). Bottom-up attention means that we will be attracted by significant, outstanding and novel things.

Most of the visual attention mechanisms used in the previous methods belong to the top-down type, that is, taking the problem as input, modeling the attention distribution, and then acting on the image features extracted by the CNN. However, the image of the attention effect of this method corresponds to the left image of the following figure, without considering the content of the picture. For humans, attention will be more focused on the target of the picture or other significant areas, so the author introduces the Bottom-up attention mechanism, as shown in the right image below, and the attention acts on the object proposal.

Ö>

Thứ Ba, 23 tháng 4, 2019

How to setup PyCharm with an anaconda virtual environment already created

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - From Medium