Hi, thanks for the great work and the comprehensive evaluation framework. We recently wanted to add your work for comparison but found that there seems to be a problem in evaluating the quality of generated images. Please tell me if there is any misunderstanding:
As line 233 in scripts/pose2img_metrics.py shows, we saved the images as a concatenation of four images: [generated_image, pose_image, text_image, original_image]. However, in your implementation of the quality evaluation in utils/metrics/evalute_metrics_of_each_category.py, you seem to load the entire image for evaluation rather than splitting out the generated image. Please refer to line 78. We believe this will lower down the quality scores.
We tried to revise the codes in evalute_metrics_of_each_category.py by adding three lines of codes between line 77 and line 78:
img = np.array(img).reshape(512,4,512,3) \\ img = cv2.cvtColor(img[:,0,:,:], cv2.COLOR_BGR2RGB) \\ img = Image.fromarray(img)
The codes above basically extracts the generated image from the saved image. Having revised this, we are able to achieve a FID of about 10 using your pretrained checkpoint (much lower than reported).
Thanks and I'm looking forward to further discussion :)
Hi, thanks for the great work and the comprehensive evaluation framework. We recently wanted to add your work for comparison but found that there seems to be a problem in evaluating the quality of generated images. Please tell me if there is any misunderstanding:
As line 233 in scripts/pose2img_metrics.py shows, we saved the images as a concatenation of four images: [generated_image, pose_image, text_image, original_image]. However, in your implementation of the quality evaluation in utils/metrics/evalute_metrics_of_each_category.py, you seem to load the entire image for evaluation rather than splitting out the generated image. Please refer to line 78. We believe this will lower down the quality scores.
We tried to revise the codes in evalute_metrics_of_each_category.py by adding three lines of codes between line 77 and line 78:
img = np.array(img).reshape(512,4,512,3) \\ img = cv2.cvtColor(img[:,0,:,:], cv2.COLOR_BGR2RGB) \\ img = Image.fromarray(img)The codes above basically extracts the generated image from the saved image. Having revised this, we are able to achieve a FID of about 10 using your pretrained checkpoint (much lower than reported).
Thanks and I'm looking forward to further discussion :)