3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D---the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
Figure 1. The overall framework of our DreamReward: (Top) Reward3D involves data collection, annotation, and preference learning. (Bottom) DreamFL utilizes feedback from Reward3D to compute RewardLoss and incorporate it into the SDS loss for simultaneous optimization of NeRF.
MVDream | DreamReward (Ours) MVDream | DreamReward (Ours)
An ultra-detailed illustration of a mythical Phoenix, rising from ashes, vibrant feathers in a fiery palette
A delicate porcelain teacup, painted with intricate flowers, rests on a saucer
Spaceship,futuristic design,sleek metal,glowing thrusters, flying in space
A lion against the sunrise, its majestic stature prominent on the savanna
A bicycle that leaves a trail of flowers
A solid, symmetrical, smooth stone fountain, with water cascading over its edges into a clear, circular pond surrounded by blooming lilies, in the center of a sunlit courtyard
A pen leaking blue ink
A marble bust of a mouse
A smoldering campfire under a clear starry night, embers glowing softly
A rotary telephone carved out of wood
A torn hat
A japanese forest, sunny,digital art
Frog with a translucent skin displaying a mechanical heart beating.
A solid, smooth, symmetrical porcelain teapot, with a cobalt blue dragon design, steam rising from the spout, suggesting it's just been filled with boiling water
A book bound in mysterious symbols
A pen sitting atop a pile of manuscripts
Left: User study of the rate from volunteers' preference for each method in the inset pie chart, Right: Holistic evaluation using GPTEval3D. The Radar charts report the Elo rating for each of the 6 criteria. The results indicate that our results consistently rank first across all metrics.
@misc{ye2024dreamreward,
title={DreamReward: Text-to-3D Generation with Human Preference},
author={Junliang Ye and Fangfu Liu and Qixiu Li and Zhengyi Wang and Yikai Wang and Xinzhou Wang and Yueqi Duan and Jun Zhu},
year={2024},
eprint={2403.14613},
archivePrefix={arXiv},
primaryClass={cs.CV}
}