DreamReward: Aligning Human Preference in Text-to-3D Generation

1Tsinghua University, 2ShengShu

Abstract

3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D---the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.

Figure 1. The overall framework of our DreamReward: (Top) Reward3D involves data collection, annotation, and preference learning. (Bottom) DreamFL utilizes feedback from Reward3D to compute RewardLoss and incorporate it into the SDS loss for simultaneous optimization of NeRF.


Text-to-3D Generation

                    MVDream                 |          DreamReward (Ours)                                  MVDream                 |          DreamReward (Ours)

An ultra-detailed illustration of a mythical Phoenix, rising from ashes, vibrant feathers in a fiery palette

A delicate porcelain teacup, painted with intricate flowers, rests on a saucer

Spaceship,futuristic design,sleek metal,glowing thrusters, flying in space

A lion against the sunrise, its majestic stature prominent on the savanna

A bicycle that leaves a trail of flowers

A solid, symmetrical, smooth stone fountain, with water cascading over its edges into a clear, circular pond surrounded by blooming lilies, in the center of a sunlit courtyard

A pen leaking blue ink

A marble bust of a mouse

A smoldering campfire under a clear starry night, embers glowing softly

A rotary telephone carved out of wood

A torn hat

A japanese forest, sunny,digital art

Frog with a translucent skin displaying a mechanical heart beating.

A solid, smooth, symmetrical porcelain teapot, with a cobalt blue dragon design, steam rising from the spout, suggesting it's just been filled with boiling water

A book bound in mysterious symbols

A pen sitting atop a pile of manuscripts

Quantitative Comparison

Left: User study of the rate from volunteers' preference for each method in the inset pie chart, Right: Holistic evaluation using GPTEval3D. The Radar charts report the Elo rating for each of the 6 criteria. The results indicate that our results consistently rank first across all metrics.

BibTeX

@misc{ye2024dreamreward,
      title={DreamReward: Text-to-3D Generation with Human Preference}, 
      author={Junliang Ye and Fangfu Liu and Qixiu Li and Zhengyi Wang and Yikai Wang and Xinzhou Wang and Yueqi Duan and Jun Zhu},
      year={2024},
      eprint={2403.14613},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}