Large Language Models (LLMs) have been demonstrated effective for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. We made the first attempt to bridge this knowledge gap by investigating whether LLMs attend to the same parts of a task description as human programmers during code generation. An analysis of six LLMs, including GPT-4, on two popular code generation benchmarks revealed a consistent misalignment between LLMs’ and programmers’ attention. Furthermore, an in-depth analysis of 273 incorrect LLM-generated code snippets showed that 35% of the errors can be explained by two attention patterns—incorrect attention and semantic misunderstanding. Finally, through a quantitative experiment and a user study, we demonstrated that model attention computed by a perturbation-based method is most aligned with human attention and is often favored by human programmers. Our findings deepen the understanding of the code generation models and highlight the need for human-aligned LLMs for better interpretability and programmer trust.