|
This class reading
|
tarix | 17.11.2018 | ölçüsü | 1,14 Mb. | | #80635 |
|
This class reading - “Program optimization space pruning for a multithreaded GPU,” S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Straton, and W. Hwu, Proc. Intl. Sym. on Code Generation and Optimization, Mar. 2008.
Project demos - Dec 13-16, 19 (19th is full)
- Send me email with date and a few timeslots if your group does not have a slot
- Almost all have signed up already
Demo format - Each group gets 30 mins
- Plan for 20 mins of presentation (no more!), 10 mins questions
- Some slides are helpful, try to have all group members say something
- Talk about what you did (basic idea, previous work), how you did it (approach + implementation), and results
- Demo or real code examples are good
Report
“Compute Unified Device Architecture” “Compute Unified Device Architecture” General purpose programming model - User kicks off batches of threads on the GPU
Advantages of CUDA
Who has written CUDA, how have you optimized it, how long did it take? Who has written CUDA, how have you optimized it, how long did it take? - Did you do tune using a better algorithm than trial and error?
Is there any hope to build a GPU compiler that can automatically do what CUDA programmers do? - How would you do it?
- What’s the input language? C, C++, Java, StreamIt?
Are GPUs a compiler writers best friend or worst enemy? What about non-scientific codes, can they be mapped to GPUs? - How can GPUs be made more “general”?
Dostları ilə paylaş: |
|
|