Bibliography
[1] V. Agarwal, M. S. Hrishikesh, S.W. Keckler, and D. Burger. Clock rate versus IPC: the end of the
road for conventional microarchitectures.
In Computer Architecture, 2000. Proceedings of the 27th
International Symposium on, pages 248–259, June 2000.
[2] Emmanuel Agullo, B´
erenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, and Toru Taka-
hashi. Task-Based FMM for Multicore Architectures. SIAM Journal on Scientific Computing, 36(1):66–
93, 2014.
[3] J.A. Ang, , R.F. Barrett, R.E. Benner, D. Burke, C. Chan, D. Donofrio, S.D. Hammond, K.S. Hemmer,
S.M. Kelly, H. Le, V.J. Leung, D.R. Resnick, A.F. Rodrigues, J. Shalf, D. Stark, D. Unat, and N.J.
Wright. Abstract machine models and proxy architectures for exascale computing. Technical report,
DOE Technical Report (joint report of Sandia Laboratories and Berkeley Laboratory), May 2014.
[4] B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis,
J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In Proceedings of 18th
Symposium on High-Performance Interconnects (Hot Interconnects 2010). IEEE, Aug. 2010.
[5] C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier. StarPU: A Unified Platform for Task Schedul-
ing on Heterogeneous Multicore Architectures. Concurrency Computat. Pract. Exper., 23:187–198, 2011.
(to appear).
[6] M. Baldauf, O. Fuhrer, M. J. Kurowski, G. de Morsier, M. Muellner, Z. P. Piotrowski, B. Rosa, P. L.
Vitagliano, and M. Ziemianski D. Wojcik. The cosmo priority project ’conservative dynamical core’
final report. Technical report, MeteoSwiss, October 2013.
[7] Barcelona Supercomputing Center. The OmpSs Programming Model. https://pm.bsc.es/ompss.
[8] Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall,
and Yuli Zhou. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of PPoPP ’95. ACM,
July 1995.
[9] E. G. Boman, K. D. Devine, V. J. Leung, S. Rajamanickam, L. A. Riesen, M. Deveci, and U. Catalyurek.
Zoltan2: Next generation combinatorial toolkit. Technical Report SAND2012-9373C, Sandia National
Laboratories, 2012.
[10] Shekhar Borkar. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual
Design Automation Conference, DAC ’07, pages 746–749, 2007.
[11] Shekhar Borkar and Andrew A. Chien. The future of microprocessors. Communnications of the ACM,
54(5):67–77, 2011.
[12] G. Bosilca, A Bouteiller, A Danalis, M. Faverge, A Haidar, T. Herault, J. Kurzak, J. Langou,
P. Lemarinier, H. Ltaief, P. Luszczek, A YarKhan, and J. Dongarra. Flexible Development of Dense
Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA. In IEEE Interna-
tional Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pages
1432–1441, May 2011.
44
BIBLIOGRAPHY
[13] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas H´
erault, and Jack J.
Dongarra. PaRSEC: Exploiting Heterogeneity to Enhance Scalability. Computing in Science and Engi-
neering, 15(6):36–45, 2013.
[14] Peter J. Braam. The lustre storage architecture. Technical report, Cluster File Systems, Inc., 2003.
[15] M.S. Campobasso and M.B. Giles. Effects of flow instabilities on the linear analysis of turbomachinery
aeroelasticity. Journal of Propulsion and Power, 19(2):250–259, March 2014.
[16] Nicolas Capit, Georges Da Costa, Yiannis Georgiou, Guillaume Huard, Cyrille Martin, Gr´
egory Mouni´
e,
Pierre Neyron, and Olivier Richard. A batch scheduler with high level components. In Cluster Computing
and the Grid, 2005. CCGrid 2005. IEEE International Symposium on, volume 2, pages 776–783. IEEE,
2005.
[17] Philip H. Carns, Walter B. Ligon III, Robert B. Ross, and Rajeev Thakur. PVFS: A parallel file system
for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317–327,
Atlanta, GA, October 2000. USENIX Association.
[18] Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovi?, James Demmel, Kurt Keutzer, John
Shalf, Kathy Yelick, and O Fox. Sejits: Getting productivity and performance with selective embedded
jit specialization, 2009.
[19] Bryan C. Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: compiling an embedded data
parallel language. In Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, pages 47–56,
2011.
[20] Ernie Chan, Field G. Van Zee, Paolo Bientinesi, Enrique S. Quintana-Orti, Gregorio Quintana-Orti,
and Robert van de Geijn. Supermatrix: A multithreaded runtime scheduling system for algorithms-
by-blocks. Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, 2008.
[21] A. Charara, H. Ltaief, D. Gratadour, D. Keyes, A. Sevin, A. Abdelfattah, E. Gendron, C. Morei, and
F. Vidal. Pipelining computational stages of the tomographic reconstructor for multi-object adaptive
optics on a multi-gpu system. In Proceedings of the 2014 ACM/IEEE Conference on Supercomputing,
Supercomputing, 2014.
[22] Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal
Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform
cluster computing. SIGPLAN Not., 40(10):519–538, October 2005.
[23] Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, and H. Kuhn. MPIPP: an Automatic Profile-
Guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters. In Gregory K. Egan and
Yoichi Muraoka, editors, Proceedings of the 20th Annual International Conference on Supercomputing,
ICS 2006, Cairns, Queensland, Australia, June 28 - July 01, 2006, pages 353–360. ACM, 2006.
[24] Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E.
Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling
Threads for Constructive Cache Sharing on CMPs. In Proceedings of SPAA, 2007.
[25] P. Colella, D. T. Graves, D. Modiano, D. B. Serafini, and B. van Straalen. Chombo software package for
AMR applications. Technical report, Lawrence Berkeley National Laboratory, 2000. http://seesar.
lbl.gov/anag/chombo/.
[26] OpenMP Standards Committee. Openmp 4.0 application program interface. http://www.openmp.org/
mp-documents/OpenMP4.0.0.pdf, July 2013.
[27] Y Cui, E Poyraz, J Zhou, S Callaghan, P Maechling, TH Jordan, L Shih, and P Chen. Accelerating
cybershake calculations on the xe6/xk7 platform of blue waters. In Extreme Scaling Workshop (XSW),
2013, pages 8–17. IEEE, 2013.
Programming Abstractions for Data Locality
45