Programming abstractions for

Yüklə 0,54 Mb.

Pdf görüntüsü

səhifə	22/23
tarix	24.12.2017
ölçüsü	0,54 Mb.
	#17201

1 ... 15 16 17 18 19 20 21 22 23

Bibliography

[1] V. Agarwal, M. S. Hrishikesh, S.W. Keckler, and D. Burger. Clock rate versus IPC: the end of the

road for conventional microarchitectures.

In Computer Architecture, 2000. Proceedings of the 27th

International Symposium on, pages 248–259, June 2000.

[2] Emmanuel Agullo, B´

erenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, and Toru Taka-

hashi. Task-Based FMM for Multicore Architectures. SIAM Journal on Scientiﬁc Computing, 36(1):66–

93, 2014.

[3] J.A. Ang, , R.F. Barrett, R.E. Benner, D. Burke, C. Chan, D. Donofrio, S.D. Hammond, K.S. Hemmer,

S.M. Kelly, H. Le, V.J. Leung, D.R. Resnick, A.F. Rodrigues, J. Shalf, D. Stark, D. Unat, and N.J.

Wright. Abstract machine models and proxy architectures for exascale computing. Technical report,

DOE Technical Report (joint report of Sandia Laboratories and Berkeley Laboratory), May 2014.

[4] B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoeﬂer, J. Joyner, J. Lewis,

J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In Proceedings of 18th

Symposium on High-Performance Interconnects (Hot Interconnects 2010). IEEE, Aug. 2010.

[5] C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier. StarPU: A Uniﬁed Platform for Task Schedul-

ing on Heterogeneous Multicore Architectures. Concurrency Computat. Pract. Exper., 23:187–198, 2011.

(to appear).

[6] M. Baldauf, O. Fuhrer, M. J. Kurowski, G. de Morsier, M. Muellner, Z. P. Piotrowski, B. Rosa, P. L.

Vitagliano, and M. Ziemianski D. Wojcik. The cosmo priority project ’conservative dynamical core’

ﬁnal report. Technical report, MeteoSwiss, October 2013.

[7] Barcelona Supercomputing Center. The OmpSs Programming Model. https://pm.bsc.es/ompss.

[8] Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall,

and Yuli Zhou. Cilk: An Eﬃcient Multithreaded Runtime System. In Proceedings of PPoPP ’95. ACM,

July 1995.

[9] E. G. Boman, K. D. Devine, V. J. Leung, S. Rajamanickam, L. A. Riesen, M. Deveci, and U. Catalyurek.

Zoltan2: Next generation combinatorial toolkit. Technical Report SAND2012-9373C, Sandia National

Laboratories, 2012.

[10] Shekhar Borkar. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual

Design Automation Conference, DAC ’07, pages 746–749, 2007.

[11] Shekhar Borkar and Andrew A. Chien. The future of microprocessors. Communnications of the ACM,

54(5):67–77, 2011.

[12] G. Bosilca, A Bouteiller, A Danalis, M. Faverge, A Haidar, T. Herault, J. Kurzak, J. Langou,

P. Lemarinier, H. Ltaief, P. Luszczek, A YarKhan, and J. Dongarra. Flexible Development of Dense

Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA. In IEEE Interna-

tional Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pages

1432–1441, May 2011.

BIBLIOGRAPHY

[13] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas H´

erault, and Jack J.

Dongarra. PaRSEC: Exploiting Heterogeneity to Enhance Scalability. Computing in Science and Engi-

neering, 15(6):36–45, 2013.

[14] Peter J. Braam. The lustre storage architecture. Technical report, Cluster File Systems, Inc., 2003.

[15] M.S. Campobasso and M.B. Giles. Eﬀects of ﬂow instabilities on the linear analysis of turbomachinery

aeroelasticity. Journal of Propulsion and Power, 19(2):250–259, March 2014.

[16] Nicolas Capit, Georges Da Costa, Yiannis Georgiou, Guillaume Huard, Cyrille Martin, Gr´

egory Mouni´

Pierre Neyron, and Olivier Richard. A batch scheduler with high level components. In Cluster Computing

and the Grid, 2005. CCGrid 2005. IEEE International Symposium on, volume 2, pages 776–783. IEEE,

2005.

[17] Philip H. Carns, Walter B. Ligon III, Robert B. Ross, and Rajeev Thakur. PVFS: A parallel ﬁle system

for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317–327,

Atlanta, GA, October 2000. USENIX Association.

[18] Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovi?, James Demmel, Kurt Keutzer, John

Shalf, Kathy Yelick, and O Fox. Sejits: Getting productivity and performance with selective embedded

jit specialization, 2009.

[19] Bryan C. Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: compiling an embedded data

parallel language. In Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice

of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, pages 47–56,

2011.

[20] Ernie Chan, Field G. Van Zee, Paolo Bientinesi, Enrique S. Quintana-Orti, Gregorio Quintana-Orti,

and Robert van de Geijn. Supermatrix: A multithreaded runtime scheduling system for algorithms-

by-blocks. Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel

Programming, 2008.

[21] A. Charara, H. Ltaief, D. Gratadour, D. Keyes, A. Sevin, A. Abdelfattah, E. Gendron, C. Morei, and

F. Vidal. Pipelining computational stages of the tomographic reconstructor for multi-object adaptive

optics on a multi-gpu system. In Proceedings of the 2014 ACM/IEEE Conference on Supercomputing,

Supercomputing, 2014.

[22] Philippe Charles, Christian Grothoﬀ, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal

Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform

cluster computing. SIGPLAN Not., 40(10):519–538, October 2005.

[23] Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, and H. Kuhn. MPIPP: an Automatic Proﬁle-

Guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters. In Gregory K. Egan and

Yoichi Muraoka, editors, Proceedings of the 20th Annual International Conference on Supercomputing,

ICS 2006, Cairns, Queensland, Australia, June 28 - July 01, 2006, pages 353–360. ACM, 2006.

[24] Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E.

Blelloch, Babak Falsaﬁ, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling

Threads for Constructive Cache Sharing on CMPs. In Proceedings of SPAA, 2007.

[25] P. Colella, D. T. Graves, D. Modiano, D. B. Seraﬁni, and B. van Straalen. Chombo software package for

AMR applications. Technical report, Lawrence Berkeley National Laboratory, 2000. http://seesar.

lbl.gov/anag/chombo/.

[26] OpenMP Standards Committee. Openmp 4.0 application program interface. http://www.openmp.org/

mp-documents/OpenMP4.0.0.pdf, July 2013.

[27] Y Cui, E Poyraz, J Zhou, S Callaghan, P Maechling, TH Jordan, L Shih, and P Chen. Accelerating

cybershake calculations on the xe6/xk7 platform of blue waters. In Extreme Scaling Workshop (XSW),

2013, pages 8–17. IEEE, 2013.

Programming Abstractions for Data Locality

Yüklə 0,54 Mb.

Dostları ilə paylaş:

1 ... 15 16 17 18 19 20 21 22 23