Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs

Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Murali Annavaram, Won Woo Ro

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper proposes a new architecture, called Adaptive PREfetching and Scheduling (APRES), which improves cache efficiency of GPUS. APRES relies on the observation that GPU loads tend to have either high locality or strided access patterns across warps. APRES schedules warps so that as many cache hits are generated as possible before the generation of any cache miss. Without directly predicting future cache hits/misses for each warp, APRES creates a warp group that will execute the same static load shortly and prioritizes the grouped warps. If the first executed warp in the group hits the cache, grouped warps are likely to access the same cache lines. Unless, APRES considers the load as a strided type and generates prefetch requests for the grouped warps. In addition, APRES includes a new dynamic L1 prefetch and data cache partitioning to reduce contentions between demand-fetched and prefetched lines. In our evaluation, APRES achieves 27.8 percent performance improvement.

Original languageEnglish
Article number8515055
Pages (from-to)609-616
Number of pages8
JournalIEEE Transactions on Computers
Volume68
Issue number4
DOIs
StatePublished - 2019 Apr 1

Fingerprint

Prefetching
Cache
Scheduling
Hits
Line
Contention
Graphics processing unit
Locality
Percent
Partitioning
Schedule
Likely
Tend
Evaluation

Keywords

  • GPU
  • cache
  • data prefetching
  • performance
  • warp scheduling

Cite this

Oh, Y., Kim, K., Yoon, M. K., Park, J. H., Park, Y., Annavaram, M., & Ro, W. W. (2019). Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs. IEEE Transactions on Computers, 68(4), 609-616. [8515055]. https://doi.org/10.1109/TC.2018.2878671
Oh, Yunho ; Kim, Keunsoo ; Yoon, Myung Kuk ; Park, Jong Hyun ; Park, Yongjun ; Annavaram, Murali ; Ro, Won Woo. / Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs. In: IEEE Transactions on Computers. 2019 ; Vol. 68, No. 4. pp. 609-616.
@article{e3199412d7e844b49de3522081106232,
title = "Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs",
abstract = "This paper proposes a new architecture, called Adaptive PREfetching and Scheduling (APRES), which improves cache efficiency of GPUS. APRES relies on the observation that GPU loads tend to have either high locality or strided access patterns across warps. APRES schedules warps so that as many cache hits are generated as possible before the generation of any cache miss. Without directly predicting future cache hits/misses for each warp, APRES creates a warp group that will execute the same static load shortly and prioritizes the grouped warps. If the first executed warp in the group hits the cache, grouped warps are likely to access the same cache lines. Unless, APRES considers the load as a strided type and generates prefetch requests for the grouped warps. In addition, APRES includes a new dynamic L1 prefetch and data cache partitioning to reduce contentions between demand-fetched and prefetched lines. In our evaluation, APRES achieves 27.8 percent performance improvement.",
keywords = "GPU, cache, data prefetching, performance, warp scheduling",
author = "Yunho Oh and Keunsoo Kim and Yoon, {Myung Kuk} and Park, {Jong Hyun} and Yongjun Park and Murali Annavaram and Ro, {Won Woo}",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/TC.2018.2878671",
language = "English",
volume = "68",
pages = "609--616",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
number = "4",

}

Oh, Y, Kim, K, Yoon, MK, Park, JH, Park, Y, Annavaram, M & Ro, WW 2019, 'Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs' IEEE Transactions on Computers, vol. 68, no. 4, 8515055, pp. 609-616. https://doi.org/10.1109/TC.2018.2878671

Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs. / Oh, Yunho; Kim, Keunsoo; Yoon, Myung Kuk; Park, Jong Hyun; Park, Yongjun; Annavaram, Murali; Ro, Won Woo.

In: IEEE Transactions on Computers, Vol. 68, No. 4, 8515055, 01.04.2019, p. 609-616.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs

AU - Oh, Yunho

AU - Kim, Keunsoo

AU - Yoon, Myung Kuk

AU - Park, Jong Hyun

AU - Park, Yongjun

AU - Annavaram, Murali

AU - Ro, Won Woo

PY - 2019/4/1

Y1 - 2019/4/1

N2 - This paper proposes a new architecture, called Adaptive PREfetching and Scheduling (APRES), which improves cache efficiency of GPUS. APRES relies on the observation that GPU loads tend to have either high locality or strided access patterns across warps. APRES schedules warps so that as many cache hits are generated as possible before the generation of any cache miss. Without directly predicting future cache hits/misses for each warp, APRES creates a warp group that will execute the same static load shortly and prioritizes the grouped warps. If the first executed warp in the group hits the cache, grouped warps are likely to access the same cache lines. Unless, APRES considers the load as a strided type and generates prefetch requests for the grouped warps. In addition, APRES includes a new dynamic L1 prefetch and data cache partitioning to reduce contentions between demand-fetched and prefetched lines. In our evaluation, APRES achieves 27.8 percent performance improvement.

AB - This paper proposes a new architecture, called Adaptive PREfetching and Scheduling (APRES), which improves cache efficiency of GPUS. APRES relies on the observation that GPU loads tend to have either high locality or strided access patterns across warps. APRES schedules warps so that as many cache hits are generated as possible before the generation of any cache miss. Without directly predicting future cache hits/misses for each warp, APRES creates a warp group that will execute the same static load shortly and prioritizes the grouped warps. If the first executed warp in the group hits the cache, grouped warps are likely to access the same cache lines. Unless, APRES considers the load as a strided type and generates prefetch requests for the grouped warps. In addition, APRES includes a new dynamic L1 prefetch and data cache partitioning to reduce contentions between demand-fetched and prefetched lines. In our evaluation, APRES achieves 27.8 percent performance improvement.

KW - GPU

KW - cache

KW - data prefetching

KW - performance

KW - warp scheduling

UR - http://www.scopus.com/inward/record.url?scp=85055870610&partnerID=8YFLogxK

U2 - 10.1109/TC.2018.2878671

DO - 10.1109/TC.2018.2878671

M3 - Article

VL - 68

SP - 609

EP - 616

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 4

M1 - 8515055

ER -