NVIDIA/FasterTransformer

C++ 6.4k stars

Transformer related optimization, including BERT, GPT

✓ Synced 1h ago Share on X →
README badge: [![ngmi](https://ngmi.review/badge/NVIDIA/FasterTransformer.svg)](https://ngmi.review/repo/NVIDIA/FasterTransformer)
96 Merged PRs
1 day Avg Merge Time
0m Fastest PR
24 days Slowest PR
#243 Global Speed Rank

PR Size Analysis

Lines changed (additions + deletions) vs review outcomes. Re-sync to populate data for older PRs.

PRs by size
Avg review time (hrs)
Clean approval rate (%)

Top Reviewers

Recent Merged PRs

# Title Author Time Reviews Blocks
#705 remove mpi_cxx from multi-gpu build @dwyatte 8.6h 0
#660 Support size_per_head=112 @dskhudia 20 days 2
#683 fix: Custom AllReduce swapTensor bug @PerkzZheng 24m 0
#672 [bugfix] Fix 2-shot Custom All Reduce kernel correctness issue (indexing bug). @rkindi 5 days 0
#584 Fix/gpt early stop @byshiue 2m 0
#568 [enhancement] improve the performace of bloom model conversion, reduce the memory and time cost @Yangruipis 4 days 0
#569 [Enhancement]create huggingface_gptneox_convert.py @zhang-ge-hao 3 days 0
#524 [fix] fix overflow in softmax_kernel when process long seqlen and big batch… @zhangxin81 22 days 0
#550 [Enhancement]add pytorch backend support for gptneox @zhang-ge-hao 10 days 7
#529 Fix typo in gpt_guide.md @Ying1123 19m 0
#505 fix: gpt tensor shapes inconsistency @zhang-ge-hao 8.9h 0
#466 fix: scale input bug inside AddBiasResidualLayerNorm kernel @zobinHuang 5 days 0
#469 [fix] fix integer overflow error @luliyucoordinate 3 days 0
#462 Feat/perf opt @byshiue 1m 0
#460 Fix Japanese Hugging face GPT conversion @noppayut 18.3h 0
#458 fix: link for the gpt-j slim weights @f0rmiga 23.1h 0
#450 Feat/cuda 12 support @byshiue 1m 0
#448 update t5 model conversion script to numpy mode @lanking520 13.5h 0
#447 [Improvement] accelerate T5 model conversion and fix bloom model on multi-process @lanking520 2 days 4
#443 [BUG FIX] place multi-processing init to main method @lanking520 1 day 0