NVIDIA/FasterTransformer

C++ 6.4k stars

Transformer related optimization, including BERT, GPT

✓ Synced 1h ago Share on X →

README badge: [![ngmi](https://ngmi.review/badge/NVIDIA/FasterTransformer.svg)](https://ngmi.review/repo/NVIDIA/FasterTransformer)

96 Merged PRs

1 day Avg Merge Time

0m Fastest PR

24 days Slowest PR

      #243
      Global Speed Rank
    

PR Size Analysis

Lines changed (additions + deletions) vs review outcomes. Re-sync to populate data for older PRs.

PRs by size

Avg review time (hrs)

Clean approval rate (%)

#	Title	Author	Time	Reviews	Blocks
#705	remove mpi_cxx from multi-gpu build	@dwyatte	8.6h	0	✓
#660	Support size_per_head=112	@dskhudia	20 days	2	✓
#683	fix: Custom AllReduce swapTensor bug	@PerkzZheng	24m	0	✓
#672	[bugfix] Fix 2-shot Custom All Reduce kernel correctness issue (indexing bug).	@rkindi	5 days	0	✓
#584	Fix/gpt early stop	@byshiue	2m	0	✓
#568	[enhancement] improve the performace of bloom model conversion, reduce the memory and time cost	@Yangruipis	4 days	0	✓
#569	[Enhancement]create huggingface_gptneox_convert.py	@zhang-ge-hao	3 days	0	✓
#524	[fix] fix overflow in softmax_kernel when process long seqlen and big batch…	@zhangxin81	22 days	0	✓
#550	[Enhancement]add pytorch backend support for gptneox	@zhang-ge-hao	10 days	7	✓
#529	Fix typo in gpt_guide.md	@Ying1123	19m	0	✓
#505	fix: gpt tensor shapes inconsistency	@zhang-ge-hao	8.9h	0	✓
#466	fix: scale input bug inside AddBiasResidualLayerNorm kernel	@zobinHuang	5 days	0	✓
#469	[fix] fix integer overflow error	@luliyucoordinate	3 days	0	✓
#462	Feat/perf opt	@byshiue	1m	0	✓
#460	Fix Japanese Hugging face GPT conversion	@noppayut	18.3h	0	✓
#458	fix: link for the gpt-j slim weights	@f0rmiga	23.1h	0	✓
#450	Feat/cuda 12 support	@byshiue	1m	0	✓
#448	update t5 model conversion script to numpy mode	@lanking520	13.5h	0	✓
#447	[Improvement] accelerate T5 model conversion and fix bloom model on multi-process	@lanking520	2 days	4	✓
#443	[BUG FIX] place multi-processing init to main method	@lanking520	1 day	0	✓