https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO:
https://www.deepspeed.ai/tutorials/zero/","text":"Paper on FSDP, PyTorch's implementation of ZeRO-3. \nIt addition to that, reading the following blog posts might be an easier introduction:\n- PyTorch's blog post on FSDP: https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO: https://www.deepspeed.ai/tutorials/zero/"},"id":"2304.11277","title":"PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2304.11277.png","upvotes":1,"publishedAt":"2023-04-21T23:52:27.000Z","isUpvotedByUser":false},{"_id":"665d8dbd728bb250f492f21e","position":1,"type":"paper","note":{"html":"Initial paper on Tensor Parallelism.","text":"Initial paper on Tensor Parallelism."},"id":"1909.08053","title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using\n Model Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1909.08053.png","upvotes":2,"publishedAt":"2019-09-17T19:42:54.000Z","isUpvotedByUser":false},{"_id":"665d8dca957df09a07552203","position":2,"type":"paper","note":{"html":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory.","text":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory."},"id":"2205.05198","title":"Reducing Activation Recomputation in Large Transformer Models","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2205.05198.png","upvotes":0,"publishedAt":"2022-05-10T22:40:17.000Z","isUpvotedByUser":false},{"_id":"665d8dd7d892e3815d471a51","position":3,"type":"paper","note":{"html":"Initial paper on Pipeline Parallelism.","text":"Initial paper on Pipeline Parallelism."},"id":"1811.06965","title":"GPipe: Efficient Training of Giant Neural Networks using Pipeline\n Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1811.06965.png","upvotes":0,"publishedAt":"2018-11-16T18:43:28.000Z","isUpvotedByUser":false}],"position":0,"theme":"purple","private":false,"shareUrl":"https://huggingface.co/collections/michaelbenayoun/distributed-training-665d8d5d0f35c005de9c3b6e","upvotes":0,"isUpvotedByUser":false}],"datasets":[],"models":[{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":5051,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random","availableInferenceProviders":[],"lastModified":"2024-10-14T14:23:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":1333,"gated":false,"id":"michaelbenayoun/t5-tiny-random","availableInferenceProviders":[],"lastModified":"2024-10-10T14:01:34.000Z","likes":0,"pipeline_tag":"text2text-generation","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":19,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-2layers-random","availableInferenceProviders":[],"lastModified":"2024-05-07T15:36:13.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":27,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-8layers-random","availableInferenceProviders":[],"lastModified":"2024-05-03T15:01:45.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":3979,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-16layers-random","availableInferenceProviders":[],"lastModified":"2024-03-14T09:45:33.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":4431,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-random","availableInferenceProviders":[],"lastModified":"2024-01-09T14:05:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":147,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-32kv-heads-random","availableInferenceProviders":[],"lastModified":"2024-01-04T16:14:26.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":24,"gated":false,"id":"michaelbenayoun/gpt-neox-tiny-4layers-random","availableInferenceProviders":[],"lastModified":"2024-01-04T15:37:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":205,"gated":false,"id":"michaelbenayoun/mistral-tiny-4layers-8kv-heads-random","availableInferenceProviders":[],"lastModified":"2023-11-09T10:46:23.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":140,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4layers-random","availableInferenceProviders":[],"lastModified":"2023-11-06T09:42:19.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]}],"numberLikes":30,"papers":[],"posts":[],"totalPosts":0,"spaces":[],"u":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","isPro":false,"fullname":"Michael Benayoun","user":"michaelbenayoun","orgs":[{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","userRole":"write","type":"org","isHf":true,"details":"The AI community building the future.","isEnterprise":true,"numUsers":209},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653062536500-5e9ecfc04957053f60648a3e.png","fullname":"Hugging Face Internal Testing Organization","name":"hf-internal-testing","userRole":"admin","type":"org","isHf":false,"numUsers":39},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641d367e45487810d13800ca/40TPYJA9S2kxHqRLawrIs.png","fullname":"AWS Inferentia and Trainium","name":"aws-neuron","userRole":"write","type":"org","isHf":false,"numUsers":30},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653061054662-5ff5d596f244529b3ec0fb89.png","fullname":"Hugging Face Optimum","name":"optimum","userRole":"admin","type":"org","isHf":false,"details":"Accelerating DL","numUsers":15},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653306672419-5dd96eb166059660ed1ee413.png","fullname":"HF Canonical Model Maintainers","name":"hf-maintainers","userRole":"write","type":"org","isHf":false,"numUsers":10},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659521200179-5e48005437cb5b49818287a5.png","fullname":"BigCode","name":"bigcode","userRole":"contributor","type":"org","isHf":false,"isEnterprise":true,"numUsers":360},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60a551a34ecc5d054c8ad93e/zj4Vyk5keZrNRfy1wWR4D.png","fullname":"Paris AI Running Club","name":"paris-ai-running-club","userRole":"read","type":"org","isHf":false,"numUsers":62},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e67c47c100906368940747e/QKH5mbtZH_GuoR-cpv9kl.png","fullname":"Hugging Face Machine Learning Optimization","name":"hf-ml-opt","userRole":"write","type":"org","isHf":false,"numUsers":11},{"avatarUrl":"https://www.gravatar.com/avatar/b2b92654970640f6225d02fb6fc48239?d=retro&size=100","fullname":"Optimum Internal Testing","name":"optimum-internal-testing","userRole":"admin","type":"org","isHf":false,"numUsers":11}],"signup":{"github":"michaelbenayoun","details":"","homepage":"","twitter":"michaelbenayou1"},"isHf":true,"isMod":false,"type":"user"},"upvotes":1,"repoFilterModels":{"sortKey":"modified"},"repoFilterDatasets":{"sortKey":"modified"},"repoFilterSpaces":{"sortKey":"modified"},"numFollowers":64,"numFollowingUsers":13,"numFollowingOrgs":11,"isFollowing":false,"isFollower":false,"sampleFollowers":[{"user":"MartaVigara","fullname":"Vigara","type":"user","_id":"6333e65e86d47274100070b2","isPro":false,"avatarUrl":"/avatars/5d8bf3f075e2375af2c152fcc9e981d4.svg"},{"user":"shuyuej","fullname":"Shuyue Jia (Bruce)","type":"user","_id":"63874ebe20244d72a740548f","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63874ebe20244d72a740548f/pmREOHPGNwpG1_Eeif6Sv.jpeg"},{"user":"regisss","fullname":"Régis Pierrard","type":"user","_id":"620b7c408f5871b8a1a168a7","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620b7c408f5871b8a1a168a7/49M2lucv3I24rOMJFnhVd.jpeg"},{"user":"Weyaxi","fullname":"Yağız Çalık","type":"user","_id":"6468ce47e134d050a58aa89c","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6468ce47e134d050a58aa89c/ApFcPlOzgI6Cjr0SYPpk6.png"}],"isWatching":false,"hardwareItems":[{"sku":["Apple Silicon","-","Apple M3 Max"],"mem":96,"num":1}],"acceptLanguages":["en","*"]}">
Michael Benayoun
michaelbenayoun
·
AI & ML interests
None yet
Recent Activity
Organizations
Scaling up BERT-like model Inference on modern CPU - Part 2
Introducing Optimum: The Optimization Toolkit for Transformers at Scale
michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random
Text Generation
•
Updated
•
5.05k
michaelbenayoun/t5-tiny-random
Text2Text Generation
•
Updated
•
1.33k
michaelbenayoun/llama-2-tiny-4kv-heads-2layers-random
Feature Extraction
•
Updated
•
19
michaelbenayoun/llama-2-tiny-4kv-heads-8layers-random
Feature Extraction
•
Updated
•
27
michaelbenayoun/llama-2-tiny-4kv-heads-16layers-random
Feature Extraction
•
Updated
•
3.98k
michaelbenayoun/llama-2-tiny-16layers-random
Feature Extraction
•
Updated
•
4.43k
michaelbenayoun/llama-2-tiny-16layers-32kv-heads-random
Feature Extraction
•
Updated
•
147
michaelbenayoun/gpt-neox-tiny-4layers-random
Feature Extraction
•
Updated
•
24
michaelbenayoun/mistral-tiny-4layers-8kv-heads-random
Text Generation
•
Updated
•
205
michaelbenayoun/llama-2-tiny-4layers-random
Text Generation
•
Updated
•
140