https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO: https://www.deepspeed.ai/tutorials/zero/","text":"Paper on FSDP, PyTorch's implementation of ZeRO-3. \nIt addition to that, reading the following blog posts might be an easier introduction:\n- PyTorch's blog post on FSDP: https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO: https://www.deepspeed.ai/tutorials/zero/"},"id":"2304.11277","title":"PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2304.11277.png","upvotes":1,"publishedAt":"2023-04-21T23:52:27.000Z","isUpvotedByUser":false},{"_id":"665d8dbd728bb250f492f21e","position":1,"type":"paper","note":{"html":"Initial paper on Tensor Parallelism.","text":"Initial paper on Tensor Parallelism."},"id":"1909.08053","title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using\n Model Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1909.08053.png","upvotes":2,"publishedAt":"2019-09-17T19:42:54.000Z","isUpvotedByUser":false},{"_id":"665d8dca957df09a07552203","position":2,"type":"paper","note":{"html":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory.","text":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory."},"id":"2205.05198","title":"Reducing Activation Recomputation in Large Transformer Models","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2205.05198.png","upvotes":0,"publishedAt":"2022-05-10T22:40:17.000Z","isUpvotedByUser":false},{"_id":"665d8dd7d892e3815d471a51","position":3,"type":"paper","note":{"html":"Initial paper on Pipeline Parallelism.","text":"Initial paper on Pipeline Parallelism."},"id":"1811.06965","title":"GPipe: Efficient Training of Giant Neural Networks using Pipeline\n Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1811.06965.png","upvotes":0,"publishedAt":"2018-11-16T18:43:28.000Z","isUpvotedByUser":false}],"position":0,"theme":"purple","private":false,"shareUrl":"https://huggingface.co/collections/michaelbenayoun/distributed-training-665d8d5d0f35c005de9c3b6e","upvotes":0,"isUpvotedByUser":false}],"datasets":[],"models":[{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":5051,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random","availableInferenceProviders":[],"lastModified":"2024-10-14T14:23:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":1333,"gated":false,"id":"michaelbenayoun/t5-tiny-random","availableInferenceProviders":[],"lastModified":"2024-10-10T14:01:34.000Z","likes":0,"pipeline_tag":"text2text-generation","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":19,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-2layers-random","availableInferenceProviders":[],"lastModified":"2024-05-07T15:36:13.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":27,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-8layers-random","availableInferenceProviders":[],"lastModified":"2024-05-03T15:01:45.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":3979,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-16layers-random","availableInferenceProviders":[],"lastModified":"2024-03-14T09:45:33.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":4431,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-random","availableInferenceProviders":[],"lastModified":"2024-01-09T14:05:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":147,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-32kv-heads-random","availableInferenceProviders":[],"lastModified":"2024-01-04T16:14:26.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":24,"gated":false,"id":"michaelbenayoun/gpt-neox-tiny-4layers-random","availableInferenceProviders":[],"lastModified":"2024-01-04T15:37:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":205,"gated":false,"id":"michaelbenayoun/mistral-tiny-4layers-8kv-heads-random","availableInferenceProviders":[],"lastModified":"2023-11-09T10:46:23.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":64},"downloads":140,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4layers-random","availableInferenceProviders":[],"lastModified":"2023-11-06T09:42:19.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]}],"numberLikes":30,"papers":[],"posts":[],"totalPosts":0,"spaces":[],"u":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","isPro":false,"fullname":"Michael Benayoun","user":"michaelbenayoun","orgs":[{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","userRole":"write","type":"org","isHf":true,"details":"The AI community building the future.","isEnterprise":true,"numUsers":209},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653062536500-5e9ecfc04957053f60648a3e.png","fullname":"Hugging Face Internal Testing Organization","name":"hf-internal-testing","userRole":"admin","type":"org","isHf":false,"numUsers":39},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641d367e45487810d13800ca/40TPYJA9S2kxHqRLawrIs.png","fullname":"AWS Inferentia and Trainium","name":"aws-neuron","userRole":"write","type":"org","isHf":false,"numUsers":30},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653061054662-5ff5d596f244529b3ec0fb89.png","fullname":"Hugging Face Optimum","name":"optimum","userRole":"admin","type":"org","isHf":false,"details":"Accelerating DL","numUsers":15},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653306672419-5dd96eb166059660ed1ee413.png","fullname":"HF Canonical Model Maintainers","name":"hf-maintainers","userRole":"write","type":"org","isHf":false,"numUsers":10},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659521200179-5e48005437cb5b49818287a5.png","fullname":"BigCode","name":"bigcode","userRole":"contributor","type":"org","isHf":false,"isEnterprise":true,"numUsers":360},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60a551a34ecc5d054c8ad93e/zj4Vyk5keZrNRfy1wWR4D.png","fullname":"Paris AI Running Club","name":"paris-ai-running-club","userRole":"read","type":"org","isHf":false,"numUsers":62},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e67c47c100906368940747e/QKH5mbtZH_GuoR-cpv9kl.png","fullname":"Hugging Face Machine Learning Optimization","name":"hf-ml-opt","userRole":"write","type":"org","isHf":false,"numUsers":11},{"avatarUrl":"https://www.gravatar.com/avatar/b2b92654970640f6225d02fb6fc48239?d=retro&size=100","fullname":"Optimum Internal Testing","name":"optimum-internal-testing","userRole":"admin","type":"org","isHf":false,"numUsers":11}],"signup":{"github":"michaelbenayoun","details":"","homepage":"","twitter":"michaelbenayou1"},"isHf":true,"isMod":false,"type":"user"},"upvotes":1,"repoFilterModels":{"sortKey":"modified"},"repoFilterDatasets":{"sortKey":"modified"},"repoFilterSpaces":{"sortKey":"modified"},"numFollowers":64,"numFollowingUsers":13,"numFollowingOrgs":11,"isFollowing":false,"isFollower":false,"sampleFollowers":[{"user":"MartaVigara","fullname":"Vigara","type":"user","_id":"6333e65e86d47274100070b2","isPro":false,"avatarUrl":"/avatars/5d8bf3f075e2375af2c152fcc9e981d4.svg"},{"user":"shuyuej","fullname":"Shuyue Jia (Bruce)","type":"user","_id":"63874ebe20244d72a740548f","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63874ebe20244d72a740548f/pmREOHPGNwpG1_Eeif6Sv.jpeg"},{"user":"regisss","fullname":"Régis Pierrard","type":"user","_id":"620b7c408f5871b8a1a168a7","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620b7c408f5871b8a1a168a7/49M2lucv3I24rOMJFnhVd.jpeg"},{"user":"Weyaxi","fullname":"Yağız Çalık","type":"user","_id":"6468ce47e134d050a58aa89c","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6468ce47e134d050a58aa89c/ApFcPlOzgI6Cjr0SYPpk6.png"}],"isWatching":false,"hardwareItems":[{"sku":["Apple Silicon","-","Apple M3 Max"],"mem":96,"num":1}],"acceptLanguages":["en","*"]}">
Michael Benayoun's picture

Michael Benayoun

michaelbenayoun

AI & ML interests

None yet

Recent Activity

liked a Space 22 days ago
nanotron/ultrascale-playbook
liked a model about 1 month ago
deepseek-ai/DeepSeek-R1
View all activity

Organizations

Hugging Face's profile picture Hugging Face Internal Testing Organization's profile picture AWS Inferentia and Trainium's profile picture Hugging Face Optimum's profile picture HF Canonical Model Maintainers's profile picture BigCode's profile picture Paris AI Running Club's profile picture Hugging Face Machine Learning Optimization's profile picture Optimum Internal Testing's profile picture

Articles 2

Article
1

Scaling up BERT-like model Inference on modern CPU - Part 2

Article
1

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

datasets

None public yet