<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Optimization |</title><link>http://josh-mccamley.com/tags/optimization/</link><atom:link href="http://josh-mccamley.com/tags/optimization/index.xml" rel="self" type="application/rss+xml"/><description>Optimization</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 29 Nov 2024 00:00:00 +0000</lastBuildDate><image><url>http://josh-mccamley.com/media/logo.svg</url><title>Optimization</title><link>http://josh-mccamley.com/tags/optimization/</link></image><item><title>FLECS</title><link>http://josh-mccamley.com/blog/flecs/</link><pubDate>Fri, 29 Nov 2024 00:00:00 +0000</pubDate><guid>http://josh-mccamley.com/blog/flecs/</guid><description>&lt;style&gt;
/* Shrink image captions specifically for this blog post */
figcaption {
font-size: 0.85rem !important;
line-height: 1.4 !important;
opacity: 0.8;
text-align: center;
}
figcaption p {
font-size: inherit !important;
margin-bottom: 0 !important;
}
/* ========================================== */
/* 1. MAIN GLASS CENTRE PIECE */
/* ========================================== */
article {
background-color: rgba(30, 41, 59, 0.6) !important;
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 1.5rem;
padding: 3rem;
margin-top: 3rem;
margin-bottom: 3rem;
/* Widens and centers the island */
width: 100% !important;
max-width: 950px !important;
margin-left: auto !important;
margin-right: auto !important;
}
/* Base text shrinking for elegance */
article p,
article li {
font-size: 0.95rem !important;
line-height: 1.6 !important;
}
/* Image Captions */
article figcaption {
font-size: 0.85rem !important;
line-height: 1.4 !important;
opacity: 0.8;
text-align: center;
}
article figcaption p {
font-size: inherit !important;
margin-bottom: 0 !important;
}
/* ========================================== */
/* 2. FLOATING TOC GLASS BOX */
/* ========================================== */
.docs-toc,
aside nav,
.js-toc {
background-color: rgba(30, 41, 59, 0.6) !important;
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 1rem;
padding: 1.5rem;
margin-top: 3rem;
}
&lt;/style&gt;
&lt;script&gt;
document.addEventListener('DOMContentLoaded', () =&gt; {
// 1. Table of Contents Scroll Highlighting
const observer = new IntersectionObserver((entries) =&gt; {
entries.forEach(entry =&gt; {
const id = entry.target.getAttribute('id');
const link = document.querySelector(`.hb-toc a[href="#${id}"]`);
if (entry.isIntersecting &amp;&amp; link) {
document.querySelectorAll('.hb-toc a').forEach(l =&gt; l.classList.remove('red-pill-active'));
link.classList.add('red-pill-active');
}
});
}, { rootMargin: '-20% 0px -70% 0px' });
document.querySelectorAll('article h2, article h3').forEach(h =&gt; observer.observe(h));
});
&lt;/script&gt;
&lt;p&gt;In the previous post, I proved Data-Oriented Design could simulate 100,000+ projectiles without melting the CPU. But Unreal’s Mass Entity framework came with suffocating boilerplate. I needed that same DOD performance, but with an API that didn&amp;rsquo;t fight me or other teammates if they ever felt the need to expose more functionality.&lt;/p&gt;
&lt;p&gt;I ripped Mass out entirely and integrated FLECS, a lightweight C/C++ ECS. I replaced dozens of Unreal asset files with about 150 lines of clean code. FLECS is widely used across the industry as a method to implement ECS without building it from scratch. For our case, the actual flow of interacting with Mass was just intuitive. Individual Data assets had to be made for each projectile, Data was stored in fragments which function as Structs making data retrieval and sending super messy, and on top of that Mass is so fragile that hit events were communicated using Interface events that couldn’t simple inherit from parents, but had to be implemented for every single actor in the game. But shoving a third party C++ library into Unreal Engine 5 led me to a few bugs, like Niagara only accepting 32 bit Particle IDs, with FLECS using 64 bit. The same In-Frame latency also came up, which was thankfully fixed in a very similar way so didn’t require too much trial and error and then on top of that I had a funky Garbage Collector crash that had to essentially validate whether a FLECS value still existed or not before purging,&lt;/p&gt;
&lt;p&gt;The spawn code alone went from this:&lt;/p&gt;
&lt;div style="margin-bottom: 2.5rem; border-radius: 1rem; overflow: hidden; border: 1px solid rgba(255, 255, 255, 0.1); box-shadow: 0 10px 30px rgba(0,0,0,0.5); background-color: rgba(15, 23, 42, 0.4);"&gt;
&lt;img src="Old.png" class="zoomable" alt="EQS breakdown" style="width: 100%; height: auto; display: block; margin: 0 !important;" /&gt;
&lt;div style="padding: 0.75rem 1rem; text-align: center; font-size: 0.85rem; color: #94a3b8; font-style: italic;"&gt;
*Query used by medium range enemies.*
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To this!&lt;/p&gt;
&lt;div style="margin-bottom: 2.5rem; border-radius: 1rem; overflow: hidden; border: 1px solid rgba(255, 255, 255, 0.1); box-shadow: 0 10px 30px rgba(0,0,0,0.5); background-color: rgba(15, 23, 42, 0.4);"&gt;
&lt;img src="new.png" class="zoomable" alt="EQS breakdown" style="width: 100%; height: auto; display: block; margin: 0 !important;" /&gt;
&lt;div style="padding: 0.75rem 1rem; text-align: center; font-size: 0.85rem; color: #94a3b8; font-style: italic;"&gt;
*Query used by medium range enemies.*
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The actual code that needed to be written is a single Subsystem!!!!! It’s made our lives so much easier and allowed us to send more “traditional” projectile data. Objects like Damage Causers and Instigators and even Damage Types were easily send across using Weak Pointers so the Multi-threading remained stable, and time Niagara had no issues assigning Particle ID’s! This Projectile Saga is easily the most complex system I’ve ever worked with, and I still feel like the rest of the industry is using something that’s even beyond this. But for now, FLECS will be more than enough to serve our needs!&lt;/p&gt;</description></item><item><title>The Bullet Factory</title><link>http://josh-mccamley.com/blog/react-performance/</link><pubDate>Fri, 22 Nov 2024 00:00:00 +0000</pubDate><guid>http://josh-mccamley.com/blog/react-performance/</guid><description>&lt;style&gt;
/* 1. Set the fixed, darkened background image for the whole page */
body {
background-image: linear-gradient(rgba(15, 23, 42, 0.55), rgba(15, 23, 42, 0.85)), url('featured.png') !important;
background-size: cover !important;
background-attachment: fixed !important;
background-position: top !important;
}
/* 2. Wrap your content in a premium "glass" card */
article {
background-color: rgba(30, 41, 59, 0.6) !important;
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 1.5rem;
padding: 3rem;
margin-top: 3rem;
margin-bottom: 3rem;
/* Widens and centers the island */
width: 100% !important;
max-width: 950px !important;
margin-left: auto !important;
margin-right: auto !important;
}
/* Prevent browser anchor jumps from overshooting */
article h2, article h3 {
scroll-margin-top: 120px !important;
}
/* 3. Table of Contents - Glass Card &amp; Links */
.hb-toc &gt; div {
background-color: rgba(30, 41, 59, 0.6) !important;
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border-radius: 1rem;
padding: 1.5rem !important;
border-left: 4px solid #e05e5e !important;
height: fit-content !important;
margin-top: 3rem !important;
}
.hb-toc p {
color: white !important;
font-size: 1.1rem !important;
margin-bottom: 1rem !important;
font-weight: 600 !important;
text-transform: none !important;
}
.hb-toc ul {
list-style: none !important;
padding-left: 0 !important;
margin: 0 !important;
}
.hb-toc ul ul {
padding-left: 1rem !important;
}
.hb-toc a {
color: #94a3b8 !important;
text-decoration: none !important;
display: block !important;
padding: 0.35rem 0.8rem !important;
border-radius: 9999px !important;
transition: all 0.2s ease-in-out;
margin-bottom: 0.25rem !important;
font-size: 0.9rem !important;
border: 1px solid transparent !important;
}
.hb-toc a:hover {
color: white !important;
background-color: rgba(255, 255, 255, 0.05) !important;
}
/* --- GUARANTEED RED PILL ACTIVE STATE --- */
.hb-toc a.red-pill-active {
color: white !important;
border: 1px solid #e05e5e !important;
background-color: transparent !important;
}
/* ========================================== */
/* TONY'S HIGHLIGHTS &amp; TAGS CSS */
/* ========================================== */
/* Hide the native Hugo Blox tags block at the very bottom */
.article-tags,
.pub-tags,
div:has(&gt; a[href*="/tags/"]) {
display: none !important;
}
/* Base text shrinking for elegance */
article p,
article li {
font-size: 0.95rem !important;
line-height: 1.6 !important;
}
/* Image Captions */
article figcaption {
font-size: 0.85rem !important;
line-height: 1.4 !important;
opacity: 0.8;
text-align: center;
}
article figcaption p {
font-size: inherit !important;
margin-bottom: 0 !important;
}
&lt;/style&gt;
&lt;script&gt;
document.addEventListener('DOMContentLoaded', () =&gt; {
// 1. Table of Contents Scroll Highlighting
const observer = new IntersectionObserver((entries) =&gt; {
entries.forEach(entry =&gt; {
const id = entry.target.getAttribute('id');
const link = document.querySelector(`.hb-toc a[href="#${id}"]`);
if (entry.isIntersecting &amp;&amp; link) {
document.querySelectorAll('.hb-toc a').forEach(l =&gt; l.classList.remove('red-pill-active'));
link.classList.add('red-pill-active');
}
});
}, { rootMargin: '-20% 0px -70% 0px' });
document.querySelectorAll('article h2, article h3').forEach(h =&gt; observer.observe(h));
});
&lt;/script&gt;
&lt;p&gt;During some research and development at the start of the academic year, I needed to simulate thousands of active projectiles with collision, ricochets, penetration, and team affiliation all while sending enough info to a Niagara system that allows for variation.&lt;/p&gt;
&lt;p&gt;In this post, I want to talk about the journey of achieving this in UE5 (and the pain it caused). I’ll cover why standard Actors buckle under the weight, why pure Niagara is limiting, and how I solved the problem by building a multi-threaded ECS (Entity Component System) using Epic&amp;rsquo;s Mass Framework, along with the brutal visual bugs I had to overcome to get there.&lt;/p&gt;
&lt;p&gt;I wrote a full dissertation where I break down the results of the benchmarks in way more detail which you can read below or through this link:
&lt;/p&gt;
&lt;iframe src="https://drive.google.com/file/d/16Td_qokedhmdSj0rU0_FCwcCKTpfwWnB/preview" width="100%" height="640" allow="autoplay"&gt;&lt;/iframe&gt;
&lt;h2 id="attempt-1-actor-based"&gt;Attempt 1: Actor based&lt;/h2&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;&lt;img alt="alt text"
src="http://josh-mccamley.com/blog/react-performance/SOLDRIFTGIF.gif"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id="the-approach"&gt;The Approach:&lt;/h3&gt;
&lt;p&gt;The standard Unreal object oriented way. You create a BP_Projectile Actor with a ProjectileMovementComponent and spawn one for every bullet.&lt;/p&gt;
&lt;h3 id="why-it-failed"&gt;Why it failed:&lt;/h3&gt;
&lt;p&gt;Actors can be heavy, and Object-Oriented Programming is not hardware sympathetic. Every AActor comes with massive overhead including UObject tracking, garbage collection polling, transform hierarchies, and virtual function ticks.&lt;/p&gt;
&lt;p&gt;In my benchmarking, the Actor system failed completely at 160,500 projectiles, locking up the CPU with a frame time of over 19 seconds (~0.05 FPS). The memory footprint was equally disastrous, consuming over 5.3 GB of RAM, roughly 27KB per projectile. Furthermore, because the frame rate tanked, the collision detection suffered from severe &amp;ldquo;tunneling&amp;rdquo; (temporal aliasing), dropping to a dismal 3.75% collision accuracy at just 25,000 entities. In retrospect, the tunnelling could’ve probably been fixed with CCD enabled on their collision spheres, but that would have exponentially added to the frame time, and I wasn’t about to let my computer suffer any more.&lt;/p&gt;
&lt;h3 id="the-outcome"&gt;The Outcome:&lt;/h3&gt;
&lt;p&gt;Great for rocket launchers and anything that needed real physics. Terrible for miniguns and bullet hells. We were hitting a &amp;ldquo;Cache Wall&amp;rdquo; where the CPU spent more time fetching fragmented object memory from RAM than doing actual math.&lt;/p&gt;
&lt;p&gt;This was our primary implementation that we used for a long time, but we didn’t really notice the performance until we scaled up the action.&lt;/p&gt;
&lt;h2 id="attempt-2-niagara"&gt;Attempt 2: Niagara&lt;/h2&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="Niagara-Breakdown"
srcset="http://josh-mccamley.com/blog/react-performance/GPU-Projectile_hu_a824df70ae887404.webp 320w, http://josh-mccamley.com/blog/react-performance/GPU-Projectile_hu_b5d06d4af37b49c7.webp 480w, http://josh-mccamley.com/blog/react-performance/GPU-Projectile_hu_d983db98dba65c66.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="http://josh-mccamley.com/blog/react-performance/GPU-Projectile_hu_a824df70ae887404.webp"
width="760"
height="413"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id="the-approach-1"&gt;The Approach:&lt;/h3&gt;
&lt;p&gt;Ditch the CPU Actors entirely. Do the math in Niagara and let the GPU render the bullets and handle collisions using raytracing. Signed Distance Field Collisions were also an option but our environment and characters had geometry that was too small and too detailed to consistently generate clean SDFs leading to very inconsistent collisions between each time the game was launched. Probably could have got around this with baked SDFs but that’s something I didn’t really have the experience or time to mess around with, so Raytracing it is!&lt;/p&gt;
&lt;h3 id="why-it-failed-1"&gt;Why it failed:&lt;/h3&gt;
&lt;p&gt;While rendering arrays in Niagara is incredibly fast, processing 250,000 particles in just ~4.3ms, doing complex gameplay logic is a nightmare.&lt;/p&gt;
&lt;p&gt;Niagara likes to operate as a walled off simulation. Because the GPU simulation runs asynchronously from the CPU Game Thread, by the time the GPU reports a collision back to the CPU, the game state has already moved on. In my collision accuracy benchmarks, Niagara peaked at a completely nonviable 6.08% accuracy, frequently dropping to 0%. This was such a killer. And sadly this method for communication was bound by hardware, since the physical PCIE lane can’t send data in two directions in a single frame. Again, a custom HLSL solution within niagara modules would’ve probably done it, but that kind of functionality with HLSL is beyond my current capabilities.&lt;/p&gt;
&lt;h3 id="the-outcome-1"&gt;The Outcome:&lt;/h3&gt;
&lt;p&gt;Perfect for sparks and rain. Functionally useless for registering reliable gameplay events like damage and ricochets.&lt;/p&gt;
&lt;h2 id="attempt-3-multi-threaded-ecs"&gt;Attempt 3: Multi-threaded ECS&lt;/h2&gt;
&lt;iframe width="600" height="400" src="https://www.youtube.com/embed/uqNshaHhEa0" title="Raw Gameplay" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen&gt;&lt;/iframe&gt;
To get the best of both worlds - fast math and fast rendering - I moved to Data-Oriented Design using Unreal's experimental Mass Entity Framework. Instead of monolithic hierarchical objects, data is strictly separated into Fragments (Velocity, Transform) and stored in contiguous memory arrays.
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="Mass-Breakdown"
srcset="http://josh-mccamley.com/blog/react-performance/MassComposition1_hu_a6802c032b683e16.webp 320w, http://josh-mccamley.com/blog/react-performance/MassComposition1_hu_d43c3eace9e6e67a.webp 480w, http://josh-mccamley.com/blog/react-performance/MassComposition1_hu_883ebd4942320b15.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="http://josh-mccamley.com/blog/react-performance/MassComposition1_hu_a6802c032b683e16.webp"
width="760"
height="239"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
This Repo on Github offered some really great examples of how to use Mass Entity in UE5. This solution would’ve taken so much longer without these two guys, considering Epic hasn’t even updated their documentation on this Plugin (That’s been in beta since 5.0 btw) in 3 years!
&lt;/p&gt;
&lt;h2 id="the-benefits-of-mass"&gt;The Benefits of Mass&lt;/h2&gt;
&lt;h3 id="batch-processing"&gt;Batch Processing:&lt;/h3&gt;
&lt;p&gt;Instead of invoking a virtual Tick function 100,000 times, Mass Processors iterate over chunks of data in a single massive loop. The Game Thread processed 100,000 entities in “just” ~65ms, a 131x improvement over the Actor system.&lt;/p&gt;
&lt;h3 id="data-locality--memory"&gt;Data Locality &amp;amp; Memory:&lt;/h3&gt;
&lt;p&gt;By packing data efficiently, we achieved linear scalability. Memory usage barely broke 1,250 MB under heavy load, proving the memory bloat of the Actor system was entirely self-inflicted, which was expected.&lt;/p&gt;
&lt;h3 id="collision-stability"&gt;Collision Stability:&lt;/h3&gt;
&lt;p&gt;Because the frame times remained stable, the physics delta-time remained tight. Mass maintained a solid ~18.6% collision accuracy on moving targets even under heavy loads, proving that high-frequency CPU updates easily beat out asynchronous GPU traces. After writing the Dissertation, I managed to expose more functionality to my Mass Collision Processor, letting me tune my projectiles to hit a cool 100% collision accuracy which is awesome! I did try to bake in the collision capabilities that comes with Chaos Physics, but that led to some real funkiness, and frankly probably would’ve caused some issues for the main game thread, so instead the projectiles use a simple line trace, or sphere trace if exposed as a thicker projectile that’s easily delegated to multiple threads. One more little point on the collisions, the reason the original collision accuracy in my bench marking was so low was because the traces were firing at a fixed length, not altered by delta time and velocity like they should’ve been. This simple fix completely fixed any collision inconsistencies.&lt;/p&gt;
&lt;figure&gt;&lt;img src="http://josh-mccamley.com/blog/react-performance/ECS-Gif.gif"
alt="ECS-Benchmark"&gt;&lt;figcaption&gt;
&lt;p&gt;&lt;em&gt;Negligible performance hit below ~12,000 projectiles on screen at once. (On my 6 year old laptop!). More details in the Dissertation.&lt;/em&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;hr&gt;
&lt;h2 id="the-drawbacks--challenges"&gt;The Drawbacks &amp;amp; Challenges&lt;/h2&gt;
&lt;p&gt;While the performance of Mass was staggering, the implementation was brutal. Moving outside of Unreal&amp;rsquo;s standard ecosystem introduced some wild hurdles:&lt;/p&gt;
&lt;h3 id="the-boilerplate--usability-gap"&gt;The Boilerplate &amp;amp; Usability Gap:&lt;/h3&gt;
&lt;p&gt;As my research showed, the performance of Mass comes at the cost of immense implementation complexity. Setting up simple collision required custom Processors, Shared Fragments, and Observers. The lack of documentation makes this framework incredibly difficult to adopt.&lt;/p&gt;
&lt;h3 id="the-ribbon-spaghetti"&gt;The Ribbon Spaghetti:&lt;/h3&gt;
&lt;p&gt;Because ECS manages memory by grabbing the last item in an array and &amp;ldquo;swapping&amp;rdquo; it into a dead entity&amp;rsquo;s slot to keep memory contiguous, array indexes constantly change. When I pushed this data to Niagara to draw ribbon trails and other effects, Niagara assumed Array Index 0 was still Bullet A. When Bullet A died and Bullet Z took its memory slot, Niagara thought the bullet had teleported, drawing a massive ribbon across the map and turning the level into a bowl of florescent spaghetti.&lt;/p&gt;
&lt;h3 id="in-frame-latency"&gt;In-Frame Latency:&lt;/h3&gt;
&lt;p&gt;Unreal&amp;rsquo;s tick groups caused a visual nightmare. Mass&amp;rsquo;s processors ran early in the frame. By the time the player&amp;rsquo;s blueprint fired the weapon and Niagara rendered the bullet, the ECS had already moved it forward for a full frame. If a bullet traveled fast enough, it visually appeared to spawn 250 units in front of the gun barrel. Fixing this required painstaking manipulation of tick phases to ensure rendering data was gathered after gameplay logic but before the GPU draw call, which was entirely a trial and error process.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Moving to a multi-threaded ECS for projectiles proves that by embracing Data-Oriented Design, you can reclaim massive amounts of performance and simulate tens of thousands of bullets on the CPU. However, fighting against Mass&amp;rsquo;s boilerplate and Niagara&amp;rsquo;s visual syncing issues left me wondering if there was a cleaner, more accessible way to achieve this same DOD performance without fighting the engine.&lt;/p&gt;
&lt;p&gt;In my next post, I’ll be diving into a completely different ECS solution that solves these exact problems with a fraction of the code.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Questions? Reach out on
!&lt;/p&gt;</description></item></channel></rss>