Interactive pedagogical digital twins for external unsteady aerodynamics require solvers that offer both speed and physical accuracy. Phenomena such as boundary-layer development and separation are challenging to capture accurately, yet essential for conveying realistic aerodynamic behavior. Data-driven (machine learning) approaches exist, but can only restitute accurate simulations for fixed geometrical configurations. It is then of primary importance to develop a real-time CFD solver to handle variable or deformable geometry. Traditional high-fidelity CFD methods, however, often entail extensive computational times and are thus unsuitable for these purposes or require too many computational nodes. The proposed methodology is based on developing a Lattice Boltzmann Method (LBM) kernel on Graphics Processing Units (GPUs) because of their high parallel throughput, superior memory bandwidth, and capability to handle large problem sizes in real time. Within this framework, the LBM (Qian et al. 1992; He et al. 1997) has proven to be especially amenable to GPU acceleration due to its inherent locality. Rather than directly solving the Navier-Stokes equations, LBM computes fluid flow by evolving particle distribution functions (PDFs) on a discrete lattice. Its streaming and collision steps allow each lattice cell to be processed independently, matching the massively parallel architecture of GPUs (Latt, Malaspinas, et al. 2021; Bauer et al. 2021; Latt, Coreixas, et al. 2021; Lehmann et al. 2022). Building upon recent algorithmic optimizations (Kummerländer et al. 2023; Lehmann 2022; Ma et al. 2023), we reduce memory usage by consolidating the distribution functions into a single array, thereby accelerating the propagation step, which is typically memory-bound. Our GPU-based LBM implementation has been extended to three-dimensional flows and advanced boundary treatments. To handle variable geometries, a uniform and fixed mesh is imposed and the underlying under-resolved wall mesh is treated with immersed boundary methods (IBM) in conjunction with wall-law boundary conditions. This approach is essential for external aerodynamic applications, where complex geometries and boundary layers must be resolved quickly without sacrificing accuracy. Performance benchmarks demonstrate that our solver can exceed 10 billions cells updates per second in three dimensions, with nearly twice that in two dimensions. These rates are sufficient to enable interactive simulations at low to moderate Reynolds numbers, offering real-time visual feedback on evolving flow structures. Comparative studies with other GPU-based LBM implementations confirm the algorithm's performance, placing our solver at the theoretical memory-bound limit according to roofline analysis. Benchmarks against canonical cases indicate that, despite aggressive memory optimization, key flow physics, such as boundary-layer formation, vortex shedding, and aerodynamic force coefficients, are preserved. To illustrate its applicability, we simulate the flow around a bluff-body obstacle and demonstrate how the solver captures boundary-layer development, unsteady separation, and periodic vortex shedding. The results align with published references, underscoring both accuracy and responsiveness. By leveraging GPUs and a memory-optimized LBM framework, we offer a robust yet efficient solver ideally suited for educational digital twins, bridging the gap between real-time interactions and high-fidelity flow simulations. Ongoing work includes further stabilization techniques, refinement of boundary treatments, and expansion to higher Reynolds number regimes to enhance the solver's utility in a broader pedagogical context. This combination of localized GPU computation and a physics-based LBM core ensures a physically accurate digital twin for students.