Realistic experimentation is a key component of systems research and industry prototyping, but experimental clusters are often too small to replay the high traffic rates found in production traces. Thus, it is often necessary to downscale traces to lower their arrival rate, and researchers/practitioners generally do this in an ad-hoc manner. For example, one practice is to multiply all arrival timestamps in a trace by a scaling factor to spread the load across a longer timespan. However, temporal patterns are skewed by this approach, which may lead to inappropriate conclusions about some system properties (e.g., the agility of auto-scaling). Another popular approach is to count the number of arrivals in fixed-sized time intervals and scale it according to some modeling assumptions. However, such approaches can eliminate or exaggerate the fine-grained burstiness in the trace depending on the time interval length. The goal of this paper is to demonstrate the drawbacks of common downscaling techniques and propose new methods for realistically downscaling traces. We introduce a new paradigm for scaling traces that splits an original trace into multiple downscaled traces to accurately capture the characteristics of the original trace. Our key insight is that production traces are often generated by a cluster of service instances sitting behind a load balancer; by mimicking the load balancing used to split load across these instances, we can similarly split the production trace in a manner that captures the workload experienced by each service instance. Using production traces, synthetic traces, and a case study of an auto-scaling system, we identify and evaluate a variety of scenarios that show how our approach is superior to current approaches.