Inference on Networke
Welcome to inference on Networke
Networke offers a variety of scalable inference solutions, all designed with two primary goals in mind: cost efficiency and seamless scalability.
Common Inference Questions To begin, here are some frequently asked questions and answers about inference on Networke Cloud.
Common inference questions
1:Should I use a request queue to hold requests until compute becomes available?
Answer: ⚠️ Probably not.
A request queue is typically used to ensure that each inference request is submitted and run as GPU resources become available. While this approach is common on other cloud platforms, it is usually unnecessary and not recommended on Networke.Instead, rely on Networke’s autoscaling capabilities, which automatically adjust compute resources to meet your inference demands. This provides a simpler and more efficient solution for managing workloads.If your workflow requires or prefers a request queue for specific reasons, please reach out to Networke support to discuss your needs.
2:Should I use, or continue to use, GPU "bin packing" on Networke?
Answer: 🚫 Never.
GPU "bin packing" refers to loading multiple models onto a single large GPU or swapping models in and out of a single GPU as needed. This method is highly discouraged on Networke and is generally unnecessary due to the wide range of GPU types and sizes available.
Instead, select a GPU that fits the size of your workload and take advantage of Networke's autoscaling capabilities. Autoscaling allows resources to scale up or down dynamically, including scaling to zero when no resources are needed. This cleaner, more efficient approach reduces complexity and saves on resource costs.
3:Should I purchase compute in the data center located as close as possible to my end users?
Answer: ⚠️ Not necessarily.
While it’s common to prioritize compute hosted in data centers near end users to reduce network latency, this is not always the best approach on Networke.
For GPU-intensive workloads, such as those using large models, the time required for model computation typically outweighs the minor latency added by selecting data centers farther from end users. It’s generally more effective to prioritize data center regions with compute nodes best suited to your workload size rather than focusing solely on geographic proximity.
To compare network latency across data centers, you can use a service like CloudPing. Click the "HTTP Ping" button to initiate a ping test for all available data center locations and compare latency directly.
4:Does Networke cache Docker images?
Answer: ✅ Yes
Docker images are automatically cached on Networke to eliminate the need for repeatedly pulling large images from external registries, ensuring faster and more efficient deployments.
Storage on Networke
Before models, checkpoints, and input data can be loaded for use, they must be stored in an appropriate location—either in remote storage or on a drive local to the compute infrastructure.
Where are models, checkpoints, and input data stored?
Networke Object Storage: Ideal for smaller models or those serialized with Tensorizer.
Networke Accelerated Object Storage: Best suited for larger models, particularly those serialized with Tensorizer.
All-NVMe Network-Attached Storage: Designed for storing large models, including those serialized with Tensorizer.
Networke Object Storage
Best for: Training code, training checkpoints.
Networke Object Storage is an S3-compatible solution that enables flexible and efficient data storage and retrieval. Features include:
Multi-region support: Store and access your data from multiple geographic locations.
Ease of use: Quick to set up with straightforward SDK integrations.
Scalable performance: Handles a wide variety of workloads, from smaller datasets to high-throughput operations.
All-NVMe Network-Attached Storage
Best for: Training code, datasets
Networke's all-NVMe storage tier delivers high-performance block storage volumes, making it an ideal solution for storing datasets or training code. These virtual disks outperform local workstation SSDs and are scalable up to petabyte levels.
Presented as generic block devices, they are recognized by the operating system as traditional, physically connected storage devices, ensuring strong compatibility and seamless integration.
Last updated