Low-Power On-Device AI: TLS & Architecture

A deep dive into low-power on-device AI, model optimization, and how edge inference reshapes TLS, caching, and traffic patterns.

On-device AI is moving from a premium feature to an architectural decision with real consequences for app performance, hosting costs, and security operations. The reason is straightforward: when inference happens locally, you can reduce latency, improve privacy, and lower the volume of round trips to your servers. But that shift also changes your application architecture in subtle ways, including how often clients reconnect, how long infrastructure needs to stay warm, and how aggressively you can reuse hosting capacity planning assumptions. If you are building developer tools, APIs, or mobile apps, the move to edge inference should be treated as a systems design problem, not just a model selection exercise.

This guide takes a practical look at memory- and compute-efficient model architectures for low-power inference, then connects them to TLS session behavior, certificate caching, and traffic patterns from a hosting provider’s perspective. The goal is to help developers choose the right architecture for device diagnostics and support workflows, while helping infrastructure teams understand how client-side inference can change cache hit rates, session lifetimes, and origin load. The backdrop matters: memory is getting more expensive, high-end AI memory demand is distorting supply chains, and the industry is being pushed to be more efficient with every byte and watt.

1. Why On-Device AI Is Becoming a Default Architecture

Latency, privacy, and resilience are the obvious wins

The strongest argument for on-device AI is not that it is fashionable, but that it removes avoidable network dependency. When a model runs locally, the user sees instant responses for tasks such as summarization, classification, autocomplete, image enhancement, or device troubleshooting. That improves perceived performance and reduces sensitivity to short network outages or TLS handshake delays. It also keeps sensitive inputs such as typed text, photos, and device telemetry closer to the user, which is increasingly important for privacy-conscious products and enterprise deployments.

The BBC’s reporting on smaller data centers and local AI compute reflects a broader trend: some workloads that once required centralized infrastructure can now be handled at the edge or on-device. That does not mean the cloud goes away; it means the center of gravity shifts. For more context on how this changes infrastructure thinking, see getting started with smaller, sustainable data centers and private cloud modernization strategies. In practice, the best systems are hybrid: local inference for fast, private, low-cost tasks and cloud backends for heavier reasoning, sync, audit, and long-context work.

The economics are forcing architectural discipline

Memory pressure is one of the biggest hidden forces shaping AI product architecture. The BBC’s coverage of rising RAM prices makes the issue tangible: AI infrastructure demand is tightening the supply of memory across devices and servers, and that cost pressure cascades through the stack. On-device AI reduces some server-side spend, but it increases the importance of efficient model size, quantization, and memory bandwidth usage on the client. In other words, you do not escape memory constraints; you move them to a different layer.

This is where product teams need to become more deliberate. If you are planning release cycles or hardware assumptions, it helps to watch pricing and component supply trends the way operators watch dependency health. Resources like what memory price signals mean for deal hunters and timing high-end GPU purchases offer a useful analogy: availability shifts quickly, and what is cheap in one quarter can become a bottleneck in the next. If your app assumes every device can run a 2B-parameter model, you are likely overfitting to premium hardware.

What changes for hosting providers and platform teams

Hosting teams often focus on backend throughput, but on-device AI changes the shape of traffic rather than simply reducing it. Fewer inference calls may reach the origin, yet the calls that remain can become more bursty, more stateful, and more sensitive to session continuity. You may see longer-lived connections, less frequent API polling, and more background syncs after local computation completes. That makes it more important to understand request batching, cache TTLs, and TLS ticket reuse.

If you operate APIs or shared hosting environments, you should track the shift using the same discipline used for capacity planning and service-level decisions. A good companion read is from data center KPIs to better hosting choices, which shows how to ask the right questions of providers. Another helpful framing comes from how infrastructure vendors should communicate AI safety features: trust is no longer just about uptime, but about how well your stack performs under a changing workload mix.

2. Core Design Patterns for Low-Power Inference

Distillation: keep the behavior, cut the bulk

Knowledge distillation remains one of the most practical techniques for on-device AI because it transfers useful behavior from a larger teacher model into a smaller student model. This can preserve task accuracy surprisingly well for classification, extraction, and lightweight generation tasks, while dramatically reducing parameter count and memory footprint. Distilled models are especially effective for apps that need short, focused outputs rather than open-ended reasoning. They are also easier to update and ship, which matters when your target device has limited storage or intermittent connectivity.

In deployment terms, distillation changes the API contract. Instead of shipping a general-purpose remote model that answers everything, you often split the product into a local model for intent detection, routing, or first-pass responses, then escalate to a backend only when necessary. That reduces server calls and can lower TLS handshake frequency for trivial interactions, because more user journeys terminate locally. For developers building workflows that depend on support triage or diagnostics, prompting for device diagnostics is a strong example of how a small model can solve a big operational problem without always reaching the cloud.

Quantization: trading precision for efficiency

Quantization is the backbone of low-power inference. By reducing weights and activations from 16-bit or 32-bit representations to 8-bit, 4-bit, or mixed precision formats, you shrink memory usage and often improve latency on hardware with optimized kernels. The main tradeoff is accuracy, especially on tasks that are sensitive to small numerical changes. The best results usually come from evaluating a few quantization strategies, not assuming one size fits all.

For teams deciding whether to quantize aggressively, think in terms of product criticality. A spelling assistant or intent classifier can usually tolerate stronger compression than a medical or financial model. Use calibration data, measure task-specific degradation, and test the model on real user inputs, not synthetic benchmarks alone. For broader context on choosing and timing platform upgrades, upgrade timing guides can be surprisingly relevant: hardware constraints are part of the economics of software architecture.

Pruning, sparsity, and routing: remove what the device never needs

Pruning removes unnecessary weights or entire channels from a model, while sparse architectures try to avoid activating the full network on every input. These approaches are useful when you need to preserve a larger effective capacity without paying full compute cost at inference time. In modern mobile and embedded environments, structured pruning tends to be more hardware-friendly than unstructured sparsity because it maps better to kernels and accelerators.

Mixture-of-Experts-style routing can also work in constrained settings if only a tiny subset of experts is loaded or activated locally. The challenge is operational complexity: sparse or routed models often require more careful packaging, fallback logic, and runtime observability. That is why teams should pair model optimization with a solid release discipline, similar to how firmware teams handle partial rollouts in resilient IoT firmware. The principle is the same: reduce runtime cost without making the system fragile.

3. Hardware Realities: Memory, Bandwidth, and Heat

HBM is not the only memory story, but it defines the high end

High Bandwidth Memory, or HBM, is often associated with training and large inference servers, but it influences the entire ecosystem because it absorbs supply and drives memory pricing across markets. That matters to on-device AI developers for two reasons. First, the cost of server-grade memory affects cloud economics and may change the break-even point for pushing more work to the client. Second, HBM’s prominence signals just how bandwidth-hungry modern AI has become, which is a warning to anyone assuming local devices can tolerate inefficient architectures.

On-device inference typically relies on LPDDR, unified memory, or mobile SoCs with specialized accelerators, not HBM. But the lesson transfers: bandwidth is often the real bottleneck, not raw FLOPS. A model that fits in memory but thrashes cache lines will feel slow and power-hungry. Engineers should profile memory access patterns, use operator fusion where possible, and avoid architectures that move large tensors back and forth unnecessarily. For a market perspective on memory scarcity and supply pressure, see memory price pressure and temporary reprieves.

Heat and battery budgets are product constraints, not afterthoughts

Low-power inference must respect thermal envelopes and battery life. A model that is fast for 10 seconds but drains a phone aggressively is not a viable production design. This is why product teams should measure energy per inference, sustained performance over time, and the impact of background tasks such as syncing results or refreshing remote context. Some developers discover too late that their local model is technically feasible but operationally noisy.

There is a practical design pattern here: do the minimum viable work on-device, then schedule heavier tasks opportunistically. For example, run a compact classifier locally, cache the decision, and only upload anonymized telemetry in batches. That reduces network chatter and protects user experience. It also changes when TLS sessions are opened and how long they need to stay alive, because the app is no longer chatting constantly with the server just to ask small questions.

Why smaller models can still be better systems

One of the mistakes teams make is optimizing for model capability while ignoring system efficiency. A smaller model that runs reliably on five-year-old hardware may create more business value than a larger, more impressive model that only works on the latest flagship devices. This is especially true for cross-platform apps, enterprise tools, and distributed field devices. Real-world deployment is often about heterogeneity, not perfection.

That thinking mirrors the move toward smaller, more distributed infrastructure, explored in smaller sustainable data centers and private cloud modernization. The question is not whether you can centralize everything, but whether you should. When local inference removes 80% of repetitive traffic, your backend can be redesigned for higher-value interactions instead of being a catch-all compute layer.

4. Application Architecture Patterns That Work Well on Device

Split the workflow into local-first and cloud-assisted stages

The most effective pattern for many apps is a two-stage pipeline. Stage one runs locally and handles fast, frequent, low-risk tasks such as entity extraction, ranking, summarization of short text, or UI assistance. Stage two is cloud-assisted and reserved for tasks that need long context, expensive reasoning, shared state, or governance. This keeps the user experience responsive while preserving the flexibility of a centralized backend.

Architecturally, the split changes your API surface. Instead of sending every keystroke or sensor event to the server, you may send only high-confidence events, aggregated summaries, or exception cases. That can reduce request rate, compress payload sizes, and shift your traffic toward fewer but richer interactions. For teams building systems with clear decision boundaries, clinical decision support guardrails are a useful reference point, even outside healthcare, because they emphasize when to trust automation and when to escalate to human or remote review.

Use caches intentionally: model cache, semantic cache, and certificate cache

On-device AI introduces caching at multiple layers. The app may cache embeddings, conversation context, feature vectors, or local results so it can avoid recomputation. At the network layer, the client may also reuse TLS sessions, which becomes more important when fewer requests are made but the app still makes periodic sync calls. Well-tuned caches reduce power usage and lower server load, but they require invalidation logic and telemetry.

Hosting teams should not overlook certificate and session behavior here. If an app now spends more time computing locally and sends only periodic bursts, the importance of certificate caching and session resumption rises because each network call is comparatively more expensive. A poor TLS configuration can erase the latency benefits of edge inference by forcing a full handshake on every sync. If your organization manages many endpoints or service tiers, the reasoning in building trust in AI through security measures and communicating safety features to customers is useful: trust is operational, not just conceptual.

Design for offline tolerance and graceful sync

On-device AI products should degrade gracefully when offline because one of the core advantages of local inference is that it works under weak connectivity. This requires a careful separation between local state and server state, plus conflict handling for when the device eventually reconnects. Good offline-first systems batch work, store timestamps, and avoid assuming a stable session in progress. This is particularly important for mobile apps and field tools.

When you design for offline tolerance, your TLS lifecycle changes too. Instead of a steady stream of small authenticated requests, you may see fewer but more critical reconnects. That places a premium on session resumption, the ability to recover from network changes, and long-lived trust anchors. If you have teams responsible for compliance or regulated data flows, the mindset in credit ratings and compliance and digital compliance rollouts can help frame the operational discipline required.

5. TLS, Certificates, and the New Traffic Shape

Fewer requests do not always mean less TLS work

At first glance, on-device AI should reduce network traffic, and therefore reduce TLS overhead. That is partly true: if a user asks the device to summarize a message locally, your backend never sees the raw prompt, the intermediate tokens, or the repeated polling required for remote inference. But many systems then reintroduce network calls through sync, telemetry, model updates, policy refreshes, and selective escalation to the cloud. The result is not necessarily less TLS volume, but a different pattern: lower steady-state chatter, more bursty synchronization, and more pronounced long-tail session behavior.

This matters because TLS is optimized for both security and performance only when clients behave predictably. If your application suddenly moves from chatty polling to periodic bulk sync, you need to revisit ticket lifetimes, keep-alive strategy, connection pooling, and backend idle timeout settings. Otherwise the app may pay the cost of new handshakes right when it reconnects after a local inference batch. For a general hosting operations lens, see how to evaluate providers using data-center KPIs and API platforms under load, both of which reinforce the value of measuring connection patterns rather than raw request counts alone.

Certificate caching becomes more important, not less

Modern TLS stacks already cache certificates, session tickets, and intermediate state, but on-device AI shifts the economics of cache misses. When each network call is rarer and more valuable, a missed session resumption or a cold certificate path can create a noticeable spike in latency and energy use. This is especially true for mobile apps on variable networks, where connection setup can dominate the total cost of a small sync request. If you were previously able to ignore a few milliseconds of handshake overhead, you may no longer have that luxury.

For hosting providers, this means tuning edge termination and CDN behavior for intermittent but high-value calls. Keep session ticket rotation practical, confirm that certificates are served efficiently across regions, and ensure your load balancers are not unintentionally defeating reuse. If your team is watching broader infrastructure signals, pairing this with hosting KPI guidance and smaller data center strategies will help you align architecture with the actual traffic shape.

Traffic patterns change from volume-heavy to value-heavy

When more computation happens on-device, your traffic becomes less about brute-force volume and more about moments of intent. The user may send a compressed feature vector, a brief exception report, a confidence score, or a sync bundle after the local model finishes its work. That means your backend should be optimized for short bursts, edge termination, rate-limit resilience, and intelligent retries. It also means logging and analytics need to capture event meaning, not just request frequency.

A practical implication: your TLS monitoring should be joined with application metrics. Track handshake rate, session resumption percentage, ticket reuse, request size distribution, reconnect churn, and average time between local inference and remote sync. That will show whether your on-device strategy is improving efficiency or simply moving load around. For teams that care about product-level insights, the thinking behind measuring influence through link strategy is a useful reminder that behavior changes are visible only when you measure the right signals.

Pattern	Client Cost	Server/TLS Impact	Best Use Case	Operational Note
Distilled intent model	Low memory, low compute	Fewer API calls, fewer handshakes	Routing, classification, support triage	Great for older phones and embedded devices
Quantized local summarizer	Very low memory, moderate compute	Periodic sync traffic, bursty TLS	Notes, message drafting, offline assistance	Watch accuracy on domain-specific language
Pruned sparse model	Low to moderate compute	Less steady chatter, more event-driven sync	Vision, extraction, ranking	Structured pruning tends to deploy more reliably
Hybrid edge + cloud reasoning	Low local / higher remote on demand	Variable TLS load, escalation spikes	Complex queries, compliance workflows	Design for session reuse and fallback retries
Fully local offline-first workflow	Highest local responsibility	Minimal traffic, but critical update bursts	Field tools, privacy-sensitive apps	Model update and certificate refresh flows matter most

6. Practical Evaluation Framework for Developers

Benchmark the right metrics, not just accuracy

For on-device AI, accuracy alone is a misleading metric. You also need to measure cold-start time, warm inference latency, model load time, memory peak, energy per task, and failure mode behavior under thermal throttling. If an app becomes sluggish after 20 minutes of use, it is not ready. Likewise, if it works only with an ideal network and a fresh battery, it is not production-grade.

Start with a representative dataset of real device inputs and run tests across hardware tiers. Include older phones, lower-memory laptops, and the least capable supported devices, because those will define the edge of your reachable market. This is similar to how teams should think about rollout risk in resilient firmware design and automated compatibility testing across device lineups: the main challenge is not the happy path, but the long tail.

Test network behavior under realistic user journeys

Once local inference enters the product, your network tests should simulate real user behavior. A user may open the app, perform several local actions, then reconnect later in a different location and sync a bundle of data. That pattern is very different from a continuous stream of server-side inference requests. You need to validate that session resumption works, idle connection timeouts are not too aggressive, and retry storms do not occur after transient failures.

For teams operating apps with media, productivity, or support flows, the infrastructure lessons are similar to those in migration strategies for seamless integration and high-availability communications platforms. Smooth behavior is rarely accidental; it comes from testing the entire user journey, not just the model inference call.

Build observability for model, device, and network together

Observability should join three layers: model performance, device health, and network behavior. At the model layer, track accuracy, confidence, and fallback rates. At the device layer, monitor memory pressure, thermal state, battery drain, and crash rates. At the network layer, track TLS handshake frequency, session reuse, cache hit ratio, and sync latency. Only then can you tell whether moving work on-device is delivering the intended benefit.

Teams that get this right often discover new product opportunities. For example, a local model may be good enough to provide instant feedback, while the cloud can be reserved for deeper review or audit logs. That tiered design reduces backend load and improves perceived quality. It is also a cleaner story for enterprise buyers who want clear boundaries around privacy and processing, which is why security evaluation and trust communication matter as much as raw performance.

7. Deployment and Operations: What Hosting Providers Should Expect

Expect fewer but more meaningful origin requests

Hosting providers should prepare for a workload profile that looks calmer in aggregate but sharper at the edges. On-device AI can reduce repetitive inference traffic, but it may increase the importance of occasional syncs, policy fetches, audit uploads, and model update downloads. These operations tend to be heavier and more important than the tiny calls they replace. That means origin protection, burst handling, and edge cache design deserve renewed attention.

Providers that optimize only for request count may miss the bigger picture. They should instead monitor transfer size, session reuse, TLS handshake ratio, and cache efficiency across client cohorts. If you want a framework for asking providers the right questions, revisit data center KPI guidance. For organizations thinking about smaller, distributed deployment topologies, smaller sustainable data centers provides a complementary view of why distributed compute is becoming normal.

Edge inference can alter cache behavior downstream

Because more computation happens on the client, backend responses may become more personalized, less frequent, and more valuable per request. That changes CDN and application cache strategy. A remote call may now contain an aggregated summary rather than a raw stream, which can improve cacheability in some cases and reduce it in others. If the payload becomes unique to the user’s local state, cache hit rates may actually fall even though traffic volume drops.

That is why the right optimization is often semantic, not merely mechanical. Cache static model files, policy documents, and certificate artifacts aggressively, but avoid assuming that the application payloads themselves will remain cache-friendly. As with any infrastructure shift, measure before and after, then adjust rather than guessing. This approach is consistent with the practical tuning mindset found in AI security evaluation and vendor trust communication.

Lifecycle management becomes part of product design

On-device models need lifecycle policies: versioning, compatibility, rollback, download resumption, checksum validation, and eventually deprecation. The same applies to certificates and TLS trust chains in the broader service architecture. If your app ships local models and relies on periodic cloud refreshes, model lifecycle and certificate lifecycle should be coordinated so that devices do not get stuck with stale content or broken trust. This is especially important for enterprise deployments, where long-lived devices may lag behind on updates.

Operational maturity means thinking in terms of release trains and support windows. That mindset is familiar to teams accustomed to staged changes and compatibility matrices, as reflected in testing matrices for device compatibility and firmware resilience patterns. The difference is that now your product includes a model as a deployable artifact, not just code.

8. Common Failure Modes and How to Avoid Them

Overestimating the device fleet

One of the most common mistakes is assuming your users all have access to premium hardware with ample memory and neural acceleration. In reality, many fleets are mixed, aging, or constrained by corporate procurement cycles. If your model depends on massive RAM headroom or a recent processor generation, your addressable market shrinks quickly. Good product design starts by identifying the lowest supported device tier and building from there.

This is where memory awareness becomes strategic. The same market dynamics that are pushing up RAM prices across the industry can also expose hidden product fragility. If your software suddenly becomes memory-hungry, you may find that some devices can no longer handle it well. For additional perspective on how memory pricing affects buying decisions and hardware timing, see memory price signals and upgrade timing guidance.

Forgetting the network is still part of the system

Teams sometimes treat on-device AI as if it eliminates the backend. It does not. It simply changes the contract between client and server. You still need model updates, trust verification, analytics, user identity, policy enforcement, and telemetry. If your TLS sessions are poorly managed, the app may gain local speed but lose remote efficiency when it reconnects. Likewise, if certificate deployment is clumsy, you risk outages precisely when the app tries to sync critical data.

The best practice is to design the network as a sparse, high-quality control plane instead of a continuous compute plane. The local device does the heavy lifting; the server handles governance, aggregation, and durable state. For organizations working through these questions, compliance awareness and regional compliance lessons provide a useful reminder that architecture decisions often have policy consequences.

Shipping a model without an operational budget

A small model can still be expensive if it is updated too often, measured poorly, or packed into a bloated app bundle. The operating cost of on-device AI includes content delivery, update orchestration, test coverage, telemetry, and support. You should estimate the total cost of ownership, not just the inference cost. A model that saves two seconds per request but doubles support cases is not a win.

That is why teams should pair launch decisions with observability and staged rollout. Think in terms of product, infrastructure, and support as a single system. In that sense, the discipline is similar to preparing for infrastructure volatility in resilient firmware deployment and managing changing traffic in communications platforms under live-event load.

9. Implementation Checklist for Teams

Model selection checklist

Choose a model size that matches the task, not the marketing promise. Prefer distilled or quantized models for routing, classification, and concise generation. Use pruning or sparsity only if your deployment target and runtime are well understood. Validate memory peak, thermal behavior, and quality on the exact hardware class you intend to support. If your app needs a remote fallback, define the trigger conditions explicitly.

Architecture checklist

Separate local inference from cloud sync. Batch uploads, defer nonessential tasks, and preserve offline functionality. Design APIs so that a local decision can be committed later without requiring constant connectivity. Tune TLS session reuse, keep-alive, and certificate caching for bursty rather than chatty traffic. Make sure your observability stack can correlate model events with network events.

Operations checklist

Coordinate model updates with certificate and configuration refreshes. Use staged rollouts and explicit rollback paths. Monitor request mix, handshake rate, and sync size, not just total requests. Review hosting KPIs periodically and validate whether your infrastructure still matches the new pattern of usage. For teams that need an operational benchmark, revisit hosting KPI guidance, smaller sustainable data centers, and trust communication best practices.

10. Conclusion: On-Device AI Is a Systems Decision, Not Just a Model Decision

Low-power on-device AI is best understood as a full-stack optimization strategy. The winning architecture is rarely the biggest model; it is the one that fits the device, preserves battery, reduces unnecessary network use, and integrates cleanly with your TLS and certificate posture. The shift toward edge inference can lower latency and protect privacy, but it also changes traffic patterns in ways that matter to hosting providers and DevOps teams. If you do not measure TLS session behavior, certificate caching, and reconnection churn, you may miss the real costs and benefits of the change.

For developers, the practical path is to use distillation, quantization, pruning, and hybrid local-cloud workflows to keep models lightweight and reliable. For infrastructure teams, the task is to tune session reuse, cache strategy, and burst handling to fit a world where fewer requests carry more meaning. The products that win will be the ones that treat model optimization, application architecture, and transport security as one coordinated design space. That is the real promise of on-device AI: not just intelligence at the edge, but better systems everywhere.

FAQ

What is on-device AI?

On-device AI refers to machine learning inference that runs directly on the user’s phone, laptop, or embedded device instead of a remote server. It is used for tasks like classification, summarization, ranking, and lightweight generation. The main benefits are lower latency, better privacy, and improved offline support.

How does low-power inference differ from regular inference?

Low-power inference is optimized to use less memory, less compute, and less battery. It typically relies on techniques such as quantization, distillation, pruning, and hardware-specific acceleration. Regular inference may prioritize accuracy or flexibility, while low-power inference prioritizes efficiency and predictable execution on constrained devices.

Why does on-device AI affect TLS sessions?

Because local inference reduces the frequency of server calls but can make the remaining calls more bursty and important. That changes how often clients reconnect, how long sessions should remain valid, and how much you benefit from ticket reuse or session resumption. In short, TLS traffic becomes less continuous and more event-driven.

Does on-device AI reduce certificate caching needs?

No. In many cases, it increases the importance of certificate caching and efficient TLS setup because network calls become less frequent but more latency-sensitive. If a client only reconnects occasionally, you want those reconnects to be fast and reliable. Poor certificate handling can negate the gains from local inference.

What model optimization technique should I start with?

For most teams, quantization is the simplest starting point because it delivers immediate memory and latency gains with relatively low implementation complexity. Distillation is a strong next step if you need a smaller model that still preserves behavior well. Pruning can help further, but it usually requires more careful benchmarking and deployment validation.

How should hosting providers prepare for edge inference traffic?

They should expect fewer but more meaningful API calls, more bursty syncs, and more sensitivity to session reuse. That means watching handshake rates, cache hit ratios, request sizes, and retry patterns. Providers should also ensure that edge termination, load balancing, and certificate delivery are tuned for intermittent mobile-style traffic rather than constant polling.

Getting Started with Smaller, Sustainable Data Centers: A Guide for IT Teams - Learn how distributed compute changes capacity planning and ops.
Private Cloud Modernization: When to Replace Public Bursting with On‑Prem Cloud Native Stacks - A practical guide to infrastructure control and hybrid workloads.
Design Patterns for Resilient IoT Firmware When Reset IC Supply Is Volatile - Useful patterns for staged rollout and constrained devices.
Testing Matrix for the Full iPhone Lineup: Automating Compatibility Across Models - Build broader test coverage for heterogeneous client fleets.
Building Trust in AI: Evaluating Security Measures in AI-Powered Platforms - A security-first lens for AI product and platform teams.