GKE Networking Differentiators (Cloud Next '19)

GKE Networking Differentiators (Cloud Next '19)



welcome to the session on gk networking differentiators hello i'm Manzo's product manager on gk networking and am Pavitra software engineer on the same team today we will be discussing how GK networking helps your business grow by catering to your needs of scalability security load balancing and visibility here are some of the features we'll be talking about today on the topic of scalability we'll see how V PC native and optimized IP allocation can help scale your clusters security features like config timeshare V PC Network level features like private clusters and master trace networks and application layer features such as cloud armor and managed ssl certificates learn how you can optimize your web applications with container native load balancing and finally we'll talk about some visibility features like internode visibility and VPC flow logs that will help you analyze your deployments a quick reminder here you're welcome to post your questions as we are going through the content in the slides you can do this online through your next app just go to the next app pick our session hy be two one seven and click on Dori Q&A post your questions or you can upload an existing question if you want us to answer it looking forward to that thank you and for this session today you're gonna do something different to go over the benefits of G K man Jude and I are gonna put ourselves in the shoes of network admins / architects and a fictitious enterprise company this enterprise runs a popular web application given this is tax season let's pretend this is the tax filing application this I've got hugely popular you do it's easy to use interface is an unbeatable price who wouldn't want that right now this enterprise is starting to use gk to scale up their service and we are here the network admins brainstorming architecture i'm the skeptic who is unsure whether gke would match up to our needs and Pavitra is the Advocate who just attended a workshop and learnt about all the cool features that gke has to offer now let's begin our story mm-hmm we have the best problem you could have massive demand we know that we are projected to get several million queries per second especially closer to tax deadline we need to have a clear plan and sketch out our architecture especially the networking we would like to proactively tackle our scalability challenges can GK match up to our needs let's talk through it what's on your mind you know that we already have a GK cluster with several hundred nodes we're obviously using ingress baits based load balancing to expose our services and we have a single virtual private cloud to protect our deployments and have a secure boundary around it my first concern is around the size of the deployments we know that as we build out many many features into our application we would need more nodes a lot more nodes we know that for sure we would be reaching maybe even up to a thousand nodes in our GK cluster can gke how do we even manage networking for such large clusters hmm let's think about that for a minute as the number of nodes increase in our clusters the number of routes and number of IP addresses become important we should start by creating our clusters as VPC native clusters in this mode cluster IP addresses come from three different range ranges the first range is the no range or the primary range of the subnet this is used for assigning node IP addresses the IP addresses for all the cluster nodes there are two secondary ranges as well one is the pod range and then one for the services now each node uses the pod range and takes a smaller slice for assigning to parts that are running on itself so you get a smaller slice of the pod range on each of those nodes now these partridges use something called alias IP addresses which are natively routable this means the logic to route them is built into the networking layer and there is no need to configure any additional routes contrast this with the advanced or clusters that need a route on for every node we create on them we can easily scale up to a thousand nodes with VPC Nadel there are additional benefits to running clusters in this mode firstly the networking layer performs anti-spoofing checks to make sure workloads are not sending packets with arbitrary source IP addresses second the VPC is global so we can create clusters in different regions like US west u.s. east and all those clusters can talk to each other without having to do any peering on our side another benefit that GK provides that's relevant here is that the master VM is automatically scaled up to match the clusters needs it is also automatically upgraded secured and bagged up can you imagine how useful this will be when we might need to repeatedly scale up our clusters as our needs grow it's one less thing to worry about I had come across V PC native as well well as you mentioned we know that every single node is assigned a pod range from which rib eye peas are picked up and allocated to pods running on that node and we know that the default for every single node is a slash 24 which roughly translates to about 110 pots per node now I know for sure that we won't even be close 210 pots per node we don't need that many pots on every single node moreover some nodes might actually have fewer resources which means even fewer pots I'm just worried how will we manage our IP space more efficiently using this scheme hmm what you mentioned does seem like a problem you're right that by default every node gets the setting of 110 maximum pods running on it this means under the hood it is assigned a slash 24 cider range and multiply that by a thousand nodes that's a lot of IP addresses especially if you're not gonna use them luckily GK does allow us to optimize IP allocation we can do this by modifying the maximum pods per node setting this can be done on a per node pool basis we can put all the nodes with smaller resources into their own node pool and manage the setting modify the setting to a different value say we set it to 8 the maximum pods per node is 8 for the smaller node in their notebook that would make the nodes get a slash 28 instead of a slash 24 which is already significant IP savings another thing we could do is that the pod range and the node range can be shared between clusters that are in the same subnet so then this lets us save even more IP space interesting that seems to work for optimizing our IP allocation but I have a greater concern what about security our application handles user PII data like Social Security numbers tax details payment information identity theft is a very real concern we need to make sure that our users feel very safe sharing their sensitive data with us how do we restrict access to our workloads how do we ensure that some person some random person over the Internet or any unauthorized IPS don't get get act access to our clusters you know that compliance is such a big concern compliance is a deal breaker hmm I'm glad you brought up security early on in our discussion there are a few things we can do here we can start by creating our clusters as private clusters in this mode the nodes in the cluster do not get a public IP address that means they're isolated from the Internet the communication between the cluster nodes and the master happens over RFC 1918 addresses which as you know refers to an address space that's completely private and not routable on the Internet enabling this feature is as simple as checking a box when we create the cluster that sounds good for the workloads we should make all our clusters private clusters but gaining access to the control plane what do we do about that I had a hunch you might ask this next there is a solution for that as well it's called master authorized networks with this feature you can specify IP address ranges that are white listed to talk to the master node in addition to creating or originating a connection from a white listed IP one also requires credentials to log on to the master node there is security in layers known as defense-in-depth like we talked about when you create a private cluster you automatically get a master authorized network with a default deny whitelist which means no IP addresses can talk to the master node didn't know about that that that seems pretty useful speaking of access to clusters we need to build a lot of features like tracking tax-deductible donations enabling you to import your w-2 etc so for example we obviously have our core tax filing team and our w-2 importing team now ideally these two teams can independently create and manage their clusters but we know that for sure then do need to share a certain set of network resources such as firewalls routes etc ideally what what should happen is those network resources should be controlled by us but our teams can still create clusters independently how do I ensure that these clusters can communicate with each other without the need for additional VPC peering how do I ensure that my teams are isolated the network resources are managed by us but the teams are still agile hmm let me make sure I summarized the problem right we have the few requirements right firstly you have individual feature teams that want to create their own clusters and manage them independently second these clusters will still use network resources that Vee maintained v as in network admins maintain and have total control over third we want these clusters these team clusters to be able to use those resources but not modify anything like no organization level changes can be done no modify more than what they need to use and last if you want all these clusters and their workloads to be able to talk to each other for requirements right that sounds about right awesome so this is exactly what the feature called shared VPC solves with this feature the network admins can manage centrally all the network resources like load balancers subnets routes and firewalls all this management happens in what is known as the host project individual feature teams will create what are known as service projects which will have use permissions on all of these resources the team's clusters reside in those service projects and they will not be able to modify any of the critical network resources like you can't have the w2 team go and delete a firewall rule for instance we have that isolation and finally since all these clusters are on the same vp c same worship private cloud they all can talk to each other without us needing to do any additional peering so you mean to say that our teams can create and manage their clusters without the need for and talk to each other without the need for additional peering and the network resources are managed by us that sounds great that's exactly what I mean well definitely keep that in mind now let's talk about handling bursty traffic you know that closer to tax deadline when everyone's filing last-minute we will get a very large spike in our traffic especially also when we're giving a promotional rate to some of our users how do we ensure that our application continues to serve our users without going down how do we ensure protection against some of those common web application attacks we obviously hear a lot about anti DDoS protection there might be abusive scripts that try to saturate our network choking real user traffic someone might attempt to shut us down or worse try to steal our data how can GE help us protect against this very valid concerns I agree we must be prepared for the worst when it comes to peak traffic good news is HTTP load balancing Google cloud load balancing specifically HTTP load balancing provides some amount of anti DDoS protection along with the load balancer when you enable SSL proxy load balancing or HTTP load balancing the Google infrastructure automatically absorbs and mitigates several layer four and below attacks such as syn floods IP fragment floods port exhaustion to name a few in addition to this cloud armor is our friend it arms us with the ability to whitelist trusted IP addresses and blacklist abusive IP addresses that might talk to our backends a flexible rules language to customize our defenses and protect against multi-vector attacks is at our disposal it also provides defense against web application attacks such as sequel injection cross-site scripting and several others that looks promising on the topic of application layer security we obviously have a lot of domains and we serve each of that each of those domains using an SSL certificate so we need to for example have have it for each of our enterprise customers as well such as food or tax up comm and an SSL cert for those for one of those domains now I am using a single we are using a single load balancer and we are obviously for now managing all those SSL certificates manually which means we need to keep track of things like expiration dates and renew them when they expire now this is a very fickle setup it can actually cause business loss in case some day we forget to renew them on time I would ideally like to offload this work to some sort of automation and some sort of management than have our teams manually manage our certificates huh I hear you I completely hear you managing certificates with one-off scripts or by hand is something I have done myself and it's not fun it's definitely sounds painful I have some great news for you that GK now offers end-to-end hassle free SSL management SSL Certificates management this means you can have your certificate management work offloaded search for domains will be automatically issued provision on the load balancer renewed as needed and deleted when you delete the load balancer or press the resources that sounds interesting can you tell me more sure you start by creating a custom resource object for your manage certificate you just need to put the domain name of the domains whose search you want managed say we want food or tax are calm to have certificates automatically issued and managed just put that domain in your spec and Link it to your ingre spec done search for this domain will now be like I mentioned automatically issued renewed and deleted when they expire it's not a dream I'm serious that would be a relief for our teams getting out of this business of manually managing certificates you bet let's recap all the security features that we talked about just now firstly we talked about network layer security with private clusters and master authorized networks then we talked about application layer security with cloud armor easy SSL certs management with manage certs and secure delegation of resources through shared IPC that's cool it seems like GCP puts security first but I'm still worried about one more thing latency to our backends our app is getting popular and obviously we are getting going to get that traffic spike I'm worried about that will our pages take several seconds to load that will cause a very frustrating user experience for our users definitely not something you want to put our customers through tell me about it no one likes staring at a blank browser screen right I recently heard about this feature called container native load bar dancing that might help us hear what I know so far is this continued native load balancing allows the load balancer to talk to pods directly and evenly distribute traffic among them how so let's talk about what happens without this feature without container native load balancing requests come to the load balancer and travel to the node instance groups from there they are routed to the correct matching part which might or might not be on the same node as you can see this introduces a connection hop because container native load balancing talks to pods directly the connections have fewer network hops and latency and throughput are improved as a result there are additional benefits to using this more with continued native load balancing you will get to see the round-trip time between the client and your back-end this makes troubleshooting way easier since you get stock driver UI support as well finally the health checking is improved and is more accurate since it directly talks to the parts all this sounds great but what about when things are not working what is the visibility and monitoring story here how do I travel shoot my deployments I was waiting for this one VPC flow logs can help us there you can enable VPC flow logs on a per subnet basis when enabled this feature collects flow data from every node on the subnet but will I see traffic both between two different nodes and we on the same node like intra no traffic yes absolutely you will see traffic between nodes like we just said and with the feature called intra node visibility you will get insight into traffic within a VM that is part of our traffic on the same node this feature run just today so it is hot off the press but will it cover all the surface areas every flow his sample whether it's TCP or UDP but there is between two parts or between a pod and a host on the Internet in your on-prem data center or a Google service and if two parts are communicating and they are both on subnets that have this feature enabled you will see bi-directional traffic between them but if it covers all those surface areas will that latency to my packets that's the best part since VPC flow logs is built into the networking infrastructure there is no additional delay and no performance penalty to routing the log packets to their destination I can't wait to try out this on one of our clusters that sounds very useful me too Pavitra discussing all these features with you and all these capabilities with you I think I feel confident that our tax app will survive this year's traffic spike I definitely learned a lot more about GK and GK networking today and I'm confident that we can keep our application running securely round the clock I'm so glad I could help and I agree with you on all of that ok well that ends our story here and I hope you have got a good overview of the various networking features on GK let's summarize what we talked about once again with GK you can manage and scale your clusters with V PC native and optimized IP allocation GK offers several network security features such as config timeshare V PC for secure delegation of network resources application layer security features such as DDoS protection through cloud armor and managed ssl certificates and network layer security features such as private clusters and master authorized networks optimize load balancing of your web applications with container native load balancing analyze your deployments easily with PPC flow logs and internode visibility if there's one key message you want to take away from my discussion here it is this with GK you get armed with all the tools you need to scale manage and secure your clusters analyzing your deployments is something you got covered with gk using the visibility features we talked about here

5 Comments

  • sukrat kaushik says:

    How does it work if projects are not from same organization. Then shared VPC will not work

  • Manoj Shetty says:

    When we run GKE Kubernetes cluster in "Private cluster" mode, all Masters and Workers nodes cannot be accessed from Internet or External IP. If you want to give secured (firewall whitelisting rule) access to Master nodes, then enable "Master Authorized Networks" setting

  • Manoj Shetty says:

    Run GKE Kubernetes cluster in "VPC native" networking mode, which leads to using less number of IP's for Kubernetes Pods and Services

  • Manoj Shetty says:

    Google GKE – Kubernetes cluster can run in "Private cluster" mode, all Master and Worker nodes are assigned Private IP addresses

  • Alfredo Espejel Corvera says:

    Thank you Manjot and Pavithra, awesome and easy to understand talk!

Leave a Reply

Your email address will not be published. Required fields are marked *