应用开发运维标准实践

06 Apr 2019 . . Comments #operation #application #process

最近的工作是带一个SRE 团队,打造公司的基础架构平台,并将公司大大小小的各种系统 迁移到云上面去,在这个过程中,为了使各种应用能够顺利上云,并且按照高标准的要求 开发运维,我们提出了在上线前的如下标准,包括开发,运维和流程等方面,在这里记录 一下

  • Commit your code to central git repo as often as possible 
  • Protect master branch, only leader have master permission, developers uses dev and feature branch to do development.
  • Create code review machanism, any code has to be reviewed before it is merged to master
  • Setup automated CI, use this CI process to check the quality of each commit.
  • Load balance the all services to avoid single point of failure
  • Put different services on different VM sets
  • Enforce end to end https encription for external services
  • Design stateless services so that is can be load balanced via round robin mechanism 
  • Use cloud offered PaaS service, eg RDS, instead of self setup.
  • Use dev/qa/prd environment for testing and release pipeline
  • Use immutable build to ensure the same package is promoted from QA to production environment.
  • Setup continues deployment pipeline with no downtime deployment capability
  • Decouple computer resource and storage resource, put application files and data in to data disk
  • Run the application with a service account on the OS.
  • Setup automated startup & shutdown.
  • Log-rotate service logs If it writes on disk logs.
  • Integrate with SystemD,  so that the application can be treat as an OS service and managed by systemctl 
  • Use automated continus deployment method to do OS configuration and application setup
  • Send application logs to a centerlized logging server
  • Format your log with a perdefined fomat
  • Integate your log with a search engine and dashboard solution
  • Provide standard health API 
  • Integate the health API with monitoring solution 
  • Send VM metrics to a standard time serial database and dashboard solution
  • Send cloud PaaS metrics to a time serial database and dashboard solution
  • Send application metrics to a time serial database
  • Setup VM alerts based on VM metrics
  • Setup cloud PaaS service alerting based
  • Setup application health alert
  • Setup application KPI/API alerts based on application metrics
  • Setup 24 * 7 alerting
  • Setup VM Performance Metrics Dashboard 
  • Setup Application Performance Dashboard
  • Setup Application KPI/API Dashboard
  • Setup backup for application VM if neccessary
  • Setup backup solution for application data 
  • Setup application own SLA
  • Setup application change process
  • Setup application incident process
  • Setup application operational scorecard
  • Understand Platform & Cloud support policy and SLA