I created this page to provide a better perspective about my skills and experiences. Unlike my CV, this page contains details like DevOps principles, architectural diagrams, and sample codes.
I will explain each workstream using the DevOps Matrix (next section) and provide code examples where applicable.
I provide a sample microservice (Sample-App) and implemented DevOps practices around it. I will use this often as an example. Please note that the technical implementations are simplified versions of my real corporate experiences.
Target Audience: Must have basic knowledge about Agile, CI/CD, and Cloud.
What is DevOps?
DevOps has different meanings depending on who you're talking to. For a cloud engineer, DevOps is about infrastructure automation. For a Tester, it's about test automation and for a developer, it's about CI/CD, so on and so forth. All of them are correct. However, because of these differences, it becomes confusing and there is no sufficient description that will cover all DevOps areas. That's why I created the DevOps Matrix.
DevOps matrix is a representation of DevOps as a whole. It comprises 9 DevOps workstreams to distinguish practices.
To achieve the full potential of DevOps, an organization must consider all workstreams and not just CI/CD or Cloud.
About the Sample-Apps and other tools used
Source Codes: https://github.com/mikaelvg/devops-demo/tree/master/crud-with-api-testing
Static Code Analysis: https://sonarcloud.io/dashboard?id=mikaelvg_devops-demo // Fix this
For documentation simplicity, I will use specific tools like JIRA as an example for a generic ticketing system.
Agile Adoption
Agile-Adoption is a process and technical implementations that integrate AGILE Practices and DevOps Automations.
JIRA Ticketing System
CI/CD Integration
Label each Stories/Features to what release versions they are included.
Developers Experience (DevEx)
You might already know this word, but it has never been an officialized standard practice. So let me define it by providing the purpose of this workstream.
For example: Deploy the app to a production-like environment and conduct a Systems Integration Test on a local desktop.
1 - How to migrate SVN to GIT
2 - How to convert from ANT to Maven migration
3 - How to "granularize" monolith into small applications. Ex. Jar files or microservices.
4 - Externalize configuration variables and store them in K8s CONFIGS and K8s or Sops SECRETS
5 - How to fix technical debts from SonarQube or other security scans like AWS Security Hub and Detectify.
//TODO: Deploy this in an aws free tier
Home: http://localhost:9080/
Get All Student: http://localhost:9080/api/student
Search by Firstname: http://localhost:9080/api/fetchstudent?fieldName=firstName&value=Mikael
Continuous Integration
I assume you have some level of knowledge about CI/CD concepts. I’ll go straight on how I usually design CI/CDs from my previous projects.
I have implemented this design since 2012 to several companies like ANZ, DBS, SC. It works perfectly fine with the Development Teams & Compliance. It increases delivery speed by removing DevOps Automation related responsibilities away from application developers.
The Diagram-1 below represents the application lifecycle in relation to CI/CD from source codes down to server deployments. It uses a custom branching strategy (Diagram-2). A combination of standard branching strategy (appendix A) and trunk-based strategy.
Diagram-1
Diagram - 2
The Jenkins jobs can be categorized into 7 types, each with different functions. They are:
1. Build-Test
• Runs every code check-in to FEATURE branch.
• For compiling and automated-test executions.
• Run security scans and provide alerts via chat for immediate action.
2. Pull-Request
• A developer requests to merge to the MASTER branch.
• Similar to Build-Test job but has additional checks and functionalities like
o Git-squash into one commit
o Comply with the minimum test coverage
o Comply with the Minimum Security Checks, etc.
• There are two types of pull-request reviewers
o Automated checks, such as minimum test coverage.
o Human checks like technical architecture designs.
3. Master (Integration and Snapshot Release)
• Runs every approved pull request
• Uploads the latest stable version to Docker Registry or Artifactory for libraries like JAR files.
• Deploys a stable snapshot release in the development environment.
4. Release-Package
• Triggered manually by the release manager.
• Creates "release candidate" artifacts, versioning, and tagging.
• Uploads the "released candidate" to Docker Registry or Artifactory.
• GIT tagging, Docker tagging, and Artifactory versioning
5. Release-Deploy
• Not related to any branch.
• Takes a binary or a docker image and deploys it
• Ability to select a version and environment to deploy.
• Manual or automatic deployments.
6. Relfix
• PROD and UAT fixes
• Allows UAT or PROD fixes independently from MASTER branch recent changes.
• Changes will go through a proper testing cycle.
7. Hotfix
• For urgent PROD fix only
• Changes will be applied directly to the current PROD release branch.
• Note: A new PROD branch is created whenever there is a regular production release (not hot-fixes or release-fixes).
3 types of release Artifacts produced within CI/CD
Release artifacts are binaries, source codes, or docker images that are directly deployable to servers.
1- Snapshots Container Images/ Jar Files
2 - Release-Package
3 - (optional) Release-Archive
Central CI/CD Library
Jenkinsfile Traditional Approach
As part of the code deliverables, CI/CD functionalities are typically defined in this file. Each application will have its own Jenkinsfile maintained by a developer. Thus, code duplication is very high and the CI/CD codes are not properly controlled.
Jenkinsfile Library Approach
For the library approach, this Jenkinsfile contains only five to seven lines of code and its purpose is to call the shared library. All build processes, automated testing, reporting, and deployment processes are defined in the library. As far as the application developers are concerned, they only need to add a few lines of code. See sample below. The DevOps team develops and maintains the library.
5 types of Server used in CI/CD
1 - Jenkins Agent - Short-lived Agent
• For compilation, automated tests execution, and package/image creation.
• Created and destroyed automatically by the Jenkins master.
• Life span = build duration.
• A good tool for this is Kubernetes Plugin to create dynamic agents.
2 - Dev Server
• Contains the latest snapshot stable version.
• Developers can deploy anytime.
• Starting from the Dev server, all microservices are deployed together so they can be tested against one another.
• Life span = Adhoc or persistent
3- QA Servers
• Same with the QA server, but the test data are more like production data.
• Life span = Adhoc or persistent
4- UAT Server
• Same with the SIT server, but the test data are more like production data.
• Life span = Adhoc or persistent
5. PROD Server
• Highly resilient and fault tolerant servers
• EKS Nodes are spread across different regions.
Additional Info
• Jenkins application uses 100% Infra-as-a-Code approach. Jobs creations and admin configuration changes are done via GitOps and not via Web console.
Appendix A
You may want to checkout how to implement Centralized DevOps in a nutshell.https://www.devops.ph/centralized-devops
Continuous Test
Continuous Test is not Automated Testing.
Continuous Test has Automated Testing.
Continuous Test is highly integrated with other DevOps workstreams from JIRA tickets to Compliance reports.
Charateristing of Continuous Testing
1 - Spin-up Tomcat Server
2 - Deploy application
3 - Create database and tables
4 - Upload Test data
5 - Execute API level automated test.
6 - Refresh test data
High Test Coverage, Low Test LOC - This single class is for 80+ automated test coverage.
Continuous Deployment
I have a specific definition of Continuous deployment and it differs from Automated Deployment.
As illustrated in this mind map. Automated deployment (Red font) is just a portion of the overall Continuous Deployment workstream.
Regression Test Pack (RTP)
Deployment Rules
I have Jenkins Running on my personal Kubernetes Cluster at home. Should you need access, simply message me.
http://jenkins.devops.ph:8888/
Elastic Infra
Elastic Infrastructure is the term I use for Cloud Infrastructure Management Frameworks like AWS EC2, AWS ECS/EKS, Google Cloud Platform, Kubernetes, Helm, Azure, etc.
I make sure at least 90% of cloud infrastructure resources are written as code (IaaC), including configurations and deployment scripts. I use a single codebase to set up production and non-production environments. Except for the environment-specific variables. I use the following technologies depending on the purpose.
Kubernetes CI/CD and the Servers of each environment type
About the diagram
• There are two Kubernetes Clusters involve. 1 for CI/CD and DevOps related tools like Jenkins, SonarQube, etc.
• And the other is for the server environments like DEV, UAT, PROD.
• EKS comprises multiple nodes spread across different AWS regions.
• My common practice is to have 1 Kubernetes cluster per environment.
• Application deployment procedure comprises two areas:
o Idempotent deployments to set up the infrastructure. i.e. database creation, folders, etc.
o Once the server/environment is set up, the application is deployed next.
• Lastly, the creation/updates of the environment are done via the CI/CD. Ex. The creation of VPCs, K8s clusters, IP whitelisting, etc.
Basic AWS Architecture design to ensure security and high-availability.
The main components descriptions are
Unlike the previous AWS diagram, I stripped out most of the components and focused on the High Availability Design explanations. This diagram illustrates how the Replica Pods are distributed across availability zones (or regions).
Aside from Cloud Load Balancing GCP, there’s another layer of load balancing within the Kubernetes cluster. Also illustrated in the diagram is the Blue/Green deployment. There are several ways to implement Blue/Green deployment. However, in this example, I am using Ingress-Resource as the switch to change from blue to green.
//TODO - in progress
//TODO - in progress
About the diagram
Accounts
There are 3 required accounts:
1.Main - Account owned by DevOps and to host DevOps Tools
2.Non-Prod - Account for Dev, UAT, QA, and other non-production environments.
3.Live - For production use only.
Continuous Monitoring
"Continuous monitoring is the process and technology used to detect compliance and risk issues associated with an organization's financial and operational environment. The financial and operational environment consists of people, processes, and systems working together to support efficient and effective operations. Controls are put in place to address risks within these components. Through continuous monitoring of the operations and controls, weak or poorly designed or implemented controls can be corrected or replaced – thus enhancing the organization's operational risk profile. Investors, governments, the public and other stakeholders continue to increase their demands for more effective corporate governance and business transparency."
-- Wikipedia
The ultimate goal of implementing Continuous Monitoring is to improve the company’s Risk Profile and gain confidence from the investors. However, as a technical person, I don’t intend to bore you to death about Risk Management and or details on how to contribute to the company’s portfolio. I am not an expert in that field either. This section is intended to describe how to monitor items that are related to application development and support.
The target beneficiaries are the Developers and Technical support. Finance is also as beneficiary by reducing the operational costs like the AWS/GCP Cloud billing statements. To achieve the “Company Portfolio” level benefits, we must address the Continuous Monitoring in our own backyard, the Delivery and Operations Team.
There is no complete Continuous Monitoring framework in the DevOps world, so I invented one! This framework consists of existing DevOps practices such as RED & USE Methods.
I invented the "STRATUM DIAGRAM" to clearly define different types and layers of monitoring. Adopted from the word Stratum, which is used by geologists to describe a bed or layer of sedimentary rock that is visually distinguishable from adjacent beds or layers. Each layer provides tell-tales about earth's past events.
For DevOps monitoring purposes, each layer corresponds to a network layer. The elongated blocks (ex. blue) represents the monitoring-scope. Each monitoring scope will have specific metrics and a purpose. By combining the multiple monitoring scopes, this will allow us to pinpoint where issues occur.
Blue monitoring-scope
- A sanity test that retrieves data from the database and checks its value. This data should be public and no authentication should be required.
- Sanity Automated test to read from a page content. The source of the content is an S3 bucket.
Orange monitoring-scope
- A sanity test that retrieves data from the database and checks its value.
- Sanity automated tests that retrieve data from other microservices that are also connected to their corresponding databases
Green monitoring-scope
Technical Implementations: Traceability
Kiali is an observability console for Istio with configuration capabilities like canary traffic adjustments. It helps you to understand the structure of your microservices architecture and also provides the health of your servers. Kiali provides detailed metrics, and a basic Grafana integration is available for advanced queries.
I took the GIF image from my Demo App. It shows the transaction flows between Microservices. Thru Kiali, it is a lot easier to track the flow of transactions compare with the traditional approach that is via console logs.
Kiali is more than just for traceability. You can control the flow of traffic between two versions of the same application. For example, you can use it to deploy a canary version. See the staff-service and the staff-service-risky-version modules (right most). I configured kiali to direct 90% of the traffic to staff-service, a stable version. Then 10% to staff-service-risky-version, the latest version. For some use cases, it is beneficial to try the application first with a small portion of users before making a full release.
Server Resource Utilization
For example, you have Metrics-Server or Prometheus and Grafana installed and configured to monitor your server and have gathered enough data. What will you do with the data? What data will you gather?
Data from CPU and Memory utilization reports must be regular inputs in the way we develop our applications or manage CPU/RAM allocations or autoscalling configurations.
For example, in Dockerfile, Kubernetes pods, and ECS tasks definitions. Initially, we guess the CPU and RAM allocations. Once data becomes available, the configuration of the CPU/RAM values must be tweaked for optimal operations.
// TODO: Add Grafana / Prometheus Graphics example of CPU and Memory usage.
//TODO Jaeger
Continuous Compliance
Efforts from this workstream must not be separated with other DevOps workstreams. Continuous Compliance is a mind-set or a guideline on how we should structure the overall DevOps workflow. For example; After running the Regression Test Pack, it will generate automated test reports to inform the Development Team's progress.
The example below is the tool I will use to implement the traceability matrix report. In simple terms, the Traceability Matrix Report contains the links between Requirements/functionalities, Test Execution and results, Test Data, Application Functionality Code Commits, Automated Testing (Glue) codes, coders name, reviewers name, etc.
As a DevOps engineer, this kind of report must not be created manually but rather a by-product of other processes and technical implementations from different DevOps workstreams. Through the tool, the report is generated automatilcally.
For example
If the traceability matrix report is implemented correctly, from my experiences, 80% of the audit findings related to the Application development can be resolved.
Security
My approach to security can be categorized in two discipline. Detection and Prevention.
These are the technical debts generated from security scans. There are tons of tools you can use from OWASP.org. Security scans must be part of the CI/CD and peer review process on a regular basis.
I am not a hardcore security guy, but I love following security best practices and make sure that these are part of the IaaC codes. Here is a list of AWS security checklist that I usually follow.
IAM
S3
Firewall
Security Group
EC2, VPC & EBS
CloudTrail
RDS