Splunk is a popular choice for log analytics. I am a java developer and really love to use splunk for production analytics. I have used splunk for more than 5 years and like its simplicity.
This article is a list of best practices that I have learned from good splunk books and over my splunk usage in everyday software projects.
Most of the learnings are common for any software architect however it becomes important to document them for new developers. This makes our life easier in maintaining the software after it goes live in production.
Almost any software becomes difficult change after its live in production. There are some many things you may need to worry about. Using these best practices while implementing splunk in your software will help you in long run.
First Thing First : Keep Splunk Logs Separate
Keep splunk log separate from debug / error logs. Debug logs can be verbose. Define a separate splunk logging file in your application. This will also save you on licensing cost since you will not index unwanted logs.
Use Standard Logging Framework
Use existing logging framework to log to splunk log files. Do not invent your own logging framework. Just ensure to keep the log file separate for splunk. I recommend using Asynchronous logger to avoid any performance issues related to too much logging.
Log In KEY=VALUE Format
Follow Key=Value format in splunk logging – Splunk understands Key=Value format, so your fields are automatically extracted by splunk. This format is also easier to read without splunk too. You may want to follow this for all other logs too.
Use Shorter KEY Names
Keep the key name short – preferable size should be less than 10 characters. Though you may have plenty of disc space. Its better to keep a tap on how much you log since it may create performance problems in long run. At the same time keep them understandable.
Use Enums For Keys
Define a Java Enum for SplunkKeys that has Description of each key and uses name field as the splunk key.
public enum SplunkKey { TXID("Transaction id"); /** * Describes the purpose of field to be splunked - not logged */ private String description; SplunkKey(String description) { this.description = description; } public String getDescription() { return description; } }
Create A Util Class To Log In Splunk
Define a SplunkAudit class in project that can do all splunk logging using easy to call methods.
public class SplunkAudit { private Mapvalues = new HashMap<>(); private static ThreadLocal auditLocal = new ThreadLocal<>(); public static SplunkAudit getInstance() { SplunkAudit instance = auditLocal.get(); if (instance == null) { instance = new SplunkAudit(); auditLocal.set(instance); } return instance; } private SplunkAudit() { } public void add(SplunkKey key, String message) { values.put(key.name(), message); } public void flush() { StringBuilder fullMessage = new StringBuilder(); for (Map.Entry val : values.entrySet()) { fullMessage.append(val.getKey()); fullMessage.append("="); fullMessage.append(val.getValue()); fullMessage.append(" "); } //log the full message now //log.info(fullMessage); } }
Use Async Log Writer
Setup Alerts
Setup Splunk queries as alerts – get automatic notifications.
Index GC Logs in Splunk
Index Java Garbage Collection Logs separately in splunk. The format of GC log is different and it may get mixed with your regular application logs. Therefore its better to keep it separate. Here are some tips to do GC log analytics using splunk.
Log These Fields
Production logs are key to debug problems in your software. Having following fields may always be useful. This list is just the minimum fields, you may add more based on your application domain.
ThreadName
Ensure to give a unique id for each thread. Its super easy to set thread names in java. One line statement will do it.
Thread.currentThread().setName(“NameOfThread-UniqueId”)
Thread Count
java.lang.Thread.activeCount()
Server IP Address
Logging the server IP address become essential when we are running the application on multiple servers. Most enterprise application run cluster of servers. Its important to be able to differentiate errors specific to a special server.
InetAddress.getLocalHost().getHostAddress()
Version
Version of software source from version control is important field. The software keeps changing for various reasons. You need to be able to identify exact version that is currently live on production. You can include your version control details in manifest file of deployable war / ear file. This can be easily done by maven.
Once the information is available in your war/ear file, it can be read in application at runtime and logged in splunk log file.
API Name
Every application performs some tasks. It may be called API or something else. These are the key identifier of actions. Log unique API names for each action in your application. For example
API=CREATE_USER
API=DELETE_USER
API=RESET_PASS
Transaction ID
Transaction id is a unique identifier of the transaction. This need not be your database transaction id. However you need a unique identifier to be able to trace one full transaction.
User ID – Unique Identifier
User identification is important to debug many use cases. You may not want to log user emails or sensitive info, however you can alway log a unique identifier that represents a user in your database.
Success / Failure of Transaction
Ensure you log success or failure of a transaction in the splunk. This will provide you a easy trend of failures in your system. Sample field would look like
TXS=S (Success transaction)
TXS=F (Failed transaction)
Error Code
Log error codes whenever there is a failure. Error codes can uniquely identify exact scenario therefore spend time defining them in your application. Best way is to define enum of ErrorCodes like below
public enum ErrorCodes {
INVALID_EMAIL(1);
private int id;
ErrorCodes(int id) {
this.id = id;
}
public int getId() {
return id;
}
}
Elapsed Time – Time Taken to Finish Transaction
Log the total time take by a transaction. It will help you easily identify the transactions that are slow.
Elapsed Time of Each Major Component in Transaction
If you transaction is made of multiple steps, you must also include time take for each step. This can narrow down your problem to the component that is performing slow.