What determines the timestamp in Splunk?

Current Splunk instances have a configuration issue that needs to be mitigated before the end of 2019.

On the 21st of November, Adarma (as well as other companies through various channels) were made aware of a Splunk issue dubbed the #Y2K20 Bug, which requires immediate remediation. Without patching this issue, Splunk would effectively stop ingesting data correctly as the world moves into 2020. Although there has been official communication on this matter through Splunk Docs, Blogs, Partner emails, and various channels such as Slack and /r/splunk, we at Adarma would like to help and offer guidance where we can.

TL:DR There’s a significant configuration issue within almost all Splunk software regarding two-digit year timestamps that requires urgent remediation. Logs ingested on or after 01/01/20 may not be ingested correctly. Update your software or implement a short-term fixed datetime.xml file immediately.

Disclaimer: These are Adarma’s conclusions based upon the most recent official information from Splunk and where noted our own observations.

We recommend reading in full the Splunk documentation on the issue:​
https://docs.splunk.com/Documentation/Splunk/8.0.0/ReleaseNotes/FixDatetimexml2020​

Source: https://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/FixDatetimexml2020

A Brief History of Time[Stamp Extraction]

To understand the issue, we need to understand a little about timestamp extraction itself. At a basic level, Splunk indexes machine data to make it searchable using a powerful syntax of commands. To make the data searchable, one of the things Splunk has to index is the event’s timestamp, which is likely to be present in your log.

Splunk is able to collect event data and perform timestamp extraction without user intervention in most cases. This is due to a file called datetime.xml, which is an XML file containing a number of regex expressions, that defines time and date elements. The majority of date-time data can be extracted in this manner but relies on the ‘auto’ part of the indexing pipeline helping the system guess the format.

It is recommended that you define your own date-time configuration within your props.conf files, setting at least the “Magic 6” configuration options to make Splunk carry out explicit timestamp extractions (more on this later). Alternatively, for more convoluted timestamps, users can supply their own datetime.xml file for use.

Timestamp extraction is used to automatically line-break events, as the setting BREAK_ONLY_BEFORE_DATE is set to be true by default.

The Problem

The version of datetime.xml that was shipped with Splunk up to 8.0.0 had two flaws that grew more apparent towards the end of this year (2019).
1. The regexes used in datetime.xml to determine two-digit years do not recognise any timestamp year components starting with ‘2’ (i.e. 01/01/20 00:00:00)
2. The regexes used in datetime.xml to determine epoch timestamps do not recognise any value over 1600000000 (Sunday, 13 September 2020 12:26:40)

This would mean that if Splunk automatically attempted to extract logs with timestamps with only two-digit years, such as 01/01/20 00:00:00, it would not recognise this as a proper timestamp.

Add Data page ingesting some dummy data using Splunk 8.0.0

In this example the timestamp extraction fails for the entries set in 2020, resorting to the current date for these entries, as well as a detected timestamp.

All the events are clustered together as Splunk has failed to detect a timestamp to line break against. The next event, the 31st of December 2019, was able to be detected and parsed correctly. (For the purposes of this test, I had to set MAX_DAYS_HENCE and MAX_DAYS_AGO to tell Splunk to allow time differences between events longer than it normally expects — otherwise, all my timestamps are parsed incorrectly).

Straight away you can see how this can be a major issue with your data ingestion. Other issues on top of this can relate to incorrect bucket rollover due to bad timestamping, incorrect retention and incorrect search results due to having the wrong timestamp assigned to the event.

This issue also applies to events using Unix epoch over 1600000000, as the regex in place only captures numbers up to 1599999999 using this regex: (?:1[012345]|9)\d{8}

What is affected?

Splunk Cloud, Splunk Light and Splunk Enterprise in all configurations are affected by this issue. Universal Forwarders are affected if they’re:

  1. using INDEX_EXTRACTIONS in props.conf to process structured data.
  2. If force_local_processing has been set in props.conf
  3. If a monitored input encounters an unknown file type.

The Solution

Currently, there are three possible solutions provided by Splunk.

  1. Patch your version of Splunk to the latest bugfix version of Splunk. At the time of writing, only 7.3.3 has been released with further bugfix versions on the way.
  2. Replace your datetime.xml file across your infrastructure with one supplied by Splunk
  3. Manually edit your existing datetime.xml files as per instructions supplied by Splunk

Another solution provided is in the form of a temporary app to deploy across your infrastructure.

Splunk Cloud instances are being fixed automatically by Support, but any forwarders that forward data into Splunk Cloud will need to be remediated as per above.

For my own instance, I swapped the datetime.xml file with the supplied patched XML file. After restarting my instance, the example data from before is parsed properly.

Fixed Datetime.xml file using Splunk 8.0.0

If you are manually updating the file - either by replacing it entirely or replacing the specific lines - be aware that Splunk’s file integrity checker will start throwing InstalledFileHashChecker warning messages. If patched through a version upgrade, the upgrade will come with an updated manifest file. To update this yourself, you can update the datetime.xml entry in your manifest file to have the updated file’s sha256 value, though the messages themselves are benign.

One additional workaround we would recommend, on top of patching your datetime.xml file, is to specify your Magic 6 configuration settings per sourcetype. By default, Splunk resorts to using datetime.xml for datetime extraction. However, if TIME_FORMAT is supplied in your props.conf file, Splunk uses these settings to determine how to process your timestamp using strptime() format. This is highly recommended as this requires less processing by your indexing cluster than using datetime.xml, as your indexer is provided explicit instructions on what your timestamp looks like. The other settings (TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD, SHOULD_LINEMERGE, LINE_BREAKER, and TRUNCATE​) should also be implemented for every sourcetype as it will tell Splunk explicitly where you are expecting a timestamp to be in your event.

Be aware that due to how Splunk operates, any data that is ingested prior to this fix being put in place in your system WILL NOT be fixed-forward as it has already been indexed at this point. This data would have to be re-indexed to correct this matter.

References

As well as the below links, there is active discussion in the #splunky2k Splunk Community Slack Channel — you can sign up for the Slack Group at http://splk.it/slack

YouTube Video of the WebEx Event Adarma Hosted

Speaker: Harry McLaren

Adarma Slide Deck Detailing the Situation

What timestamp does Splunk use?

Timestamps are stored in UNIX time This moment in time is sometimes referred to as epoch time. UNIX time appears as a series of numbers, for example 1518632124 . You can use any UNIX time converter to convert the UNIX time to either GMT or your local time.

Is timestamp a default field of Splunk event?

A default field that represents time information in an event. Most events contain timestamps. In cases where an event does not contain timestamp information, Splunk Enterprise attempts to assign a timestamp value to the event at index time.

How do you define a timestamp?

A timestamp is the current time of an event that a computer records. Through mechanisms, such as the Network Time Protocol, a computer maintains accurate current time, calibrated to minute fractions of a second.

How do I change time stamp on Splunk?

If you have Splunk Cloud Platform and need to modify timestamp extraction, use a heavy forwarder to ingest the data and send it to the Splunk Cloud Platform instance. The heavy forwarder lets you specify configurations that extract the timestamps.