Splunk is a high performance, scalable software server written in C/C++ and Python. It indexes and searches logs and other IT data in real time. Splunk works with data generated by any application, server or device. The Splunk Developer API is accessible via REST, SOAP or the command line. After downloading, installing and starting Splunk, you’ll find two Splunk Server processes running on your host, splunkd and splunkweb.
splunkd is a distributed C/C++ server that accesses, processes and indexes streaming IT data and also handles search requests. splunkd processes and indexes your data by streaming it through a series of pipelines, each made up of a series of processors. Pipelines are single threads inside the splunkd process, each configured with a single snippet of XML. Processors are individual, reusable C/C++ or Python functions that act on the stream of IT data passing through a pipeline. Pipelines can pass data to one another via queues. splunkd supports a command line interface for searching and viewing results.
splunkweb is a Python-based application server providing the Splunk Web user interface. It allows users to search and navigate IT data stored by Splunk servers and to manage your Splunk deployment through the browser interface. splunkweb communicates with your web browser via REST and communicates with splunkd via SOAP.
- You can receive data from various network ports by running scripts for automating data forwarding
- You can monitor the files coming in and detect the changes in real time
- The forwarder has the capability to intelligently route the data, clone the data and do load balancingon that data before it reaches the indexer. Cloning is done to create multiple copies of an event right at the data source where as load balancing is done so that even if one instance fails, the data can be forwarded to another instance which is hosting the indexer
- As I mentioned earlier, the deployment server is used for managing the entire deployment, configurations and policies
- When this data is received, it is stored in an Indexer. The indexer is then broken down into different logical data stores and at each data store you can set permissions which will control what each userviews, accesses and uses
- Once the data is in, you can search the indexed data and also distribute searches to other search peers and the results will merged and sent back to the Search head
- Apart from that, you can also do scheduled searches and create alerts, which will be triggered when certain conditions match saved searches
- You can use saved searches to create reports and make analysis by using Visualization dashboards
- Finally you can use Knowledge objects to enrich the existing unstructured data
- Search heads and Knowledge objects can be accessed from a Splunk CLI or a Splunk Web Interface. This communication happens over a REST API connection
Different Stages In Data Pipeline
There are primarily 3 different stages in Splunk:
Data Input stage
Data Storage stage
Data Searching stage
Data Input Stage
In this stage, Splunk software consumes the raw data stream from its source, breaks it into 64K blocks, and annotates each block with metadata keys. The metadata keys include hostname, source, and source type of the data. The keys can also include values that are used internally, such as character encoding of the data stream and values that control the processing of data during the indexing stage, such as the index into which the events should be stored.
Data Storage Stage
Data storage consists of two phases: Parsing and Indexing.
- In Parsing phase, Splunk software examines, analyzes, and transforms the data to extract only the relevant information. This is also known as event processing. It is during this phase that Splunk software breaks the data stream into individual events. The parsing phase has many sub-phases:
- Breaking the stream of data into individual lines
- Identifying, parsing, and setting timestamps
- Annotating individual events with metadata copied from the source-wide keys
- Transforming event data and metadata according to regex transform rules
- In Indexing phase, Splunk software writes parsed events to the index on disk. It writes both compressed raw data and the corresponding index file. The benefit of Indexing is that the data can be easily accessed during searching.
Data Searching Stage
This stage controls how the user accesses, views, and uses the indexed data. As part of the search function, Splunk software stores user-created knowledge objects, such as reports, event types, dashboards, alerts and field extractions. The search function also manages the search process.
If you look at the below image, you will understand the different data pipeline stages under which various Splunk components fall under.
There are 3 main components in Splunk:
- Splunk Forwarder, used for data forwarding
- Splunk Indexer, used for Parsing and Indexing the data
- Search Head, is a GUI used for searching, analyzing and reportingt
For More Information Visit Mindmajix