Introduction
I recently interviewed for a Senior Platform Engineer at company that is using Elixir. The code assessment was to process a log file that does the following:
- Parse the
access.log
file hosted as a gist - Get the
TCP_HIT
percentage per Video ID- Extract the Video ID based on these formats:
http://example.com/04C0BF/v2/sources/content-owners/.../<VideoID>/...
http://example.com/04C0BF/ads/transcodes/<VideoID>/...
- Calculate the percentage of
TCP_HIT
responses for each Video ID.
- Extract the Video ID based on these formats:
- Sort by Video ID
- Ensure the Video IDs are treated as integers for sorting.
- Print results to the console or write to a file
- Add tests if there is still time
Here are the exact instructions verbatim:
Write a script that does the following:
1. Parse access.log file hosted as gist
2. Get tcp_hit percentage per video id
3. Sort by video id (video id is an integer)
4. print to console or write to file
5. add tests if there is still time
there are two different url formats to handle:
http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/384055/v201708302148-2273k.mp4+4023936.ts
http://example.com/04C0BF/ads/transcodes/006817/2791522/v0402000243-854x480-HD-1401k.mp4+22355.ts
example line:
1523756544 3 86.45.165.83 1845784 152.195.141.240 80 TCP_HIT/200 1846031 GET http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts - 0 486 "-" "ItubExoPlayer/2.12.9 (Linux;Android 6.0) ExoPlayerLib/2.4.2" 49343 "-"
lines
cache hit/miss:
TCP_HIT
video id:
275211
URL
https://gist.github.com/space-rocket/4980b59d205a33158b27bc10c6a13ed5
Steps to process a log file
So from this information, we break down the task into these steps:
- Fetch data from URL
- Split each new line into a list item
- Split each line into list items
- Filter items to only contain the URL and
TCP_HIT/MISS
- Find the six-digit Video ID from the URL
- It should be the first integer in HTTP paths of:
"example.com/04C0BF/v2/sources/content-owners/"
"example.com/04C0BF/ads/transcodes/"
- It should be the first integer in HTTP paths of:
- Group by Video ID
- Get Cache Hit and Misses for each Video
- Calculate the Cache Hit Misses
- Sort by Video ID
- Print to file
- Get a job (profit)
We start be fetching the data from a Github Gist and splitting each new line into a list item and then splitting each space into a list item. Each line will be separated into something like this:
["1523756544", "3", "86.45.165.83", "1845784", "152.195.141.240", "80",
"TCP_HIT/200", "1846031", "GET",
"http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts",
"-", "0", "486", "\"-\"", "\"ItubExoPlayer/2.12.9", "(Linux;Android",
"6.0)", "ExoPlayerLib/2.4.2\"", "49343", "\"-\"", ""]
Paying attention to the details is important here. The requirements state that the urls come in two formats,
http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/......
and
http://example.com/04C0BF/ads/transcodes/006817/....
This is one of the most important details. There are 5000+ lines, some of them looking something like this:
http://example.com/80C0BF/subtitles/422e3734-382b-4bb3-a753-e3f003d9cdd6.m3u8
We don’t want those URLs. Its also important to note that the Video IDs are integers for the sorting by ID to work. We then group by Video ID, get the cache hit/miss and then calculate the percentage of hits over misses divided by total amount of hits. Finally we sort by Video ID and have the choice on how to present the data.
Getting started
First lets create our app
mix new access_log_app
Install dependencies
Next we are going to need HTTPoison library.
defmodule AccessLogApp.MixProject do
use Mix.Project
...
# Run "mix help deps" to learn about dependencies.
defp deps do
[
# {:dep_from_hexpm, "~> 0.3.0"},
# {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"}
{:httpoison, "~> 1.8"}
]
end
...
end
and then run mix deps.get
mix deps.get
Next, create a directory for the access_log_app namespace
mkdir lib/access_log_app
Now create a file named cli.ex in the access-log-app directory we just created.
defmodule AccessLogApp.CLI do
def fetch() do
HTTPoison.get("https://gist.githubusercontent.com/clanchun/2b5e07cda53718ccbf64f62fb31900c8/raw/64be7f018973717dd5faa7be2bfb817f50ed05bb/access.log")
|> handle_response
end
def handle_response({_, %{status_code: status_code, body: body}}) do
{
status_code |> check_for_error(),
body
}
end
def check_for_error(200), do: :ok
def check_for_error(_), do: :error
end
We can run the command and iex -S mix and will see we get a heap of data.
iex -S mix
recompile && AccessLogApp.CLI.fetch
iex(1)> recompile && AccessLogApp.CLI.fetch
Compiling 1 file (.ex)
{:ok,
"#Fields: timestamp time-taken c-ip filesize s-ip s-port sc-status sc-bytes cs-method cs-uri-stem - rs-duration rs-bytes c-referrer c-user-agent customer-id x-ec_custom-1\n1523756544 3 86.45.165.83 1845784 152.195.141.240 80 TCP_HIT/200 1846031 GET http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts - 0 486 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 6.0) ExoPlayerLib/2.4.2\" 49343 \"-\" \n1523756611 58 86.165.81.111 3364824 152.195.141.240 80 TCP_HIT/200 3365071 GET http://example.com/04C0BF/v2.... <> ...}
## Creating Environment Variables with Elixir
Update: It turns out following is an anti-pattern. So instead, you shoud pass in the URL as function parameter. The URL is long and we might want to change it later for other projects. So lets put the URL in an environment variable. Create a file named config.exs
inside a directory named config. Inside the file import Config module and set the github_url
variable for our access_log_app.
import Config
config :access_log_app, github_url: "https://gist.githubusercontent.com/clanchun/2b5e07cda53718ccbf64f62fb31900c8/raw/64be7f018973717dd5faa7be2bfb817f50ed05bb/access.log"
We can then use it in our cli.ex file
defmodule AccessLogApp.CLI do
@github_url Application.get_env(:access_log_app, :github_url)
def fetch() do
HTTPoison.get("#{@github_url}")
|> handle_response
end
def handle_response({_, %{status_code: status_code, body: body}}) do
{
status_code |> check_for_error(),
body
}
end
def check_for_error(200), do: :ok
def check_for_error(_), do: :error
end
Next steps
In this post we covered the processing log file requirements, outlined the steps, created a our Elixir project and fetched our data from an external URL. We also created an environment variable, so our code is easier to reuse. In the next post we will take the line separated text and split each new line into a list item. Subscribe to receive updates!
Launch Your Project
Get your project off the ground
with Space-Rocket!
Fill out the form below to get started.