Introduction
This is part five of the nine post series on Processing a Log File with Elixir. If you find this article helpful, please subscribe and share 🚀 In the last post, Filtering a List Using Regex Match and Elixir, we trimmed our list to only contain the paths: "example.com/04C0BF/v2/sources/content-owners/"
and "example.com/04C0BF/ads/transcodes/"
. Looking at our remaining steps, we see that we still need to trim the URL to just be the 4
, 5
, or 6
digit video_id
.
Fetch data from URLSplit each new line into a list itemSplit each line into list itemsFilter items to only contain the URL andTCP_HIT/MISS
Find the six-digit Video ID from the URL, it should be the first integer in HTTP paths of:"example.com/04C0BF/v2/sources/content-owners/"
"example.com/04C0BF/ads/transcodes/"
- Group by Video ID
- Get Cache Hit and Misses for each Video
- Calculate the Cache Hit Misses
- Sort by Video ID
- Print to file
Our data is still looking something like this:
[
....
[
http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts",
tcp: "TCP_HIT/200"
],
[
http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/326260/v20169101326-1256x544-3063k.mp4+3713710.ts",
tcp: "TCP_HIT/200"
],
[
http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
tcp: "TCP_HIT/200"
],
[
http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
tcp: "TCP_HIT/206"
],
...
]
Write a Test
Let's start by writing a simple test:
defmodule AccessLogAppTest do
test "Gets first integer in URL path" do
data = [
[
tcp_hit: "TCP_HIT/200",
http: "http://example.com/ABCD/a/b/c/123456/somefile.mp4.ts"
],
[
tcp_hit: "TCP_HIT/206",
http: "http://example.com/ABCD/e/f/789012/someotherfile.mp4.ts"
]
]
result = get_id_from_url_path(data)
assert result == [
[ tcp_hit: "TCP_HIT/200",
video_id: 123456
],
[
tcp_hit: "TCP_HIT/206",
video_id: 789012
]
]
end
end
From the above test we see that we need to create a function that takes a URL
and finds the video_id
, which is the first integer in the URL
path. To do this, we can create a function that takes a string and splits it by "/
" and only returns the first chunk that contains an integer using a regex.
Solution
defmodule AccessLogApp.CLI do
...
def get_id_from_url_path(list) do
Enum.map(list, fn entry ->
Enum.map(entry, fn items ->
case items do
{:http, url} ->
[video_id | _] = url
|> String.split("/")
|> Enum.map(fn keep_if_int ->
case Regex.match?(~r(\b^\d{6}\b|\b^\d{5}\b|\b^\d{4}\b), keep_if_int) do
true -> keep_if_int
_ -> ""
end
end)
|> Enum.filter(& !is_blank(&1))
{:video_id, elem(Integer.parse(video_id), 0)}
{k, v} -> {k, v}
_ -> ""
end
end)
end)
end
...
def is_blank(nil), do: true
def is_blank(val) when val == %{}, do: true
def is_blank(val) when val == [], do: true
def is_blank(val) when is_binary(val), do: String.trim(val) == ""
def is_blank(_val), do: false
...
end
iex -S mix
iex(1)AccessLogApp.CLI.fetch
Compiling 1 file (.ex)
{:ok,
[
[video_id: 275211, tcp: "TCP_HIT/200"],
[video_id: 326260, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 351421, tcp: "TCP_HIT/200"],
[video_id: 12410, tcp: "TCP_HIT/200"],
[video_id: 339342, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 367665, tcp: "TCP_HIT/200"],
[video_id: 367706, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 312985, tcp: "TCP_MISS/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 23261, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 12410, tcp: "TCP_HIT/200"],
[video_id: 291986, tcp: "TCP_HIT/200"],
[video_id: 360634, tcp: "TCP_HIT/200"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, ...],
[...],
...
]}
Conclusion
In the post, we finally distilled a URL value to an integer value, all while keeping the remaining key values in the list intact. In the next post, we group the TCP_HITS/MISS
by their video_id
. If you like this post, please share and subscribe!
Launch Your Project
Get your project off the ground
with Space-Rocket!
Fill out the form below to get started.