This is part six of the nine post series on Processing a Log File with Elixir. If you find this article helpful, please subscribe and share 🚀 Looking at our list of things to do, the next step is to group by video_id.
Fetch data from URLSplit each new line into a list itemSplit each line into list itemsFilter items to only contain the URL and TCP_HIT/MISSFind the six-digit video id from the URL, it should be the first integer in HTTP paths of:"example.com/04C0BF/v2/sources/content-owners/""example.com/04C0BF/ads/transcodes/"
- Group by Video ID
- Get Cache Hit and Misses for each Video
- Calculate the Cache Hit Misses
- Sort by video id
- Print to file
Our data is now looking something like this:
[
[video_id: 275211, tcp: "TCP_HIT/200"],
[video_id: 326260, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 351421, tcp: "TCP_HIT/200"],
[video_id: 12410, tcp: "TCP_HIT/200"],
[video_id: 339342, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 160842, tcp: "TCP_HIT/206"],
[video_id: 367665, tcp: "TCP_HIT/200"],
[video_id: 367706, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 312985, tcp: "TCP_MISS/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 398629, tcp: "TCP_HIT/206"],
[video_id: 23261, tcp: "TCP_HIT/200"],
[video_id: 414098, tcp: "TCP_HIT/200"],
[video_id: 12410, tcp: "TCP_HIT/200"],
[video_id: 291986, tcp: "TCP_HIT/200"],
[video_id: 360634, tcp: "TCP_HIT/200"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, tcp: "TCP_HIT/206"],
[video_id: 186001, ...],
[...],
...
]
Looking at the Enum.group_by/3
documentation, see that it returns a map of list of items that match the given key in function call. We can write our test to get started:
defmodule AccessLogAppTest do
test "Groups list by video_id" do
list = [
[video_id: 1, tcp: "TCP_HIT/200"],
[video_id: 1, tcp: "TCP_HIT/200"],
[video_id: 1, tcp: "TCP_HIT/206"],
[video_id: 1, tcp: "TCP_HIT/304"],
[video_id: 2, tcp: "TCP_HIT/200"],
[video_id: 2, tcp: "TCP_HIT/200"],
[video_id: 2, tcp: "TCP_HIT/206"],
[video_id: 2, tcp: "TCP_HIT/304"],
[video_id: 3, tcp: "TCP_HIT/200"],
[video_id: 3, tcp: "TCP_HIT/200"],
[video_id: 3, tcp: "TCP_HIT/206"],
[video_id: 3, tcp: "TCP_HIT/304"],
[video_id: 4, tcp: "TCP_HIT/200"],
[video_id: 4, tcp: "TCP_HIT/200"],
[video_id: 4, tcp: "TCP_HIT/206"],
[video_id: 4, tcp: "TCP_HIT/304"]
]
result = group_by_id(list)
assert result == %{
[video_id: 1] => [
[video_id: 1, tcp: "TCP_HIT/200"],
[video_id: 1, tcp: "TCP_HIT/200"],
[video_id: 1, tcp: "TCP_HIT/206"],
[video_id: 1, tcp: "TCP_HIT/304"]
],
[video_id: 2] => [
[video_id: 2, tcp: "TCP_HIT/200"],
[video_id: 2, tcp: "TCP_HIT/200"],
[video_id: 2, tcp: "TCP_HIT/206"],
[video_id: 2, tcp: "TCP_HIT/304"]
],
[video_id: 3] => [
[video_id: 3, tcp: "TCP_HIT/200"],
[video_id: 3, tcp: "TCP_HIT/200"],
[video_id: 3, tcp: "TCP_HIT/206"],
[video_id: 3, tcp: "TCP_HIT/304"]
],
[video_id: 4] => [
[video_id: 4, tcp: "TCP_HIT/200"],
[video_id: 4, tcp: "TCP_HIT/200"],
[video_id: 4, tcp: "TCP_HIT/206"],
[video_id: 4, tcp: "TCP_HIT/304"]
]
}
end
end
Our function looks like this:
defmodule AccessLogApp.CLI do
...
def group_by_id(list) do
Enum.group_by(list, fn [video_id, _] ->
[video_id]
end)
end
...
end
Our function simply enumerates over the list groups by the video_id that is passed into the key_fun. Easy peas! That is it for today! Tomorrow we will be wrapping this up with step 7 "Calculating by Percentage". If you like, please share and subscribe!
Launch Your Project
Get your project off the ground
with Space-Rocket!
Fill out the form below to get started.