Getting Items from List with Elixir

This is part three of nine post series on processing a log file with Elixir. If you find this article helpful, please subscribe and share 🚀 In the last post on processing a log file with Elixir, we split each item in a list seperated by spaces into list items inside a list. Looking at our steps defined in the first post on how to process a log file with Elixir, we see that next step is filter the list items to only contain the URL and TCP_HIT/MISS.

  1. Fetch data from URL
  2. Split each new line into a list item
  3. Split each line into list items
  4. Filter items to only contain the URL and TCP_HIT/MISS
  5. Find the six-digit Video ID from the URL, it should be the first integer in HTTP paths of:
    • "example.com/04C0BF/v2/sources/content-owners/"
    • "example.com/04C0BF/ads/transcodes/"
  6. Group by Video ID
  7. Get Cache Hit and Misses for each Video
  8. Calculate the Cache Hit Misses
  9. Sort by Video ID
  10. Print to file

Our data is now looking something like this:

terminal

{:ok,
 [
   ...
   ["1523756639", "3", "88.110.35.157", "2227424", "152.195.141.240", "80",
    "TCP_HIT/200", "2227671", "GET",
    "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
    "-", "0", "604", "\"-\"", "\"Mozilla/5.0", "(Linux;", "Android", "5.1.1;",
    "AFTM", "Build/LVY48F;", "wv)", "AppleWebKit/537.36", "(KHTML,", "like",
    "Gecko)", "Version/4.0", "Chrome/55.0.2883.91", "Mobile", "Safari/537.36\"",
    "49343", "\"-\"", ""],
   ["1523756653", "0", "81.132.50.208", "2227424", "152.195.141.240", "80",
    "TCP_HIT/206", "262442", "GET",
    "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
    "-", "0", "519", "\"-\"", "\"Roku/DVP-8.0", "(068.00E04155A)\"", "49343",
    "\"-\"", ""],
   ...
 ]}

We want only the "TCP_HIT/200" and the string beginning with http, which contains the video ID that we will later extract from the URL. Lets write a simple test.

test/access_log_app_test.exs

defmodule AccessLogAppTest do
  use ExUnit.Case
  doctest AccessLogApp

  ...
  test "filters data an array containing strings to match" do
    data = [
      ["a1", "TCP_HIT/200", "http://example.com/ABCD/a/b/c/123456/somefile.mp4.ts"],
      ["a2", "TCP_HIT/206", "http://example.com/ABCD/e/f/789012/someotherfile.mp4.ts"]
    ]
    result = AccessLogApp.CLI.filter_data(data, ["TCP_HIT", "http"])
    assert result == [
      [["TCP_HIT/200"], ["http://example.com/ABCD/a/b/c/123456/somefile.mp4.ts"]],
      [["TCP_HIT/206"], ["http://example.com/ABCD/e/f/789012/someotherfile.mp4.ts"]]
    ]
  end
end

During the process of writing the test, the expected data structure is defined.

lib/access_log_app/CLI.ex

defmodule AccessLogApp.CLI do
  ...
  def filter_data(list, strings) do
    Enum.map(list, fn item ->
      Enum.map(strings, fn string ->
        {
          String.to_atom(String.downcase(string)),
          Enum.at(Enum.filter(item, &String.contains?(&1, string)), 0)
        }
      end)
    end)
  end
  ...
end

In our CLI.ex file we create a function that takes the list and strings. The Enum.map function takes each item in the list and runs the anonymous function x on it. The nested Enum.map loops through the given strings and runs the Enum.filter function on each row and checks if the items in each row contain the string. If it does, it returns that item, if it doesn't, the item is discarded. The result is a list of list items containing only the data specified by our matching strings. The result looks like this:

terminal

[
  [
    http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts",
    tcp: "TCP_HIT/200"
  ],
  [
    http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/326260/v20169101326-1256x544-3063k.mp4+3713710.ts",
    tcp: "TCP_HIT/200"
  ],
  [
    http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
    tcp: "TCP_HIT/200"
  ],
  [
    http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
    tcp: "TCP_HIT/206"
  ],
  [
    http: "http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts",
    ...
  ],
  [...],
  ...
]

In this post we showed a way to get only the items we want from a list. We gave a function a list, and then a list of strings that we want to filter with and return the key value pair we wanted. The result is added to a list of list items. But what if we wanted to return a list of maps instead? Check out Elixir - List vs Maps blog post to see how to add to list, map or tuple. If you like this post, please share and subscribe!

Launch Your Project

Get your project off the ground with Space-Rocket! Fill out the form below to get started.

Space-Rocket pin icon