Elixir String Split

Elixir String Split

This is part two of nine post series on processing a log file with Elixir. If you find this article helpful, please subscribe and share 🚀

In the first post on processing a log file with Elixir, we outlined the steps needed to retrieve the cache hit/miss percentage of each video_id:

  1. Fetch data from URL
  2. Split each new line into a list item
  3. Split each line into list items
  4. Find the six digit video id from the URL, it should be the first integer in http paths of:
    “example.com/04C0BF/v2/sources/content-owners/”,
    “example.com/04C0BF/ads/transcodes/”
  5. Group by Video ID
  6. Get Cache Hit and Misses for each Video
  7. Calculate the Cache Hit Misses
  8. Sort by video id
  9. Print to file

We are now getting data that looks like this:

["#Fields: timestamp time-taken c-ip filesize s-ip s-port sc-status sc-bytes cs-method cs-uri-stem - rs-duration rs-bytes c-referrer c-user-agent customer-id x-ec_custom-1",
  "1523756544 3 86.45.165.83 1845784 152.195.141.240 80 TCP_HIT/200 1846031 GET http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts - 0 486 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 6.0) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756611 58 86.165.81.111 3364824 152.195.141.240 80 TCP_HIT/200 3365071 GET http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/326260/v20169101326-1256x544-3063k.mp4+3713710.ts - 0 616 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTT Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756639 3 88.110.35.157 2227424 152.195.141.240 80 TCP_HIT/200 2227671 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 604 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTM Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756653 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 262442 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 519 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756653 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 393359 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 524 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756653 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 393360 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 525 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756653 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 526 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756653 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 526 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756654 0 81.132.50.208 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 526 \"-\" \"Roku/DVP-8.0 (068.00E04155A)\" 49343 \"-\" ",
  "1523756679 0 82.132.244.184 321104 152.195.141.240 80 TCP_HIT/200 321350 GET http://example.com/04C0BF/v2/sources/content-owners/kinonation/351421/v201703152222-284k.mp4+3441187.ts - 0 615 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 6.0.1) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756747 704 178.153.185.211 3790456 152.195.141.240 80 TCP_HIT/200 3790703 GET http://example.com/04C0BF/v2/sources/content-owners/digital-media-rights/012410/v0308223807-2512k.mp4+2610026.ts - 0 525 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 7.1.1) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756748 7 95.44.108.250 2939004 152.195.141.240 80 TCP_HIT/200 2939251 GET http://example.com/04C0BF/v2/sources/content-owners/drg/339342/v201701052300-872x486-1816k.mp4+612144.ts - 0 511 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 7.0) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756800 41 90.204.114.187 2081912 152.195.141.240 80 TCP_HIT/200 2082159 GET http://example.com/04C0BF/v2/sources/content-owners/gravitas/414098/v201802131917-2011k.mp4+5162390.ts - 0 601 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTM Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756803 1 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 262442 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 512 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756803 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419733 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 517 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756803 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419734 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 518 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756803 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 519 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756803 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 519 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756804 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 519 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756804 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 398593 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 519 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756804 0 84.92.136.15 3177012 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/go-digital/160842/v1220040557-2420k.mp4+2300006.ts - 0 519 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756806 1 156.197.120.108 962184 152.195.141.240 80 TCP_HIT/200 962430 GET http://example.com/04C0BF/v2/sources/content-owners/the-asylum/367665/v201706110929-596k.mp4+31990.ts - 0 627 \"-\" \"AppleCoreMedia/1.0.0.15B202 (iPhone; U; CPU OS 11_1_2 like Mac OS X; en_gb)\" 49343 \"-\" ",
  "1523756808 41 62.30.222.127 2229116 152.195.141.240 80 TCP_HIT/200 2229363 GET http://example.com/04C0BF/v2/sources/content-owners/the-asylum/367706/v201706121717-2103k.mp4+2913660.ts - 0 511 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 7.0) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756810 123 90.204.114.187 2432156 152.195.141.240 80 TCP_HIT/200 2432403 GET http://example.com/04C0BF/v2/sources/content-owners/gravitas/414098/v201802131917-2011k.mp4+5232560.ts - 0 601 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTM Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756817 195 98.192.88.74 2602296 152.195.141.240 80 TCP_MISS/200 2602503 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/312985/v201604170144-640x352-2251k.mp4+892808.ts - 485 2603022 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 7.0) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756864 119 90.204.114.187 2081912 152.195.141.240 80 TCP_HIT/200 2082159 GET http://example.com/04C0BF/v2/sources/content-owners/gravitas/414098/v201802131917-2011k.mp4+5162390.ts - 0 601 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTM Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756872 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 262442 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 521 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756872 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 393359 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 526 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756872 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 393360 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 527 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756872 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 528 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756872 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 528 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756873 0 90.202.209.109 2227424 152.195.141.240 80 TCP_HIT/206 393361 GET http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/398629/v201711170053-2061k.mp4+4582327.ts - 0 528 \"-\" \"Roku/DVP-8.1 (518.01E04090A)\" 49343 \"-\" ",
  "1523756875 121 156.219.85.204 1481440 152.195.141.240 80 TCP_HIT/200 1481687 GET http://example.com/04C0BF/v2/sources/content-owners/digital-media-rights/023261/v0312201216-1918k.mp4+0.ts - 0 490 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 6.0.1) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756878 374 90.204.114.187 2432156 152.195.141.240 80 TCP_HIT/200 2432403 GET http://example.com/04C0BF/v2/sources/content-owners/gravitas/414098/v201802131917-2011k.mp4+5232560.ts - 0 601 \"-\" \"Mozilla/5.0 (Linux; Android 5.1.1; AFTM Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36\" 49343 \"-\" ",
  "1523756901 5 178.153.185.211 3790456 152.195.141.240 80 TCP_HIT/200 3790703 GET http://example.com/04C0BF/v2/sources/content-owners/digital-media-rights/012410/v0308223807-2512k.mp4+2610026.ts - 0 525 \"-\" \"ItubExoPlayer/2.12.9 (Linux;Android 7.1.1) ExoPlayerLib/2.4.2\" 49343 \"-\" ",
  "1523756919 257 89.240.182.239 4697556 152.195.141.240 443 TCP_HIT/200 4697803 GET http://example.com/04C0BF/v2/sources/content-owners/gravitas/291986/v0813002204-1280x720-HD-3312k.mp4+1900033.ts - 0 696 \"-\" \"Kodi/17.6 (Linux; Android 7.1.2; H96 PRO+ Build/NHG47L) Android/7.1.2 Sys_CPU/aarch64 App_Bitness/64 Version/17.6-Git:20171119-ced5097\" 49343 \"-\" ",
  "1523756974 550 172.114.205.131 3179080 152.195.141.240 443 TCP_HIT/200 3179327 GET http://example.com/04C0BF/v2/sources/content-owners/indie-rights-films/360634/v201705030503-1981k.mp4+270061.ts - 0 809 \"https://tubitv.com/movies/360634/the_test\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36\" 49343 \"-\" ",
  "1523757000 1 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 262442 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 486 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757000 1 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419733 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 491 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757000 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419734 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 492 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757000 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757001 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757001 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757010 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757010 1 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757010 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  "1523757010 0 86.145.212.248 4501096 152.195.141.240 80 TCP_HIT/206 419735 GET http://example.com/04C0BF/v2/sources/content-owners/artcast/186001/v0205061236-3219k.mp4+1980019.ts - 0 493 \"-\" \"Roku/DVP-8.0 (048.00E04184A)\" 49343 \"-\" ",
  ...]}

Looking at our steps, the next one is to split each line into list items

We can write a simple test to get started.

defmodule CliTest do
  use ExUnit.Case

  import Assessment.Cli, only: [
    ...
    split_by_space: 1
  ]

  ...

  test "splits lists by spaces" do
    data = [
      "0 a b c",
      "1 a b c",
      "2 a b c"
    ]
    result = split_by_space(data)
    assert result == [
      ["0", "a", "b", "c"],
      ["1", "a", "b", "c"],
      ["2", "a", "b", "c"]
    ]
  end
end

Writing the test makes iterations much faster as we go along. Instead of running our code against a log file of 5000 entries, we will be developing with a very small data set that compiles and runs quickly. Once the function is hammered out we can run it on the actual log file data.

To achieve our desired result we can use Enum.map which takes an enumerable and function. The function simply runs String.split on each string and putting each string into quotation marks and adds them to a list.

defmodule AccessLogApp.CLI do
  ...
  def split_by_space(lines) do
    lines
    |> Enum.map(fn x ->
        String.split(x, " ")
    end)
  end
  ...

  ...
end

We run the test and they pass!

access_log_app mix test
Compiling 1 file (.ex)
....

Finished in 0.06 seconds (0.00s async, 0.06s sync)
1 doctest, 3 tests, 0 failures

Randomized with seed 756677

Three ways to write anonymous functions

Functions and anonymous functions can be written several ways:

option 1, separate function

def split_by_space(lines) do
    lines
    |> Enum.map(x)

    x = fn line ->
      line
      |> String.split(" ")
    end
  end

Option 1 creates a separate function with the name ‘x’. I name it ‘x’ because its just be used for our map function, but writing ‘x’ as a separate function outside of the Enum.map can get messy and polluted if they were more map functions called later.

def split_by_space(lines) do
    lines
    |> Enum.map(fn x ->
        String.split(x, " ")
    end)
  end

Option 2 keeps the ‘x’ function scoped to the Enum.map function. But we can even get even more fancy and succinct. In comes the capture & operator.

def split_by_space(lines) do
    lines
    |> Enum.map(&String.split(&1," "))
  end

option 3, using capture operator. The capture is how the ampersand works, it is “capturing” the current function, in this cause Enum.map. So the &String.split has now “captured” the Enum.map and its arguments, which is the ‘lines’ passed into the split_by_space function. &String.split then passes in the “captured” Enum.map lines as ‘&1’ and performs the string split with the quotation marks. It can be confusing to grasp (for me it has been) capture operators, but these three examples should make it clear on how the capture operator can used in your functions.

I tend to go with option 2 and refactor into option 3. All three ways pass the test. There might be instances where want an option 1 function that can be called outside the function that is calling it, for instance the function might be called by multiple functions.

In this post we introduced Elixir’s built in functions to loop (or enumerate in Elixir terms) over a list of items, and split each entry based on spaces. We started with a simple test and then was easily able to experiment with variations of writing a function with the same result three different ways. In the next post we will filter our data to only get the items we want from each list item. If you would like to be notified of new articles, be sure to share and subscribe!

Published
Categorized as Elixir

By mchavez

Michael Chavez is a web and software developer from San Francisco, California. His experience spans almost a decade, working with San Francisco Bay Area design and development agencies, and high-profile Silicon Valley start-ups and enterprises. After studying Multimedia at City College of San Francisco, Michael self-taught himself programming languages such as JavaScript, Node.js, PHP and founded the web development consultancy, Space-Rocket. Michael is currently working with the Elixir programming language.

Leave a comment

Your email address will not be published. Required fields are marked *