This episode is for members only

Sign up to access "Build a Livewire CSV Importer" right now.

Get started
Already a member? Sign in to continue
Playing
15. Chunking through records

Transcript

00:00
So by the end of this episode, we'll have created a generator that can, where we can iterate over batches of CSV records, whether that's a hundred, a thousand, it doesn't really matter, but they're not going to be all loaded into memory, which is what we looked at in the last episode.
00:15
So we're going to go ahead and upload a file. The first thing that we're going to do is look at what we get from this. Now remember we used this to go ahead and count on them CSV records, so that will give us the total, but let's actually see what we get when we use this statement and create method,
00:31
and we process the CSV that we've read. So under the import section, I'm just going to go ahead and do this up here, and I'm going to die dump on the result we get from back from this, just so we can see what we've got here. So I'm going to give this a refresh, head over to import, we'll
00:46
import the same file here, and let's just see what we get when we import this. So let's choose and map up all of these, hit import, and there we go. So we get back a result set. Now the records, which we're going to use a method on to grab the records, has
01:00
an iterator. So this iterator allows us to use that count method to tell us how many we have in here, but what we're going to do is we're going to iterate over these, build up a generator with these chunked, and again they won't all be loaded into
01:14
memory. We imagine we have a million records in a CSV file. If we were to create an array of a thousand chunks, and we had a huge amount of different chunks inside of that array, we're going to run it run into a memory issue.
01:27
So let's go ahead and just get started on doing this. I'm going to pull over some pre-made code, I'm going to walk you through it, but we're not going to go into the specifics of generators, because that's for another lesson. So let's first of all just have a look at
01:40
this for each on what we get back from this. So we can actually reference the CSV records property here. So let's say for each this CSV records as record, and with this what we need to do is to
01:58
actually get them records that we saw over in the browser inside of that object. We just need to use the get record. So I'm just going to var dump out on, in fact let's go ahead and yeah we'll var dump and see what happens. I don't think this is actually going to appear,
02:12
but we'll go ahead and do this anyway and then we'll just die dump here. Okay so let's die dump on something and let's come over and try this again just so we're clear how this is working and what we're getting back. So again let's just map all of these up and hit import and yeah we don't
02:28
actually see anything. So essentially what's going to happen is as we iterate over these, each of these records is going to be that CSV record as an array, but it's only going to give us an individual record. What we want are chunks of records, we
02:42
don't want to create a job just to process a single CSV item. That's kind of the point here. Okay so let's just get rid of this, we could always do that another way, but hopefully that makes sense. And here we're going to go ahead and use a chunk iterator
02:56
to go and grab the iterator that we've got, the iterator instance we've got, build up a chunk using a generator. So we're going to go ahead and create our class in the root directory. Let's just create a helpers folder, it doesn't really matter too much for now, and I'm going to create
03:11
a chunk iterator file. Now I'm going to pull over the code for this, but we're going to go ahead and look through this properly, and I think I put this inside of utilities before, so let's change that to helpers.
03:22
Okay so this chunk iterator takes in an iterator which we can in php iterate over, and it takes the amount of chunks that we want to iterate over. So the way that this is going to work in code looks a bit like this. So let's go ahead and say iterator, or
03:38
let's just call this chunks. We're going to new up a chunk iterator that we've just created, we're going to pass through the records, so it will be this csv records get records, which we just saw, and the second argument to this
03:55
is going to be the amount that we want batched by. Now we've only got 100 in total, so let's batch this by 10, which obviously means we're going to get 10 chunks of 10 records, which of course make up 100, and then from this we're going to go ahead and call this
04:10
get method. So let's say get. So I wrote this code out previously, so you can go ahead and grab this from the resources if you need to just copy and paste it over into your app. So what's happening here then? Well when we pass in the iterator, and we pass in the amount of chunks
04:25
that we want to iterate by, we're going to go ahead and create out an array. So we're still returning an array per chunk, but what we're doing is we are iterating over the items that we have passed in, which are just all of the csv records. Then what we're going ahead and doing is
04:40
setting the chunk, so setting or appending to this array, the current item in this iterator. So imagine we add 100 records, the first one would be represented by current, then we call next, go on to the next
04:55
iterator item, and then that gives us the next one, and we build this up into this chunk over and over again. So this will run 10 times, and then obviously we don't want it to run more than 10 times because we only want chunks of 10.
05:07
So what's going to happen there then is if the count equals the chunks, we're going to go ahead and yield the entire chunk, which we did back in the last part. Remember we yielded the item, so this will just kind of pop off, if you like, the chunk that we built up, and then it
05:23
will reset the chunk. And then down here we just go ahead and yield the chunk anyway, if we have a positive count on that. So essentially what this is doing is iterating 10 times, building up a chunk, kind of returning it within this generator,
05:37
then it's going on to the next batch of 10, returning it, kind of, it's not actually returning it, but basically building that up. So this whole chunk thing that we're building up, this whole collection of chunks, are never always going to be all in
05:51
memory at the same time. So now that we've got this and we can call this get method, let's go and see what we get from this. So let's just die dump on chunks here, and let's go ahead and go through the whole import process. So we'll import this.
06:04
Remember we're only working with 100 records here, you wouldn't actually need to chunk these necessarily, but we want to do this anyway in case we get a huge amount of records. So let's pull in the email, hit import, and there we go. So we get
06:17
this generator returned back to us, which is what we looked at in the last episode. We can't see a collection of the items here because it's a generator, they're going to be given to us as we iterate over them. So what we could do now
06:28
is actually iterate over them. So just to kind of demonstrate this with Livewire, I'm going to go ahead and just create out a kind of fake array here, and we're going to go ahead and iterate over the chunks as chunk,
06:43
and we're going to go ahead and set this to the chunk that we get back, and then we're going to go ahead and die dump on abc. So we're just building up a kind of new array of each of these chunks that we're iterating through with the generator, so hopefully that
06:58
makes sense. Let's go and import this, and what we should get is an array of 10 arrays, which each have 10 items in. So again, let's go ahead and map these up to each of the columns, and there we go. So we've got 10,
07:12
10, 10, 10, all of these contain all of the items from the CSV, but each of these are pulled out when they're needed. We don't build up initially an entire array of each of these chunks because then we would run into memory issues,
07:26
so hopefully that's starting to come together. If you're still confused about generators, I highly recommend you read up on them, and that's just going to go ahead and give you a little bit more background, but hopefully that makes sense. The whole goal
07:37
here is that we can have as many records as we want in this CSV, up to millions, and it's never going to run out of memory because we're chunking through them using a generator, returning each of them chunks, and they're never all loaded into
07:50
memory at once. So I'm going to go ahead and get rid of this silly example here, but hopefully that has demonstrated this well. The point now is we've got our chunk iterator which will iterate over, and for each of these chunks, we can then create a job to process them,
08:04
and you can bump this number up as high as you want. So if you're always expecting, you know, millions of records, maybe bump that up to a thousand. You could even process 5,000 at once. I'm going to leave this at 10 for now because obviously with our
08:16
slightly smaller CSV, we still want to test this out and make sure each of these batches, these jobs, get put into this batch. So there we go. We've built up our chunks. Let's head over and start to create out some queues to handle these.
25 episodes2 hrs 20 mins

Overview

Let's build a powerful CSV importer with Livewire, completely from scratch. This can handle millions of rows, be reused for multiple models, and by using job batches, doesn't require the browser to be open.

This course is for you if:

  • You need a robust importer component for your models that you have full control over
  • You want to brush up on some advanced Livewire concepts
  • You want to learn about job batching and queues in Laravel
Alex Garrett-Smith
Alex Garrett-Smith
Hey, I'm the founder of Codecourse!

Episode discussion

No comments, yet. Be the first!