close
close
import data to dynamodb from csv lambda typescript

import data to dynamodb from csv lambda typescript

3 min read 18-10-2024
import data to dynamodb from csv lambda typescript

Import Data to DynamoDB from CSV using Lambda and TypeScript

This article guides you on how to import data from a CSV file into Amazon DynamoDB using a Lambda function written in TypeScript. This approach is ideal for scenarios where you need to bulk-load data into DynamoDB from a structured data source.

Setting Up Your Project

  1. AWS Account and IAM Role:

    • Ensure you have an AWS account with the necessary permissions to create Lambda functions, DynamoDB tables, and interact with S3.
    • Create an IAM role with permissions to access DynamoDB and S3.
  2. DynamoDB Table:

    • Create a DynamoDB table with appropriate schema and primary key structure to match your CSV data.
  3. S3 Bucket:

    • Create an S3 bucket to store your CSV file.
  4. Lambda Function and TypeScript Project:

    • Create a new TypeScript project.
    • Initialize a Lambda function in your project with the appropriate runtime (Node.js) and permissions to access S3 and DynamoDB.

Code Walkthrough

Here's a breakdown of the TypeScript code for the Lambda function:

import * as AWS from 'aws-sdk';
import * as csv from 'csv-parser';
import * as fs from 'fs';
import * as readline from 'readline';

const dynamoDb = new AWS.DynamoDB.DocumentClient();

exports.handler = async (event: any) => {
  // 1. Get CSV file from S3 bucket
  const s3 = new AWS.S3();
  const s3Object = await s3.getObject({
    Bucket: process.env.BUCKET_NAME, 
    Key: process.env.CSV_FILE_NAME,
  }).promise();

  // 2. Parse CSV data using csv-parser
  const csvData: any[] = [];
  const parser = csv();

  await new Promise((resolve, reject) => {
    const stream = readline.createInterface({
      input: s3Object.Body.pipe(parser),
    });

    stream.on('data', (row) => {
      csvData.push(row);
    });

    stream.on('close', resolve);
    stream.on('error', reject);
  });

  // 3. Batch write data to DynamoDB
  const batchParams = {
    RequestItems: {
      [process.env.TABLE_NAME]: [], 
    },
  };

  for (const row of csvData) {
    batchParams.RequestItems[process.env.TABLE_NAME].push({
      PutRequest: {
        Item: row, 
      },
    });
  }

  try {
    await dynamoDb.batchWrite(batchParams).promise();
    console.log('CSV data successfully imported to DynamoDB');
    return { message: 'Data imported successfully' };
  } catch (err) {
    console.error(err);
    return { message: 'Error importing data' };
  }
};

Explanation:

  1. Import Modules: This code imports necessary modules for interacting with AWS services, parsing CSV data, and handling file streams.
  2. DynamoDB Client: An instance of the DynamoDB Document Client is initialized for interactions with the DynamoDB table.
  3. Lambda Handler: The handler function is the entry point for your Lambda function.
  4. Get CSV File:
    • Retrieves the CSV file from the S3 bucket using the provided bucket name and file key (obtained from environment variables).
  5. Parse CSV Data:
    • Uses csv-parser to parse the CSV data, converting it into an array of objects.
    • The readline module allows for efficient streaming of the CSV data, avoiding memory issues with large files.
  6. Batch Write to DynamoDB:
    • The parsed data is processed in batches to improve performance.
    • The batchWrite method of the DynamoDB client is used to write the data to the specified table in batches.
  7. Error Handling:
    • The code includes basic error handling to log errors and return a descriptive error message.

Deployment and Configuration

  1. Environment Variables:
    • Set environment variables (e.g., BUCKET_NAME, CSV_FILE_NAME, TABLE_NAME) within your Lambda function configuration to provide necessary parameters.
  2. Install Dependencies:
    • Install required Node.js packages like aws-sdk, csv-parser, fs, and readline using npm install or yarn add.
  3. Deploy:
    • Deploy your Lambda function and configure the associated trigger (e.g., S3 event trigger for automatic execution when a new CSV file is uploaded).

Additional Notes

  • Data Validation: Implement data validation and error handling within the Lambda function to catch inconsistencies in the CSV data before writing to DynamoDB.
  • Batch Size: Adjust the batch size in the batchWrite method to optimize for performance.
  • Concurrency Control: Use appropriate concurrency control mechanisms (e.g., DynamoDB locks or transactions) if necessary to handle concurrent imports.
  • Logging and Monitoring: Enable detailed logging for your Lambda function and monitor the performance of the import process.

This article provides a basic implementation for importing CSV data into DynamoDB. You can further customize and enhance the code based on your specific requirements and use cases. Remember to review the official AWS documentation for detailed information on each service and best practices for efficient development.

Related Posts


Popular Posts