Content moderation in Social Media with AWS services – Capstone Project

cloud

Today social media is an integral part of our life, and it’s hard to imagine life without social media. Social media has grown leaps and bounds. As more and more people use social media, there is an increase in the trend of offensive content, which can put the social media platforms into controversy and often in legal issues. Hence, social media platforms have a bigger responsibility for moderating the content that gets published on their platform. 

As part of our PGP-CC capstone project, we came up with an innovative approach to ease the overhead of content moderation for social media platforms. This solution tries to demonstrate how AWS cognitive services can be leveraged to automate content moderation. 

Proposed solution  

This solution proposes the usage of following AWS cognitive services for content moderation:

  • AWS Comprehends to do the sentimental analysis of the blog texts. If the sentiment of the text is reported as negative with high confidence, then the blog will have a high probability of containing offensive/inappropriate content. Such blogs can be further evaluated for the usage of restrictive words, and if such words are found, the blog can be either auto rejected or put for manual moderation, depending on the requirement.
  •  AWS Rekognition to moderate the blog photos to identify any inappropriate content. If Rekognition reports any moderation categories with a high confidence level, then there are high chances that blog photos contain inappropriate content. Certain moderation categories (from Rekognition) can be auto rejected (like explicit, violence etc.) and certain categories can be pushed for manual moderation (like suggestive, hate symbols etc.)
  • AWS Rekognition to analyze the photos for the presence of celebrities in the blog photos. If any inappropriate content is found with celebrity pictures, then there is a high chance of creating chaos. Moderation rules (text sentiment analysis confidence score & photo moderation analysis confidence score) can be adjusted to have stricter conditions. 

Apart from the above primary objectives, this project demonstrates following aspects:

  • AWS Translate to translate the blog text
  • AWS Lex to build text chat bot support system in to the web app
  • Lambda functions for all backend processing
  • API gateway to provide API interface for the Lambda functions
  • S3 to store blog photos and host static website
  • DynamoDB to store blog text
  • Aurora MySQL DB to store blog metadata
  • Cognito for user management and authentication
  • SNS for sending Email notifications
  • Cloud front distribution to enable faster content delivery
  • Route 53 to route traffic coming to DNS address to cloud front distribution
  • VPC endpoints to enable routing of traffic between Lambda function deployed in VPC with SNS, DynamoDB, S3, Comprehend, Rekognition

Considering that resilience and fault tolerance are of high importance now a days, cross-region data replication has been implemented for following:

  • S3 photo store
  • DynamoDB blog data
  • Aurora MySQL blog metadata

Implementation overview

Some of the key use cases are detailed out here to give insight into the implementation details

Blog creation

Create Blog 
1Store blog metadata in Aurora MySQL
2Store blog text in DynamoDB
3Get S3 pre-signed URL for uploading the photo
4Call S3 pre-signed URL to post the photo from the blog app

Blog text moderation

Process blog text
1DynamoDB trigger calls text processor lambda APILambda API process the record if it’s a new record creation event
2Do the sentimental analysis of the text using Comprehend If the blog sentiment is negative and contains any restricted words, mark blog for manual moderation, else mark as text approved for public access
3Store the sentimental analysis results from Comprehend against blog in DynamoDB
4Update the blog status in Aurora MySQL, based on text analysis in step 2
5If blog text requires manual moderation, send message to SNS to trigger an Email to admin

Blog photo moderation

Process blog photo
1S3 event calls photo processor lambda API, when new photo added in the bucket
2Validate the photo for restrictive content using content moderation capability of Rekognition. Based on moderation results, blog photo approved/rejected/marked for manual moderation.Identify the celebrities using celebrity recognition capability of Rekognition. If no celebrities found, identify the entities in the picture using entity identification capability of Rekognition.
3Store photo moderation results and celebrity/entity identification results from Rekognition against the blog in DynamoDB
4Update the blog status in Aurora MySQL, based on photo moderation results in step 2
5If blog photo requires manual moderation, send message to SNS to trigger an Email to admin

Read Blog

Fetch blogs
1Call lambda to fetch the list of blogs eligible for public access
2Get the blog details for each of the eligible blog
2aGet the blog text from DynamoDB
2bIf user selected different language (other than English), translate the text using Amazon Translate
2cGet the S3 pre-signed URL for the blog photo
3Fetch the blog photo using S3 pre-signed URL

Admin moderation of blog

Admin moderation
1Call lambda to fetch the blog that require manual moderation
1aFetch the blog pending for manual moderation from Aurora MySQL
1bFetch the pre-signed S3 URL for the blog photo
1cFetch the blog text from the DynamoDB
2Blog app fetches photo from S3 using pre-signed URL
3Call lambda to either approve/reject the blog based on admin action

Key learnings

As part of our capstone journey, we came across many challenges which enabled us to get more insight into the AWS cloud. Here are few key takeaways:

  • Versioning should be enabled on S3 to enable cross region replication
  • Source S3 bucket should be granted access to replicate in destination region S3 bucket via IAM role
  • Streaming should be enabled in DynamoDB for enabling global table feature
  • Free tier provisioned capacity will not support global tables in DynamoDB
  • Cross-region read replicas not supported for MySQL RDS
  • RDS proxy not supported for Aurora RDS instances
  • Default DB parameter group doesn’t support cross region replication of Aurora RDS
  • VPC endpoints enable access to S3, DynamoDB, Comprehend, Rekognition and SNS for Lambda inside VPC
  • Custom classification features of Comprehend will support classification of content within predefined sets of labels. Even if a content doesn’t match any label, it will map to one of them. So it is suitable to use when the scope of content is limited to certain context.
  • Custom classification endpoint will be charged even when not in use.
  • Before enabling RDS cross-region replication, RDS subnet should be created in the destination region.
  • RDS instances will be restarted automatically after 7 days if they are not started manually.
  • If Lambda doesn’t have necessary rights to access S3, still pre-signed url will be provided by S3 for object access on request. However, the URL access failed.
  • S3 will return a pre-signed URL for fetching an object, even if it doesn’t present in the bucket. This needs to be handled at the client side.
  • AWS authentication enabled DB users can be used to access Aurora MySQL RDS instances from Lambda.
  • CORS should be enabled at the API gateway to enable access of API’s from web apps.
  • Lambda will require appropriate rights for accessing Comprehend, Rekognition, Translate, SNS etc. Which can be granted by assigning appropriate policies to the execution role attached to the Lambda.
  • Additional dependencies required for Lambda like MySQL connector library can be added via Lambda layers.
  • To stop/delete a read replica cluster of RDS, it has to be first promoted as stand-alone.
  • To associate alternate domain names to cloud front distribution, you need to get a custom wildcard security certificate and then associate the DNS with cloud front distribution.
  • To add routing logic to a cloud front distribution from route53, DNS name should be configured as alternate domain name in cloud front distribution and associate the custom security certificate.
  • Custom security certificates for the DNS can be created using Amazon Certificate Manager and the certificate needs to be added as a CNAME record in the DNS. 

Authored by 

Ashok K A Setty – More than 17+ years of experience in software development in Aerospace, Telecom and Supply Chain domain. Currently working in Boeing India as a Technical Lead for Cabin System software development. 

Vijay Rajagopalan – Overall 20+ years of experience in the IT industry. Currently working as Senior Manager with Hitachi Vantara managing a global centralized Marketing Center of excellence team to drive initiatives around Delivery excellence , Process adherence and Standardization. 

Maheshkumar Rajagopalan – Overall 14+ years of experience in Technical support and Project management. Currently working with Sabre as a Supervisor Service Delivery managing their Asia Pacific operation for End user computing , Unified communication and Office  infrastructure.

→ Explore this Curated Program for You ←

Avatar photo
Great Learning Editorial Team
The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.

Cloud Computing PG Program by Great Lakes

Enroll in India's top-rated Cloud Program for comprehensive learning. Earn a prestigious certificate and become proficient in 120+ cloud services. Access live mentorship and dedicated career support.

4.62 ★ (2,760 Ratings)

Course Duration : 8 months

Scroll to Top