{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "0eccd20e", "metadata": {}, "outputs": [], "source": [ "from langchain_groq import ChatGroq" ] }, { "cell_type": "code", "execution_count": 2, "id": "c16ff50e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The first person to land on the moon was Neil Armstrong. He stepped onto the lunar surface on July 20, 1969, as part of the Apollo 11 mission.\n" ] } ], "source": [ "llm = ChatGroq(\n", " temperature=0, \n", " groq_api_key='your_api_key_here', \n", " model_name=\"llama-3.1-70b-versatile\"\n", ")\n", "# checking the response, and it is very fast\n", "response = llm.invoke(\"The first person to land on moon was ...\")\n", "print(response.content)" ] }, { "cell_type": "code", "execution_count": 3, "id": "66815076-34c6-4588-bcfc-853ad226d1a9", "metadata": {}, "outputs": [], "source": [ "# we need to setup a vector database, and we going to use chromadb\n", "# there are other solutions too, but chromadb is open source and very light weight" ] }, { "cell_type": "code", "execution_count": 4, "id": "90d33612", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "USER_AGENT environment variable not set, consider setting it to identify your requests.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Data Scientist\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "About\n", "Alum\n", "Inclusion\n", "Careers\n", "Culture\n", "Blog\n", "Tech\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Data Scientist\n", "Bengaluru\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Share\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Apply\n", "\n", "\n", "\n", "About Team\n", "Myntra Data Science team delivers a large number of data science solutions for the company which are deployed at various customer touch points every quarter. The models create significant revenue and customer experience impact. The models involve real-time, near-real-time and offline solutions with varying latency requirements. The models are built using massive datasets. You will have the opportunity to be part of a rapidly growing organization and gain exposure to all the parts of a comprehensive ecommerce platform. You’ll also get to learn the intricacies of building models that serve millions of requests per second at sub second latency. \n", "The team takes pride in deploying solutions that not only leverage state of the art machine learning models like graph neural networks, diffusion models, transformers, representation learning, optimization methods and bayesian modeling but also contribute to research literature with multiple peer-reviewed research papers.\n", "Roles and Responsibilities\n", "\n", "Design, develop and deploy machine learning models,algorithms and systems to solve complex business problems for Myntra Recsys, Search, Vision, SCM, Pricing, Forecasting, Trend and Virality prediction, Gen AI and other areas\n", "Theoretical understanding and practise of machine learning and expertise in one or more of the topics, such as, NLP, Computer Vision, recommender systems and Optimisation. \n", "Implement robust and reliable software solutions for model deployment.\n", "Support the team in maintaining machine learning pipelines, contributing to tasks like data cleaning, feature extraction and basic model training.\n", "Participate in monitoring the performance of machine learning models, gaining experience in using statistical methods for evaluation.\n", "Working with the Data Platforms teams for understanding and collecting the data.\n", "Conduct performance testing, troubleshooting and tuning as required.\n", "Stay current with the latest research and technology and communicate your knowledge throughout the enterprise.\n", "\n", "Qualifications & Experience\n", "\n", "Master’s/PhD in Computer Science, Mathematics, Statistics/related fields ‘or’ 1+ years of relevant industry experience with a Bachelor’s degree.\n", "Proficiency in Python or one other high-level programming language.\n", "Theoretical understanding of statistical models such as regression, clustering and ML algorithms such as decision trees, neural networks, etc.\n", "Strong written and verbal communication skills\n", "Intellectual curiosity and enthusiastic about continuous learning\n", "Experience developing machine learning models in Python, or equivalent programming language.\n", "Basic familiarity with machine learning frameworks like TensorFlow, PyTorch, or scikit-learn.\n", "Introductory understanding of statistics as it applies to machine learning.\n", "Ability to manage and prioritize your workload and support his/her manager.\n", "Experience with SQL and/or NoSQL databases.\n", "If you are an exceptional candidate, write in. We are happy to hire you even if you don't have the certified qualifications.\n", "\n", "Nice to Have:\n", "\n", "Publications or presentations in recognized Machine Learning and Data Science journals/conferences.\n", "Experience with ML orchestration tools (Airflow, Kubeflow or MLFlow)\n", "Exposure to GenAI models.\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "×\n", "\n", "\n", "\n", "\n", "Apply now\n", "\n", "\n", "\n", "\n", "Name *\n", "\n", "\n", "\n", "\n", "\n", "Last Name *\n", "\n", "\n", "\n", "\n", "\n", "Your Email *\n", "\n", "\n", "\n", "\n", "\n", "Phone *\n", "\n", "\n", "\n", "\n", "\n", "Your current location *\n", "\n", "\n", "\n", "\n", "\n", "Resume/CV *\n", "\n", "\n", "Attach\n", "\n", "×\n", "\n", "\n", "\n", "Cover Letter\n", "\n", "\n", "Attach\n", "Paste\n", "\n", "×\n", "\n", "\n", "\n", "\n", "\n", "Submit \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "We got your Appliaction, our team will get back to you soon.\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Looks like the application has not uploaded, Please try agin.\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Bengaluru (HQ)\n", "\n", "gurgaon\n", "\n", "Mumbai\n", "\n", "\n", "\n", "\n", "\n", "\n", "contact\n", "Shop\n", "Careers\n", "Privacy Policy\n", "Terms & Conditions\n", "\n", "\n", "Myntra is proud to be an Equal Opportunity Employer\n", "\n", "\n", "© 2019 www.myntra.com. All rights reserved.\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "# WebBaseLoader will accept the url and extract the data from that, ie web scraping\n", "\n", "from langchain_community.document_loaders import WebBaseLoader\n", "\n", "loader = WebBaseLoader(\"https://careers.myntra.com/job-detail/?id=7431200002\")\n", "page_data = loader.load().pop().page_content\n", "print(page_data)" ] }, { "cell_type": "code", "execution_count": 5, "id": "85c89a57", "metadata": {}, "outputs": [], "source": [ "from langchain_core.prompts import PromptTemplate\n", "# (NO PREAMBLE) means dont give that initial text like Here is your response.\n", "prompt_extract = PromptTemplate.from_template(\n", " \"\"\"\n", " ### SCRAPED TEXT FROM WEBSITE:\n", " {page_data}\n", " ### INSTRUCTION:\n", " The scraped text is from the career's page of a website.\n", " Your job is to extract the job postings and return them in JSON format containing the \n", " following keys: `role`, `experience`, `skills` and `description`.\n", " Only return the valid JSON.\n", " ### VALID JSON (NO PREAMBLE): \n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "id": "5267bb13-3402-4f91-9899-77c8b9e08e48", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[\n", " {\n", " \"role\": \"Data Scientist\",\n", " \"experience\": \"1+ years of relevant industry experience with a Bachelor’s degree or Master’s/PhD in Computer Science, Mathematics, Statistics/related fields\",\n", " \"skills\": [\n", " \"Python or one other high-level programming language\",\n", " \"Theoretical understanding of statistical models such as regression, clustering and ML algorithms such as decision trees, neural networks, etc.\",\n", " \"Machine learning frameworks like TensorFlow, PyTorch, or scikit-learn\",\n", " \"SQL and/or NoSQL databases\"\n", " ],\n", " \"description\": \"Design, develop and deploy machine learning models, algorithms and systems to solve complex business problems for Myntra Recsys, Search, Vision, SCM, Pricing, Forecasting, Trend and Virality prediction, Gen AI and other areas. Theoretical understanding and practise of machine learning and expertise in one or more of the topics, such as, NLP, Computer Vision, recommender systems and Optimisation.\"\n", " }\n", "]\n" ] } ], "source": [ "chain_extract = prompt_extract | llm # this will form a langchain chain ie you are getting a prompt and passing it to LLM \n", "res = chain_extract.invoke(input={'page_data':page_data})\n", "print(res.content)\n", "\n", "# we got the json format of the job description" ] }, { "cell_type": "code", "execution_count": 7, "id": "c0213559-8127-4ce4-90b9-8ad913fa5b69", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# but the type of it is string, we want json object so we will use JSON Parser\n", "type(res.content)" ] }, { "cell_type": "code", "execution_count": 8, "id": "5415fd54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'role': 'Data Scientist',\n", " 'experience': '1+ years of relevant industry experience with a Bachelor’s degree or Master’s/PhD in Computer Science, Mathematics, Statistics/related fields',\n", " 'skills': ['Python or one other high-level programming language',\n", " 'Theoretical understanding of statistical models such as regression, clustering and ML algorithms such as decision trees, neural networks, etc.',\n", " 'Machine learning frameworks like TensorFlow, PyTorch, or scikit-learn',\n", " 'SQL and/or NoSQL databases'],\n", " 'description': 'Design, develop and deploy machine learning models, algorithms and systems to solve complex business problems for Myntra Recsys, Search, Vision, SCM, Pricing, Forecasting, Trend and Virality prediction, Gen AI and other areas. Theoretical understanding and practise of machine learning and expertise in one or more of the topics, such as, NLP, Computer Vision, recommender systems and Optimisation.'}]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain_core.output_parsers import JsonOutputParser\n", "\n", "json_parser = JsonOutputParser()\n", "json_res = json_parser.parse(res.content)\n", "json_res" ] }, { "cell_type": "code", "execution_count": 9, "id": "c4226c86-9f8c-4206-9706-c4d93724a584", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(json_res)" ] }, { "cell_type": "code", "execution_count": 10, "id": "39961ed6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(json_res)\n", "# but we want a dictionary" ] }, { "cell_type": "code", "execution_count": 11, "id": "eb173c02-93d5-4cff-8763-483834fc7c5c", "metadata": {}, "outputs": [], "source": [ "# Check if the result is a list and extract the first dictionary\n", "if isinstance(json_res, list):\n", " json_res = json_res[0]" ] }, { "cell_type": "code", "execution_count": 12, "id": "0614b58c-7ac4-48ad-a20a-69180d759b93", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'role': 'Data Scientist',\n", " 'experience': '1+ years of relevant industry experience with a Bachelor’s degree or Master’s/PhD in Computer Science, Mathematics, Statistics/related fields',\n", " 'skills': ['Python or one other high-level programming language',\n", " 'Theoretical understanding of statistical models such as regression, clustering and ML algorithms such as decision trees, neural networks, etc.',\n", " 'Machine learning frameworks like TensorFlow, PyTorch, or scikit-learn',\n", " 'SQL and/or NoSQL databases'],\n", " 'description': 'Design, develop and deploy machine learning models, algorithms and systems to solve complex business problems for Myntra Recsys, Search, Vision, SCM, Pricing, Forecasting, Trend and Virality prediction, Gen AI and other areas. Theoretical understanding and practise of machine learning and expertise in one or more of the topics, such as, NLP, Computer Vision, recommender systems and Optimisation.'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_res" ] }, { "cell_type": "code", "execution_count": 13, "id": "62c524d8-3e3a-4922-af5b-4874307298f0", "metadata": {}, "outputs": [], "source": [ "# now its a dicitionary" ] }, { "cell_type": "code", "execution_count": 14, "id": "1e8a0f74", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Techstack | \n", "Links | \n", "
---|---|---|
0 | \n", "Machine Learning, ML, Python | \n", "https://github.com/MandarBhalerao/Gurgaon-Real... | \n", "
1 | \n", "Recommendation System, Python | \n", "https://github.com/MandarBhalerao/Movie-Recomm... | \n", "
2 | \n", "C++, CUDA | \n", "https://github.com/MandarBhalerao/Dilated-Conv... | \n", "
3 | \n", "React, Node.js, MongoDB | \n", "https://example.com/react-portfolio | \n", "
4 | \n", "Angular,.NET, SQL Server | \n", "https://example.com/angular-portfolio | \n", "
5 | \n", "Vue.js, Ruby on Rails, PostgreSQL | \n", "https://example.com/vue-portfolio | \n", "
6 | \n", "Java, Spring Boot, Oracle | \n", "https://example.com/java-portfolio | \n", "
7 | \n", "Flutter, Firebase, GraphQL | \n", "https://example.com/flutter-portfolio | \n", "
8 | \n", "WordPress, PHP, MySQL | \n", "https://example.com/wordpress-portfolio | \n", "
9 | \n", "Magento, PHP, MySQL | \n", "https://example.com/magento-portfolio | \n", "
10 | \n", "React Native, Node.js, MongoDB | \n", "https://example.com/react-native-portfolio | \n", "
11 | \n", "iOS, Swift, Core Data | \n", "https://example.com/ios-portfolio | \n", "
12 | \n", "Android, Java, Room Persistence | \n", "https://example.com/android-portfolio | \n", "
13 | \n", "Kotlin, Android, Firebase | \n", "https://example.com/kotlin-android-portfolio | \n", "
14 | \n", "Android TV, Kotlin, Android NDK | \n", "https://example.com/android-tv-portfolio | \n", "
15 | \n", "iOS, Swift, ARKit | \n", "https://example.com/ios-ar-portfolio | \n", "
16 | \n", "Cross-platform, Xamarin, Azure | \n", "https://example.com/xamarin-portfolio | \n", "
17 | \n", "Backend, Kotlin, Spring Boot | \n", "https://example.com/kotlin-backend-portfolio | \n", "
18 | \n", "Frontend, TypeScript, Angular | \n", "https://example.com/typescript-frontend-portfolio | \n", "
19 | \n", "Full-stack, JavaScript, Express.js | \n", "https://example.com/full-stack-js-portfolio | \n", "
20 | \n", "DevOps, Jenkins, Docker | \n", "https://example.com/devops-portfolio | \n", "