<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Machine Learning by Japkeerat Singh]]></title><description><![CDATA[Learn 1 concept from Machine Learning, Artificial Intelligence, or Data Science, every other day at 2:30 PM IST in 4 minutes or less!]]></description><link>https://japkeeratsingh.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 12 May 2026 08:40:18 GMT</lastBuildDate><atom:link href="https://japkeeratsingh.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building Production-Ready Agentic Systems: Lessons from Real Implementation]]></title><description><![CDATA[Most articles about agentic systems show you the glossy marketing version. Multi-agent orchestration! Intelligent decision making! Autonomous task execution! The reality is more nuanced. After building a production agentic system that handles real El...]]></description><link>https://japkeeratsingh.com/building-production-ready-agentic-systems-lessons-from-real-implementation</link><guid isPermaLink="true">https://japkeeratsingh.com/building-production-ready-agentic-systems-lessons-from-real-implementation</guid><category><![CDATA[google adk]]></category><category><![CDATA[agents]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[#agent]]></category><category><![CDATA[elasticsearch]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Wed, 27 Aug 2025 23:41:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756334668096/9e78ee35-3597-4cb4-a821-2a7d4067ecae.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most articles about agentic systems show you the glossy marketing version. Multi-agent orchestration! Intelligent decision making! Autonomous task execution! The reality is more nuanced. After building a production agentic system that handles real ElasticSearch queries from actual users, I learned that reliable agent systems are less about AI magic and more about thoughtful software architecture with agents as components.</p>
<p>This tutorial examines a real implementation - an LLM ElasticSearch Agent, built using Google Agent Development Kit <a target="_blank" href="https://japkeeratsingh.com/i-finally-built-my-first-mcp-server">(yes, I am a hypocrite)</a>, that converts natural language queries into database operations - to understand what actually makes agentic systems work in production environments where failures matter.</p>
<h2 id="heading-understanding-what-agentic-systems-actually-solve">Understanding What Agentic Systems Actually Solve</h2>
<p>Before diving into architecture patterns, let me establish why you'd build an agentic system instead of a traditional application. Consider this user request: "Show me all failed login attempts from last week."</p>
<p>A traditional approach might involve a hardcoded query builder with preset filters. This works until users start asking variations like "Find security incidents from the past 7 days" or "Display authentication failures since Monday." Each variation requires code changes.</p>
<p>An agentic approach treats this as a multi-step reasoning problem. The system needs to understand that "failed login attempts" relates to authentication data, determine which database index contains this information, translate temporal expressions like "last week" into actual date ranges, generate the appropriate query syntax, execute it safely, and present results in human-readable format.</p>
<p>The key insight is that complex user requests often require multiple distinct capabilities working together. Rather than building one monolithic system that handles everything, agentic architectures decompose these requests into specialized agents that can be developed, tested, and debugged independently.</p>
<h2 id="heading-the-multi-agent-pipeline-pattern">The Multi-Agent Pipeline Pattern</h2>
<p>The most important architectural decision in my system was breaking the query process into distinct stages, each handled by a specialized agent. This isn't just good software engineering - it's essential for reliability when dealing with the unpredictable outputs that LLMs can produce.</p>
<p>My pipeline consists of three specialized agents working in sequence. The Index Selection Agent determines which ElasticSearch index contains the relevant data. The Query Generation Agent converts the natural language request into proper ElasticSearch DSL syntax. The Query Execution Agent runs the query safely and interprets results for the user.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ElasticsearchPipelineAgent</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment"># Each agent specializes in one part of the problem</span>
        self.index_selection_agent = create_index_selection_agent()
        self.query_generation_agent = create_query_generation_agent()
        self.query_execution_agent = create_query_execution_agent()

        <span class="hljs-comment"># SequentialAgent coordinates the pipeline</span>
        self.agent = SequentialAgent(
            name=<span class="hljs-string">"ElasticsearchPipelineAgent"</span>,
            sub_agents=[
                self.index_selection_agent.agent,
                self.query_generation_agent.agent,
                self.query_execution_agent.agent,
            ],
        )
</code></pre>
<p>This design provides several practical advantages. First, each agent can be optimized for its specific task with tailored prompts and tools. The Index Selection Agent uses tools for discovering available indices and analyzing their schemas, while the Query Execution Agent focuses on safe query execution and result formatting. Second, failures can be isolated and handled appropriately at each stage. If index selection fails, you know exactly where to look and can potentially recover by prompting the user for clarification. Third, the system becomes much easier to test and debug when you can examine each stage independently.</p>
<p>The sequential approach also handles the state management challenge that trips up many agentic systems. Each agent receives the complete context from previous agents, building up the information needed for the final query execution.</p>
<h2 id="heading-structured-output-the-foundation-of-reliability">Structured Output: The Foundation of Reliability</h2>
<p>The biggest practical challenge in building reliable agentic systems is ensuring that agents produce consistent, parseable outputs that subsequent agents can work with reliably. This is where structured output schemas become absolutely critical.</p>
<p>Each agent in my pipeline defines exactly what it will output using Pydantic models. This isn't just good practice - it's essential for system reliability. Here's what the Index Selection Agent produces:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">IndexSelectionOutput</span>(<span class="hljs-params">BaseModel</span>):</span>
    <span class="hljs-string">"""Structured output for the Index Selection Agent."""</span>

    selected_index: Optional[str] = Field(
        description=<span class="hljs-string">"Name of the selected index, or null if selection failed"</span>
    )
    index_schema: Optional[IndexSchema] = Field(
        description=<span class="hljs-string">"Complete schema information for the selected index"</span>
    )
    selection_metadata: SelectionMetadata = Field(
        description=<span class="hljs-string">"Metadata about the selection process"</span>
    )
    validation: ValidationResult = Field(
        description=<span class="hljs-string">"Validation results for the selection"</span>
    )
</code></pre>
<p>Notice how this structure handles partial failures gracefully. If index selection fails, the <code>selected_index</code> field is null, but the <code>selection_metadata</code> explains why it failed and the <code>validation</code> section provides specific error information. This allows the system to provide meaningful feedback to users rather than cryptic error messages.</p>
<p>The validation component is particularly important. Each agent validates not just its inputs, but also its outputs before passing them to the next stage. The Index Selection Agent doesn't just return an index name - it verifies that the index actually exists, that the schema was retrieved successfully, and that the selection is ready for query generation.</p>
<p>This validation-first approach catches problems early in the pipeline rather than letting them cascade through multiple agents before failing in confusing ways.</p>
<h2 id="heading-tool-integration-connecting-agents-to-real-systems">Tool Integration: Connecting Agents to Real Systems</h2>
<p>Agents become useful when they can interact with external systems through tools. However, tool integration introduces significant complexity around error handling, security, and connection management that doesn't exist in simple LLM applications.</p>
<p>My ElasticSearch tools demonstrate several patterns for reliable tool integration. The most important principle is defensive programming - assume everything will fail and design accordingly.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">QueryExecutionTools</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">execute_query</span>(<span class="hljs-params">self, query_data: Dict[str, Any]</span>) -&gt; Dict[str, Any]:</span>
        <span class="hljs-keyword">try</span>:
            <span class="hljs-comment"># Extract and validate query components - always check inputs first</span>
            <span class="hljs-keyword">if</span> <span class="hljs-string">"generated_query"</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> query_data:
                <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"No generated query found in query data"</span>}

            generated_query = query_data[<span class="hljs-string">"generated_query"</span>]
            query_dsl = generated_query.get(<span class="hljs-string">"query_dsl"</span>)

            <span class="hljs-comment"># Security validation - ensure read-only operations</span>
            <span class="hljs-comment"># This prevents agents from accidentally modifying data</span>
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._is_read_only_query(query_dsl):
                <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"Only read-only queries are allowed"</span>}

            <span class="hljs-comment"># Resource validation - ensure index exists before querying</span>
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self.es.indices.exists(index=target_index):
                <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">f"Index '<span class="hljs-subst">{target_index}</span>' does not exist"</span>}

            <span class="hljs-comment"># Execute query with structured error handling</span>
            response = self.es.search(index=target_index, body=query_dsl)

            <span class="hljs-comment"># Return clean, structured results for LLM analysis</span>
            <span class="hljs-keyword">return</span> {
                <span class="hljs-string">"total_hits"</span>: response[<span class="hljs-string">"hits"</span>][<span class="hljs-string">"total"</span>][<span class="hljs-string">"value"</span>],
                <span class="hljs-string">"documents"</span>: [
                    {
                        <span class="hljs-string">"id"</span>: hit.get(<span class="hljs-string">"_id"</span>),
                        <span class="hljs-string">"score"</span>: hit[<span class="hljs-string">"_score"</span>],
                        <span class="hljs-string">"source"</span>: hit[<span class="hljs-string">"_source"</span>],
                    }
                    <span class="hljs-keyword">for</span> hit <span class="hljs-keyword">in</span> response[<span class="hljs-string">"hits"</span>][<span class="hljs-string">"hits"</span>]
                ],
                <span class="hljs-string">"took_ms"</span>: response[<span class="hljs-string">"took"</span>]
            }

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            logger.error(<span class="hljs-string">f"Error executing query: <span class="hljs-subst">{str(e)}</span>"</span>)
            <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">f"Failed to execute query: <span class="hljs-subst">{str(e)}</span>"</span>}
</code></pre>
<p>The pattern here is safety first, then functionality. Every tool method validates its inputs, checks that required resources exist, ensures operations are safe to execute, and returns errors in a consistent format that agents can understand and handle appropriately.</p>
<p>Connection management is another critical concern. Tools that interact with external systems need robust connection handling to prevent resource leaks and handle network failures gracefully. My system uses a singleton pattern for ElasticSearch connections, ensuring all tools share a single connection pool while maintaining thread safety across the agent system.</p>
<p>The security validation deserves particular attention. The <code>_is_read_only_query</code> method examines the query DSL to ensure it doesn't contain any write operations. This prevents the system from accidentally modifying data, which is essential when giving AI agents access to production databases.</p>
<h2 id="heading-state-management-between-agent-interactions">State Management Between Agent Interactions</h2>
<p>Unlike stateless HTTP APIs, agentic systems need to maintain context across multiple agent interactions. The Index Selection Agent needs to communicate its findings to the Query Generation Agent, which in turn needs to pass both the index information and the generated query to the Query Execution Agent.</p>
<p>My system handles this through ADK's session management capabilities, with utility functions that make state sharing explicit and reliable:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_index_selection_data</span>(<span class="hljs-params">
    context: ToolContext,
    selected_index: str,
    index_schema: Dict[str, Any],
    reasoning: str,
    confidence: str = <span class="hljs-string">"high"</span>
</span>) -&gt; str:</span>
    <span class="hljs-string">"""Save index selection data to session state for the next pipeline agent."""</span>

    selection_data = {
        <span class="hljs-string">"selected_index"</span>: selected_index,
        <span class="hljs-string">"index_schema"</span>: index_schema,
        <span class="hljs-string">"selection_metadata"</span>: {
            <span class="hljs-comment"># Store the reasoning so later agents understand the decision</span>
            <span class="hljs-string">"reasoning"</span>: reasoning,
            <span class="hljs-string">"confidence"</span>: confidence
        },
        <span class="hljs-string">"validation"</span>: {
            <span class="hljs-comment"># Include validation status for error handling</span>
            <span class="hljs-string">"index_exists"</span>: <span class="hljs-literal">True</span>,
            <span class="hljs-string">"schema_retrieved"</span>: <span class="hljs-literal">True</span>,
            <span class="hljs-string">"ready_for_query_generation"</span>: <span class="hljs-literal">True</span>
        }
    }

    <span class="hljs-comment"># Save to session state for next agent to access</span>
    context.state[<span class="hljs-string">"index_selection_data"</span>] = selection_data
    context.state[<span class="hljs-string">"selected_index"</span>] = selected_index

    <span class="hljs-keyword">return</span> <span class="hljs-string">f"Index selection data saved. Selected: <span class="hljs-subst">{selected_index}</span>"</span>
</code></pre>
<p>This approach makes state management explicit rather than relying on implicit context passing. Each agent can access previous results through well-defined state keys, and the validation information helps agents understand whether they have reliable input data to work with.</p>
<p>State management becomes especially important for error recovery. If the Query Generation Agent fails to produce a valid query, the system still has the index selection results and can potentially retry with a different approach or ask the user for clarification, rather than starting the entire pipeline from scratch.</p>
<h2 id="heading-observability-making-agent-decisions-transparent">Observability: Making Agent Decisions Transparent</h2>
<p>Production agentic systems require comprehensive observability. When an agent makes a poor decision or produces unexpected results, you need to understand its reasoning process to improve the system. This goes beyond traditional application logging because agent decisions involve complex reasoning that isn't visible from external behavior alone.</p>
<p>My system integrates with Phoenix for distributed tracing, treating each agent interaction as a span in a larger trace that shows how user queries flow through the agent pipeline:</p>
<pre><code class="lang-python"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_user_query</span>(<span class="hljs-params">runner, session_service, query: str, user_id: str, 
                           app_name: str, logger: logging.Logger, tracer=None</span>):</span>
    <span class="hljs-string">"""Process user query with comprehensive tracing."""</span>

    <span class="hljs-keyword">if</span> tracer:
        <span class="hljs-keyword">with</span> tracer.start_as_current_span(<span class="hljs-string">"process_user_query"</span>) <span class="hljs-keyword">as</span> span:
            <span class="hljs-comment"># Record key attributes for debugging later</span>
            span.set_attribute(<span class="hljs-string">"user.id"</span>, user_id)
            span.set_attribute(<span class="hljs-string">"user.query"</span>, query)
            span.set_attribute(<span class="hljs-string">"app.name"</span>, app_name)

            <span class="hljs-comment"># Process through agent pipeline with full trace visibility</span>
            <span class="hljs-keyword">await</span> _process_query_internal(
                runner, session_service, query, user_id, app_name, logger, span
            )
</code></pre>
<p>This provides me with a complete view of how user queries flow through the agent pipeline, how long each stage takes, and where failures occur. More importantly, it captures the reasoning and intermediate results from each agent, making it possible to understand why the system made particular decisions.</p>
<p>I encountered a practical challenge with OpenTelemetry tracing in async environments. The async generators and streaming patterns common in agent frameworks can cause context detachment errors that crash the application. I solved this with safe tracing utilities that handle these issues gracefully:</p>
<pre><code class="lang-python"><span class="hljs-meta">@contextmanager</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">safe_tracing_context</span>():</span>
    <span class="hljs-string">"""Context manager that safely handles OpenTelemetry tracing errors."""</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">yield</span>
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        error_msg = str(e).lower()
        <span class="hljs-comment"># Check if this is a context-related tracing error</span>
        <span class="hljs-keyword">if</span> any(keyword <span class="hljs-keyword">in</span> error_msg <span class="hljs-keyword">for</span> keyword <span class="hljs-keyword">in</span> [<span class="hljs-string">"context"</span>, <span class="hljs-string">"token"</span>, <span class="hljs-string">"detach"</span>]):
            <span class="hljs-comment"># Silently ignore context-related errors to prevent crashes</span>
            logger.debug(<span class="hljs-string">f"Suppressed OpenTelemetry context error: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-comment"># Re-raise non-context related errors since they're real problems</span>
            <span class="hljs-keyword">raise</span>
</code></pre>
<p>This ensures that tracing problems don't crash the agent system while still providing observability when possible. The lesson here is that production systems need to be resilient to their monitoring infrastructure failing.</p>
<h2 id="heading-error-handling-when-agents-make-poor-decisions">Error Handling: When Agents Make Poor Decisions</h2>
<p>Reliable agentic systems need sophisticated error handling strategies that account for the probabilistic nature of LLM outputs. Not all errors are equal - some should cause the system to stop completely, while others should trigger graceful degradation or alternative approaches.</p>
<p>My system implements error handling at multiple layers, each with different recovery strategies. At the tool level, individual tools return structured error information rather than throwing exceptions. This allows agents to understand what went wrong and potentially try alternative approaches.</p>
<p>At the agent level, agents interpret tool errors and decide whether to retry, use alternative approaches, or escalate the error. The structured output format includes success indicators and error messages that make this decision-making explicit:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">QueryExecutionOutput</span>(<span class="hljs-params">BaseModel</span>):</span>
    <span class="hljs-string">"""Structured output for the Query Execution Agent."""</span>

    execution_results: Optional[ExecutionResults] = Field(
        description=<span class="hljs-string">"Raw query execution results, or null if execution failed"</span>
    )
    success: bool = Field(
        description=<span class="hljs-string">"Whether the query execution was successful"</span>
    )
    error_message: Optional[str] = Field(
        description=<span class="hljs-string">"Error message if execution failed"</span>
    )
    natural_language_response: str = Field(
        description=<span class="hljs-string">"Natural language response based on analyzing the results"</span>
    )
</code></pre>
<p>At the pipeline level, the orchestrator can route queries to alternative handlers if the primary ElasticSearch pipeline fails. For example, if a query can't be mapped to any available index, the system can fall back to answering general questions about ElasticSearch concepts.</p>
<p>The key insight is that different types of errors require different recovery strategies. A syntax error in query generation might be recoverable by retrying with additional context, while a security violation should immediately terminate the request. The structured error handling approach makes these distinctions explicit and actionable.</p>
<h2 id="heading-configuration-and-deployment-patterns">Configuration and Deployment Patterns</h2>
<p>Building reliable agentic systems isn't just about the code - it's also about how they're configured, deployed, and maintained in production environments. My system demonstrates several patterns that have proven essential for operational reliability.</p>
<p>Different agents benefit from different language models based on their task complexity and cost requirements. My configuration approach allows easy experimentation without code changes:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># config.yaml - Flexible model configuration</span>
<span class="hljs-attr">agents:</span>
  <span class="hljs-attr">orchestrator:</span> <span class="hljs-string">"openai/gpt-4o-mini"</span>
  <span class="hljs-attr">elasticsearch:</span> <span class="hljs-string">"openai/gpt-4o-mini"</span>  
  <span class="hljs-attr">index_selection:</span> <span class="hljs-string">"openai/gpt-3.5-turbo"</span>  <span class="hljs-comment"># Simpler task, cheaper model</span>
  <span class="hljs-attr">query_generation:</span> <span class="hljs-string">"openai/gpt-4o-mini"</span>   <span class="hljs-comment"># Complex task, better model</span>
  <span class="hljs-attr">query_execution:</span> <span class="hljs-string">"openai/gpt-4o-mini"</span>
</code></pre>
<p>This configuration-driven approach allows me to optimize for cost and performance independently for each agent, and makes it easy to experiment with new models as they become available without touching the core system logic.</p>
<p>The deployment strategy acknowledges that agentic systems typically depend on multiple services. My Docker Compose setup orchestrates ElasticSearch for data storage, Kibana for data visualization, Phoenix for observability, and the agent system itself:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># docker-compose.yml - Complete service orchestration</span>
<span class="hljs-attr">services:</span>
  <span class="hljs-attr">elasticsearch:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">docker.elastic.co/elasticsearch/elasticsearch:8.11.0</span>
    <span class="hljs-comment"># Configuration for data storage</span>

  <span class="hljs-attr">kibana:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">docker.elastic.co/kibana/kibana:8.11.0</span>  
    <span class="hljs-comment"># Configuration for data visualization</span>

  <span class="hljs-attr">phoenix:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">arizephoenix/phoenix:latest</span>
    <span class="hljs-comment"># Configuration for observability</span>

  <span class="hljs-attr">llm-es-agent:</span>
    <span class="hljs-attr">build:</span> <span class="hljs-string">.</span>
    <span class="hljs-comment"># Configuration for the agent system</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">ES_HOST=http://elasticsearch:9200</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">PHOENIX_ENDPOINT=http://phoenix:6006</span>
</code></pre>
<p>This approach ensures that the entire system can be deployed consistently across development, staging, and production environments. The environment variables handle deployment-specific configuration while keeping the core system logic environment-agnostic.</p>
<h2 id="heading-user-experience-multiple-interfaces-for-different-needs">User Experience: Multiple Interfaces for Different Needs</h2>
<p>Production agentic systems often need to serve different types of users through different interfaces. My system demonstrates this with both terminal and web interfaces sharing the same underlying agent logic. This architectural pattern has proven valuable in real deployments.</p>
<p>The core agent processing is abstracted into a reusable class that both interfaces can use:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">UnifiedAgentApp</span>:</span>
    <span class="hljs-string">"""Unified application class supporting multiple interfaces."""</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.orchestrator = <span class="hljs-literal">None</span>
        self.runner = <span class="hljs-literal">None</span>
        self.session_service = <span class="hljs-literal">None</span>

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_query</span>(<span class="hljs-params">self, query: str, user_id: str</span>) -&gt; Dict[str, Any]:</span>
        <span class="hljs-string">"""Process user query through the orchestrator agent."""</span>
        <span class="hljs-comment"># Shared logic for both interfaces - the heavy lifting happens here</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_terminal_interface</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Run the terminal interface for developers."""</span>
        <span class="hljs-comment"># Terminal-specific UI logic - just handles input/output formatting</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_streamlit_interface</span>(<span class="hljs-params">self</span>):</span> 
        <span class="hljs-string">"""Run the web interface for business users."""</span>
        <span class="hljs-comment"># Web-specific UI logic - handles web-specific presentation</span>
</code></pre>
<p>This separation allows the system to serve different user personas - developers who prefer command-line interfaces for debugging and scripting, and business users who prefer web interfaces for ad-hoc queries - without duplicating the complex agent orchestration logic.</p>
<p>The lesson here is that the interface is often less important than the underlying system reliability. Users will adapt to different interfaces as long as the core functionality works consistently and provides useful feedback when things go wrong.</p>
<h2 id="heading-what-actually-works-in-production">What Actually Works in Production</h2>
<p>After building and operating this system, several patterns have proven essential for production reliability that aren't obvious from reading about agentic systems in theory.</p>
<p><strong>Structured everything</strong>. Use structured outputs for all agent communications. The time I spent defining Pydantic models upfront saved enormous debugging time later when agents produce unexpected outputs. Structured inputs and outputs make the system testable and debuggable in ways that free-form text communication simply doesn't allow.</p>
<p><strong>Validate at every boundary</strong>. Each agent should validate its inputs, perform its work safely, and validate its outputs before passing them to the next agent. This catches problems early rather than letting them cascade through multiple agents before failing in confusing ways.</p>
<p><strong>Design for partial failures</strong>. In traditional applications, you often design for success and handle failures as exceptions. In agentic systems, partial failures are common and often recoverable. Design your error handling to distinguish between recoverable issues and hard failures, and make recovery paths explicit.</p>
<p><strong>Make decisions observable</strong>. The complexity of agentic systems means you need comprehensive observability to understand system behavior. But don't let observability infrastructure crash your system - implement defensive observability that degrades gracefully when monitoring systems fail.</p>
<p><strong>Security by default</strong>. When giving agents access to external systems, implement security controls at the tool level rather than relying on prompt engineering or agent instructions. Agents will eventually try to do things they shouldn't, either through user requests or unexpected reasoning patterns.</p>
<p><strong>Configuration over code</strong>. Agent behavior often needs to be tuned based on operational experience. Make key parameters configurable so you can adjust system behavior without code deployments. This is especially important for model selection and cost optimization.</p>
<h2 id="heading-my-one-advice">My One Advice</h2>
<p>If you're building agentic systems, focus on making each component reliable independently before optimizing agent interactions. The systems that work in production are built on solid engineering fundamentals, not prompt engineering magic.</p>
]]></content:encoded></item><item><title><![CDATA[I finally built my first MCP Server. I expected magic, I got glorified API endpoints]]></title><description><![CDATA[It is 2025. Everybody's releasing their MCP Server. I, on the other hand, was avoiding it like Neo avoided the red pill. But just like in The Matrix, you can only dodge inevitability for so long. And I built one. For BigQuery. Even though Google's MC...]]></description><link>https://japkeeratsingh.com/i-finally-built-my-first-mcp-server</link><guid isPermaLink="true">https://japkeeratsingh.com/i-finally-built-my-first-mcp-server</guid><category><![CDATA[bigquery mcp]]></category><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><category><![CDATA[MCP-host]]></category><category><![CDATA[Model Context Protocol]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Sat, 16 Aug 2025 14:16:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755353666801/b1078953-c10a-4aba-abb3-cbb226f3782f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It is 2025. Everybody's releasing their MCP Server. I, on the other hand, was avoiding it like Neo avoided the red pill. But just like in The Matrix, you can only dodge inevitability for so long. And I built one. For BigQuery. Even though Google's MCP Toolbox existed and had support for BigQuery. Why? Let's dive in…</p>
<h2 id="heading-wtf-even-is-mcp">WTF even is MCP?</h2>
<p><a target="_blank" href="https://www.instagram.com/p/DJQu6DPMgK_/"><img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSJ71umijIXUAonB6W_pL3tLrACYclq6T4ZUg&amp;s" alt="MCP is basically JSON over HTTP. #developers #aiagents #openai #programming Originally posted by @thecrazyprogramer on Instagram" class="image--center mx-auto" /></a></p>
<p>I'll admit it - until I started building one myself, I didn't fully understand what MCP was trying to solve. Talk about living under the rock. Primarily because it seemed like such a large mountain to climb, and honestly, the documentation made my brain hurt.</p>
<p>Here's what finally made it click for me: <strong>MCP (Model Context Protocol) is basically Anthropic's way of saying "hey, let's standardize how AI assistants talk to your stuff."</strong> Your databases, your APIs, your tools - everything.</p>
<p>Picture this: You're one of those people who has very specific requirements for everything. Your morning routine involves exactly 2.5 tablespoons of this specific coffee grind, steamed oat milk at precisely the right temperature, stirred counterclockwise three times (don't judge). You decide to hire an assistant to handle this.</p>
<p>Your assistant is brilliant but has no clue how to use any of your gadgets. The fancy espresso machine, the smart home system, the automated sock-sorting device (yes, that's a thing in my hypothetical world). Without MCP, you'd have to teach each AI assistant how to use each tool individually - different interfaces, different authentication, different everything.</p>
<p>With MCP, you create one "universal manual" that any AI assistant can read. Suddenly, Claude, ChatGPT, or whatever AI assistant comes next can all use your tools through the same standardized interface.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755350804736/c1ab2ab4-48d9-414f-8299-b3af1a5d5fc6.png" alt class="image--center mx-auto" /></p>
<p>That's when the lightbulb went off: this wasn't just another tech buzzword - it was solving a real integration nightmare that's only getting worse as more AI assistants show up to the party.</p>
<h2 id="heading-why-i-decided-to-reinvent-the-wheel-spoiler-googles-version-has-issues">Why I decided to reinvent the wheel (spoiler: Google's version has issues)</h2>
<p>Now, before you start typing "just use Google's MCP Toolbox, you fool" in the comments, let me explain why I went down this rabbit hole despite a perfectly functional solution already existing.</p>
<p>Google's BigQuery MCP connector works... but it's like that friend who's helpful but comes with a lot of baggage.</p>
<p><strong>First, Google themselves treat it like a science experiment.</strong> They literally plaster "BETA" warnings everywhere with the classic "expect breaking changes in future versions" disclaimer. Nothing says "production-ready" like a vendor telling you they might completely change everything tomorrow. I've been burned by Google deprecating products before (<em>cough</em> Google Podcasts <em>cough</em>).</p>
<p><strong>Second, it's about as granular as a sledgehammer.</strong> Want to give your AI access to sales data but not employee records? Tough luck. The AI gets access to ALL tables the service account can see. There's no way to say "hey, only show the marketing tables to this user, and only the finance tables to that user."</p>
<p><strong>Third, the data leakage potential gave me anxiety.</strong> The only workaround is spinning up multiple MCP instances with different service accounts. So instead of one clean connector, you're managing what I like to call "a zoo of MCP servers." Elegant? About as elegant as duct tape.</p>
<p><strong>But here's what really pushed me over the edge:</strong> When you have lots of tables (and BigQuery projects are basically table hoarders), the AI consistently picks the wrong ones. And this isn't Google's fault - it's a fundamental quirk of how LLMs work.</p>
<p>The more options you give them, the worse they get at picking the right one. It's like having a really smart friend who always grabs the first tool they see in your toolbox, even when they need a screwdriver and they picked up a hammer. Your "customer_analytics_2024" table gets ignored while "test_backup_data_2019" gets selected just because it appeared first in the list.</p>
<p>No amount of clever sorting fixes this - alphabetical, by importance, by creation date, even random order. The model will still lean toward the first few options like a kid reaching for candy.</p>
<p><a target="_blank" href="https://odsc.medium.com/evaluating-agent-tool-selection-testing-if-first-really-is-the-worst-b83dad43f641"><img src="https://miro.medium.com/v2/resize:fit:1400/0*opYXpHsrl4J1nIOS.png" alt /></a></p>
<p>That's when I thought, "You know what? I bet I can try to solve this differently”.</p>
<h2 id="heading-what-i-built-instead">What I built instead</h2>
<p><em>A quick note: I won't be sharing code for this work as it belongs to my employer. I can share the high-level concepts and architecture, but the specifics are proprietary.</em></p>
<p>Instead of throwing every table at the AI and hoping it picks the right one, I broke down the BigQuery interaction into five distinct tools with a focus on intelligent filtering and user-specific access control.</p>
<p><strong>Tool 1: IAM-Based Content Visibility Controller</strong> This tool integrates with our organization's Identity and Access Management system to dynamically determine what datasets and tables a user can access. Rather than relying on service account permissions (which are static), it pulls the authenticated user's identity and cross-references it with our data governance policies in real-time. The tool queries our IAM provider's API to get group memberships and role assignments, then filters the available BigQuery resources accordingly. A user in the marketing group sees marketing datasets, finance users see finance data, and cross-functional analysts get broader access based on their role matrix.</p>
<p><strong>Tool 2: Semantic Dataset Discovery</strong> This tool addresses the core table selection problem through semantic search over dataset metadata. It maintains an indexed representation of all dataset names, descriptions, and table schemas, then uses vector similarity search to match user queries with relevant datasets. Instead of presenting 200+ datasets alphabetically, it ranks them by semantic relevance to the user's intent. The key insight: this only works because our organization has invested in proper data cataloging with meaningful dataset names and comprehensive metadata.</p>
<p><strong>Tool 3: Table Metadata Fetcher</strong> Once relevant datasets are identified, this tool retrieves table-level metadata including creation dates, last modified timestamps, row counts, and table descriptions. It essentially wraps BigQuery's INFORMATION_SCHEMA queries but filters results based on the datasets identified in Tool 2, preventing information overload.</p>
<p><strong>Tool 4: Schema and Documentation Inspector</strong> This tool fetches detailed schema information including column names, data types, constraints, and most importantly, column descriptions and business definitions when available. It pulls from both BigQuery's native schema metadata and our organization's data documentation system to provide context about what each field actually represents.</p>
<p><strong>Tool 5: Query Executor</strong> The final tool executes the SQL query with the full context from previous tools. It includes safety mechanisms like query cost estimation and result size limits based on the user's permissions.</p>
<h3 id="heading-what-makes-this-different-from-googles-mcp-toolbox">What Makes This Different From Google's MCP Toolbox</h3>
<p>Yes, Google's BigQuery MCP connector also uses multiple tools for dataset discovery, schema inspection, and query execution. But there are crucial architectural differences:</p>
<p><strong>1. User-Centric vs. Service Account-Centric Access Control</strong> Google's connector operates under a single service account's permissions - what the service account can see, everyone can see. My implementation integrates with organizational IAM at request time, providing user-specific data access without spinning up multiple MCP instances. This isn't just a feature difference; it's a fundamental security model change.</p>
<p><strong>2. Semantic vs. Alphabetical Discovery</strong> Google's tool discovery presents datasets in essentially alphabetical or chronological order. The AI has to pick from a massive list, leading to the position bias problem I mentioned earlier. My semantic search pre-filters and ranks datasets by relevance, dramatically reducing the cognitive load on the AI model.</p>
<p><strong>3. Context-Aware Tool Chaining</strong> While both approaches use multiple tools, Google's connector treats each tool call independently. My implementation maintains context across the tool chain - Tool 3 only fetches metadata for datasets identified by Tool 2, Tool 4 only inspects schemas for tables the user can actually access, etc. This reduces the total information space the AI has to process.</p>
<p><strong>4. Custom Data Governance Integration</strong> Google's connector only knows about BigQuery's native metadata. My implementation pulls from our organization's broader data governance infrastructure - business glossaries, data lineage systems, and custom documentation platforms - providing richer context for query generation.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755352349770/461b60cc-2501-4542-b12b-ab75f24b51b5.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-the-reality-check">The Reality Check</h3>
<p>Here's the honest truth that took me way too long to realize - building this taught me that MCP servers really are just fancy API orchestration wearing a protocol costume. Remember my title about expecting magic and getting glorified API endpoints? Yeah, that's exactly what happened.</p>
<p><img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTOS7CjdlORTfR9JO8pJTP8cLTAUXK2iAwdcQ&amp;s" alt="MCP... It means no worries for the rest of your days. jk. But seriously, if  you have no idea what MCP is or why you should care, check out our quick MCP" class="image--center mx-auto" /></p>
<p>The "magic" I was looking for doesn't exist in the MCP protocol itself. It's just a standardized way to wrap your APIs so AI assistants can call them. The real magic (if you can call it that) is in how thoughtfully you design those API interactions and what external systems you decide to integrate with.</p>
<p>All that complexity I was trying to avoid with Google's solution? Well, I didn't eliminate it, I just moved it around and painted it a different color. Instead of dealing with their tool selection issues, I built my own tool selection logic. Instead of managing multiple service accounts, I built complex IAM integration. The work didn't disappear; it just got redistributed to different parts of the system.</p>
<p>But here's the thing - that redistribution actually mattered for our specific use case. We needed user-specific access control and intelligent dataset discovery more than we needed simplicity. Your organization might have completely different priorities, and Google's sledgehammer approach might be exactly what you need.</p>
<p>So did I reinvent the wheel? Absolutely. Was it worth it? That depends on whether you think a wheel with better tires and custom rims is worth the engineering effort, or if you just need something that rolls.</p>
]]></content:encoded></item><item><title><![CDATA[Let's talk about Perplexity]]></title><description><![CDATA[The Generative AI race has been coupled with a rise in usage of the term “Perplexity”.

Google Trends suggests the same and most of the references in academic journals coming from the last 1 year. And no, this is not perplexity.ai. It is a metric bei...]]></description><link>https://japkeeratsingh.com/lets-talk-about-perplexity</link><guid isPermaLink="true">https://japkeeratsingh.com/lets-talk-about-perplexity</guid><category><![CDATA[confidence in llm]]></category><category><![CDATA[LLM confidence]]></category><category><![CDATA[llm evaluation]]></category><category><![CDATA[perplexity]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 02 Jan 2025 09:00:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1735792992164/e5730b69-744c-465e-baa9-1e6a32fbdbfe.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Generative AI race has been coupled with a rise in usage of the term “Perplexity”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735790906803/e6c184d1-ddba-407d-ae77-ba9f631ee96f.png" alt class="image--center mx-auto" /></p>
<p>Google Trends suggests the same and most of the references in academic journals coming from the last 1 year. And no, this is not <a target="_blank" href="https://perplexity.ai">perplexity.ai</a>. It is a metric being used in Generative AI.</p>
<h2 id="heading-how-generative-models-work">How Generative Models Work</h2>
<p>Generative models, such as those used in language modeling, operate by predicting the next token in a sequence based on the context provided by preceding tokens. This prediction is grounded in probability distributions learned during training on vast corpora of text. The process is iterative, with each token generated influencing the probabilities for the next.</p>
<p>At a high level, the model calculates the likelihood of every possible token at a given step. For example, if the context is “The cat is on the,” the model might assign the following probabilities to the next word:</p>
<ul>
<li><p>“mat”: 0.7</p>
</li>
<li><p>“chair”: 0.2</p>
</li>
<li><p>“sofa”: 0.1</p>
</li>
</ul>
<p>The token with the highest probability (“mat” in this case) is typically selected, although alternative strategies like sampling or beam search can be used to introduce diversity or explore multiple sequences.</p>
<p>The foundation of this token-generation mechanism is the softmax function, which ensures that the probabilities of all tokens sum to 1. This normalization allows the model to make probabilistic predictions and enables perplexity calculations by assessing how closely the predicted distribution aligns with the true sequence.</p>
<h4 id="heading-probability-and-context">Probability and Context</h4>
<p>The power of generative models lies in their ability to use context effectively. Context is established by analyzing the preceding tokens and constructing a vector representation of their meanings. This representation feeds into the model’s architecture—typically a transformer—to predict the next token.</p>
<p>For instance, in the sentence “The weather today is sunny and,” the model might prioritize weather-related tokens like “warm” or “hot” over unrelated ones. This contextual sensitivity enables generative models to produce coherent and contextually appropriate outputs, making them integral to applications such as chatbots, translation systems, and creative content generation.</p>
<h2 id="heading-perplexity">Perplexity</h2>
<p>Generative models are not a typical model on which you can calculate accuracy and call it a day. The only thing to work with is the probability of each token. Perplexity leverages this set of information to determine a numerical value of each generated text from a model.</p>
<p>At its core, perplexity measures how well a generative model predicts a given sequence. It is calculated using the probabilities assigned by the model to the tokens in the sequence. Lower perplexity indicates that the model has assigned higher probabilities to the correct tokens, which generally correlates with better performance.</p>
<p>The formula for perplexity can be expressed as:</p>
<p>$$\text{Perplexity} = 2^{-\frac{1}{N} \sum_{i=1}^{N} \log_2 P(w_i)}$$</p><p>where <code>P(wi)</code> is the probability of the i-th word under the model.</p>
<h3 id="heading-why-perplexity-matters">Why Perplexity Matters</h3>
<ol>
<li><p><strong>Model Comparison:</strong> Perplexity serves as a benchmark for evaluating and comparing different generative models. Lower perplexity indicates a model’s stronger ability to predict sequences accurately, offering insights into its overall performance.</p>
</li>
<li><p><strong>Training Feedback:</strong> During the training phase, perplexity acts as a critical feedback mechanism. A steadily decreasing perplexity score suggests that the model is learning effectively. Conversely, stagnation or an increase in perplexity might signal issues such as overfitting or insufficient learning.</p>
</li>
<li><p><strong>Data Quality Assessment:</strong> High perplexity on specific datasets may indicate that the data contains ambiguities or inconsistencies, prompting a closer examination of the dataset quality.</p>
</li>
<li><p><strong>Application Suitability:</strong> By measuring perplexity on task-specific data, developers can determine whether a generative model is well-suited for a particular application, such as summarization or dialogue generation.</p>
</li>
</ol>
<h3 id="heading-limitations-of-perplexity">Limitations of Perplexity</h3>
<p>While perplexity is a valuable metric, it is not without limitations:</p>
<ul>
<li><p><strong>Tokenization Dependence:</strong> Perplexity scores can vary based on the tokenization scheme used. Different tokenization methods (e.g., word-level vs. subword-level) produce varying perplexity values, complicating direct comparisons.</p>
</li>
<li><p><strong>Human Readability Disconnect:</strong> Low perplexity does not always guarantee outputs that are coherent or contextually appropriate to humans. Complementary evaluations, such as human judgment or task-specific metrics, are often necessary.</p>
</li>
<li><p><strong>Cross-Lingual Challenges:</strong> Perplexity may behave differently across languages with varying syntax and morphology, requiring tailored interpretations for multilingual models.</p>
</li>
</ul>
<h3 id="heading-is-perplexity-just-confidence">Is Perplexity Just Confidence?</h3>
<p>At a conceptual level, perplexity can be thought of as reflecting the model's confidence in its predictions. However, this confidence is not necessarily tied to correctness. A model can be "confidently wrong," assigning high probabilities to tokens that do not align with human expectations or task requirements. This duality—confidence versus correctness—is one of the reasons perplexity alone is an imperfect metric.</p>
<p>Consider an example where a model predicts the next word in the sentence: “The cat is on the.” If it assigns a high probability to “ceiling” instead of the more contextually appropriate “mat,” the perplexity might appear low, but the prediction is clearly unsuitable.</p>
<p>This disconnect highlights that perplexity measures the internal consistency of the model's probabilistic predictions rather than their human-like coherence or utility.</p>
<h3 id="heading-living-with-perplexity">Living with Perplexity</h3>
<p>Despite its limitations, perplexity remains the go-to metric for generative models. Its ease of calculation and alignment with the probabilistic nature of these models make it a convenient choice. However, developers and researchers are increasingly aware of its shortcomings. To address these, perplexity is often supplemented with human evaluations, task-specific metrics like BLEU and ROUGE, and even adversarial testing.</p>
<p>The future of generative AI evaluation likely involves a combination of metrics, balancing quantitative measures like perplexity with qualitative assessments that better capture human expectations. While perplexity may not tell the full story, it provides a crucial foundation for understanding and improving generative models.</p>
<h3 id="heading-perplexity-and-data-quality">Perplexity and Data Quality</h3>
<p>A generative model’s performance is inherently tied to the quality of the data it is trained on. High-quality input data often leads to better predictions and, consequently, lower perplexity scores. This connection suggests that when the data is robust—well-curated, comprehensive, and representative of the task at hand—perplexity can serve as a reliable indicator of model performance.</p>
<p>However, it is essential to consider a few nuances:</p>
<ol>
<li><p><strong>Reflection of Learned Patterns:</strong> Perplexity is most meaningful when the model has been trained on data that aligns well with the evaluation dataset. If the training data accurately represents the patterns in the test sequences, low perplexity indicates effective learning.</p>
</li>
<li><p><strong>Sensitivity to Outliers:</strong> Even with high-quality input, generative models can struggle with edge cases or outliers, leading to higher perplexity scores on those specific examples. While this reflects the model's difficulty, it does not necessarily undermine perplexity as a metric for the majority of cases.</p>
</li>
<li><p><strong>Scope of Utility:</strong> When data quality is high, perplexity becomes a more direct measure of how well the model captures the structure and probabilities of the language. In this scenario, perplexity may indeed be viewed as a "good metric," particularly for tasks focused on language modeling.</p>
</li>
</ol>
<h3 id="heading-tldr">TLDR</h3>
<p>Perplexity, a key metric in Generative AI, measures how well a model predicts a sequence (e.g., text) by assessing the alignment between its predicted probabilities and actual outcomes. While useful for model comparison, training feedback, and data quality assessment, perplexity has limitations: it doesn't guarantee human-like coherence, varies with tokenization schemes, and behaves differently across languages. To provide a comprehensive evaluation, perplexity is best used in conjunction with other metrics (e.g., BLEU, ROUGE, human evaluations), especially when working with high-quality, representative training data.</p>
]]></content:encoded></item><item><title><![CDATA[What I learned about PyPi from maintaining an Open-Source Package]]></title><description><![CDATA[Last month, I published a package on PyPi - TezzCrawler - a simple CLI tool to convert any website to LLM ready draft for building a RAG capabilities on any website. What spiraled next, was hundreds of hours of analysis on how PyPi works and what hap...]]></description><link>https://japkeeratsingh.com/what-i-learned-about-pypi-from-maintaining-an-open-source-package</link><guid isPermaLink="true">https://japkeeratsingh.com/what-i-learned-about-pypi-from-maintaining-an-open-source-package</guid><category><![CDATA[pypi]]></category><category><![CDATA[bigquery]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[Python]]></category><category><![CDATA[llm]]></category><category><![CDATA[Crawler]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 26 Dec 2024 09:00:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1735199068452/d2318b70-c6c5-4f38-bf62-1886266f82bc.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last month, I published a package on PyPi - <a target="_blank" href="https://pypi.org/project/TezzCrawler">TezzCrawler</a> - a simple CLI tool to convert any website to LLM ready draft for building a RAG capabilities on any website. What spiraled next, was hundreds of hours of analysis on how PyPi works and what happens in the background when you publish a package on PyPi.</p>
<h2 id="heading-the-issue-that-caused-me-to-dig-a-rabbit-hole">The issue that caused me to dig a rabbit hole</h2>
<p>Few weeks ago, I got curious on how the package is doing. Did anyone other than me even downloaded the package or it is just a piece of junk in the discarded pile of hundreds of thousands of PyPi packages. And I copied a simple python script from StackOverflow to get total download stats and got this</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735186612350/700adf14-ce74-403a-87bc-66f07b1dcb48.png" alt="Python script and final results for calculating downloads from PyPi" class="image--center mx-auto" /></p>
<p>Well, to be honest, this is a few too many. When I ran this script, I was expecting to see a number like 10 or 15. Never in my life had I thought it would be more than 3,400!</p>
<p>As you’d think, I was elated seeing the number like this. I didn’t give it much thought until after a few days. TezzCrawler works fine and gets the work done, but it is neither the only package that solves this problem, is not the fastest crawler, and most importantly, I had not even mentioned it to anyone that I have published it, nobody other than me knew it even existed in the past month. So how did a tool with 0 marketing get these many downloads? I’d sure want to know in order to replicate the results in other projects.</p>
<p>And then started a deep dive which eventually left me about $10 poorer.</p>
<h2 id="heading-pypi-stats">PyPi Stats</h2>
<p>There are a number of analytic tools where you can go and check your package’s statistics. Officially, there are 4 that are promoted but only 2 of them provided download numbers as part of their statistics (and both gave different numbers and none matched with the API provided numbers 😭)</p>
<p><strong>ClickHouse</strong>, an open source data warehousing tool, maintains a dashboard called <strong>ClickPy</strong> for analysing any Python package’s statistics. Searching on the platform gave me a cumulative download count of 2,400. A staggering 1000 difference than what the API counts are!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735187906846/a7d0b859-87c9-4f6b-81ef-05899d5aed80.png" alt="ClickPy dashboard" class="image--center mx-auto" /></p>
<p><em>Oh cool, the package is highly popular in the US, Canada, China, and Russia.</em></p>
<p>Anyways, this sparked another debate in my head. Why are the numbers so much different?</p>
<p>To get to the bottom of it once and for all, I went to <a target="_blank" href="https://pypistats.org">PypiStats</a> in the hopes that I’ll get all my answers there (since they are the ones that provided the Python API I used at the first place). But I left with more questions I started with.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735188607717/cc75bdc3-0d80-4bb8-9667-4dddce5bb86a.png" alt class="image--center mx-auto" /></p>
<p>PyPi stats doesn’t provide a cumulative download but only a daily, weekly, and monthly count. But look closely, The last month count as per PyPi Stats is 861 but ClickPy dashboard screenshot from before shows a total download count of 2,100. (WTF is that difference?)</p>
<p>Also, this brought in a new terminology as well - <strong>Mirrors</strong>. I knew what mirrors are, just didn’t expect PyPi to be using it (either I was stupid or I am stupid to not realize this earlier 🥲)</p>
<p>Here’s a key detail - anybody can boot up their own PyPi server. This server is essentially what is called a Mirror. Depending on how many of the packages from the main PyPi server are reflected onto the Mirror, there are essentially 3 segregations of mirrors - Private, Partial, or Public. Public Mirrors are exact 1-1 replica of PyPi while Private Mirrors are hosted by companies internally for the packages they allow their teams to use for development.</p>
<p>There’s a tool called <strong>Bandersnatch</strong> that allows you to replicate any Python package to a Mirror. (This information will come in handy in a minute)</p>
<p>So, the download count mentioned on PyPiStats is without Mirrors. 861 downloads from without Mirrors and 2100 downloads with Mirrors. So that means, ~1300 mirrors duplicated the package to the Mirror during the last month and 861 actual downloads happened. I got my actual download count what I was after and I should call it a day now.</p>
<p><img src="https://y.yarn.co/432ec25f-e948-494d-9d66-b322f1b4042b_text.gif" alt="But, hold on a second (meme gif)" class="image--center mx-auto" /></p>
<p>861 downloads is still a lot more than 10-15 downloads I was expecting! Did I miss something during the analysis?</p>
<h2 id="heading-the-analysis-where-i-lost-my-money">The analysis where I lost my money</h2>
<p>Okay, till now, I’ve got 3 things - anybody can create a Mirror and a sync between Mirror and main PyPi server can trigger a “download” count in PyPi stats; if someone downloads from a Mirror, it doesn’t reflect in PyPi stats; and ClickPy is a waste of a dashboard.</p>
<p>But the main quest “How many downloads did actually happen for my package?” was still unfulfilled. And at this stage, I found the holy grail - PyPi migrates all of its logs to a BigQuery dataset.</p>
<p>The BigQuery dataset of PyPi consists of 3 tables - package metadata, download events, and download request metadata. Running a really simple query on BigQuery like the below one was a little out of my budget (It was going to process 15TB of data to execute) and I didn’t want to burn my entire budget on 1 query.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">count</span>(*)
<span class="hljs-keyword">FROM</span> bigquery-<span class="hljs-keyword">public</span>-data.pypi.file_downloads
<span class="hljs-keyword">WHERE</span> file.project = <span class="hljs-string">'tezzcrawler'</span>
</code></pre>
<p>So I had to get a little creative.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">COUNT</span>(*) <span class="hljs-keyword">as</span> download_frequency,
  DATE_TRUNC(<span class="hljs-built_in">DATE</span>(<span class="hljs-built_in">timestamp</span>), <span class="hljs-keyword">MONTH</span>) <span class="hljs-keyword">AS</span> <span class="hljs-keyword">month</span>
<span class="hljs-keyword">FROM</span> bigquery-<span class="hljs-keyword">public</span>-data.pypi.file_downloads
<span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">project</span> = <span class="hljs-string">'tezzcrawler'</span>
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">month</span>
<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">month</span> <span class="hljs-keyword">DESC</span>
</code></pre>
<p>Running the query, I get… 2,408 downloads. Exactly what ClickPy tells. But wait, I can segregate on how the package was downloaded as well.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> 
  <span class="hljs-keyword">DISTINCT</span>(details.installer.name) <span class="hljs-keyword">as</span> installer,
  <span class="hljs-keyword">COUNT</span>(*) <span class="hljs-keyword">as</span> download_frequency
<span class="hljs-keyword">FROM</span> bigquery-<span class="hljs-keyword">public</span>-data.pypi.file_downloads
<span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">project</span> = <span class="hljs-string">'tezzcrawler'</span>
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> installer
<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> download_frequency <span class="hljs-keyword">DESC</span>
</code></pre>
<p><em>To save on costs while writing the article, I added a time period in all the queries to get the screenshot of the results to explain the findings. This above query, was executed on a 30 day period.</em></p>
<p>Here’s what I found on executing the above query.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735196638769/328baa10-8177-4a88-956b-5862ede8040b.png" alt class="image--center mx-auto" /></p>
<p>This is more on the lines of what I was actually expecting. The actual downloads are 39, that happened via <code>pip install</code> command. The rest are some other activities with Bandersnatch being a Mirror. My guess is some bots monitoring PyPi packages directly on the main server instead of first creating a Mirror. Unfortunately, the user agent is not tracked in the download request, only the TLS protocol is and that doesn’t really tell much about the remaining downloads.</p>
<p>So, there we have it, 39 downloads in the last month for <a target="_blank" href="https://github.com/TezzLabs/TezzCrawler">TezzCrawler</a>. The actual count, the honest number, and not what these dashboards were feeding my ego earlier. Now I can have some peace, but more importantly, <strong>I've learned a valuable lesson in the importance of data accuracy and the nuances of open-source package distribution</strong>.</p>
<p>For fellow developers and open-source enthusiasts, take away this: don't be misled by vanity metrics. Dig deeper, question the numbers, and understand the ecosystem your project operates within. It might not always be as glamorous as thousands of downloads, but <strong>the true measure of your project's impact lies in its genuine usage and the value it brings to its users</strong>. With this newfound understanding, I'm excited to focus on what really matters – improving TezzCrawler for those 39 users, and hopefully, many more to come.</p>
<hr />
<p>Every Thursday, 2:30PM IST, I’ll share 1 article directly to your mailbox. The next few articles are things I learnt while developing my own RAG Framework, which I would have definitely missed had I stuck to using pre-developed frameworks and probably spent days debugging. If this resonates with you, signup for the free newsletter.</p>
]]></content:encoded></item><item><title><![CDATA[Is the Model making right predictions? - Part 5 of 5 on Evaluation of Machine Learning Models]]></title><description><![CDATA[Preparing your dataset, especially the test set, is a crucial step in building reliable and high-performing machine learning models. Proper dataset preparation ensures that your model is not only accurate but also generalizable to new, unseen data. L...]]></description><link>https://japkeeratsingh.com/is-the-model-making-right-predictions-part-5-of-5-on-evaluation-of-machine-learning-models</link><guid isPermaLink="true">https://japkeeratsingh.com/is-the-model-making-right-predictions-part-5-of-5-on-evaluation-of-machine-learning-models</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Model Evaluation]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Mon, 23 Dec 2024 12:14:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734948188006/72271e3a-2210-4600-b497-2215f9241abe.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Preparing your dataset, especially the test set, is a crucial step in building reliable and high-performing machine learning models. Proper dataset preparation ensures that your model is not only accurate but also generalizable to new, unseen data. Let’s delve into the various techniques and best practices to achieve this effectively.</p>
<hr />
<h3 id="heading-train-test-split-the-basics"><strong>Train-Test Split: The Basics</strong></h3>
<p>The train-test split is a fundamental step in machine learning workflows. It divides your data into two parts:</p>
<ul>
<li><p><strong>Training Set</strong>: This portion is used to train the model. The model learns patterns and relationships from this data.</p>
</li>
<li><p><strong>Test Set</strong>: This part is reserved for evaluating the model’s performance. It simulates how the model will behave on unseen data.</p>
</li>
</ul>
<p>Python’s <code>scikit-learn</code> library provides a simple way to perform this split:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.3</span>, random_state=<span class="hljs-number">42</span>)
</code></pre>
<p>In this example, 30% of the data is set aside for testing (<code>test_size=0.3</code>). Setting a <code>random_state</code> ensures reproducibility. The train-test split is critical for assessing whether your model is overfitting or generalizing well.</p>
<hr />
<h3 id="heading-time-series-data-handle-with-care"><strong>Time Series Data: Handle with Care</strong></h3>
<p>Time series data requires a different approach because the order of observations carries meaningful information. Randomly shuffling the data can break the temporal patterns, leading to unreliable evaluations. For instance, testing a model on past data after training it on future data doesn’t reflect real-world scenarios.</p>
<p>When working with time series data, it’s essential to maintain the chronological order. Train the model on historical data and evaluate it on future data. This ensures that the model’s predictions are based on past trends.</p>
<p>In <code>scikit-learn</code>, the <code>TimeSeriesSplit</code> class facilitates this type of split:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=<span class="hljs-number">5</span>)
<span class="hljs-keyword">for</span> train_index, test_index <span class="hljs-keyword">in</span> tscv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
</code></pre>
<p>This method allows for multiple train-test splits while preserving the temporal order, providing a robust way to evaluate time series models.</p>
<hr />
<h3 id="heading-k-fold-cross-validation-a-comprehensive-evaluation"><strong>K-Fold Cross-Validation: A Comprehensive Evaluation</strong></h3>
<p>K-Fold Cross-Validation is an effective technique for evaluating model performance. It works by dividing the dataset into ‘k’ subsets (folds). The model is trained on ‘k-1’ folds and tested on the remaining fold. This process repeats ‘k’ times, with each fold serving as the test set once. The results are then averaged to provide an overall performance metric.</p>
<p>Here’s how to implement K-Fold Cross-Validation in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> KFold

kf = KFold(n_splits=<span class="hljs-number">5</span>, shuffle=<span class="hljs-literal">True</span>, random_state=<span class="hljs-number">42</span>)
<span class="hljs-keyword">for</span> train_index, test_index <span class="hljs-keyword">in</span> kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
</code></pre>
<p>K-Fold Cross-Validation reduces the risk of overfitting and ensures that the model is evaluated across multiple data partitions. This method is particularly useful for smaller datasets where reserving a large test set isn’t feasible.</p>
<hr />
<h3 id="heading-stratified-k-fold-fair-evaluation-for-imbalanced-data"><strong>Stratified K-Fold: Fair Evaluation for Imbalanced Data</strong></h3>
<p>When dealing with imbalanced datasets—where some classes are underrepresented—it’s important to ensure that the train and test sets have the same class distribution as the entire dataset. Stratified K-Fold Cross-Validation addresses this issue by maintaining class proportions in each fold.</p>
<p>Here’s how to implement it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> StratifiedKFold

skf = StratifiedKFold(n_splits=<span class="hljs-number">5</span>, shuffle=<span class="hljs-literal">True</span>, random_state=<span class="hljs-number">42</span>)
<span class="hljs-keyword">for</span> train_index, test_index <span class="hljs-keyword">in</span> skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
</code></pre>
<p>Stratified K-Fold ensures fair representation of all classes in both training and testing phases. This leads to more reliable and unbiased performance metrics, especially in classification tasks.</p>
<hr />
<h3 id="heading-domain-specific-splits-tailored-testing"><strong>Domain-Specific Splits: Tailored Testing</strong></h3>
<p>For certain types of data, domain-specific considerations are crucial when preparing the test set. For instance:</p>
<ul>
<li><p><strong>Spatial Data</strong>: For geospatial datasets, it’s often better to split data based on regions. For example, you might train a model on data from one geographic area and test it on another to evaluate how well the model generalizes across locations.</p>
</li>
<li><p><strong>Demographic Data</strong>: For human-centered applications, splitting data by demographic groups can highlight biases or ensure fairness. For example, testing a healthcare model separately on age groups can reveal disparities in predictions.</p>
</li>
<li><p><strong>Event-Specific Data</strong>: In event-driven datasets (e.g., sports or financial markets), splitting by events can ensure that the model is evaluated on entirely different scenarios.</p>
</li>
</ul>
<p>These approaches ensure the test set reflects real-world variability and allows for a more robust evaluation.</p>
<hr />
<h3 id="heading-data-augmentation-for-robust-evaluation"><strong>Data Augmentation for Robust Evaluation</strong></h3>
<p>Data augmentation is a technique commonly used in training, but it’s also valuable for testing. By applying transformations to the test data, you can assess the robustness of your model under different scenarios. For instance:</p>
<ul>
<li><p><strong>Computer Vision</strong>: Apply augmentations such as rotations, translations, or noise to test images and evaluate if the model’s predictions remain consistent.</p>
</li>
<li><p><strong>Natural Language Processing</strong>: Add variations like synonym replacement, misspellings, or paraphrasing to test the resilience of language models.</p>
</li>
<li><p><strong>Audio Processing</strong>: Add background noise or change the pitch in audio test data to see how well the model adapts to real-world distortions.</p>
</li>
</ul>
<p>Data augmentation during testing is particularly useful when you want to simulate challenging conditions or validate robustness beyond ideal scenarios.</p>
<hr />
<h3 id="heading-why-proper-dataset-preparation-matters"><strong>Why Proper Dataset Preparation Matters</strong></h3>
<p>Proper dataset preparation directly impacts the reliability and generalizability of your machine learning models. Here are a few reasons why it’s critical:</p>
<ol>
<li><p><strong>Avoid Overfitting</strong>: Using separate datasets for training and testing prevents the model from memorizing the data rather than learning patterns.</p>
</li>
<li><p><strong>Realistic Evaluation</strong>: Testing on unseen data mimics real-world scenarios, providing a realistic measure of the model’s performance.</p>
</li>
<li><p><strong>Fair Metrics</strong>: Techniques like Stratified K-Fold ensure that performance metrics are not biased due to class imbalances.</p>
</li>
<li><p><strong>Respect Temporal Patterns</strong>: For time series data, maintaining the order of observations leads to more meaningful evaluations.</p>
</li>
<li><p><strong>Domain Relevance</strong>: Tailored splits and augmented testing ensure the model’s performance is validated in realistic and varied conditions.</p>
</li>
</ol>
<hr />
<h3 id="heading-conclusion"><strong>Conclusion</strong></h3>
<p>Dataset preparation is not just a preliminary step—it’s a cornerstone of machine learning. By applying techniques like train-test splits, time series-specific methods, K-Fold Cross-Validation, Stratified K-Fold, domain-specific splits, and data augmentation, you can build models that are both accurate and generalizable. Investing time in preparing your data ensures that your models are reliable and ready to tackle real-world challenges with confidence.</p>
]]></content:encoded></item><item><title><![CDATA[Is the Model making right predictions? - Part 4 of 5 on Evaluation of Machine Learning Models]]></title><description><![CDATA[When it comes to evaluating regression-based machine learning models, picking the right metric is like picking the right seasoning for your dish—the wrong choice could leave a bitter taste. We have already previously covered the metrics which you can...]]></description><link>https://japkeeratsingh.com/is-the-model-making-right-predictions-part-4-of-5-on-evaluation-of-machine-learning-models</link><guid isPermaLink="true">https://japkeeratsingh.com/is-the-model-making-right-predictions-part-4-of-5-on-evaluation-of-machine-learning-models</guid><category><![CDATA[R2 metric]]></category><category><![CDATA[#Regression]]></category><category><![CDATA[evaluation metrics]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[machine learning model evaluation]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Sat, 14 Dec 2024 08:07:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734163582592/62dcede4-41cb-48fb-a44b-6d6c6ab60a0a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When it comes to evaluating regression-based machine learning models, picking the right metric is like picking the right seasoning for your dish—the wrong choice could leave a bitter taste. We have already previously covered the metrics which you can use for evaluation of classification based machine learning models. In this article, and the following one, we will solely focus on metrics for regression based machine learning models.</p>
<h2 id="heading-metrics-for-regression-models-the-basics">Metrics for Regression Models: The Basics</h2>
<p>Unlike classification models, regression models deal with predicting continuous values. This means we’re less concerned with thresholds and more focused on how far off the predictions are from the actual values.</p>
<p>Ideally, if we just want to see how far off is the prediction, we can just make a difference of actual and predicted values and call it a day. But with Machine Learning, there are no ideal scenarios. Extrapolating this idea to thousands and millions of data points, and you will sometimes get an average error rate of 0. But does that mean your model is performing way too well? Nope. It could be that the model is wrong on both the directions of the actual value equally in terms of cumulative magnitude in each direction.</p>
<p>For instance, imagine a model predicting daily temperatures where the actual temperature is 30°C. If the model predicts 40°C on one day and 20°C on another, the average error might come out to 0, but it’s clearly far from accurate! This highlights the importance of using more robust metrics to assess the performance of regression models.</p>
<p>Keeping this in mind, let's focus how we can carefully craft the metrics that work well for regression models.</p>
<h1 id="heading-mean-absolute-error">Mean Absolute Error</h1>
<p>The Mean Absolute Error (MAE) is one of the simplest yet highly interpretable metrics for evaluating regression models. MAE calculates the average of the absolute differences between the actual and predicted values, making it easy to understand and less sensitive to outliers compared to some other metrics.</p>
<h3 id="heading-formula">Formula</h3>
<p>The formula for MAE is as follows:</p>
<p>$$\text{MAE} = \frac{1}{n} \sum_{i=1}^n \lvert y_i - \hat{y}_i \rvert$$</p><p>where:</p>
<ul>
<li><p><code>n</code> is the total number of observations.</p>
</li>
<li><p><code>y</code>​ represents the true value.</p>
</li>
<li><p><code>y^</code>​ represents the predicted value.</p>
</li>
</ul>
<p>In simple terms, it is basically finding the difference, removing the sign (discarding the direction of error), sums all the errors, and averages the final value to get the mean error of the model.</p>
<h1 id="heading-mean-squared-error">Mean Squared Error</h1>
<p>MAE brings an important approach towards calculation of the error - discarding the direction of the error. MSE does exactly the same thing, with a slight change. Instead of calculating the mean, it calculates the square of the difference. This way, we again remove the direction and only emphasis is on the magnitude of the error.</p>
<h2 id="heading-formula-1">Formula</h2>
<p>The formula of MSE is</p>
<p>$$\text{MSE} = \frac{1}{n} \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2$$</p><h2 id="heading-why-mse-when-mae-exists">Why MSE when MAE exists?</h2>
<ol>
<li><p>The squaring of the error makes the algorithm sensitive to the magnitude of the error. It amplifies large errors such that even a small portion of large errors will have profound impact on the MSE value.</p>
</li>
<li><p>Due to sensitive nature of MSE, it is sensitive to outliers which isn’t the case with MAE.</p>
</li>
<li><p>Mean Absolute Error minimizes when the predictions favor the median value of the testing set while MSE minimizes when the predictions are closer to the average of the testing set.</p>
</li>
</ol>
<h1 id="heading-r-squared-r-metric">R-Squared (R²) Metric</h1>
<p>Now, let’s talk about a metric that is widely recognized and frequently used in the evaluation of regression models: R-squared (R²). While Mean Absolute Error (MAE) and Mean Squared Error (MSE) are good at measuring the magnitude of errors, R-squared takes a different approach. It gives us a measure of how well the regression model fits the data, helping us understand how much of the variance in the target variable is explained by the model.</p>
<h3 id="heading-what-does-r-squared-measure">What Does R-Squared Measure?</h3>
<p>R², also called the coefficient of determination, is essentially a percentage that tells us how well the regression model explains the variability of the target variable. A high R² indicates that the model captures most of the variance, whereas a low R² suggests that the model is missing the mark and not explaining much of the variability in the data.</p>
<p>To understand this better, let’s break it down. The idea behind R² is based on comparing two things:</p>
<ol>
<li><p><strong>The total sum of squares (TSS)</strong>: This measures how much the actual values vary from the mean of the target variable.</p>
</li>
<li><p><strong>The residual sum of squares (RSS)</strong>: This measures how much the predicted values deviate from the actual values.</p>
</li>
</ol>
<p>R² is then calculated using the formula:</p>
<p>$$R^2 = 1 - \frac{RSS}{TSS}$$</p><p>Where:</p>
<ul>
<li><strong>RSS</strong> (Residual Sum of Squares) is the sum of the squared differences between the observed values and the predicted values. It is mathematically written as</li>
</ul>
<p>$$RSS = \sum_{i=1}^n (y_i - \hat{y_i})^2$$</p><ul>
<li><strong>TSS</strong> (Total Sum of Squares) is the sum of the squared differences between the observed values and the mean of the observed values. It is mathematically written as</li>
</ul>
<p>$$TSS = \sum_{i=1}^n (y_i - \bar{y_i})^2$$</p><h3 id="heading-interpretation-of-r-squared">Interpretation of R-Squared</h3>
<ul>
<li><p><strong>R² = 1</strong>: The model perfectly fits the data. All data points fall exactly on the regression line.</p>
</li>
<li><p><strong>R² = 0</strong>: The model doesn’t explain any of the variance in the data. Essentially, it’s no better than predicting the mean of the target variable for all instances.</p>
</li>
<li><p><strong>R² &lt; 0</strong>: This happens when the model performs worse than a simple horizontal line at the mean of the target variable. This indicates that the model is a poor fit, and in some cases, even overfitting or underfitting the data.</p>
</li>
</ul>
<h3 id="heading-the-pros-and-cons-of-r-squared">The Pros and Cons of R-Squared</h3>
<h4 id="heading-pros">Pros:</h4>
<ol>
<li><p><strong>Easy to Interpret</strong>: R² is easy to explain to stakeholders because it represents a percentage of the variance explained by the model.</p>
</li>
<li><p><strong>Model Comparison</strong>: It helps in comparing different models for the same dataset. A higher R² value typically indicates a better model fit.</p>
</li>
</ol>
<h4 id="heading-cons">Cons:</h4>
<ol>
<li><p><strong>Not Robust to Overfitting</strong>: A higher R² doesn’t always mean a better model. It can sometimes mislead when dealing with overfitting, as a model might perfectly fit the training data but fail to generalize well to unseen data.</p>
</li>
<li><p><strong>Doesn't Handle Non-linearity Well</strong>: R² assumes a linear relationship between the independent variables and the dependent variable. It may not be appropriate for complex models that don't exhibit a linear pattern.</p>
</li>
<li><p><strong>Insensitive to Changes in Data</strong>: Since R² is based on the sum of squares, small changes in the data can sometimes cause large variations in the R² value, especially when the data is noisy.</p>
</li>
</ol>
<h3 id="heading-adjusted-r-squared-a-better-alternative">Adjusted R-Squared: A Better Alternative?</h3>
<p>While R² is useful, it has a significant drawback: it always increases when you add more variables to the model, even if those variables aren’t actually contributing useful information. This means that R² can give you an inflated sense of the model's quality when you're working with multiple features.</p>
<p>To address this issue, <strong>Adjusted R-squared</strong> comes into play. Adjusted R² adjusts the statistic based on the number of predictors in the model. It is particularly useful when comparing models with different numbers of features.</p>
<p>The formula for Adjusted R² is:</p>
<p>$$Adj R^2 = 1 - (\frac{(1-R^2)(n-1)}{n-p-1})$$</p><p>Where:</p>
<ul>
<li><p><strong>n</strong> is the number of data points.</p>
</li>
<li><p><strong>p</strong> is the number of independent variables (predictors).</p>
</li>
</ul>
<h3 id="heading-when-to-use-r-squared">When to Use R-Squared?</h3>
<p>R² is most effective when you're dealing with linear regression models and are trying to evaluate the model's ability to capture the relationship between input features and the target variable. It can also be useful when comparing models on the same dataset. However, it's important to remember that a high R² doesn’t always indicate a good model, especially if the model is overfitting the data or fails to generalize well to new data.</p>
<p>In summary, R² is a valuable metric in regression analysis, but it should always be considered alongside other metrics like MAE and MSE, as well as techniques like cross-validation, to get a fuller picture of model performance.</p>
]]></content:encoded></item><item><title><![CDATA[Is the Model making right predictions? - Part 3 of 5 on Evaluation of Machine Learning Models]]></title><description><![CDATA[When it comes to evaluating machine learning models, picking the right metric is like picking the right outfit for an occasion—it can make or break your impression. Sure, accuracy and precision-recall are great, but they sometimes fall short. AUC-ROC...]]></description><link>https://japkeeratsingh.com/is-the-model-making-right-predictions-part-3-of-5-on-evaluation-of-machine-learning-models</link><guid isPermaLink="true">https://japkeeratsingh.com/is-the-model-making-right-predictions-part-3-of-5-on-evaluation-of-machine-learning-models</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[machine learning models]]></category><category><![CDATA[machine learning model evaluation]]></category><category><![CDATA[AUCROCCurve]]></category><category><![CDATA[evaluation metrics]]></category><category><![CDATA[Model Evaluation]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Wed, 11 Dec 2024 13:49:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733924615199/6d228ce0-0acd-410c-bdc0-44ef5e409363.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When it comes to evaluating machine learning models, picking the right metric is like picking the right outfit for an occasion—it can make or break your impression. Sure, <a target="_blank" href="https://japkeeratsingh.com/is-the-model-making-right-predictions-part-1-of-5-on-evaluation-of-machine-learning-models">accuracy</a> and <a target="_blank" href="https://japkeeratsingh.com/is-the-model-making-right-predictions-part-2-of-5-on-evaluation-of-machine-learning-models">precision-recall</a> are great, but they sometimes fall short. <strong>AUC-ROC Curve</strong> is that metric which tells a deeper story than other metrics.</p>
<p>Most Machine Learning models, even the classification ones, calculate probability of each class and have a predefined decision boundary of 0.5. It means, any probability greater than 0.5 would mean a positive class and less than 0.5 is a negative class. This may look good in theory but it is not practical. There are scenarios where you need models to be extremely sure when making a positive classification (again, the rare disease classification example from the previous article) and AUC ROC Curve makes a whole lot of sense to find the decision boundary.</p>
<h2 id="heading-auc-roc-curve-a-quick-overview">AUC-ROC Curve: A Quick Overview</h2>
<p>Let’s break it down:</p>
<ul>
<li><p><strong>ROC</strong> stands for Receiver Operating Characteristic. Fancy name, but all it means is a graph that shows how well your model separates the positive and negative classes as you tweak the decision threshold.</p>
</li>
<li><p><strong>AUC</strong> stands for Area Under the Curve. This is the number that summarizes the ROC curve into one handy score. (Remember area under the curve concept while learning integration which many question why do we learn this and your teacher couldn’t probably explain you that? Yeah, that area under the curve concept)</p>
</li>
</ul>
<p>In plain English, the ROC curve tells you how good your model is at making the right calls, while the AUC is the gold star rating—higher is better.</p>
<h2 id="heading-why-should-you-care-about-the-auc-roc-curve">Why Should You Care About the AUC-ROC Curve?</h2>
<p>Let’s face it: <a target="_blank" href="https://japkeeratsingh.com/is-the-model-making-right-predictions-part-1-of-5-on-evaluation-of-machine-learning-models">accuracy</a> can be a real jerk sometimes. It looks good on paper but doesn’t always tell you the full story. Imagine a dataset where 99% of the cases are “No” and 1% are “Yes.” A model that just says “No” all the time scores a 99% accuracy. Impressive? Not really.</p>
<p>The AUC-ROC Curve doesn’t fall for such tricks. It doesn’t just focus on getting answers right—it checks if your model can <em>tell the difference</em> between the two classes. It’s like asking, “Does this model have good instincts?”</p>
<h2 id="heading-breaking-down-the-roc-curve">Breaking Down the ROC Curve</h2>
<p>Here’s the gist:</p>
<ol>
<li><strong>True Positive Rate (TPR)</strong>, also known as <strong>Recall</strong>.This measures how good the model is at catching actual positives.</li>
</ol>
<p>$$TPR = \frac{\text{TP}}{\text{TP} + \text{FN}}$$</p><p><strong>False Positive Rate (FPR)</strong>: This checks how often the model cries wolf when there isn’t one.</p>
<p>$$FPR = \frac{\text{FP}}{\text{FP} + \text{TN}}$$</p><p>At every threshold, you plot these values to create the ROC curve. A perfect model would hit the top-left corner of the graph (TPR = 1, FPR = 0), which means it’s nailing all positives and ignoring all negatives. The AUC quantifies this—higher is better!</p>
<h2 id="heading-so-how-do-you-use-it">So, How Do You Use It?</h2>
<p>You didn’t come here just for theory, right? Let’s get our hands dirty with some code. We’ll use Python and the trusty <code>Scikit-learn</code> library to show how it’s done.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> make_classification
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> RandomForestClassifier
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_curve, roc_auc_score
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt

<span class="hljs-comment"># Generate some synthetic data</span>
X, y = make_classification(n_samples=<span class="hljs-number">1000</span>, n_classes=<span class="hljs-number">2</span>, weights=[<span class="hljs-number">0.9</span>, <span class="hljs-number">0.1</span>], random_state=<span class="hljs-number">42</span>)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.3</span>, random_state=<span class="hljs-number">42</span>)

<span class="hljs-comment"># Train a model</span>
model = RandomForestClassifier()
model.fit(X_train, y_train)

<span class="hljs-comment"># Get predicted probabilities</span>
y_scores = model.predict_proba(X_test)[:, <span class="hljs-number">1</span>]

<span class="hljs-comment"># Calculate the ROC curve</span>
fpr, tpr, thresholds = roc_curve(y_test, y_scores)

<span class="hljs-comment"># Calculate AUC</span>
auc = roc_auc_score(y_test, y_scores)
print(<span class="hljs-string">f"AUC Score: <span class="hljs-subst">{auc:<span class="hljs-number">.2</span>f}</span>"</span>)

<span class="hljs-comment"># Plot the ROC curve</span>
plt.figure(figsize=(<span class="hljs-number">8</span>, <span class="hljs-number">6</span>))
plt.plot(fpr, tpr, label=<span class="hljs-string">f"ROC Curve (AUC = <span class="hljs-subst">{auc:<span class="hljs-number">.2</span>f}</span>)"</span>)
plt.plot([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>], <span class="hljs-string">'k--'</span>, label=<span class="hljs-string">"Random Guess"</span>)
plt.xlabel(<span class="hljs-string">"False Positive Rate"</span>)
plt.ylabel(<span class="hljs-string">"True Positive Rate"</span>)
plt.title(<span class="hljs-string">"ROC Curve"</span>)
plt.legend()
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733924133007/076f70cb-1284-482c-8a04-062843c6715c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-what-do-the-results-mean">What Do the Results Mean?</h2>
<ol>
<li><p><strong>AUC Score</strong>:</p>
<ul>
<li><p><strong>1.0</strong>: Your model’s a rockstar!</p>
</li>
<li><p><strong>0.5</strong>: Your model’s as good as flipping a coin.</p>
</li>
<li><p><strong>&lt; 0.5</strong>: Uh-oh, something’s seriously wrong.</p>
</li>
</ul>
</li>
<li><p><strong>ROC Curve</strong>:</p>
<ul>
<li><p>The closer it hugs the top-left corner, the better.</p>
</li>
<li><p>A straight diagonal line? That’s random guessing.</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-when-to-use-auc-roc-vs-precision-recall">When to Use AUC-ROC vs. Precision-Recall</h2>
<p>Here’s a quick tip: AUC-ROC is great for balanced datasets. But when your data is heavily imbalanced (like fraud detection or rare disease diagnosis), <a target="_blank" href="https://japkeeratsingh.com/is-the-model-making-right-predictions-part-2-of-5-on-evaluation-of-machine-learning-models">precision-recall</a> often give more meaningful insights. Why choose one when you can compare both?</p>
<p>The <strong>AUC-ROC Curve</strong> <em>can</em> be used for imbalanced datasets, but it’s not always the best tool. Here’s why:</p>
<ul>
<li><p><strong>Balanced Datasets</strong>: The ROC curve works well because both the True Positive Rate (TPR) and False Positive Rate (FPR) are meaningful metrics. The model's ability to distinguish between classes is clear and reliable.</p>
</li>
<li><p><strong>Imbalanced Datasets</strong>: When there’s a severe imbalance, the False Positive Rate (FPR) can become misleading. This is because the negative class dominates, making the FPR very small even for a model that’s just guessing. In these cases, the <strong>Precision-Recall (PR) curve</strong> becomes more informative since it focuses on the positive class (which is usually the minority class in imbalanced datasets).</p>
</li>
</ul>
<h2 id="heading-using-the-roc-curve-to-set-decision-boundaries">Using the ROC Curve to Set Decision Boundaries</h2>
<p>The AUC-ROC Curve isn’t just about evaluating performance; it’s also a handy tool for finding the optimal decision boundary for your model.</p>
<p>By default, many binary classifiers use <strong>0.5</strong> as the threshold for assigning a positive or negative class. However, this one-size-fits-all approach might not work in all cases. For instance:</p>
<ul>
<li><p>In <strong>medical diagnostics</strong>, false negatives (missing an actual disease) can be catastrophic. You might want to set a threshold closer to <strong>0.3</strong> or <strong>0.4</strong> to ensure fewer false negatives, even if it means slightly more false positives.</p>
</li>
<li><p>In <strong>spam detection</strong>, false positives (marking a legitimate email as spam) are annoying. You might prefer a threshold closer to <strong>0.7</strong> or <strong>0.8</strong> to minimize those errors.</p>
</li>
</ul>
<h3 id="heading-how-the-roc-curve-helps">How the ROC Curve Helps</h3>
<p>The ROC curve gives you a visual way to assess these trade-offs:</p>
<ol>
<li><p>At different thresholds, calculate the True Positive Rate (TPR) and False Positive Rate (FPR).</p>
</li>
<li><p>Choose the threshold where the balance between TPR and FPR aligns with your business goals.</p>
</li>
</ol>
<p>For example:</p>
<ul>
<li><p>A steeper curve at the top-left corner means the model achieves high TPR with minimal FPR. This is a good spot to consider your threshold.</p>
</li>
<li><p>A threshold closer to <strong>0.8</strong> may prioritize precision over recall, useful for high-stakes scenarios where false positives are costly.</p>
</li>
</ul>
<h3 id="heading-how-to-find-the-optimal-decision-boundary-from-tpr-amp-fpr">How to find the optimal decision boundary from TPR &amp; FPR?</h3>
<p>You can calculate <strong>Youden’s Index</strong> for the same. It is just another fancy term but it basically is a difference between TPR &amp; FPR. Wherever this index is highest, that’s where if you use a decision boundary, you’ll get the best results.</p>
]]></content:encoded></item><item><title><![CDATA[Is the Model making right predictions? - Part 2 of 5 on Evaluation of Machine Learning Models]]></title><description><![CDATA[We have already discussed accuracy as a metric, its limitations and confusion matrix in the previous post in the series. This post will cover the metrics that we can derive from confusion matrix and how they serve as a better alternative than looking...]]></description><link>https://japkeeratsingh.com/is-the-model-making-right-predictions-part-2-of-5-on-evaluation-of-machine-learning-models</link><guid isPermaLink="true">https://japkeeratsingh.com/is-the-model-making-right-predictions-part-2-of-5-on-evaluation-of-machine-learning-models</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[evaluation metrics]]></category><category><![CDATA[Precision]]></category><category><![CDATA[recall]]></category><category><![CDATA[f1 score]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Mon, 09 Dec 2024 15:16:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1732101787636/1689fe54-c82a-4ed7-ae9f-c59903fc1a84.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We have already discussed <a target="_blank" href="https://japkeeratsingh.com/is-the-model-making-right-predictions-part-1-of-5-on-evaluation-of-machine-learning-models">accuracy as a metric, its limitations and confusion matrix in the previous post in the series</a>. This post will cover the metrics that we can derive from confusion matrix and how they serve as a better alternative than looking at accuracy as a metric for classification problems.</p>
<p>There are a certain terminologies that each cell of this confusion matrix gets. To understand the terminology, we need to redefine <strong>Class A</strong> and <strong>Class B</strong> from the previous example to <strong>Positive</strong> and <strong>Negative</strong>. This would mean our matrix would now look something like this</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>Positive</strong></td><td><strong>Negative</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Positive</strong></td><td>45</td><td>5</td></tr>
<tr>
<td><strong>Negative</strong></td><td>12</td><td>38</td></tr>
</tbody>
</table>
</div><p>When the actual label is positive and the predicted one is positive as well, that scenario is called a <strong>True Positive (TP)</strong>. Similarly, when the actual label is negative and the predicted one is negative as well, that scenario is called a <strong>True Negative (TN)</strong>.</p>
<p>The matrix will now look like this</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>Positive</strong></td><td><strong>Negative</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Positive</strong></td><td>True Positive (TP)</td><td>-</td></tr>
<tr>
<td><strong>Negative</strong></td><td>-</td><td>True Negative (TN)</td></tr>
</tbody>
</table>
</div><p>When the prediction is negative but the actual output is positive, the scenario will be called a <strong>False Negative (FN)</strong> and similarly, when the actual is negative but prediction is positive, the scenario becomes a <strong>False Positive (FP)</strong>.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>Positive</strong></td><td><strong>Negative</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Positive</strong></td><td>True Positive (TP)</td><td>False Negative (FN)</td></tr>
<tr>
<td><strong>Negative</strong></td><td>False Positive (FP)</td><td>True Negative (TN)</td></tr>
</tbody>
</table>
</div><p>Now, the False Positive is sometimes referred to as <strong>Type I Error</strong> and False Negative is referred to as <strong>Type II Error</strong>. Why Type I and Type II Errors? It will be discussed separately as it is a large topic of its own.</p>
<p>With terminologies completed, let’s derive a few metrics that we can use.</p>
<h2 id="heading-precision">Precision</h2>
<p>Precision as a metric explains how precise the model is when predicting a positive output. Meaning, whenever the machine learning model predicted a positive output, how many times was it indeed positive.</p>
<p>Mathematically, it can be written as</p>
<p>$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$</p><p>Simple, right?</p>
<p>Precision comes in handy during development of the machine learning models for which being correct when making a positive prediction is extremely important. One particular example comes in when building a machine learning model for rare disease identification.</p>
<p>Let’s say you build a model for which you get</p>
<ul>
<li><p>True Positives = 50</p>
</li>
<li><p>False Positives = 150</p>
</li>
<li><p>True Negative = 9750</p>
</li>
<li><p>False Negative = 50</p>
</li>
</ul>
<p>Going by these numbers, we get an accuracy of 98%. A really great number, isn’t it? But looking at the precision, we are only correct 25% times we say the person is positive for a certain disease.</p>
<h2 id="heading-recall">Recall</h2>
<p>Recall basically tells the ratio of how many positive examples the model can detect.</p>
<p>Mathematically, you can write it as</p>
<p>$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$</p><p>Looking at the same example of rare disease identification, we see that the model identifies only 50% of the positive samples. Meaning, 50% of the people that are actually ill will have a negative test report which means they will not be able to get treatment on time for the disease. We don’t want that to happen at all.</p>
<hr />
<p><em>A little detour as this concept is going to come in great detail later in the series but a basic idea is required to understand the next metric.</em></p>
<p><em>When we develop a machine learning model, we usually perform hyperparameter tuning to identify the right set of hyperparameters which gives the best results. To do so in an automated fashion, we can make a simpler algorithm if we try to optimize for a single metric.</em></p>
<hr />
<h2 id="heading-f1-score">F1 Score</h2>
<p>From the detour, you have gotten the gist of what this metric is. It combines both Precision &amp; Recall to a single metric. It is highly useful for scenarios where both Precision &amp; Recall need to be optimized.</p>
<p>F1 Score is a harmonic mean of both Precision &amp; Recall. If you don’t know what harmonic mean is, it’s this formula:</p>
<p>$$\frac{2}{\text{F1 Score}} = \frac{1}{\text{Precision}} + \frac{1}{\text{Recall}}$$</p><p>For the same use case as above, if we put in precision &amp; recall, we get an F1 score of 33.3%.</p>
]]></content:encoded></item><item><title><![CDATA[Is the Model making right predictions? - Part 1 of 5 on Evaluation of Machine Learning Models]]></title><description><![CDATA[A student has exams after their training is done. So does the model. There are certain algorithms (now here we don’t mean machine learning models) that are used depending on the problem you have trained the machine learning model for. This is an extr...]]></description><link>https://japkeeratsingh.com/is-the-model-making-right-predictions-part-1-of-5-on-evaluation-of-machine-learning-models</link><guid isPermaLink="true">https://japkeeratsingh.com/is-the-model-making-right-predictions-part-1-of-5-on-evaluation-of-machine-learning-models</guid><category><![CDATA[machine learning model evaluation]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[introduction to machine learning]]></category><category><![CDATA[machine learning models]]></category><category><![CDATA[Machine Learning algorithm]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Sat, 12 Oct 2024 12:32:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1728734984908/7036409b-4461-4a55-ab51-93c5cf34c13c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A student has exams after their training is done. So does the model. There are certain algorithms (now here we don’t mean machine learning models) that are used depending on the problem you have trained the machine learning model for. This is an extremely important concept which most courses that you’ll find give the least weight to and therefore, it is one of the earliest topics of this series.</p>
<p>Before jumping right into the algorithms, we first need to discuss one more idea - preparing the data for testing. This is a topic that should be and would be discussed in a separate post in an excruciating detail. For now, what we’'ll do is take the example dataset from the previous post of this series and segregate it into 2 datasets - one will be used for training and another for testing. This way, we have a dataset for which we know the actual answers and can easily compare the output of the machine learning models we develop.</p>
<p>To do so, we’ll again take use of the <code>scikit-learn</code> library and use a function called <code>train-test-split</code> that does exactly what we described before.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.3</span>)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
</code></pre>
<p>The test size of 0.3 means that 30% of the data from the input training data should be reserved for testing purposes. This way, we have the actual output that model should give stored in the variable <code>y_test</code> and the output that the model have in <code>y_pred</code>.</p>
<p>Another important thing to understand is to answer 2 core questions -</p>
<ol>
<li><p>Why do we evaluate the model?</p>
</li>
<li><p>What insights do we need to collect from a model in order to make better decisions?</p>
</li>
<li><p>How do we use the metric to optimize the model? (This will be covered in a separate post later in the series)</p>
</li>
</ol>
<p>So all the evaluation algorithms that we are going to look at, we will try to answer these 3 questions. Let’s begin with the evaluation of classification models.</p>
<h3 id="heading-accuracy">Accuracy</h3>
<p>Perhaps one of the most straight forward metric. Let’s say you took an MCQ test that had 100 questions. You answered, 74 correct. Your accuracy is 74%. This is a calculation that all of us have intuitively been doing whenever we see ratios.</p>
<p>In really simple terms, Accuracy is defined as how many times you were correct divided by how many attempts you made. In terms of Machine Learning, you calculate accuracy by how many times the output of the model was correct divided by how many times the model was used.</p>
<p>We can calculate accuracy of our machine learning models using <code>accuracy_score</code> function from the <code>metrics</code> module of the library.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score

accuracy = accuracy_score(y_test, y_pred)
</code></pre>
<p>Now, if Accuracy would be the best metric, it would have been the only metric to exist and this blog post would be over right here and I’d probably go to sleep instead of writing this at 11:00PM on a Saturday. But alas, it isn’t.</p>
<p>Let’s take 2 examples here -</p>
<p>Example A:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Actual</td><td>Predicted</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>0</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>0</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>0</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>0</td></tr>
</tbody>
</table>
</div><p>In this case, there are 9 correct predictions and 1 wrong output. This means, the model is 90% accurate.</p>
<p>Example B:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Actual</td><td>Predicted</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>0</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
<tr>
<td>1</td><td>1</td></tr>
</tbody>
</table>
</div><p>In this case as well, the accuracy of the model is 90%. But if you look closely, it has wrongly predicted every time the model should have predicted 0. Now this is a scenario that does occur in a lot of problems (credit card fraud identification, for instance) where there are very few examples of a certain class during training and testing the model, due to which the model gets biased to a certain class if not trained properly and carefully. This means, for any problem with an imbalance between classes, accuracy is a wrong metric to use.</p>
<p>There is one more challenge with using accuracy as the source of truth when working with a multi-class classification problem. It will not help you identify, at the minute level, if the model is confusing two of the classes, or is biased towards a single class or is straight up guessing and got lucky.</p>
<p>All these challenges mean that we need to look into a bit more sophisticated evaluation algorithms that do a bit more than just provide a number as a output.</p>
<p>These challenges, lead us to a stepping stone towards the solution - Confusion Matrix.</p>
<h3 id="heading-confusion-matrix">Confusion Matrix</h3>
<p>It is not an evaluation metric, something that should be cleared at the start itself. It is something that you can use alongside the primary metric to answer the question “is my model getting confused between two classes?” while using the primary metric to optimize the model.</p>
<p>A confusion matrix can look something like this:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>Class A</td><td>Class B</td></tr>
</thead>
<tbody>
<tr>
<td>Class A</td><td>45</td><td>5</td></tr>
<tr>
<td>Class B</td><td>12</td><td>38</td></tr>
</tbody>
</table>
</div><p>Rows of the confusion matrix are the actual classes while the columns are the predicted classes. To interpret the above matrix, there were 45 instances where the model predicted output as ‘A’ and it indeed was ‘A’ while 5 times it predicted ‘B’ while it was actually ‘A’. Similarly, 12 times model predicted ‘A’ while it was ‘B’ and 38 times the model correctly predicted ‘B’.</p>
<p>Similar to how you get accuracy in <code>scikit-learn</code>, there is a function for confusion matrix that you can use to get the matrix.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> confusion_matrix

cm = confusion_matrix(y_true, y_pred)

print(cm)
</code></pre>
<p>This matrix addresses one of the two core challenges of accuracy and allows you to understand if the model is confusing two classes. However, it doesn’t yet solve for the imbalance dataset issue for which accuracy is a big challenge.</p>
<p>That’s where we get 3 metrics - Precision, Recall, and F1 Score - all of which will be discussed in the next article of the series.</p>
]]></content:encoded></item><item><title><![CDATA[Introduction to Machine Learning]]></title><description><![CDATA[Let’s take an example. You are designing an automation script to perform a particular activity. To develop an automation, you first need the data and the well defined set of rules to follow to reach the desired output.

In Machine Learning, things ar...]]></description><link>https://japkeeratsingh.com/introduction-to-machine-learning</link><guid isPermaLink="true">https://japkeeratsingh.com/introduction-to-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[introduction to machine learning]]></category><category><![CDATA[machine learning for beginners]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Sat, 12 Oct 2024 09:06:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1728723836706/f1766d73-067f-4c53-841a-604fab7541a5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let’s take an example. You are designing an automation script to perform a particular activity. To develop an automation, you first need the data and the well defined set of rules to follow to reach the desired output.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728723299487/4da2f22d-d844-42d3-8b69-db9358d28f5b.jpeg" alt class="image--center mx-auto" /></p>
<p>In <a target="_blank" href="https://arc.net/l/quote/alauteez">Machine Learning</a>, things are a little different. You don’t make the rules. You know what the input is, you know what the output is but don’t really know what operations helped in reaching the specific output.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728723311275/4c37f8c5-9a16-4c4e-bdbc-1a89166863f6.jpeg" alt class="image--center mx-auto" /></p>
<p>Let’s take a very simple example to start with.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Input</td><td>Output</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>2</td></tr>
<tr>
<td>2</td><td>4</td></tr>
<tr>
<td>3</td><td>8</td></tr>
<tr>
<td>4</td><td>16</td></tr>
<tr>
<td>5</td><td>32</td></tr>
<tr>
<td>6</td><td>64</td></tr>
</tbody>
</table>
</div><p>In this example, you have an input and an output. If I ask you what would be the output when the input is 7, it is pretty straight forward for you to determine the operation here. It is 2^n where n is the input number. We want to emulate the same pattern finding abilities to a computer with Machine Learning so that it can find patterns that even we cannot.</p>
<p>This is an a subset of data from a real world problem being solved with Machine Learning. Try to find a pattern, you have 10 seconds.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Input A</td><td>Input B</td><td>Input C</td><td>Input D</td><td>Output</td></tr>
</thead>
<tbody>
<tr>
<td>30</td><td>70000</td><td>12000</td><td>11.11</td><td>0</td></tr>
<tr>
<td>22</td><td>33000</td><td>10000</td><td>11.12</td><td>1</td></tr>
<tr>
<td>22</td><td>56000</td><td>4000</td><td>13.35</td><td>0</td></tr>
<tr>
<td>25</td><td>25000</td><td>3500</td><td>13.49</td><td>1</td></tr>
<tr>
<td>37</td><td>35000</td><td>6000</td><td>11.49</td><td>0</td></tr>
</tbody>
</table>
</div><p>Couldn’t find one, right? Well it is just 5 examples and only 4 input values. Most datasets are well over 100,000 examples and can span anywhere from 5 to 500 columns (no hard limit on the columns, just stating a number to make a point).</p>
<p>To solve for such problems, we take use of Machine Learning. The concepts involving Machine Learning go both wide and deep. At every turn of applying Machine Learning, you need to make a decision - which algorithm to use?, how do I check if the algorithm gave a good output?, what operations should I do on the data to make it easier for the algorithm to learn? - just the 3 of the questions you’d need answer to before you design a solution. There are tens of such questions if not a hundred. And the number of questions keep on increasing as you gain experience.</p>
<p>If you have read my previous posts, you’d know how much I emphasize a top-down approach of learning. However, it can’t be used here today. We need to touch base on 2 concepts, before we come back to the top-down approach for the remainder of this series. Please bear with me on this.</p>
<h2 id="heading-concept-1-types-of-machine-learning-algorithms">Concept 1 - Types of Machine Learning Algorithms</h2>
<p>Primarily, there are 2 types of Machine Learning - <a target="_blank" href="https://arc.net/l/quote/sepjmskf">Supervised</a> and <a target="_blank" href="https://arc.net/l/quote/nrvaxizk">Unsupervised</a>. There are more, but let’s not go into that rabbit hole at the start itself.</p>
<p>In Supervised Machine Learning, you know both the input and the output but don’t know how to reach the output from input and the algorithms that you use to solve for this task fall in the category of Supervised Machine Learning.</p>
<p>In Unsupervised Machine Learning however, you don’t have the output either! Let’s take a real world example, something I’ve worked on before to understand what the heck goes into unsupervised machine learning. For any particular application, sending personalized notification is extremely important. However, it is not really feasible to write personalized notifications for each user (at least wasn’t before the ChatGPT era). So you use the information you have for the users and club them into “N” groups and write a notification for each group. Here, you don’t necessarily have predefined groups. you want the algorithm to self determine the groups in which each user should be.</p>
<h2 id="heading-concept-2-problem-types-solved-with-machine-learning">Concept 2 - Problem Types solved with Machine Learning</h2>
<p>Under Supervised Machine Learning, we have 2 problem types - Classification and Regression. Under Unsupervised, we have Clustering (the same example we looked at before).</p>
<p>These are self explanatory problems. For Classification problems, we develop an algorithm that is able to classify a certain data point to one of the many classes the algorithm is designed for. A very famous example you’ll come across when starting with Machine Learning is a Dog vs Cat classifier where in you try to build an algorithm that can classify images of a pet into 2 categories - Dog or a cat. Regression problems, on the other hand, are the ones where the algorithm tries to predict a continuous value instead of a discrete class. An algorithm that estimates the temperature of tomorrow would fall in this category of the problem statement.</p>
<p>Clustering problems, a part of the unsupervised machine learning, requires you to make groups of the data that you have. A prominent example of this has already been discussed before.</p>
<h2 id="heading-time-to-get-hands-dirty">Time to get hands dirty</h2>
<p>We’ll start with a toy dataset.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Input</strong></td><td><strong>Output</strong></td></tr>
</thead>
<tbody>
<tr>
<td>9</td><td>0</td></tr>
<tr>
<td>12</td><td>1</td></tr>
<tr>
<td>13</td><td>1</td></tr>
<tr>
<td>4</td><td>0</td></tr>
<tr>
<td>6</td><td>0</td></tr>
<tr>
<td>11</td><td>1</td></tr>
<tr>
<td>7</td><td>0</td></tr>
</tbody>
</table>
</div><p>This dataset is relatively easy. By just looking at the pattern, it is easy to identify that any input below 10 is getting mapped to 0 while anything larger than 10 is getting mapped to 1.</p>
<p>Let’s implement a really simple script in python.</p>
<p>To do so, we’ll leverage a library called <code>scikit-learn</code> which you can install by running the following command:</p>
<pre><code class="lang-bash">pip install scikit-learn
</code></pre>
<p>Scikit Learn is the most popular library for Machine Learning and implements a vast majority of algorithms with optimizations and various other concepts that we’ll cover in the later topics.</p>
<p>First, let’s get the data ready:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

X = np.array([<span class="hljs-number">9</span>, <span class="hljs-number">12</span>, <span class="hljs-number">13</span>, <span class="hljs-number">4</span>, <span class="hljs-number">6</span>, <span class="hljs-number">11</span>, <span class="hljs-number">7</span>]).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
y = np.array([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>])
</code></pre>
<p>This creates <code>NumPy</code> array of both the input and the output.</p>
<p>Next, we’ll implement a Logistic Regression algorithm (don’t worry, we’ll cover it) from <code>scikit-learn</code>.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LogisticRegression

model = LogisticRegression()

model.fit(X, y)
</code></pre>
<p>That’s basically it. You have implemented the first algorithm. [From now on, we’ll call Machine Learning algorithms as 'models’ to be more in line with Machine Learning terminologies]</p>
<p>In Machine Learning terms, when we feed existing data to the model, we are essentially training it. To train a model, all the models have a <code>.fit()</code> function where we pass our existing data.</p>
<p>So what we have done here is essentially trained the Logistic Regression algorithm on the data that we previously generated.</p>
<p>Now, to get predictions on the new set of data, you can just do:</p>
<pre><code class="lang-python">y_test = np.array([<span class="hljs-number">8</span>, <span class="hljs-number">18</span>])

y_predictions = model.predict(y_test)

print(y_predictions)
<span class="hljs-comment"># [0, 1]</span>
</code></pre>
<p>And that’s it. We have our first model, trained and doing predictions!</p>
<hr />
<p><em>If this was helpful, you can follow the blog or bookmark the series and I’ll be sharing 1-2 articles every week on this series.</em></p>
]]></content:encoded></item><item><title><![CDATA[A Beginner's Guide to Ollama]]></title><description><![CDATA[Ollama is an open-source framework designed to facilitate the local execution of Large Language Models (LLMs) such as LLaMa and others. It allows you to run these models directly on your machines, providing a secure and customizable environment witho...]]></description><link>https://japkeeratsingh.com/a-beginners-guide-to-ollama</link><guid isPermaLink="true">https://japkeeratsingh.com/a-beginners-guide-to-ollama</guid><category><![CDATA[llm]]></category><category><![CDATA[ollama]]></category><category><![CDATA[LLaMa]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 19 Sep 2024 09:00:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1726496290257/6a833a59-3021-4d8d-ad9f-3dd1e7f0ce03.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ollama is an open-source framework designed to facilitate the local execution of Large Language Models (LLMs) such as LLaMa and others. It allows you to run these models directly on your machines, providing a secure and customizable environment without relying on cloud services.</p>
<p>In this article, we’ll cover how to use Ollama. In the next, we will discuss the intricacies of Ollama to the depth that no other blog has gone before.</p>
<h1 id="heading-building-your-own-chatbot-with-ollama">Building your own Chatbot with Ollama</h1>
<h2 id="heading-step-1-get-the-server-running">Step 1 - Get the Server running</h2>
<p>You will first need to install Ollama server. The process varies a bit for the Operating System you are using so I’ll list all 3 here.</p>
<p><strong>MacOS</strong></p>
<p>There are essentially 2 different ways for you to get started with Ollama on Mac. We’ll of course cover the easiest one. Installation with <code>brew</code>. Running the following command will install the Ollama server on your system.</p>
<pre><code class="lang-bash">brew install ollama
</code></pre>
<p><em>I told you it would be really easy to do so.</em></p>
<p><strong>Linux</strong></p>
<p>To install Ollama on linux, just copy the following command.</p>
<pre><code class="lang-bash">curl -fsSL https://ollama.com/install.sh | sh
</code></pre>
<p><strong>Windows</strong></p>
<p>On windows, Ollama is still in preview but you can download .exe of Ollama from <a target="_blank" href="https://ollama.com/download/windows">https://ollama.com/download/windows</a> and that will install Ollama on your system.</p>
<h2 id="heading-step-2-testing-if-it-works">Step 2 - Testing if it works</h2>
<p>On Windows, you have to just open the .exe you downloaded to start the server. For the rest, we can start the server with the command</p>
<pre><code class="lang-bash">ollama serve
</code></pre>
<p>This will start the server on the machine itself.</p>
<p>Note that at this stage, the server doesn’t have access to any LLM model. To download the model onto the server, you’ll need to use a different command.</p>
<h2 id="heading-step-3-downloading-the-model">Step 3 - Downloading the Model</h2>
<p>Downloading the model is pretty straight forward. Knowing which one to download, is not. There are 116 different LLMs that Ollama supports at the time of writing this and this is excluding different variants of each of the model.</p>
<p>You can explore the entire repository of models that Ollama supports at <a target="_blank" href="https://ollama.com/library">https://ollama.com/library</a></p>
<p>For the purpose of this blog, I’ll stick to Qwen2 model. Not because it is better or anything, it has one variant which is just 350MBs and one of the few models that I can run on my laptop itself without renting out a VM.</p>
<p>To download, open a separate terminal while keeping the server running (pro-tip, you can also keep the server running in the background by adding <code>&amp;</code> to the end of the command and keep using the same terminal for next steps) and execute the following step.</p>
<pre><code class="lang-bash">ollama pull qwen2:0.5b-instruct
</code></pre>
<p>It will download the Qwen2 model with a variant that has 0.5 billion parameters. You can in same way download llama3.1, phi3, or any other model that you find that suits your needs.</p>
<h2 id="heading-step-4-interact-with-the-model">Step 4 - Interact with the model</h2>
<p>We can interact with the model directly from the command line itself. To prompt a model with the prompt “Who is the Prime Minister of India?”, you will need to run,</p>
<pre><code class="lang-bash">ollama run qwen2:0.5b-instruct <span class="hljs-string">"Who is the Prime Minister of India?"</span>
</code></pre>
<p>This command will first load the model onto the GPU or the CPU, whichever is available and prompt the model with your query. Internally, it is making an API call to “/api/generate” endpoint which you will be able to see on the logs it will print on the terminal alongside the response.</p>
<h2 id="heading-step-5-building-your-chat-app-with-this">Step 5 - Building your Chat app with this</h2>
<p>To do so, we’ll use Ollama’s Python client (there is also a JavaScript one) and use Streamlit to build the interface for the chat application.</p>
<p>Installing the dependencies.</p>
<pre><code class="lang-bash">pip install ollama streamlit
</code></pre>
<p>Next, just copy paste this code.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> streamlit <span class="hljs-keyword">as</span> st
<span class="hljs-keyword">from</span> ollama <span class="hljs-keyword">import</span> Client

client = Client(host=<span class="hljs-string">"http://localhost:11434"</span>)

st.title(<span class="hljs-string">"Hey, it's just like ChatGPT, but free!"</span>)

<span class="hljs-comment"># Initialize chat history</span>
<span class="hljs-keyword">if</span> <span class="hljs-string">"messages"</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> st.session_state:
    st.session_state.messages = []

<span class="hljs-comment"># Display chat messages from history on app rerun</span>
<span class="hljs-keyword">for</span> message <span class="hljs-keyword">in</span> st.session_state.messages:
    <span class="hljs-keyword">with</span> st.chat_message(message[<span class="hljs-string">"role"</span>]):
        st.markdown(message[<span class="hljs-string">"content"</span>])

<span class="hljs-comment"># Accept user input</span>
<span class="hljs-keyword">if</span> prompt := st.chat_input(<span class="hljs-string">"What is up?"</span>):
    <span class="hljs-comment"># Add user message to chat history</span>
    st.session_state.messages.append({<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: prompt})

    <span class="hljs-comment"># Display user message in chat message container</span>
    <span class="hljs-keyword">with</span> st.chat_message(<span class="hljs-string">"user"</span>):
        st.markdown(prompt)

    <span class="hljs-comment"># Generate and display assistant response (example)</span>
    response = client.chat(model=<span class="hljs-string">"qwen2:0.5b-instruct"</span>, messages=st.session_state.messages)[<span class="hljs-string">"message"</span>][<span class="hljs-string">"content"</span>]
    <span class="hljs-keyword">with</span> st.chat_message(<span class="hljs-string">"assistant"</span>):
        st.markdown(response)

    <span class="hljs-comment"># Add assistant response to chat history</span>
    st.session_state.messages.append({<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: response})
</code></pre>
<p><em>I too generated this code with Perplexity and just added the model name</em> 😂 <em>If you are new to Streamlit, there is</em> <a target="_blank" href="https://www.youtube.com/playlist?list=PLa6CNrvKM5QU7AjAS90zCMIwi9RTFNIIW"><em>this wonderful playlist</em></a> <em>you can check.</em></p>
<p>Now, just execute the script with</p>
<pre><code class="lang-bash">streamlit run main.py
</code></pre>
<p>And you’ll instantly have a chat interface like below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726495644114/31a68003-995a-4c8a-b23c-9fbd4079fa79.png" alt class="image--center mx-auto" /></p>
<hr />
<p>And there we have it. Our own AI chatbot!</p>
<p>In the next article next week, we will be discussing Ollama in much more detail. I’ll share everything I learnt during my last two weeks reading Ollama codebase (which, trust me I have spent hours scratching the internet for, you’ll not find anywhere else). So for that, please follow and subscribe to the newsletter.</p>
<p>Until then, have a healthy and happy week.</p>
]]></content:encoded></item><item><title><![CDATA[Unidirectional is *not* the only way...]]></title><description><![CDATA[Recurrent Neural Networks, Long Short Term Memory, and Gated Recurrent Units, all three have one similarity. They all are unidirectional. It means, context is created by only and only considering the past. However, some use cases benefit to have cont...]]></description><link>https://japkeeratsingh.com/unidirectional-is-not-the-only-way</link><guid isPermaLink="true">https://japkeeratsingh.com/unidirectional-is-not-the-only-way</guid><category><![CDATA[bidirectional]]></category><category><![CDATA[RNN]]></category><category><![CDATA[nlp]]></category><category><![CDATA[natural language processing]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 15 Feb 2024 09:00:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1707538881676/bb37e519-d439-4229-9b56-41cd94d1c260.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://hashnode.com/post/clrbmsras000108l1akvuh2u6">Recurrent Neural Networks</a>, <a target="_blank" href="https://hashnode.com/post/clrbn5xe8000709iadvc72f6p">Long Short Term Memory</a>, and <a target="_blank" href="https://hashnode.com/post/clrize5bu000209ju8gfo3ow7">Gated Recurrent Units</a>, all three have one similarity. They all are unidirectional. It means, context is created by only and only considering the past. However, some use cases benefit to have context of both past and future to estimate the present. For instance, machine translation is one of the NLP use case that heavily benefits from the context built from both past and the future. The neural networks like this, are called Bidirectional.</p>
<p><em>While we are discussing the use cases, you might want to hold and think of one case where you would never use Bidirectional Neural Networks and comment the same.</em></p>
<p><img src="https://media.geeksforgeeks.org/wp-content/uploads/20230302163012/Bidirectional-Recurrent-Neural-Network-2.png" alt="Bidirectional Recurrent Neural Network - GeeksforGeeks" /></p>
<p>How BiRNN work (or essentially any Bidirectional network works) is simple. There are 2 different RNN layers processing data from either direction and the output is then merged together to form the final output.</p>
<p>BiRNNs and other bidirectional networks are extremely complex, computationally speaking. This makes them harder to train, require more memory, and takes considerably more time to train. However, they do usually perform better in terms of unidirectional networks in terms of accuracy and also can handle variable length sequences better.</p>
]]></content:encoded></item><item><title><![CDATA[Attention!]]></title><description><![CDATA[Recurrent Neural Networks have "Attention Deficiency". Came along LSTMs with their ability to store information for long. But, as they say, all good things have an even better alternatives, came Attention.
Attention Mechanism is a way to provide the ...]]></description><link>https://japkeeratsingh.com/attention</link><guid isPermaLink="true">https://japkeeratsingh.com/attention</guid><category><![CDATA[natural language processing]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[attention-mechanism]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[RNN]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 25 Jan 2024 09:00:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705745928340/656a7af0-2901-48c4-a609-33d916f3a3af.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://hashnode.com/post/clrbmsras000108l1akvuh2u6">Recurrent Neural Networks have "Attention Deficiency"</a>. Came along <a target="_blank" href="https://hashnode.com/post/clrbn5xe8000709iadvc72f6p">LSTMs</a> with their ability to store information for long. But, as they say, all good things have an even better alternatives, came Attention.</p>
<p>Attention Mechanism is a way to provide the model with the information of which aspects of the training data are more useful for making prediction. If you could, try to compare the Attention Mechanism with that of your brain. When reading something, do you put equal emphasis on each word or more emphasis on the keywords? Or when you've just watched a movie and tell the story to someone else, do you narrate a whole 3 hour script or a summary of the most-important bits? It's the same case with the Attention Mechanism. It focuses on the most important bits of information.</p>
<h1 id="heading-where-does-attention-finds-its-use">Where does Attention finds it's use?</h1>
<p>The primary use-case for Attention Mechanism is in the Natural Language Processing tasks. And it also happens to be the reason this was invented. Translation of sentences from one language to another often poses a challenge of remembering the context for a longer periods of time and Attention enables it.</p>
<p>However, lately, it has found its use in the Computer Vision tasks as well. Enabling the model to focus on the pixels that add the more value to the final prediction. We'll discuss more on it when we cover the Computer Vision topics.</p>
<h1 id="heading-how-does-attention-even-work">How does Attention even work?</h1>
<p>The core idea behind an attention mechanism is relatively straightforward: it allows a model to automatically focus on the most relevant parts of the input for making a decision or prediction.</p>
<p>Each part of the input has its individual weight. These weights are called <strong>attention weights</strong> and basically tell the model how much focus each part of the input deserves. Now, it depends on the kind of Attention Mechanism used (more on it in just a sec) but the gist is the same.</p>
<p>These weights make up a vector representation of the input called the <strong>context vector</strong>. This vector is used by the model to identify the part of the inputs to focus more on.</p>
<p>There are 2 primary forms of Attention - Global and Local. It is self-explanatory what it means. Global assigns attention weights to the entirety of the input while the Local assigns attention weights to the specific parts of the input. There are many more, and each deserves its own article. Perhaps once a few more concepts are covered first.</p>
<p>(I am skipping over the architecture for now. Will bring it up once the Transformers are covered in this series.)</p>
]]></content:encoded></item><item><title><![CDATA[GRUs are lit 🔥 But why so little traction?]]></title><description><![CDATA[LSTMs are popular because it solved the problem of vanishing gradient with RNNs. But so do Gated Recurrent Units (GRUs). On top of it, GRUs also have less parameters to learn compared to LSTM. And yet, GRUs are not used as much in the industry as LST...]]></description><link>https://japkeeratsingh.com/grus-are-lit-but-why-so-little-traction</link><guid isPermaLink="true">https://japkeeratsingh.com/grus-are-lit-but-why-so-little-traction</guid><category><![CDATA[gated recurrent unit]]></category><category><![CDATA[nlp]]></category><category><![CDATA[natural language processing]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[gru]]></category><category><![CDATA[RNN]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Thu, 18 Jan 2024 09:00:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705152516782/f64d7f34-4005-48df-b03e-aad97d322eb0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://hashnode.com/post/clrbn5xe8000709iadvc72f6p">LSTMs</a> are popular because it solved the <a target="_blank" href="https://hashnode.com/post/clrbmsras000108l1akvuh2u6">problem of vanishing gradient with RNNs</a>. But so do Gated Recurrent Units (GRUs). On top of it, GRUs also have less parameters to learn compared to LSTM. And yet, GRUs are not used as much in the industry as LSTMs. Almost nobody ever mentions of GRUs when building a course on NLP. Look at the Google Trends of these 2 terms.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705130534319/f9e8a59f-31a3-43a4-86f1-aa524f81f253.png" alt class="image--center mx-auto" /></p>
<p>An astonishing difference, right? In this post let's learn what GRUs are, how they learn, and what is the reason behind GRUs not being the goto algorithm for NLP.</p>
<h1 id="heading-what-are-grus-and-how-does-it-learn">What are GRUs and how does it learn?</h1>
<p><img src="https://i.imgflip.com/8c7z0p.jpg" alt="Gru's Plan Meme | Let's use a neural network for sequence modeling! We'll use GRUs because... Have fewer parameters than LSTMs; But wait, how do they handle long-term dependencies? | image tagged in memes,gru's plan | made w/ Imgflip meme maker" class="image--center mx-auto" /></p>
<p>Gated Recurrent Units (GRUs) were designed with one purpose. To give comparable performance of LSTM while reducing the number of trainable parameters. And they do it very effectively. And this is how they achieve it.</p>
<p>Contrary to LSTMs (it's also counterintuitive), output of a GRU is only 1 value. If you have checked <a target="_blank" href="https://hashnode.com/post/clrbn5xe8000709iadvc72f6p">my previous post on LSTMs</a>, LSTMs tend to have 2 outputs, one aligned with the short term memory and another the long term memory. So the question remains, how does it maintain the context for long?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705165668789/1f29f232-2f5c-43f4-8cb6-e80292482e30.jpeg" alt="Architecture of GRU" class="image--center mx-auto" /></p>
<p>GRUs simplify the LSTM design with two gates: a <strong>reset gate</strong> and an <strong>update gate</strong>. These gates decide what information should be passed to the output. The update gate helps the model determine the amount of past information (from previous time steps) that needs to be passed along to the future. This is akin to how much of the long-term memory should be used in the current state. The reset gate, on the other hand, decides how much past information to forget. This functionality is somewhat similar to the forget gate in LSTMs, but GRUs combine this with the input gate into a single update gate to make the model more efficient.</p>
<p>Despite having only one output, GRUs are able to maintain the context for long sequences through a clever balancing act performed by these gates. They modulate the flow of information inside the unit without separate memory cells, which are present in LSTMs. This allows GRUs to still capture dependencies from large spans of time, deciding at each step what to keep from the past and what new information to add. By adapting these gates' settings at each step in the sequence, GRUs can keep track of long-term dependencies, thus enabling them to maintain context and perform various sequence modeling tasks effectively.</p>
<h1 id="heading-why-arent-they-used-as-much-as-lstms">Why aren't they used as much as LSTMs?</h1>
<p>GRUs for the most part, give a close fight to LSTM in terms of performance. The final decision of which one to use lies on the dataset and the problem at hand. It's not written on stone but from experience, when dealing with datasets with smaller sequence lengths, GRUs outperform LSTMs and the inverse happens when the sequence length starts to increase. This means, LSTMs are better when it comes to remembering context for a longer time compared to GRUs.</p>
<p>Secondly, GRUs have been around for far less time than LSTMs. GRUs were introduced only in 2014. Coincidentally, Attention Mechanism (we'll discuss this in next week's post) was also introduced around the same time. Attention brought most of engineer's and researchers attention towards it, overshadowing GRUs.</p>
<p>Just to make it clear, it is not like GRUs are not at all used, it's just the use-cases that the industry is focussed on right now, fit LSTMs and Attention Mechanism more. For instance, chatbots. The need to remember the context of conversation for a long time is extremely important. And LSTMs do it better than GRUs.</p>
]]></content:encoded></item><item><title><![CDATA[How LSTMs architecture solves the problem created by RNNs]]></title><description><![CDATA[Recurrent Neural Networks had a problem - vanishing gradient problem. Why the problem exists is better discussed in the previous article. In a brief, the vanishing gradient problem implies that the context of the sentence is forgotten about too quick...]]></description><link>https://japkeeratsingh.com/how-lstms-architecture-solves-the-problem-created-by-rnns</link><guid isPermaLink="true">https://japkeeratsingh.com/how-lstms-architecture-solves-the-problem-created-by-rnns</guid><category><![CDATA[LSTM]]></category><category><![CDATA[natural language processing]]></category><category><![CDATA[nlp]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Wed, 10 Jan 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705126553114/9f3ab114-e196-4667-bb28-a41e0e0b2b5c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://japkeerat.hashnode.dev/a-beginners-guide-to-recurrent-neural-networks-part-1-of-2">Recurrent Neural Networks</a> had a problem - vanishing gradient problem. Why the problem exists is better discussed in <a target="_blank" href="https://japkeerat.hashnode.dev/a-beginners-guide-to-recurrent-neural-networks-part-1-of-2">the previous article</a>. In a brief, the vanishing gradient problem implies that the context of the sentence is forgotten about too quickly.</p>
<p>To fix the problem of Vanishing Gradient, Long-Short Term Memory (LSTM) Networks were introduced. LSTM maintains the context in 2 different ways - Long Term Memory and Short Term Memory. Both have a different purpose.</p>
<p>Before diving into the architecture and representation of LSTMs, it is important to understand the concept of Short Term and Long Term Memory and what exactly does it do.</p>
<p><strong>Short Term Memory</strong> to put it simply, keeps the latest information in the memory that is important for making predictions.</p>
<p><strong>Long Term Memory</strong> is used to keep important information in context while making prediction for a long time. Basically, it is a persistent storage of all the important keywords in the data.</p>
<p>Here’s an extremely rough approximation of how it looks.</p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9adad9d4-aabf-4cc2-8e45-396f8b30b30f_320x391.png" alt="Rough Approximate Representation of LSTM Network" class="image--center mx-auto" /></p>
<h1 id="heading-how-lstms-do-the-magic">How LSTMs do the magic?</h1>
<p>This overwhelming diagram below is a representation of a cell in LSTM.</p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5816d3-4c64-4177-aac0-63cfd08e65ab_1190x407.png" alt="LSTM Unit" /></p>
<p>The line on top, the green one, that is what is called <strong>Cell State</strong> which essentially is the Long Term Memory.</p>
<p>The line on bottom, the pink one, that is the <strong>Hidden State</strong> which, if you are following along could guess, is Short Term Memory.</p>
<p>Slowing down for a second. We were discussing the RNN problems. How does the Cell State and Hidden State solve it? Well, if you look closely, there is no weight to Cell State. Meaning, there is absolutely no chance of gradient to explode or vanish, something that is possible in the case of RNNs.</p>
<p>Moving on, let’s label the diagram for better understanding.</p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9af45f-64bb-479a-8ebb-22537529b0e1_1190x407.png" alt="Boxes labelled diagram for the LSTM Unit" /></p>
<p>The box 1 is a way to determine the impact of input and the short term memory on the long term memory. It controls how much of the long term memory needs to be remembered. Hence, it is also known as <strong>Forget Gate</strong>.</p>
<p>The box 2 and 3 determines how much of the input to this unit should be remembered for the long time. Box 3 is calculating the potential long term memory for this input while box 2 is responsible for determining the “how much” part. Important to note, the only difference in both the boxes is the activation function. These 2 boxes make up what is called <strong>Input Gate</strong>.</p>
<p>Box 4 and 5 are responsible for calculating the Short Term Memory for the next LSTM unit. Box 5 is where the calculation of potential short term memory takes place which is done using the Long Term Memory and not the input and the “how much” aspect is calculated by box 4. These boxes make up what is called <strong>Output Gate</strong>.</p>
<p>So with the use of Input Gate, Forget Gate, Output Gate, Cell State, and Hidden State (that's a lot!), LSTMs solve the problem of vanishing gradient with Recurrent Neural Networks.</p>
]]></content:encoded></item><item><title><![CDATA[A Beginner's Guide to Recurrent Neural Networks (Part 2 of 2)]]></title><description><![CDATA[By the end of this tutorial, you’ll have this.

Let’s get started.
For the purpose of this tutorial, I am using Netflix Stock Price Data available on Kaggle and only choose to predict the Daily High of the stock price.
Step 1 - Preprocess the Data
fr...]]></description><link>https://japkeeratsingh.com/a-beginners-guide-to-recurrent-neural-networks-part-2-of-2</link><guid isPermaLink="true">https://japkeeratsingh.com/a-beginners-guide-to-recurrent-neural-networks-part-2-of-2</guid><category><![CDATA[RNN]]></category><category><![CDATA[recurrent neural network]]></category><category><![CDATA[natural language processing]]></category><category><![CDATA[nlp]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Wed, 03 Jan 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705126128164/d42d99d5-b189-469a-96cf-53ea41b8ed46.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By the end of this tutorial, you’ll have this.</p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1213d224-0e86-46e3-a0ff-b1df3cacbb03_1000x600.jpeg" alt /></p>
<p>Let’s get started.</p>
<p>For the purpose of this tutorial, I am using <a target="_blank" href="https://www.kaggle.com/datasets/jainilcoder/netflix-stock-price-prediction/">Netflix Stock Price Data available on Kaggle</a> and only choose to predict the Daily High of the stock price.</p>
<p><strong>Step 1 - Preprocess the Data</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> MinMaxScaler

high_data = data[<span class="hljs-string">"High"</span>].values.reshape(<span class="hljs-number">-1</span>,<span class="hljs-number">1</span>)

scaler = MinMaxScaler()
scaled_high_data = scaler.fit_transform(high_data)
</code></pre>
<p>The code snippet above does 2 things.</p>
<p>First, it selects only the “High” column data and converts it into a NumPy Array. Then it changes its shape. Understand this from the example.</p>
<pre><code class="lang-python">data[<span class="hljs-string">"High"</span>].values
</code></pre>
<p>This part of the code is responsible for converting the “High” column data to a NumPy Array. It will give the output like</p>
<pre><code class="lang-plaintext">[10,20,30,40,50]
</code></pre>
<p>Then, <code>.reshape(-1,1)</code> converts this array into a two-dimensional array where in each dimension, you have only 1 element. The output of the above array now looks like this</p>
<pre><code class="lang-plaintext">[[10],
 [20],
 [30],
 [40], 
 [50]]
</code></pre>
<p><strong>Step 2 - Preparing the Dataset</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_dataset</span>(<span class="hljs-params">dataset, time_step=<span class="hljs-number">1</span></span>):</span>
    X, Y = [], []
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(dataset)-time_step<span class="hljs-number">-1</span>):
        a = dataset[i:(i+time_step), <span class="hljs-number">0</span>]
        X.append(a)
        Y.append(dataset[i + time_step, <span class="hljs-number">0</span>])
    <span class="hljs-keyword">return</span> np.array(X), np.array(Y)


X, y = create_dataset(scaled_high_data)

X = X.reshape(X.shape[<span class="hljs-number">0</span>], X.shape[<span class="hljs-number">1</span>], <span class="hljs-number">1</span>)
</code></pre>
<p>In Step 2 of this tutorial, we're preparing our dataset for the RNN model.</p>
<p>The <code>`create_dataset` </code> function is key here. It takes our scaled stock price data and creates sequences for training. Each sequence consists of consecutive days' 'High' prices, determined by <code>time_step</code>. If `<code>time_step` </code> is 1, we use today's price to predict tomorrow's. The function goes through the entire dataset, creating these sequences (<code>X</code>) and their next day's price (<code>Y</code>). Finally, we reshape <code>X</code> to fit the RNN's expected input shape, making it ready for the neural network to process.</p>
<p>This step is crucial as it aligns our data with the way RNNs learn temporal patterns.</p>
<p><strong>Step 3 - Keras Model</strong></p>
<pre><code class="lang-python">model = Sequential()

model.add(SimpleRNN(units=<span class="hljs-number">50</span>, activation=<span class="hljs-string">"relu"</span>, input_shape=(<span class="hljs-number">1</span>,<span class="hljs-number">1</span>)))
model.add(Dense(units=<span class="hljs-number">1</span>))

model.compile(optimizer=<span class="hljs-string">"adam"</span>, loss=<span class="hljs-string">"mean_squared_error"</span>)

model.fit(X, y, epochs=<span class="hljs-number">2</span>, batch_size=<span class="hljs-number">16</span>, verbose=<span class="hljs-number">1</span>)
</code></pre>
<p>The process of developing a model with Keras begins by initializing a <code>Sequential</code> model, which allows us to stack layers linearly. Within this model, we add a <code>SimpleRNN</code> layer with 50 units. The activation function 'relu' helps the model learn non-linear patterns. The <code>input_shape</code> is set according to our preprocessed data. Next, we add a <code>Dense</code> layer with a single unit. This is our output layer that will predict the next day's 'High' value.</p>
<p>The model is compiled with the 'adam' optimizer and 'mean_squared_error' loss function, both standard choices for regression problems. Finally, we fit the model to our prepared data (<code>X</code> and <code>y</code>). The <code>epochs</code> parameter controls how many times the model will see the entire dataset, and <code>batch_size</code> determines how many data points the model sees before updating its internal parameters. With these steps, our model is ready to learn from the data and make predictions.</p>
<p>And that’s it, you are ready to run the code and build your own Stock Price Prediction Model.</p>
]]></content:encoded></item><item><title><![CDATA[A Beginner's Guide to Recurrent Neural Networks (Part 1 of 2)]]></title><description><![CDATA[Imagine you're watching a movie. Each scene in the movie is connected to the previous one, right? You understand the story because you remember what happened before. Recurrent Neural Networks (RNNs) work in a similar way. They are a type of Neural Ne...]]></description><link>https://japkeeratsingh.com/a-beginners-guide-to-recurrent-neural-networks-part-1-of-2</link><guid isPermaLink="true">https://japkeeratsingh.com/a-beginners-guide-to-recurrent-neural-networks-part-1-of-2</guid><category><![CDATA[nlp]]></category><category><![CDATA[natural language processing]]></category><category><![CDATA[RNN]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[recurrent neural network]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Wed, 27 Dec 2023 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705125894902/db8b7f87-9ee7-4b2f-a737-7ab49cc870d6.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine you're watching a movie. Each scene in the movie is connected to the previous one, right? You understand the story because you remember what happened before. Recurrent Neural Networks (RNNs) work in a similar way. They are a type of Neural Network that remembers past information and uses it to understand new data. Just like you use your memory of the previous scenes to understand the current scene in a movie, RNNs use information from the past to make sense of new information.</p>
<h1 id="heading-towards-intuitive-understanding">Towards Intuitive Understanding</h1>
<p><img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3685f05c-b51d-495c-b03e-da879c9834d0_817x182.png" alt="Rolled Representation of a Recurrent Neural Network" /></p>
<p>The above diagram is a high-level representation of 1 neuron of an RNN. Unlike a simple neural network, it has a feedback loop. Output from the Activation Function is fed back to the input and then summed together before sending it again to the activation function.</p>
<p>I’ll be honest, this representation always confused me. Coming from a programming point of view, I was confused, wouldn’t this mean we are always stuck in the loop? The following representation actually cleared the confusion.</p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb087b14-8010-4c87-90b4-3b6d837362ee_819x751.png" alt="Unrolled Representation of a Recurrent Neural Network" /></p>
<p>Notice anything different in this representation?</p>
<p>The feedback loop basically affects the next data point in the sequence, not the same one. Input <code>X1</code> output from the activation function is multiplied with Weight <code>W2</code> and summed with Input <code>X2</code> before sending it to the activation function. This process continues till forever.</p>
<h1 id="heading-the-curse-in-disguise">The Curse in Disguise</h1>
<p>Take a good look at the unrolled representation of the RNN once more. Especially the feedback loop. The same thing that makes RNNs remember historical context are also its curse (and the basis of further advancements in NLP and Sequence Data Processing).</p>
<p>Let’s consider 2 examples - one where W2 is too small and another where W2 is too large. I’ll keep it intuitive. Mathematically, we’ll discuss in a separate post.</p>
<h2 id="heading-too-small-w2">Too Small W2</h2>
<p>If the weight of the feedback loop is too small (for the argument sake, let’s say 0.1) the impact of X1 on the prediction of ŷ5 (for instance) would be negligible. Why? Because X1 got multiplied with a tiny value of 0.1 so many times that it has now almost approached 0 (X1x0.1x0.1x0.1x0.1 = X1x0.0001). So no matter what X1 is, it will be only able to provide 0.01% of it to ŷ5.</p>
<p>This problem means that the historical information is not well preserved when making the predictions for a sequence that occurs after a few data points have been processed.</p>
<p>This problem has a name - <strong>Vanishing Gradient</strong>.</p>
<h2 id="heading-too-large-w2">Too Large W2</h2>
<p>If the weight of the feedback loop is too large (anything above 1, so for the sake of argument, let’s say 2) the impact of X1 on the prediction of ŷ5 would overshadow the impact of X2 which in turn would overshadow X3 and so on. Meaning, older the history, more the impact it would have on the predictions. Recent events would have next to no impact on the output.</p>
<p>This problem too has a name - <strong>Exploding Gradient</strong>.</p>
<h1 id="heading-tldr">TLDR;</h1>
<ol>
<li><p>Recurrent Neural Networks are a type of Neural Networks that uses historical information to predict next data point in the sequence.</p>
</li>
<li><p>RNNs are cursed with the exact thing that makes them unique. If the weight of feedback loop is too small, the gradient vanishes (meaning, RNN would hardly remember the past) and if the weight of feedback loop is too large, the gradient explodes (meaning, RNN would live in the ancient history, present and recent history would have less impact in predictions).</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Glossary - Machine Learning]]></title><description><![CDATA[Machine Learning - Machine learning is a way for computers to learn from data and make decisions or predictions without being explicitly programmed for each task. Instead of following fixed rules, the computer analyzes patterns in the data to improve...]]></description><link>https://japkeeratsingh.com/glossary-machine-learning</link><guid isPermaLink="true">https://japkeeratsingh.com/glossary-machine-learning</guid><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Japkeerat Singh]]></dc:creator><pubDate>Sat, 31 Dec 2022 18:30:00 GMT</pubDate><content:encoded><![CDATA[<ol>
<li><p><strong>Machine Learning</strong> - Machine learning is a way for computers to learn from data and make decisions or predictions without being explicitly programmed for each task. Instead of following fixed rules, the computer analyzes patterns in the data to improve its performance over time. Think of it like teaching a child by showing them examples, and they learn to recognize or predict things on their own.</p>
</li>
<li><p><strong>Supervised Machine Learning</strong> - Supervised machine learning is a type of machine learning where a computer learns from labeled data. This means that the data used for training includes both the input (features) and the correct output (labels). The computer analyzes these examples to learn the relationship between inputs and outputs so it can make predictions on new, unseen data. It's like teaching a student with answer keys: when they study the examples with the correct answers, they learn how to solve similar problems in the future.</p>
</li>
<li><p><strong>Unsupervised Machine Learning</strong> - Unsupervised machine learning is a type of machine learning where a computer learns from data that doesn't have labeled answers. Instead of being given the correct output, the computer looks for patterns and relationships in the data on its own. It’s like exploring a new city without a map—you're trying to find groups of similar places or discover hidden patterns without anyone telling you what to look for. Common tasks include clustering similar items together or reducing the dimensions of data to find simpler representations.</p>
</li>
</ol>
]]></content:encoded></item></channel></rss>