LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!
As LLMs continue to evolve, they are becoming smaller and smarter, enabling them to run directly on your phone. Take, for instance, the DeepSeek R1 Distil Qwen 2.5 with 1.5 billion parameters, this model really shows how advanced AI can now fit into the palm of your hand!
In this blog, we will guide you through creating a mobile app that allows you to chat with these powerful models locally. The complete code for this tutorial is available in our EdgeLLM repository. If you've ever felt overwhelmed by the complexity of open-source projects, fear not! Inspired by the Pocket Pal app, we will help you build a straightforward React Native application that downloads LLMs from the Hugging Face hub, ensuring everything remains private and runs on your device. We will utilize llama.rn
, a binding for llama.cpp
, to load GGUF files efficiently!
Why You Should Follow This Tutorial?
This tutorial is designed for anyone who:
- Is interested in integrating AI into mobile applications
- Wants to create a conversational app compatible with both Android and iOS using React Native
- Seeks to develop privacy-focused AI applications that operate entirely offline
By the end of this guide, you will have a fully functional app that allows you to interact with your favorite models.
0. Choosing the Right Models
Before we dive into building our app, let's talk about which models work well on mobile devices and what to consider when selecting them.
Model Size Considerations
When running LLMs on mobile devices, size matters significantly:
- Small models (1-3B parameters): Ideal for most mobile devices, offering good performance with minimal latency
- Medium models (4-7B parameters): Work well on newer high-end devices but may cause slowdowns on older phones
- Large models (8B+ parameters): Generally too resource-intensive for most mobile devices, but can be used if quantized to low precision formats like Q2_K or Q4_K_M
GGUF Quantization Formats
When downloading GGUF models, you'll encounter various quantization formats. Understanding these can help you choose the right balance between model size and performance:
Legacy Quants (Q4_0, Q4_1, Q8_0)
- Basic, straightforward quantization methods
- Each block is stored with:
• Quantized values (the compressed weights).
• One (_0) or two (_1) scaling constants. - Fast but less efficient than newer methods => not used widely anymore
K-Quants (Q3_K_S, Q5_K_M, ...)
- Introduced in this PR
- Smarter bit allocation than legacy quants
- The K in “K-quants” refers to a mixed quantization format, meaning some layers get more bits for better accuracy.
- Suffixes like _XS, _S, or _M refer to specific mixes of quantization (smaller = more compression), for example :
• Q3_K_S uses Q3_K for all tensors
• Q3_K_M uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, and Q3_K for the rest.
• Q3_K_L uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, and Q5_K for the rest.
I-Quants (IQ2_XXS, IQ3_S, ...)
- It still uses the block-based quantization, but with some new features inspired by QuIP
- Smaller file sizes but may be slower on some hardware
- Best for devices with strong compute power but limited memory
Recommended Models to Try
Here are some models that perform well on mobile devices:
Finding More Models
To find additional GGUF models on Hugging Face:
- Visit huggingface.co/models
- Use the search filters:
- Visit the GGUF models page
- Specify the size of the model in the search bar
- Look for "chat" or "instruct" in the name for conversational models
When selecting a model, consider both the parameter count and the quantization level. For example, a 7B model with Q2_K quantization might run better than a 2B model with Q8_0 quantization. So if you can fit a small model comfortably on your device try to use a bigger quantized model instead, it might have a better performance.
1. Setting Up Your Environment
React Native is a popular framework for building mobile applications using JavaScript and React. It allows developers to create apps that run on both Android and iOS platforms while sharing a significant amount of code, which speeds up the development process and reduces maintenance efforts.
Before you can start coding with React Native, you need to set up your environment properly.
Tools You Need
Node.js: Node.js is a JavaScript runtime that allows you to run JavaScript code. It is essential for managing packages and dependencies in your React Native project. You can install it from Node.js downloads.
react-native-community/cli: This command installs the React Native command line interface (CLI), which provides tools to create, build, and manage your React Native projects. Run the following command to install it:
npm i @react-native-community/cli
Virtual Device Setup
To run your app during development, you will need an emulator or a simulator:
If you are on macOS:
If you are on Windows or Linux:
- For iOS: We need to rely on cloud-based simulators like LambdaTest and BrowserStack
- For Android: Install Java Runtime and Android Studio -> Go to Device Manager and Create an emulator
If you are curious about the difference between simulators and emulators, you can read this article: Difference between Emulator and Simulator, but to put it simply, emulators replicate both hardware and software, while simulators only replicate software.
For setting up Android Studio, follow this excellent tutorial by Expo : Android Studio Emulator Guide
2. Create the App
Let's start this project!
You can find the full code for this project in the EdgeLLM
repo here, there are two folders:
EdgeLLMBasic
: A basic implementation of the app with a simple chat interfaceEdgeLLMPlus
: An enhanced version of the app with a more complex chat interface and additional features
First, we need to initiate the app using @react-native-community/cli:
npx @react-native-community/cli@latest init <ProjectName>
Project Structure
App folders are organized as follows:
Default Files/Folders
android/
- Contains native Android project files
- Purpose: To build and run the app on Android devices
ios/
- Contains native iOS project files
- Purpose: To build and run the app on iOS devices
node_modules/
- Purpose: Holds all npm dependencies used in the project
App.tsx
- The main root component of your app, written in TypeScript
- Purpose: Entry point to the app's UI and logic
index.js
- Registers the root component (
App
) - Purpose: Entry point for the React Native runtime. You don't need to modify this file.
- Registers the root component (
Additional Configuration Files
tsconfig.json
: Configures TypeScript settingsbabel.config.js
: Configures Babel for transpiling modern JavaScript/TypeScript, which means it will convert modern JS/TS code to older JS/TS code that is compatible with older browsers or devices.jest.config.js
: Configures Jest for testing React Native components and logic.metro.config.js
: Customizes the Metro bundler for the project. It's a JavaScript bundler specifically designed for React Native. It takes your project's JavaScript and assets, bundles them into a single file (or multiple files for efficient loading), and serves them to the app during development. Metro is optimized for fast incremental builds, supports hot reloading, and handles React Native's platform-specific files (.ios.js or .android.js)..watchmanconfig
: Configures Watchman, a file-watching service used by React Native for hot reloading.
3. Running the Demo & Project
Running the Demo
To run the project, and see how it looks like on your own virtual device, follow these steps:
Clone the Repository:
git clone https://github.com/MekkCyber/EdgeLLM.git
Navigate to the Project Directory:
cd EdgeLLMPlus #or cd EdgeLLMBasic
Install Dependencies:
npm install
Navigate to the iOS Folder and Install:
cd ios pod install
Start the Metro Bundler: Run the following command in the project folder (EdgeLLMPlus or EdgeLLMBasic):
npm start
Launch the App on iOS or Android Simulator: Open another terminal and run:
# For iOS npm run ios # For Android npm run android
This will build and launch the app on your emulator/simulator to test the project before we start coding.
Running the Project
Running a React Native application requires either an emulator/simulator or a physical device. We'll focus on using an emulator since it provides a more streamlined development experience with your code editor and debugging tools side by side.
We start by ensuring our development environment is ready, we need to be in the project folder and run the following commands:
# Install dependencies
npm install
# Start the Metro bundler
npm start
In a new terminal, we will launch the app on our chosen platform:
# For iOS
npm run ios
# For Android
npm run android
This should build and launch the app on your emulator/simulator.
4. App Implementation
Installing Dependencies
First, let's install the required packages. We aim to load models from the Hugging Face Hub and run them locally. To achieve this, we need to install :
llama.rn
: a binding forllama.cpp
for React Native apps.react-native-fs
: allows us to manage the device's file system in a React Native environment.axios
: a library for sending requests to the Hugging Face Hub API.
npm install axios react-native-fs llama.rn
Let's run the app on our emulator/simulator as we showed before so we can start the development
State Management
We will start by deleting everyting from the App.tsx
file, and creating an empty code structure like the following :
App.tsx
import React from 'react';
import {StyleSheet, Text, View} from 'react-native';
function App(): React.JSX.Element {
return <View> <Text>Hello World</Text> </View>;
}
const styles = StyleSheet.create({});
export default App;
Inside the return
statement of the App
function we define the UI rendered, and outside we define the logic, but all code will be inside the App
function.
We will have a screen that looks like this:

The text "Hello World" is not displayed properly because we are using a simple View
component, we need to use a SafeAreaView
component to display the text correctly, we will deal with that in the next sections.
Now let's think about what our app needs to track for now:
Chat-related:
- The conversation history (messages between user and AI)
- Current user input
Model-related:
- Selected model format (like Llama 1B or Qwen 1.5B)
- Available GGUF files list for each model format
- Selected GGUF file to download
- Model download progress
- A context to store the loaded model
- A boolean to check if the model is downloading
- A boolean to check if the model is generating a response
Here's how we implement these states using React's useState hook (we will need to import it from react)
State Management Code
import { useState } from 'react';
...
type Message = {
role: 'system' | 'user' | 'assistant';
content: string;
};
const INITIAL_CONVERSATION: Message[] = [
{
role: 'system',
content:
'This is a conversation between user and assistant, a friendly chatbot.',
},
];
const [conversation, setConversation] = useState<Message[]>(INITIAL_CONVERSATION);
const [selectedModelFormat, setSelectedModelFormat] = useState<string>('');
const [selectedGGUF, setSelectedGGUF] = useState<string | null>(null);
const [availableGGUFs, setAvailableGGUFs] = useState<string[]>([]);
const [userInput, setUserInput] = useState<string>('');
const [progress, setProgress] = useState<number>(0);
const [context, setContext] = useState<any>(null);
const [isDownloading, setIsDownloading] = useState<boolean>(false);
const [isGenerating, setIsGenerating] = useState<boolean>(false);
This will be added to the App.tsx
file inside the App
function but outside the return
statement as it's part of the logic.
The Message type defines the structure of chat messages, specifying that each message must have a role (either 'user' or 'assistant' or 'system') and content (the actual message text).
Now that we have our basic state management set up, we need to think about how to:
- Fetch available GGUF models from Hugging Face
- Download and manage models locally
- Create the chat interface
- Handle message generation
Let's tackle these one by one in the next sections...
Fetching available GGUF models from the Hub
Let's start by defining the model formats our app is going to support and their repositories. Of course llama.rn
is a binding for llama.cpp
so we need to load GGUF
files. To find GGUF repositories for the models we want to support, we can use the search bar on Hugging Face and search for GGUF
files for a specific model, or use the script quantize_gguf.py
provided here to quantize the model ourselves and upload the files to our hub repository.
const modelFormats = [
{label: 'Llama-3.2-1B-Instruct'},
{label: 'Qwen2-0.5B-Instruct'},
{label: 'DeepSeek-R1-Distill-Qwen-1.5B'},
{label: 'SmolLM2-1.7B-Instruct'},
];
const HF_TO_GGUF = {
"Llama-3.2-1B-Instruct": "medmekk/Llama-3.2-1B-Instruct.GGUF",
"DeepSeek-R1-Distill-Qwen-1.5B":
"medmekk/DeepSeek-R1-Distill-Qwen-1.5B.GGUF",
"Qwen2-0.5B-Instruct": "medmekk/Qwen2.5-0.5B-Instruct.GGUF",
"SmolLM2-1.7B-Instruct": "medmekk/SmolLM2-1.7B-Instruct.GGUF",
};
The HF_TO_GGUF
object maps user-friendly model names to their corresponding Hugging Face repository paths. For example:
- When a user selects 'Llama-3.2-1B-Instruct', it maps to
medmekk/Llama-3.2-1B-Instruct.GGUF
which is one of the repositories containing the GGUF files for the Llama 3.2 1B Instruct model.
The modelFormats
array contains the list of model options that will be displayed to users in the selection screen, we chose Llama 3.2 1B Instruct, DeepSeek R1 Distill Qwen 1.5B, Qwen 2 0.5B Instruct and SmolLM2 1.7B Instruct as they are the most popular small models.
Next, let's create a way to fetch and display available GGUF model files from the hub for our selected model format.
When a user selects a model format, we make an API call to Hugging Face using the repository path we mapped in our HF_TO_GGUF
object. We're specifically looking for files that end with '.gguf' extension, which are our quantized model files.
Once we receive the response, we extract just the filenames of these GGUF files and store them in our availableGGUFs
state using setAvailableGGUFs
. This allows us to show users a list of available GGUF model variants they can download.
Fetching Available GGUF Files
const fetchAvailableGGUFs = async (modelFormat: string) => {
if (!modelFormat) {
Alert.alert('Error', 'Please select a model format first.');
return;
}
try {
const repoPath = HF_TO_GGUF[modelFormat as keyof typeof HF_TO_GGUF];
if (!repoPath) {
throw new Error(
`No repository mapping found for model format: ${modelFormat}`,
);
}
const response = await axios.get(
`https://huggingface.co./api/models/${repoPath}`,
);
if (!response.data?.siblings) {
throw new Error('Invalid API response format');
}
const files = response.data.siblings.filter((file: {rfilename: string}) =>
file.rfilename.endsWith('.gguf'),
);
setAvailableGGUFs(files.map((file: {rfilename: string}) => file.rfilename));
} catch (error) {
const errorMessage =
error instanceof Error ? error.message : 'Failed to fetch .gguf files';
Alert.alert('Error', errorMessage);
setAvailableGGUFs([]);
}
};
Note: Ensure to import axios and Alert at the top of your file if not already imported.
We need to test that the function is working correclty, let's add a button to the UI to trigger the function, instead of View
we will use a SafeAreaView
(more on that later) component, and we will display the available GGUF files in a ScrollView
component. the onPress
function is triggered when the button is pressed:
<TouchableOpacity onPress={() => fetchAvailableGGUFs('Llama-3.2-1B-Instruct')}>
<Text>Fetch GGUF Files</Text>
</TouchableOpacity>
<ScrollView>
{availableGGUFs.map((file) => (
<Text key={file}>{file}</Text>
))}
</ScrollView>
This should look something like this :

Note: For the whole code until now you can check the
first_checkpoint
branch in theEdgeLLMBasic
folder here
Model Download Implementation
Now let's implement the model download functionality in the handleDownloadModel
function which should be called when the user clicks on the download button. This will download the selected GGUF file from Hugging Face and store it in the app's Documents directory:
Model Download Function
const handleDownloadModel = async (file: string) => {
const downloadUrl = `https://huggingface.co./${
HF_TO_GGUF[selectedModelFormat as keyof typeof HF_TO_GGUF]
}/resolve/main/${file}`;
// we set the isDownloading state to true to show the progress bar and set the progress to 0
setIsDownloading(true);
setProgress(0);
try {
// we download the model using the downloadModel function, it takes the selected GGUF file, the download URL, and a progress callback function to update the progress bar
const destPath = await downloadModel(file, downloadUrl, progress =>
setProgress(progress),
);
} catch (error) {
const errorMessage =
error instanceof Error
? error.message
: 'Download failed due to an unknown error.';
Alert.alert('Error', errorMessage);
} finally {
setIsDownloading(false);
}
};
We could have implemented the api
requests inside the handleDownloadModel
function, but we will keep it in a separate file to keep the code clean and readable. handleDownloadModel
calls the downloadModel
function, located in src/api
, which accepts three parameters: modelName
, downloadUrl
, and a progress
callback function. This callback is triggered during the download process to update the progress. Before downloading we need to have the selectedModelFormat
state set to the model format we want to download.
Inside the downloadModel
function we use the RNFS
module, part of the react-native-fs
library, to access the device's file system. It allows developers to read, write, and manage files on the device's storage. In this case, the model is stored in the app's Documents folder using RNFS.DocumentDirectoryPath
, ensuring that the downloaded file is accessible to the app. The progress bar is updated accordingly to reflect the current download status and the progress bar component is defined in the components
folder.
Let's create src/api/model.ts
and copy the code from the src/api/model.ts
file in the repo. The logic should be simple to understand. The same goes for the progress bar component in the src/components
folder, it's a simple colored View
where the width is the progress of the download.
Now we need to test the handleDownloadModel
function, let's add a button to the UI to trigger the function, and we will display the progress bar. This will be added under the ScrollView
we added before.
Download Model Button
<View style={{ marginTop: 30, marginBottom: 15 }}>
{Object.keys(HF_TO_GGUF).map((format) => (
<TouchableOpacity
key={format}
onPress={() => {
setSelectedModelFormat(format);
}}
>
<Text> {format} </Text>
</TouchableOpacity>
))}
</View>
<Text style={{ marginBottom: 10, color: selectedModelFormat ? 'black' : 'gray' }}>
{selectedModelFormat
? `Selected: ${selectedModelFormat}`
: 'Please select a model format before downloading'}
</Text>
<TouchableOpacity
onPress={() => {
handleDownloadModel("Llama-3.2-1B-Instruct-Q2_K.gguf");
}}
>
<Text>Download Model</Text>
</TouchableOpacity>
{isDownloading && <ProgressBar progress={progress} />}
In the UI we show a list of the supported model formats and a button to download the model, when the user chooses the model format and clicks on the button the progress bar should be displayed and the download should start. In the test we hardcoded the model to download Llama-3.2-1B-Instruct-Q2_K.gguf
, so we need to select Llama-3.2-1B-Instruct
as a model format for the function to work, we should have something like:

Note: For the whole code until now you can check the
second_checkpoint
branch in theEdgeLLMBasic
folder here
Model Loading and Initialization
Next, we will implement a function to load the downloaded model into a Llama context, as detailed in the llama.rn
documentation available here. If a context is already present, we will release it, set the context to null
, and reset the conversation to its initial state. Subsequently, we will utilize the initLlama
function to load the model into a new context and update our state with the newly initialized context.
Model Loading Function
import {initLlama, releaseAllLlama} from 'llama.rn';
import RNFS from 'react-native-fs'; // File system module
...
const loadModel = async (modelName: string) => {
try {
const destPath = `${RNFS.DocumentDirectoryPath}/${modelName}`;
// Ensure the model file exists before attempting to load it
const fileExists = await RNFS.exists(destPath);
if (!fileExists) {
Alert.alert('Error Loading Model', 'The model file does not exist.');
return false;
}
if (context) {
await releaseAllLlama();
setContext(null);
setConversation(INITIAL_CONVERSATION);
}
const llamaContext = await initLlama({
model: destPath,
use_mlock: true,
n_ctx: 2048,
n_gpu_layers: 1
});
console.log("llamaContext", llamaContext);
setContext(llamaContext);
return true;
} catch (error) {
Alert.alert('Error Loading Model', error instanceof Error ? error.message : 'An unknown error occurred.');
return false;
}
};
We need to call the loadModel
function when the user clicks on the download button, so we need to add it inside the handleDownloadModel
function right after the download is complete if it's successful.
// inside the handleDownloadModel function, just after the download is complete
if (destPath) {
await loadModel(file);
}
To test the model loading let's add a console.log
inside the loadModel
function to print the context, so we can see if the model is loaded correctly. We keep the UI the same as before, because clicking on the download button will trigger the handleDownloadModel
function, and the loadModel
function will be called inside it. To see the console.log
output we need to open the Developer Tools, for that we press j
in the terminal where we ran npm start
. If everything is working correctly we should see the context printed in the console.
Note: For the whole code until now you can check the
third_checkpoint
branch in theEdgeLLMBasic
folder here
Chat Implementation
With the model now loaded into our context, we can proceed to implement the conversation logic. We'll define a function called handleSendMessage
, which will be triggered when the user submits their input. This function will update the conversation state and send the updated conversation to the model via context.completion
. The response from the model will then be used to further update the conversation, which means that the conversation will be updated twice in this function.
Chat Function
const handleSendMessage = async () => {
// Check if context is loaded and user input is valid
if (!context) {
Alert.alert('Model Not Loaded', 'Please load the model first.');
return;
}
if (!userInput.trim()) {
Alert.alert('Input Error', 'Please enter a message.');
return;
}
const newConversation: Message[] = [
// ... is a spread operator that spreads the previous conversation array to which we add the new user message
...conversation,
{role: 'user', content: userInput},
];
setIsGenerating(true);
// Update conversation state and clear user input
setConversation(newConversation);
setUserInput('');
try {
// we define list the stop words for all the model formats
const stopWords = [
'</s>',
'<|end|>',
'user:',
'assistant:',
'<|im_end|>',
'<|eot_id|>',
'<|end▁of▁sentence|>',
'<|end▁of▁sentence|>',
];
// now that we have the new conversation with the user message, we can send it to the model
const result = await context.completion({
messages: newConversation,
n_predict: 10000,
stop: stopWords,
});
// Ensure the result has text before updating the conversation
if (result && result.text) {
setConversation(prev => [
...prev,
{role: 'assistant', content: result.text.trim()},
]);
} else {
throw new Error('No response from the model.');
}
} catch (error) {
// Handle errors during inference
Alert.alert(
'Error During Inference',
error instanceof Error ? error.message : 'An unknown error occurred.',
);
} finally {
setIsGenerating(false);
}
};
To test the handleSendMessage
function we need to add an input text field and a button to the UI to trigger the function, and we will display the conversation in the ScrollView
component.
Simple Chat UI
<View
style={{
flexDirection: "row",
alignItems: "center",
marginVertical: 10,
marginHorizontal: 10,
}}
>
<TextInput
style={{flex: 1, borderWidth: 1}}
value={userInput}
onChangeText={setUserInput}
placeholder="Type your message here..."
/>
<TouchableOpacity
onPress={handleSendMessage}
style={{backgroundColor: "#007AFF"}}
>
<Text style={{ color: "white" }}>Send</Text>
</TouchableOpacity>
</View>
<ScrollView>
{conversation.map((msg, index) => (
<Text style={{marginVertical: 10}} key={index}>{msg.content}</Text>
))}
</ScrollView>
If everything is implemented correctly, we should be able to send messages to the model and see the conversation in the ScrollView
component, it's not beautiful of course but it's a good start, we will improve the UI later.
The result should look like this:

Note: For the whole code until now you can check the
fourth_checkpoint
branch in theEdgeLLMBasic
folder here
The UI & Logic
Now that we have the core functionality implemented, we can focus on the UI. The UI is straightforward, consisting of a model selection screen with a list of models and a chat interface that includes a conversation history and a user input field. During the model download phase, a progress bar is displayed. We intentionally avoid adding many screens to keep the app simple and focused on its core functionality. To keep track of which part of the app is being used, we will use a an other state variable called currentPage
, it will be a string that can be either modelSelection
or conversation
. We add it to the App.tsx
file.
const [currentPage, setCurrentPage] = useState<
'modelSelection' | 'conversation'
>('modelSelection'); // Navigation state
For the css we will use the same styles as in the EdgeLLMBasic repo, you can copy the styles from there.
We will start by working on the model selection screen in the App.tsx file, we will add a list of model formats (you need to do the necessary imports and delete the previous code in the SafeAreaView
component we used for testing):
Model Selection UI
<SafeAreaView style={styles.container}>
<ScrollView contentContainerStyle={styles.scrollView}>
<Text style={styles.title}>Llama Chat</Text>
{/* Model Selection Section */}
{currentPage === 'modelSelection' && (
<View style={styles.card}>
<Text style={styles.subtitle}>Choose a model format</Text>
{modelFormats.map(format => (
<TouchableOpacity
key={format.label}
style={[
styles.button,
selectedModelFormat === format.label && styles.selectedButton,
]}
onPress={() => handleFormatSelection(format.label)}>
<Text style={styles.buttonText}>{format.label}</Text>
</TouchableOpacity>
))}
</View>
)}
</ScrollView>
</SafeAreaView>
We use SafeAreaView
to ensure that the app is displayed correctly on devices with different screen sizes and orientations as we did in the previous section, and we use ScrollView
to allow the user to scroll through the model formats. We also use modelFormats.map
to map over the modelFormats
array and display each model format as a button with a style that changes when the model format is selected. We also use the currentPage
state to display the model selection screen only when the currentPage
state is set to modelSelection
, this is done by using the &&
operator. The TouchableOpacity
component is used to allow the user to select a model format by pressing on it.
Now let's define handleFormatSelection
in the App.tsx file:
const handleFormatSelection = (format: string) => {
setSelectedModelFormat(format);
setAvailableGGUFs([]); // Clear any previous list
fetchAvailableGGUFs(format);
};
We store the selected model format in the state and clear the previous list of GGUF files from other selections, and then we fetch the new list of GGUF files for the selected format. The screen should look like this on your device:

Next, let's add the view to show the list of GGUF files already available for the selected model format, we will add it below the model format selection section.
Available GGUF Files UI
{
selectedModelFormat && (
<View>
<Text style={styles.subtitle}>Select a .gguf file</Text>
{availableGGUFs.map((file, index) => (
<TouchableOpacity
key={index}
style={[
styles.button,
selectedGGUF === file && styles.selectedButton,
]}
onPress={() => handleGGUFSelection(file)}>
<Text style={styles.buttonTextGGUF}>{file}</Text>
</TouchableOpacity>
))}
</View>
)
}
We need to only show the list of GGUF files if the selectedModelFormat
state is not null, which means a model format is selected by the user.

We need to define handleGGUFSelection
in the App.tsx file as a function that will trigger an alert to confirm the download of the selected GGUF file. If the user clicks on Yes
, the download will start, else the selected GGUF file will be cleared.
Confirm Download Alert
const handleGGUFSelection = (file: string) => {
setSelectedGGUF(file);
Alert.alert(
'Confirm Download',
`Do you want to download ${file}?`,
[
{
text: 'No',
onPress: () => setSelectedGGUF(null),
style: 'cancel',
},
{text: 'Yes', onPress: () => handleDownloadAndNavigate(file)},
],
{cancelable: false},
);
};
const handleDownloadAndNavigate = async (file: string) => {
await handleDownloadModel(file);
setCurrentPage('conversation'); // Navigate to conversation after download
};
handleDownloadAndNavigate
is a simple function that will download the selected GGUF file by calling handleDownloadModel
(implemented in the previous sections) and navigate to the conversation screen after the download is complete.
Now after clicking on a GGUF file, we should have an alert to confirm or cancel the download :

We can add a simple ActivityIndicator
to the view to display a loading state when the available GGUF files are being fetched. For that we will need to import ActivityIndicator
from react-native
and define isFetching
as a boolean state variable that will be set to true in the start of the fetchAvailableGGUFs
function and false when the function is finished as you can see here in the code, and add the ActivityIndicator
to the view just before the {availableGGUFs.map((file, index) => (...))}
to display a loading state when the available GGUF files are being fetched.
{isFetching && (
<ActivityIndicator size="small" color="#2563EB" />
)}
The app should look like this for a brief moment when the GGUF files are being fetched:

Now we should be able to see the different GGUF files available for each model format when we click on it, and we should see the alert when clicking on a GGUF confirming if we want to download the model.
Next we need to add the progress bar to the model selection screen, we can do it by importing the ProgressBar
component from src/components/ProgressBar.tsx
in the App.tsx
file as we did before, and we will add it to the view just after the {availableGGUFs.map((file, index) => (...))}
to display the progress bar when the model is being downloaded.
Download Progress Bar
{
isDownloading && (
<View style={styles.card}>
<Text style={styles.subtitle}>Downloading : </Text>
<Text style={styles.subtitle2}>{selectedGGUF}</Text>
<ProgressBar progress={progress} />
</View>
);
}
The download progress bar will now be positioned at the bottom of the model selection screen. However, this means that users may need to scroll down to view it. To address this, we will modify the display logic so that the model selection screen is only shown when the currentPage
state is set to 'modelSelection' and the added condition that there is no ongoing model download.
{currentPage === 'modelSelection' && !isDownloading && (
<View style={styles.card}>
<Text style={styles.subtitle}>Choose a model format</Text>
...
After confirming a model download we should have a screen like this :

Note: For the whole code until now you can check the
fifth_checkpoint
branch in theEdgeLLMBasic
folder here
Now that we have the model selection screen, we can start working on the conversation screen with the chat interface. This screen will be displayed when currentPage
is set to conversation
. We will add a conversation history and a user input field to the screen. The conversation history will be displayed in a scrollable view, and the user input field will be displayed at the bottom of the screen out of the scrollable view to stay visible. Each message will be displayed in a different color depending on the role of the message (user or assistant).
We need to add just under the model selection screen the view for the conversation screen:
Conversation UI
{currentPage == 'conversation' && !isDownloading && (
<View style={styles.chatContainer}>
<Text style={styles.greetingText}>
🦙 Welcome! The Llama is ready to chat. Ask away! 🎉
</Text>
{conversation.slice(1).map((msg, index) => (
<View key={index} style={styles.messageWrapper}>
<View
style={[
styles.messageBubble,
msg.role === 'user'
? styles.userBubble
: styles.llamaBubble,
]}>
<Text
style={[
styles.messageText,
msg.role === 'user' && styles.userMessageText,
]}>
{msg.content}
</Text>
</View>
</View>
))}
</View>
)}
We use different styles for the user messages and the model messages, and we use the conversation.slice(1)
to remove the first message from the conversation, which is the system message.
We can now add the user input field at the bottom of the screen and the send button (they should not be inside the ScrollView
). As I mentioned before, we will use the handleSendMessage
function to send the user message to the model and update the conversation state with the model response.
Send Button & Input Field
{currentPage === 'conversation' && (
<View style={styles.inputContainer}>
<TextInput
style={styles.input}
placeholder="Type your message..."
placeholderTextColor="#94A3B8"
value={userInput}
onChangeText={setUserInput}
/>
<View style={styles.buttonRow}>
<TouchableOpacity
style={styles.sendButton}
onPress={handleSendMessage}
disabled={isGenerating}>
<Text style={styles.buttonText}>
{isGenerating ? 'Generating...' : 'Send'}
</Text>
</TouchableOpacity>
</View>
</View>
)}
When the user clicks on the send button, the handleSendMessage
function will be called and the isGenerating
state will be set to true. The send button will then be disabled and the text will change to 'Generating...'. When the model finishes generating the response, the isGenerating
state will be set to false and the text will change back to 'Send'.
Note: For the whole code until now you can check the
main
branch in theEdgeLLMBasic
folder here
The conversation page should now look like this:

Congratulations you've just built the core functionality of your first AI chatbot, the code is available here ! You can now start adding more features to the app to make it more user friendly and efficient.
The other Functionnalities
The app is now fully functional, you can download a model, select a GGUF file, and chat with the model, but the user experience is not the best. In the EdgeLLMPlus
repo, I've added some other features, like on the fly generation, automatic scrolling, the inference speed tracking, the thought process of the model like deepseek-qwen-1.5B,... we will not go into details here as it will make the blog too long, we will go through some of the ideas and how to implement them but the whole code is available in the repo
Generation on the fly
The app generates responses incrementally, producing one token at a time rather than delivering the entire response in a single batch. This approach enhances the user experience, allowing users to begin reading the response as it is being formed. We achieve this by utilizing a callback
function within context.completion
, which is triggered after each token is generated, enabling us to update the conversation
state accordingly.
Auto Scrolling
Auto Scrolling ensures that the latest messages or tokens are always visible to the user by automatically scrolling the chat view to the bottom as new content is added. To implement that we need we use a reference to the ScrollView
to allow programatic control over the scroll position, and we use the scrollToEnd
method to scroll to the bottom of the ScrollView
when a new message is added to the conversation
state. We also define a autoScrollEnabled
state variable that will be set to false when the user scrolls up more than 100px from the bottom of the ScrollView
.
Inference Speed Tracking
Inference Speed Tracking is a feature that tracks the time taken to generate each token and displays under each message generated by the model. This feature is easy to implement because the CompletionResult
object returned by the context.completion
function contains a timings
property which is a dictionary containing many metrics about the inference process. We can use the predicted_per_second
metric to track the speed of the model.
Thought Process
The thought process is a feature that displays the thought process of the model like deepseek-qwen-1.5B. The app identifies special tokens like thought
and showThought
property to the Message
type. message.thought
will store the reasoning of the model and message.showThought
will be a boolean that will be set to true when the user clicks on the message to toggle the visibility of the thought.
Markdown Rendering
The app uses the react-native-markdown-display
package to render markdown in the conversation. This package allows us to render code in a better format.
Model Management
We added a checkDownloadedModels
function to the App.tsx
file that will check if the model is already downloaded on the device, if it's not we will download it, if it is we will load it into the context directly, and we added some elements in the UI to show if a model is already downloaded or not.
Stop/Back Buttons
We added two important buttons in the UI, the stop button and the back button. The stop button will stop the generation of the response and the back button will navigate to the model selection screen. For that, We added a handleStopGeneration
function to the App.tsx
file that will stop the generation of the response by calling context.stop
and set the isGenerating
state to false. We also added a handleBack
function to the App.tsx
file that will navigate to the model selection screen by setting the currentPage
state to modelSelection
.
5. How to Debug
Chrome DevTools Debugging
For debugging we use Chrome DevTools as in web development :
- Press
j
in the Metro bundler terminal to launch Chrome DevTools - Navigate to the "Sources" tab
3. Find your source files
4. Set breakpoints by clicking on line numbers
5. Use debugging controls (top right corner):
- Step Over - Execute current line
- Step Into - Enter function call
- Step Out - Exit current function
- Continue - Run until next breakpoint
Common Debugging Tips
- Console Logging
console.log('Debug value:', someValue);
console.warn('Warning message');
console.error('Error details');
This will log the output in the console of Chrome DevTools
- Metro Bundler Issues If you encounter issues with the Metro bundler, you can try clearing the cache first:
# Clear Metro bundler cache
npm start --reset-cache
- Build Errors
# Clean and rebuild
cd android && ./gradlew clean
cd ios && pod install
6. Additional Features we can add
To enhance the user experience, we can add some features like:
Model Management:
- Allow users to delete models from the device
- Add a feature to delete all downloaded models from the device
- Add a performance tracking feature to the UI to track memory and cpu usage
Model Selection:
- Allow users to search for a model
- Allow users to sort models by name, size, etc.
- Show the model size in the UI
- Add support for VLMs
Chat Interface:
- Display the code in color
- Math Formatting
I'm sure you can think of some really cool features to add to the app, feel free to implement them and share them with the community 🤗
7. Acknowledgments
I would like to thank the following people for reviewing this blog post and providing valuable feedback:
Their expertise and suggestions helped improve the quality and accuracy of this guide.
8. Conclusion
You now have a working React Native app that can:
- Download models from Hugging Face
- Run inference locally
- Provide a smooth chat experience
- Track model's performance
This implementation serves as a foundation for building more sophisticated AI-powered mobile applications. Remember to consider device capabilities when selecting models and tuning parameters.
Happy coding! 🚀