Troubleshooting guide for common errors encountered during model installation and deployment in production.
Common Errors During Model Installation and Deployment
This troubleshooting guide covers the most frequently encountered errors when installing the Techsolut platform and deploying computer vision models in production. For each problem, we provide a precise diagnosis and step-by-step solutions.
Installation Problems
Error: "Missing Python Dependencies"
Symptoms
- Error messages like
ModuleNotFoundError: No module named 'X'
- Installation that interrupts with dependency errors
- Version conflicts between packages
Solutions
-
Use the Recommended Virtual Environment
bash python -m venv techsolut_env source techsolut_env/bin/activate # Linux/Mac techsolut_env\Scripts\activate # Windows
-
Install All Dependencies with requirements.txt
bash pip install --upgrade pip setuptools wheel pip install -r requirements.txt
-
In Case of Version Conflicts
- Create a clean new virtual environment
- Install packages in the order specified in the documentation
-
Use the
--no-deps
option for problematic packages then manually install their dependencies -
For Issues with Binaries (PyTorch, TensorFlow)
- Check CUDA compatibility if using a GPU
- Install the specific version compatible with your hardware:
bash pip install torch==X.X.X+cu11X -f https://download.pytorch.org/whl/torch_stable.html
Error: "CUDA Not Available"
Symptoms
- Messages indicating
CUDA not available
orNo CUDA GPUs are available
- Very slow performance during training
- Errors when launching GPU tasks
Solutions
-
Check NVIDIA Driver Installation
bash nvidia-smi
If this command fails, reinstall NVIDIA drivers -
Check CUDA Version
bash nvcc --version
Make sure it's compatible with your PyTorch/TensorFlow version -
Reinstall PyTorch with the Appropriate CUDA Support
bash pip uninstall torch pip install torch==X.X.X+cuXXX -f https://download.pytorch.org/whl/torch_stable.html
-
Test CUDA Availability
python import torch print(torch.cuda.is_available()) print(torch.cuda.device_count()) print(torch.cuda.get_device_name(0))
-
If CUDA Is Not Available on Your Machine
- Configure Techsolut to use CPU only
- Or use our remote computing option on our GPU servers
Error: "Database Connection Failure"
Symptoms
- Error messages like
OperationalError: unable to open database file
- Unable to start the application
- Failure during database creation or migration
Solutions
-
Check Permissions on the Database Folder
bash ls -la /path/to/db/folder chmod -R 755 /path/to/db/folder
-
Verify Connection Configuration
- Make sure the information in
config.py
is correct -
For PostgreSQL/MySQL, check that the service is active and accessible
-
Reset the Database (if possible)
bash flask db reset # Warning: this deletes all existing data flask db upgrade
-
For Remote Databases
- Check that the firewall allows connections
- Test the connection with a standard SQL client
- Check your host's limitations (quotas, number of connections)
Deployment Problems
Error: "Insufficient Memory During Inference"
Symptoms
CUDA out of memory
errors- Application that crashes when processing images
- Performance that degrades over time
Solutions
- Reduce Batch Size
- In
config.py
, modifyBATCH_SIZE
to a smaller value -
For the API, limit the number of simultaneous requests
-
Optimize the Model for Inference
-
Use model quantization:
python from techsolut.optimization import quantize_model quantized_model = quantize_model(model, quantization_type='dynamic')
-
Use Memory-Efficient Inference Mode
- Enable gradient-free mode:
python with torch.no_grad(): predictions = model(inputs)
-
Use our optimized inference function:
python from techsolut.inference import efficient_predict results = efficient_predict(model, data, max_batch_size=4)
-
Free GPU Memory Regularly
- After each inference of a large batch:
python torch.cuda.empty_cache()
- For long-running services, schedule periodic restarts
Error: "High Latency in Production"
Symptoms
- Very slow response time in production
- Acceptable performance in development but not in production
- Request timeouts
Solutions
-
Measure and Identify the Bottleneck
python from techsolut.profiling import profile_inference profile_results = profile_inference(model, sample_input) print(profile_results)
-
Optimize Image Preprocessing
- Use GPU resizing if possible
- Preprocess images in batches
-
Use our optimized pipeline:
python from techsolut.preprocessing import FastImageProcessor processor = FastImageProcessor(device='cuda')
-
Implement Request Batching
- Collect requests for a short interval
- Process them together rather than individually
-
Use our automatic batching middleware:
python from techsolut.serving import BatchingMiddleware app = BatchingMiddleware(app, batch_size=16, timeout=0.1)
-
Use Result Caching
- For repetitive or similar inputs
- Configure caching in
config.py
- Or use a solution like Redis for distributed caching
Error: "Incompatible Model Format"
Symptoms
- Errors when loading the model in production
- Messages like
Error loading model
orUnsupported op
- Inconsistencies between dev/prod results
Solutions
- Check Version Compatibility
- Make sure PyTorch/TensorFlow versions are identical
-
Use the same CPU/GPU architecture in dev and prod if possible
-
Convert the Model to a Standard Format
-
Export to ONNX for better portability:
python from techsolut.export import convert_to_onnx convert_to_onnx(model, 'model.onnx', input_shape=[1, 3, 224, 224])
-
Verify Model Integrity
- Compare checksums of model files
-
Use our verification tool:
bash techsolut-cli verify-model path/to/model.pth
-
Use the Correct Model Version
- Check that you're not using a training checkpoint instead of the final model
- For Techsolut models, use the dedicated export:
python model.export(format='production', optimized=True)
Integration Problems
Error: "Integration Failure with Existing Systems"
Symptoms
- Errors when exchanging data with other systems
- Format incompatibilities between Techsolut and your systems
- Data synchronization issues
Solutions
- Use Integration Adapters
- Check our adapter library in
/techsolut/integrations/
-
Install the adapter specific to your system:
bash pip install techsolut-adapter-erp
-
Correctly Configure Webhooks
- Check webhook URLs and formats
-
Test with our diagnostic tool:
bash techsolut-cli test-webhook http://your-system.com/webhook
-
Use Compatibility Mode
- Enable it in integration settings
- Specify your external system version
-
Use automatic format converters
-
Check Integration-Specific Logs
- Enable detailed logging in
config.py
- Examine
/logs/integration.log
for specific error messages - Use our integration diagnostic tool:
bash techsolut-cli diagnose-integration --system=SAP --level=verbose
Error: "API Authentication Problems"
Symptoms
- 401 or 403 errors during API calls
- Tokens unexpectedly expiring
- Intermittent authentication issues
Solutions
- Check API Key Configuration
- Make sure the keys in
.env
orsecrets.yaml
are correct -
Regenerate API keys if necessary via the admin portal
-
Correctly Configure Authentication
- For OAuth2, ensure the flow is correctly implemented
-
Use our authentication helper:
python from techsolut.auth import OAuth2Helper auth = OAuth2Helper(client_id, client_secret, redirect_uri)
-
Manage Token Renewal
- Implement automatic token refresh logic
-
Use our token management middleware:
python from techsolut.auth import TokenRefreshMiddleware app.wsgi_app = TokenRefreshMiddleware(app.wsgi_app)
-
Test End-to-End Authentication
- Use our API test tool:
bash techsolut-cli test-auth --endpoint=prediction --api-key=your_key
Solutions to Specific Errors
Error: "ImportError: libcudnn.so.X: cannot open shared object file"
This error indicates that the CUDA Deep Neural Network libraries are not found.
Solution
# Check installed CUDA version
nvcc --version
# Install the corresponding cuDNN version
# Download from NVIDIA website and install:
tar -xzvf cudnn-X.X-linux-x64-v8.X.X.X.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# Update environment variables
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
Error: "RuntimeError: CUDA error: device-side assert triggered"
This often cryptic error typically indicates a problem with input dimensions or indices.
Solution
# Enable CUDA debugging information
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
# Check input tensor dimensions
print(f"Input shape: {input_tensor.shape}")
# Verify class indices are valid
num_classes = model.module.num_classes if hasattr(model, 'module') else model.num_classes
if (targets >= num_classes).any():
raise ValueError(f"Target class indices exceed number of classes ({num_classes})")
# Check for NaN values
if torch.isnan(input_tensor).any():
raise ValueError("Input contains NaN values")
Error: "OSError: Unable to load weights from pytorch checkpoint file"
This error occurs when the structure of the loaded model doesn't match that of the checkpoint.
Solution
# Loading with strict=False to ignore non-critical mismatches
state_dict = torch.load('model_checkpoint.pth')
model.load_state_dict(state_dict, strict=False)
# Display key differences
pretrained_dict = torch.load('model_checkpoint.pth')
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
missing_keys = set(model_dict.keys()) - set(pretrained_dict.keys())
unexpected_keys = set(pretrained_dict.keys()) - set(model_dict.keys())
print(f"Missing keys: {missing_keys}")
print(f"Unexpected keys: {unexpected_keys}")
# Use our model reconciliation tool
from techsolut.utils import reconcile_state_dict
fixed_state_dict = reconcile_state_dict(state_dict, model.state_dict())
model.load_state_dict(fixed_state_dict)
How to Get Additional Help
If you encounter a problem not covered by this guide:
- Check Detailed Logs
- Enable debug logging in
config.py
-
Examine
/logs/debug.log
for detailed error messages -
Generate a Diagnostic Report
bash techsolut-cli generate-diagnostic-report --output=diagnostic.zip
-
Contact Our Technical Support
- Send the diagnostic report to support@techsolut.fr
- Include a detailed description of the problem
-
Mention the steps you've already tried
-
Check Our Knowledge Base
- Visit support.techsolut.fr
- Search for similar issues
-
Check recent updates and release notes
-
Join Our Community
- Ask your questions on our community forum
- Participate in weekly troubleshooting webinars
- Check previous discussions on similar problems